Xbox Research conducts player research to understand behavior, preferences, and experiences across gaming ecosystems. But historical focus on Seattle-based players created a fundamental limitation: as Xbox expands into global markets like Southeast Asia, Latin America, and Eastern Europe, traditional research methods break down.
The organization faced a choice: accept the cost and time of hiring language-fluent researchers across time zones, or miss player insights in growth markets that represent billions in potential revenue.
This case study documents AURA (AI User Research Assistants), an integrated platform I designed and built to change how Xbox Research operates globally. Through co-design with researchers, I built four capabilities inside one platform: they address language barriers, timezone constraints, and the slow crawl from raw data to usable insight.
AURA demonstration showing all four capabilities in action within the platform.
The Problem
"I collect the data, send it to vendors, wait weeks, then finally get back something I can analyze. It feels like I'm constantly waiting."— Xbox Research Participant
Xbox Research historically focused on Seattle players. While comprehensive and high-quality, this geographic focus meant insights primarily reflected US cultural contexts and gaming behaviors. As Xbox expanded into global markets, traditional research methods hit real barriers.
Conducting UX studies in markets like Thailand, Indonesia, or Colombia required hiring researchers fluent in local languages and dialects. Finding qualified UX researchers who understood both specific languages and gaming cultures often meant expensive vendor contracts or hiring delays of weeks or months. Translation costs added another layer: surveys, interview transcripts, and playtest feedback all required professional translation services.
Timezone constraints made scheduling brutal. Traditional moderated sessions demanded real-time coordination between Xbox researchers in Seattle and participants in, for example, Vietnam (a 15-hour time difference) or Brazil (4-8 hours depending on the season). This meant narrow scheduling windows, higher no-show rates, and little ability to follow up immediately when something interesting came up during a session.
Vendor dependency slowed everything down. Researchers reported spending weeks waiting for vendors to clean interview data, pull key clips from video sessions, and arrange findings into something analyzable. By the time they had material to work with, product decisions had often already been made.
Without scalable solutions to conduct research across languages and time zones, Xbox risked missing player insights in high-growth international markets. Understanding how players in different cultural contexts experience games is essential for product decisions that affect millions of players.
My Approach
Phase 1 — Discovery Research
8 contextual inquiries and 5 interviews with researchers to understand current workflows, pain points, and opportunities for AI integration across all stages of the research process.
Phase 2 — Synthesis and Design
Analyzed qualitative data to identify core problems, then designed 4 integrated AI-powered tools addressing language, timezone, data processing, and collaboration challenges.
Phase 3 — Full-Stack Development
Built production-ready applications integrating LLM APIs (GPT-4, Claude) with intuitive interfaces designed specifically for UX researchers' mental models.
Phase 4 — Rigorous Evaluation
Conducted mixed-methods studies including within-subjects comparisons, statistical analysis, and qualitative interviews to validate effectiveness with 35+ participants.
Phase 5 — Leadership Presentation
Demonstrated business impact (50% reduced vendor costs) and strong adoption (4 teams) to secure organizational commitment across Xbox Research.
Key Outcomes
AURA delivered measurable improvements in efficiency, cost, and research quality.
Co-Designing with Researchers
I knew early on that AI's potential here was easy to overestimate from the outside. Understanding exactly how it could support UX researchers required spending real time with those researchers, watching them work. To ground AURA in actual needs, I conducted 8 contextual inquiries (observing researchers in their workspace doing real work) and 5 semi-structured interviews with researchers across Xbox Research.
Time Gap Problem
Researchers spent weeks waiting for vendor services to clean data, arrange findings, and identify important clips. They were constantly blocked, waiting for the next phase of their own work to begin.
Geographic Constraints
Finding qualified UX researchers fluent in Thai, Indonesian, or Tagalog meant expensive vendor contracts or research gaps in critical growth markets.
Collaboration Friction
Distributed teams needed real-time discussion of findings, shared note-taking during analysis, and collaborative refinement. Traditional tools split this work across too many places.
AURA: One Platform, Four Capabilities
One thing I noticed during the contextual inquiries was that researchers weren't just frustrated by individual bottlenecks in isolation. They were frustrated by context-switching. Mid-study, a researcher would be reviewing an interview transcript, then jump to a different tool to apply codes, then open a third tool for synthesis, then a fourth for sharing findings with the team. Each switch broke their thinking. I decided early on that AURA needed to be a single platform rather than a collection of separate tools. Researchers should be able to move between capabilities without losing their place in the work.
Based on the pain points from co-design, I built four capabilities inside AURA, each addressing a specific bottleneck in the research lifecycle:
Interviews
The interview capability conducts and analyzes user interviews without a human moderator present, operating across languages and time zones. It uses conversational AI to follow up on participant responses with contextual questions in real time. Translation happens automatically, so researchers can review sessions conducted in Thai or Indonesian without waiting on a vendor.
Surveys
The survey capability adapts questions based on what participants say as they go, rather than presenting the same static sequence to everyone. LLM-based logic generates follow-up questions from previous answers, producing personalized survey paths that surface more specific detail than a one-size-fits-all instrument.
Canvas
The synthesis canvas lets researchers organize qualitative data visually in real time, following a "Human in the Loop" grounded theory approach. Researchers arrange transcripts, survey responses, and behavioral observations spatially while the LLM suggests emerging themes, connections, and patterns for the researcher to accept or reject.
Coder
The coding capability supports collaborative qualitative analysis by helping researchers develop and apply codebooks systematically. Integrated with Microsoft Copilot, it identifies candidate codes from data, applies them consistently across large datasets, and lets researchers refine codebooks iteratively as their understanding of the data evolves.
For detailed demonstrations of each capability, I recommend watching the video at the top of this page.
Evaluation
To evaluate the effectiveness of each AURA capability, I conducted mixed-methods experiments. Below I outline the methods and key findings for each.
Interviews Evaluation
Within-subjects experiment with 12 participants. Each completed two playtests, one using traditional unmoderated interviews and the other using AURA's interview capability. Order was counterbalanced to control for learning effects.
- Participants reported the AURA interview experience was more efficient for providing playtest feedback
- The adaptive questioning approach produced more in-depth responses than the static interview structure
- Participants preferred the AURA condition for its conversational feel and reduced friction
One result I didn't expect: in sessions conducted with participants whose primary language wasn't English, the AI-adaptive approach produced more nuanced responses than we'd seen from even experienced human moderators running sessions through interpreters. Participants seemed more willing to elaborate when the follow-up questions came in their own language without a visible interpreter in the loop. This wasn't something I had hypothesized going in.
Surveys Evaluation
Within-subjects experiment with 43 participants. Each completed two surveys about their gaming experience, one a traditional static survey and the other using AURA's adaptive survey capability. Order was counterbalanced. The survey capability was also deployed at scale during a Halo playtest — replacing the static post-session questionnaire with an adaptive instrument that probed each player's specific experience in real time. Interviews weren't the right fit for a large in-room playtest; surveys let every player respond simultaneously without the noise and coordination overhead of moderated sessions.
- Participants found the AURA survey more engaging and relevant to their specific situation
- Dynamic question generation produced more personalized paths through the survey
- Participants preferred the AURA condition for its ability to capture detail that static surveys miss
Canvas Evaluation
Between-subjects experiment with 4 researchers from Xbox Research. Each completed a playtest analysis using either traditional synthesis methods or AURA's canvas capability.
- The canvas produced better-organized synthesis artifacts compared to traditional methods
- Real-time collaboration features led to more active discussion among researchers during analysis
- Researchers preferred the canvas for making patterns visible across large volumes of data
Coder Evaluation
Between-subjects experiment with 6 researchers from Xbox Research. Each completed a qualitative coding task using either traditional methods or AURA's coding capability.
- The coding capability reduced the time researchers spent on initial code application
- AI-assisted coding produced more consistent application of codes across large datasets
- Researchers preferred the AURA condition for iterative codebook refinement
Organizational Impact
AURA is now used by multiple teams across Xbox Research. What I find most interesting about how it spread is that nobody mandated it. The first researchers to use it started telling colleagues about specific things it had let them do, sessions in markets they couldn't previously reach, studies finished in a day instead of a week. Word moved through the org naturally. Four teams had integrated it into their regular workflow by the time I presented to leadership.
Teams report completing studies in roughly half the time previously required. Xbox can now conduct player research in markets that were effectively off-limits due to language or timezone constraints. The adaptive interview and survey capabilities generate more detailed participant responses than static methods produced, which has changed what researchers are able to bring to product reviews.
More Inclusive Research
Language-agnostic capabilities mean Xbox can understand player needs across diverse global markets, not just English-speaking regions.
Faster Decision Cycles
Reduced time-to-insight allows research to inform product decisions before those decisions get made without it.
Innovation Catalyst
Seeing what was possible with AURA has prompted broader conversations inside Xbox Research about what other parts of the research process could be redesigned.
Conclusion
AURA shows that meaningful impact in enterprise research doesn't require massive teams or years of development. It requires understanding researcher needs in specific, concrete terms, then building something that fits directly into how they already think about their work.
The 50% reduction in vendor costs and adoption across four teams are good numbers, but the thing I keep coming back to is the adoption pattern. Nobody was told to use AURA. Researchers chose it because it solved problems they'd been living with for a long time. That kind of uptake is harder to manufacture than any metric.
The framework I used here, co-designing with the people who will actually use the system, building one integrated environment rather than a collection of disconnected tools, and testing rigorously before claiming results, could apply to other research contexts. Any domain where qualitative expertise needs to scale across language and geography is probably facing a version of the same problem Xbox Research had.
Skills & Methods Demonstrated
Contextual Inquiry, Semi-Structured Interviews, Mixed-Methods Research, Data Synthesis, Usability Testing
UX Design, Full-Stack Development, LLM Integration, Python/JavaScript, Prototyping
Product Strategy, Stakeholder Management, Change Management, Presentation Skills