Xbox AURA: Designing, Developing and Conceptualizing LLM tools to support User Research.

Microsoft 2024-2025 LLM × UX Research ~$200K Annual Savings

Xbox Research conducts player research to understand behavior, preferences, and experiences across gaming ecosystems. But historical focus on Seattle-based players created a fundamental limitation: as Xbox expands into global markets like Southeast Asia, Latin America, and Eastern Europe, traditional research methods break down. The organization faced a choice: accept the cost and time of hiring language-fluent researchers across time zones, or miss critical player insights in growth markets that represent billions in potential revenue.

This case study documents AURA Toolkit (AI User Research Assistants Toolkit)—a comprehensive suite of AI-powered tools I designed and developed to transform how Xbox Research operates globally. Through rigorous co-design with researchers, I built four integrated LLM-powered applications that eliminate language barriers, overcome timezone constraints, and dramatically accelerate the research-to-insight pipeline. The result: $200K+ in annual cost savings, 8 hours saved per study, and 100% researcher preference for the new tools.

The Problem

Xbox Research historically focused on Seattle players: While comprehensive and high-quality, this geographic limitation meant insights primarily reflected US cultural contexts and gaming behaviors. As Xbox expanded into global markets—Southeast Asia, Latin America, Eastern Europe—traditional research methods hit fundamental barriers.

Language barriers created expensive bottlenecks: Conducting UX studies in markets like Thailand, Indonesia, or Colombia required hiring researchers fluent in local languages and dialects. Finding qualified UX researchers who spoke specific languages and understood gaming cultures often meant expensive vendor contracts or hiring delays of weeks or months. Translation costs added another layer: surveys, interview transcripts, and playtest feedback all required professional translation services.

Timezone constraints limited scheduling flexibility: Traditional moderated sessions demanded real-time coordination between Xbox researchers in Seattle and participants in, for example, Vietnam (15-hour time difference) or Brazil (varies by 4-8 hours). This meant limited scheduling windows, higher no-show rates, and reduced ability to conduct iterative studies where follow-up questions could be explored immediately.

Vendor dependency slowed research velocity: Researchers reported spending weeks waiting for vendors to clean interview data, identify key clips from video sessions, and arrange findings into analyzable formats. This created a bottleneck: researchers couldn't progress from data collection to insight synthesis quickly enough to inform product decisions at the speed Xbox's competitive landscape demanded.

Opportunity cost in growth markets: Without scalable solutions to conduct research across languages and time zones, Xbox risked missing player insights in high-growth international markets. Understanding how players in different cultural contexts experience games isn't just nice-to-have—it's essential for product decisions that could impact millions in revenue.

My Approach

As the sole Lead Researcher, Designer, and Developer, I initiated, conceptualized, designed, built, and evaluated the entire AURA Toolkit end-to-end:

Phase 1 - Discovery research (8 contextual inquiries + 5 interviews): Deep engagement with researchers to understand current workflows, pain points, and opportunities for AI integration across all stages of the research process
Phase 2 - Synthesis and design: Analyzed qualitative data to identify core problems, then designed 4 integrated AI-powered tools addressing language, timezone, data processing, and collaboration challenges
Phase 3 - Full-stack development: Built production-ready applications integrating LLM APIs (GPT-4, Claude) with intuitive interfaces designed specifically for UX researchers' mental models
Phase 4 - Rigorous evaluation (35+ participants): Conducted mixed-methods studies including within-subjects comparisons, statistical analysis, and qualitative interviews to validate effectiveness
Phase 5 - Leadership presentation and adoption: Demonstrated business impact ($200K+ savings) and user preference (100% adoption) to secure organizational commitment and adoption across Xbox Research

Key Outcomes

100% researcher preference: All participants found AURA Interviews and AURA Surveys more efficient than traditional methods for generating playtest feedback, with qualitative insights indicating the AI-adaptive approach produced deeper, more nuanced responses
8 hours saved per study: AURA Canvas enabled researchers to visualize, organize, and identify themes from qualitative data significantly faster—saving an average of 8 hours per research study or playtest session
$200K+ estimated annual savings: By reducing dependency on vendor support for data annotation, data collection, and subscription fees for platforms like Dscout and UserTesting, AURA Toolkit generated approximately $200K in annual cost savings
Global research enabled: Language-agnostic and timezone-flexible study capabilities now allow Xbox to conduct player research in any market without geographic constraints
Faster time-to-insight: Researchers can move from data collection to actionable insights in days rather than weeks, dramatically increasing research velocity

Co-Designing with Researchers

While AI offers tremendous potential, I recognized that understanding exactly how AI could support UX researchers required deep engagement with researchers themselves—not assumptions from outside the discipline. To ground the AURA Toolkit in real researcher needs, I conducted 8 contextual inquiries (observing researchers in their actual workspace doing actual work) and 5 semi-structured interviews with researchers across Xbox Research. These sessions revealed critical pain points that became the foundation for each tool in the AURA Toolkit.

Key Pain Points Discovered

Time Gap Between Raw Data and Actionable Analysis

Researchers reported spending significant time waiting for vendor services to clean data, arrange findings, and identify important clips from interview recordings and survey responses. One researcher shared: "I collect the data, send it to vendors, wait weeks, then finally get back something I can analyze. It feels like I'm constantly waiting for the next phase of my own work." This bottleneck meant researchers couldn't iterate quickly or respond to time-sensitive product questions.

Geographic and Economic Constraints on Research Scope

As Xbox expands into Southeast Asia and other growth markets, finding qualified UX researchers fluent in specific languages (Thai, Indonesian, Tagalog, etc.) became increasingly challenging and expensive. Language requirements combined with cultural knowledge requirements meant either expensive vendor contracts or research gaps in critical markets. Further, timezone differences made real-time moderated sessions nearly impossible—creating scheduling friction that reduced participant convenience and study feasibility.

Need for Real-Time Collaborative Synthesis

Researchers working on distributed teams across Xbox expressed frustration with asynchronous collaboration on qualitative analysis. They needed tools that enable real-time discussion of findings, shared note-taking during analysis sessions, and collaborative refinement of insights—especially when team members spanned multiple time zones. Traditional tools (email, shared documents, video calls) fragmented rather than streamlined this collaborative sense-making.

Struggle with Qualitative Data Synthesis at Scale

When playtests generate hundreds of hours of interview recordings, thousands of survey responses, or complex behavioral data, researchers struggle to identify patterns, organize findings, and synthesize qualitative insights into actionable recommendations. Traditional manual methods (color-coding transcripts, creating affinity diagrams in Miro, writing synthesis documents) don't scale—researchers spend weeks on synthesis for a single study, limiting research velocity and reducing ability to inform rapid product decisions.

AURA Toolkit: Four Integrated Tools

Based on the pain points discovered through co-design, I conceptualized and built four integrated AI-powered tools that address specific bottlenecks in the research lifecycle:

AURA Interviews

AI-assisted interview platform that conducts and analyzes user interviews language-agnostically and timezone-flexibly. Rather than requiring a researcher to moderate sessions in real-time, AURA Interviews uses conversational AI to conduct adaptive interviews that follow up on participant responses with contextual questions. The system handles translation automatically, allowing Xbox researchers to analyze interviews conducted in Thai, Indonesian, or any language without manual translation overhead. Since interviews don't require real-time moderation, participants can complete them on their schedule—eliminating timezone constraints.

AURA Surveys

Intelligent survey platform that adapts questions based on participant responses in real-time. Instead of presenting static questions to all participants, AURA Surveys uses LLM-based logic to ask follow-up questions based on previous answers—creating personalized survey experiences that generate deeper insights than traditional one-size-fits-all surveys. The adaptive questioning means researchers get more nuanced data while participants feel surveys are more engaging and relevant to their experiences.

AURA Canvas

Visual collaboration tool that helps researchers organize and synthesize qualitative data in real-time using a "Human in the Loop" grounded theory approach. Researchers can arrange interview transcripts, survey responses, and behavioral observations spatially on a canvas, while LLM assistance suggests themes, connections, and patterns. Rather than replacing researcher judgment, AURA Canvas augments synthesis by handling initial organization and pattern detection, allowing researchers to focus on deeper interpretation. The real-time collaboration features enable distributed teams to work together synchronously despite timezone differences.

AURA Coder

Qualitative coding tool for collaborative analysis that allows researchers to develop and apply codebooks systematically. By integrating Microsoft Copilot, AURA Coder assists researchers in identifying codes from data, applying codes consistently across large datasets, and refining codebooks iteratively. The tool supports collaborative workflows where multiple researchers can code the same dataset, ensuring consistency while maintaining researcher agency over interpretive decisions.

For detailed demonstrations of these tools in action, I highly recommend watching the video at the top of this page.

Evaluation

Due to NDA, I do not share exact statistical numbers with regards to outcomes.

To evaluate the effectiveness of various tools across the AURA Toolkit, I conducted a mixed-methods experiments, below I outline the methods used to evaluate, and the key findings associated with experiments for each tool.

AURA Interviews

To evaluate the effectiveness of AURA Interviews, I conducted a within-subjects experiment with 12 participants. Each participant completed two playtests of a game, one using traditional unmoderated interviews and the other using AURA Interviews. The order of conditions was counterbalanced to control for order effects. After each playtest, participants completed a survey assessing their experience and provided qualitative feedback through semi-structured interviews.

Key Findings

Participants reported that AURA Interviews were more efficient in providing playtest feedback compared to traditional unmoderated methods.
The adaptive questioning approach of AURA Interviews led to more in-depth responses from participants.
Overall, participants expressed a preference for the AURA Interviews due to their streamlined nature and improved user experience.

AURA Surveys

To evaluate the effectiveness of AURA Surveys, I conducted a within-subjects experiment with 43 participants. Each participant completed two surveys about their gaming experience, one using a traditional static survey and the other using AURA Surveys. The order of conditions was counterbalanced to control for order effects. After each survey, participants completed a follow-up survey assessing their experience and provided qualitative feedback through semi-structured interviews.

Participants found AURA Surveys to be more engaging and interactive compared to traditional static surveys.
The dynamic question generation of AURA Surveys led to more relevant and personalized questions for participants.
Overall, participants expressed a preference for AURA Surveys due to their improved user experience and ability to capture nuanced feedback.

AURA Canvas

To evaluate the effectiveness of AURA Canvas, I conducted a between-subjects experiment with 4 researchers from Xbox Research. Each researcher completed a playtest analysis using either traditional methods or AURA Canvas. The order of conditions was counterbalanced to control for order effects. After each analysis, researchers provided qualitative feedback through semi-structured interviews.

Participants reported that AURA Canvas facilitated better organization and synthesis of data compared to traditional methods.
The real-time collaboration features of AURA Canvas led to more dynamic discussions among researchers.
Overall, participants expressed a preference for AURA Canvas due to its innovative approach to qualitative analysis.

AURA Coder

To evaluate the effectiveness of AURA Coder, I conducted a between-subjects experiment with 6 researchers from Xbox Research. Each researcher completed a qualitative coding task using either traditional methods or AURA Coder. The order of conditions was counterbalanced to control for order effects. After each coding task, researchers provided qualitative feedback through semi-structured interviews.

Participants reported that AURA Coder streamlined the coding process compared to traditional methods.
The AI-assisted coding features of AURA Coder led to more consistent application of codes among researchers.
Overall, participants expressed a preference for AURA Coder due to its efficiency and effectiveness in qualitative analysis.

Organizational Impact and Adoption

AURA Toolkit has been adopted by multiple teams across Xbox Research, fundamentally transforming how the organization conducts player research. The adoption didn't happen through top-down mandate—it emerged organically as researchers discovered the tools solved real problems in their daily workflows.

Operational Impact

Increased research efficiency: Teams report completing studies in half the time previously required, from data collection through insight synthesis
Expanded research scope: Xbox can now conduct player research in markets previously considered infeasible due to language or timezone constraints
Improved data quality: The adaptive questioning in AURA Interviews and AURA Surveys generates more nuanced insights than static surveys or unmoderated methods
Reduced vendor dependency: By in-sourcing data processing capabilities, Xbox Research reduced spending on external vendors while gaining more control over research timelines

Strategic Value

More inclusive research practices: Language-agnostic capabilities mean Xbox can understand player needs across diverse global markets, not just English-speaking regions
Faster product decision cycles: Reduced time-to-insight allows research to inform product decisions at the speed the competitive gaming landscape demands
Innovation in research methodology: AURA Toolkit demonstrates how AI can augment rather than replace human expertise in qualitative research
Organizational learning: The toolkit has become a catalyst for rethinking research processes more broadly across Xbox

Conclusion

AURA Toolkit demonstrates that transformative impact in enterprise research doesn't require massive teams or years of development—it requires deep understanding of researcher needs, thoughtful application of AI capabilities, and rigorous validation through mixed-methods evaluation. By starting with co-design rather than technical assumptions, I built tools that addressed fundamental pain points rather than optimizing surface-level workflows.

Beyond individual tools: The success of AURA Toolkit extends beyond the four specific applications—it represents a proof-of-concept for how AI can augment qualitative research expertise at scale. The $200K+ annual savings and 100% researcher preference don't just validate the tools' effectiveness; they demonstrate that AI integration in research organizations can create business value while improving researcher experience and research quality.

Future directions: The AURA Toolkit framework—co-designing with domain experts, integrating AI capabilities thoughtfully, and validating through rigorous evaluation—could extend to other research contexts beyond gaming. The approach demonstrated here offers a template for building AI-assisted research tools in any domain where qualitative expertise requires augmentation rather than replacement.

Skills Demonstrated

Research: Contextual Inquiry • Semi-Structured Interviews • Mixed-Methods Research • Data Synthesis

Design & Development: UX Design • Full-Stack Development • LLM Integration • Python/JavaScript • Prototyping

Strategy: Product Strategy • Stakeholder Management • Change Management • Presentation Skills

Enter Password to View Case Study

Xbox AURA: Designing, Developing and Conceptualizing LLM tools to support User Research.

The Problem

My Approach

Key Outcomes

Co-Designing with Researchers

Key Pain Points Discovered

Time Gap Between Raw Data and Actionable Analysis

Geographic and Economic Constraints on Research Scope

Need for Real-Time Collaborative Synthesis

Struggle with Qualitative Data Synthesis at Scale

AURA Toolkit: Four Integrated Tools

AURA Interviews

AURA Surveys

AURA Canvas

AURA Coder

Evaluation

AURA Interviews

Key Findings

AURA Surveys

AURA Canvas

AURA Coder

Organizational Impact and Adoption

Operational Impact

Strategic Value

Conclusion

Skills Demonstrated