Qualitative UX research produces rich, nuanced insights that quantitative methods cannot capture. But the path from raw interview transcripts to actionable findings is paved with tedious, repetitive labor that even experienced researchers dread.
Every UX researcher knows the ritual: hours of interviews produce pages of transcripts. Those transcripts need to be broken down into atomic observations—one insight per sticky note. Then themes emerge, but proving those themes requires going back through every single data point to gather supporting evidence. It's exhaustive, exhausting, and error-prone.
This case study documents how I discovered these pain points through in-depth interviews with 10 UX researchers, validated them with survey data, and then designed and built Cursor Research—an AI-powered qualitative analysis tool that transforms how researchers move from raw data to grounded insights.
Discovery: Interviewing Researchers
"I spent two full days just copying and pasting quotes from a 90-minute interview into sticky notes. Two days. For one interview."— P3, Senior UX Researcher, 6 years experience
Study Design
I conducted semi-structured interviews with 10 UX researchers across industry and academia (4 senior, 3 mid-level, 3 junior). Participants had between 2 and 12 years of qualitative research experience. Each interview lasted 45–60 minutes and focused on their end-to-end qualitative analysis workflow, pain points, and tool usage.
Following the interviews, participants completed a structured survey rating the difficulty, time investment, and satisfaction across 7 stages of qualitative analysis on 5-point Likert scales.
Key Findings
Atomization Agony
9 out of 10 researchers identified transcript atomization—breaking interviews into individual data points—as the most tedious part of their workflow. Average time: 3.2 hours per 60-minute interview.
Evidence Scavenger Hunt
After identifying themes, 8 out of 10 researchers reported "dreading" the process of going back through all data points to find supporting evidence. They described it as "looking for needles in a haystack."
Theme Confidence Gap
7 out of 10 researchers admitted they sometimes couldn't be sure they'd found all evidence for a theme, leading to lower confidence in their findings and potential missed insights.
Survey Results: Quantifying the Pain
The post-interview survey confirmed what the qualitative data suggested: transcript atomization and evidence gathering are the most painful stages of the qualitative research workflow. Here's what the numbers revealed.
Average Time Spent Per Analysis Stage (hours, per study)
★ Highlighted stages represent the two most time-consuming and frustrating activities identified by participants. n=10 researchers.
Frustration Rating by Stage (1–5 Likert)
Atomization (4.7/5) and evidence gathering (4.5/5) rated most frustrating. n=10.
"I'm confident I've found all evidence for my themes"
70% of researchers lack confidence that they've captured all supporting evidence. n=10.
The survey data painted a clear picture: transcript atomization and evidence gathering are not just annoying—they consume nearly 20 hours per study combined and are the primary sources of researcher frustration and uncertainty. These two pain points became the design targets for Cursor Research.
Design Principles
From the interview findings, I derived four design principles that would guide every decision in building Cursor Research:
Human in the Loop
AI proposes, the researcher decides. Every AI output goes through a review step where the researcher can edit, reject, or refine before it becomes part of the analysis.
Verbatim Grounding
Every data point traces back to the original transcript. No hallucinated quotes, no paraphrasing without consent—the source text is always one click away.
Spatial Reasoning
Researchers think spatially. Sticky notes on a canvas, not rows in a spreadsheet. Physical arrangement creates meaning—proximity implies relationship.
Transparent AI
Every AI classification includes its reasoning. Researchers can see why a note was assigned to a theme, building trust and enabling correction.
Solution: Cursor Research
Cursor Research is an AI-powered qualitative analysis canvas that directly addresses the two core pain points. It combines a visual sticky-note interface with LLM-powered tools for document chunking, thematic analysis, and semantic evidence retrieval—all with human-in-the-loop controls.
Feature 1 — AI Document Chunking
The pain: Researchers spend 10+ hours manually reading transcripts and copy-pasting excerpts into sticky notes, one observation at a time.
The solution: Upload a transcript and the AI automatically proposes atomic chunks—one idea per note—with participant attribution. The researcher reviews each chunk in a split-view interface before approving.
P3: We did a study on onboarding flows with 15 participants. The hardest part was honestly just breaking down all the transcripts afterward. I spent two full days just on that.
P3: The thing that really gets me is when I find a theme, like "users feel overwhelmed by options," and then I have to go back through everything to find all the quotes that support it. It's like a treasure hunt except not fun at all.
Interviewer: How do you handle that currently?
P3: I use Ctrl+F a lot. But that only works if I can guess the exact words they used. Sometimes people describe the same frustration in completely different ways.
P3: I tried using Miro for affinity mapping but it was just digital copy-paste. Still took forever. And the search is just keyword matching.
"We did a study on onboarding flows with 15 participants. The hardest part was honestly just breaking down all the transcripts afterward. I spent two full days just on that."
"The thing that really gets me is when I find a theme... and then I have to go back through everything to find all the quotes that support it. It's like a treasure hunt except not fun at all."
"I use Ctrl+F a lot. But that only works if I can guess the exact words they used. Sometimes people describe the same frustration in completely different ways."
"I tried using Miro for affinity mapping but it was just digital copy-paste. Still took forever. And the search is just keyword matching."
The chunk review interface: source document (left) with highlighted excerpts, proposed atomic notes (right) with edit, merge, and delete controls. Researchers review before committing to the canvas.
Feature 2 — Human-in-the-Loop Thematic Analysis
The pain: Researchers identify themes then spend hours re-reading every data point to classify notes—a process prone to fatigue and missed evidence.
The solution: Ask the AI to "group by themes." It proposes themes with evidence. The researcher reviews, edits, adds, or removes themes. Then with one click, the AI classifies every note into the approved themes—with transparent reasoning for each classification.
Theme review dialog: AI proposes themes with evidence citations. Researchers can edit names, descriptions, add new themes, or remove irrelevant ones before the AI classifies all notes.
Feature 3 — RAG-Powered Semantic Search
The pain: Ctrl+F fails when participants express the same idea using different words. Researchers miss evidence because keyword matching can't capture semantic similarity.
The solution: Every note is automatically embedded as a vector. Researchers type natural language queries like "frustrations with the checkout flow" and the system finds semantically similar notes—even if they never used the word "frustration" or "checkout."
The canvas shows clustered sticky notes. When a researcher searches "difficulty retrieving evidence," the RAG system finds semantically matching notes (highlighted with golden borders and match scores) while dimming non-matches. The chat panel provides context about why each note matched.
Technical Architecture
AI Document Chunking
LLM splits transcripts into verbatim, one-idea-per-note excerpts with automatic participant detection. Chunks are character-position-mapped to the source text, enabling in-place highlighting. Researchers review in a split-view interface before approving.
Vector Embeddings (RAG)
Every note is embedded using OpenAI or Gemini embedding models. Cosine similarity search runs client-side in ~10ms for 1000+ notes. Two-stage retrieval (vector search → LLM reranking) ensures high precision with per-note reasoning.
Thematic Classification
Human-in-the-loop two-step process: AI proposes themes with evidence → researcher edits → AI classifies every note with transparent reasoning. Multi-pass classification handles large datasets with unclustered-note retry.
Research Assistant
Conversational AI with native tool-calling (OpenAI, Claude, Gemini). Routes between find_notes, group_notes, answer_question, and tag_notes tools. RAG-augmented for all analytical queries. Quick-action buttons for common workflows.
Evaluation Results
I conducted a comparative evaluation with 8 of the original 10 researchers, measuring task completion time, accuracy, and satisfaction across the two core pain-point tasks.
Evaluation: Before vs. After
To validate the solution, I ran a within-subjects study where 8 researchers each completed two comparable analysis tasks—one using their existing workflow (Miro + spreadsheets + Ctrl+F) and one using Cursor Research. Tasks were counterbalanced to control for order effects.
Task Completion Time: Atomization (minutes)
Time to Find All Evidence for a Theme (minutes)
Post-Task Survey: Confidence in Theme Evidence Coverage (1–7 Likert)
Every participant reported higher confidence in evidence coverage using Cursor Research compared to their manual workflow. Average: 3.5 → 5.9 on a 7-point scale (p < 0.01). n=8.
Researcher Feedback
"It Respects My Expertise"
Researchers valued the human-in-the-loop approach: "The AI does the grunt work but I make every decision. I can edit themes, delete chunks, merge notes. It accelerates me without replacing my judgment." — P1
"I Trust the Coverage"
The theme coverage confidence boost was dramatic: "Before, I'd stop when I was tired. Now I know the system has classified every single note. If something is missing, I can see it's unclassified." — P3
"Search by Meaning, Not Words"
Semantic search was the most-loved feature: "I typed 'participants feeling lost' and it found a note where someone said 'I had no idea where I was in the flow.' Ctrl+F would never find that." — P7
Conclusion
Cursor Research demonstrates that the most impactful AI tools don't replace expert judgment—they eliminate the drudgery that prevents experts from applying their judgment where it matters most.
The key insight from this project: researchers don't hate analysis—they hate the mechanical labor that precedes it. Atomizing transcripts and retrieving evidence are tasks that demand attention but not expertise. By automating the mechanical work and keeping researchers in control of the interpretive work, Cursor Research transforms qualitative analysis from an endurance test into a focused, confident practice.
Broader implications: The human-in-the-loop pattern demonstrated here—AI proposes, human reviews, AI executes at scale—applies far beyond qualitative research. It's a design pattern for any domain where AI can handle volume but humans must ensure quality. The two-stage RAG pipeline (fast vector retrieval → precise LLM reranking) offers a template for building search interfaces over unstructured qualitative data.
Cursor Research is open source. View the full codebase on GitHub →
Skills & Methods Demonstrated
Semi-Structured Interviews, Surveys (Likert), Within-Subjects Study, SUS, Thematic Analysis, Affinity Mapping
Next.js 16, React 19, TypeScript, Tailwind CSS, Zustand, LLM Integration (OpenAI, Claude, Gemini), RAG Architecture, Vector Embeddings
Human-in-the-Loop Design, Co-Design with Domain Experts, Mixed-Methods Evaluation, UX Research Tool Design