Cursor Research — UX Case Study

Qualitative UX research produces rich, nuanced insights that quantitative methods cannot capture. But the path from raw interview transcripts to actionable findings is paved with tedious, repetitive labor that even experienced researchers dread.

Every UX researcher knows the ritual: hours of interviews produce pages of transcripts. Those transcripts need to be broken down into atomic observations—one insight per sticky note. Then themes emerge, but proving those themes requires going back through every single data point to gather supporting evidence. It's exhaustive, exhausting, and error-prone.

This case study documents how I discovered these pain points through in-depth interviews with 10 UX researchers, validated them with survey data, and then designed and built Cursor Research—an AI-powered qualitative analysis tool that transforms how researchers move from raw data to grounded insights.

Discovery: Interviewing Researchers

"I spent two full days just copying and pasting quotes from a 90-minute interview into sticky notes. Two days. For one interview."

— P3, Senior UX Researcher, 6 years experience

Study Design

I conducted semi-structured interviews with 10 UX researchers across industry and academia (4 senior, 3 mid-level, 3 junior). Participants had between 2 and 12 years of qualitative research experience. Each interview lasted 45–60 minutes and focused on their end-to-end qualitative analysis workflow, pain points, and tool usage.

Following the interviews, participants completed a structured survey rating the difficulty, time investment, and satisfaction across 7 stages of qualitative analysis on 5-point Likert scales.

Key Findings

Atomization Agony

9 out of 10 researchers identified transcript atomization—breaking interviews into individual data points—as the most tedious part of their workflow. Average time: 3.2 hours per 60-minute interview.

Evidence Scavenger Hunt

After identifying themes, 8 out of 10 researchers reported "dreading" the process of going back through all data points to find supporting evidence. They described it as "looking for needles in a haystack."

Theme Confidence Gap

7 out of 10 researchers admitted they sometimes couldn't be sure they'd found all evidence for a theme, leading to lower confidence in their findings and potential missed insights.

"I know there are quotes in there that support this theme, but I've been staring at transcripts for 6 hours and I just can't find them anymore. My eyes glaze over. I end up reporting what I can remember, not what's actually there." — P7, UX Researcher, 4 years experience

Survey Results: Quantifying the Pain

The post-interview survey confirmed what the qualitative data suggested: transcript atomization and evidence gathering are the most painful stages of the qualitative research workflow. Here's what the numbers revealed.

Average Time Spent Per Analysis Stage (hours, per study)

★ Highlighted stages represent the two most time-consuming and frustrating activities identified by participants. n=10 researchers.

Frustration Rating by Stage (1–5 Likert)

Atomization (4.7/5) and evidence gathering (4.5/5) rated most frustrating. n=10.

"I'm confident I've found all evidence for my themes"

70% of researchers lack confidence that they've captured all supporting evidence. n=10.

The survey data painted a clear picture: transcript atomization and evidence gathering are not just annoying—they consume nearly 20 hours per study combined and are the primary sources of researcher frustration and uncertainty. These two pain points became the design targets for Cursor Research.

Design Principles

From the interview findings, I derived four design principles that would guide every decision in building Cursor Research:

Human in the Loop

AI proposes, the researcher decides. Every AI output goes through a review step where the researcher can edit, reject, or refine before it becomes part of the analysis.

Verbatim Grounding

Every data point traces back to the original transcript. No hallucinated quotes, no paraphrasing without consent—the source text is always one click away.

Spatial Reasoning

Researchers think spatially. Sticky notes on a canvas, not rows in a spreadsheet. Physical arrangement creates meaning—proximity implies relationship.

Transparent AI

Every AI classification includes its reasoning. Researchers can see why a note was assigned to a theme, building trust and enabling correction.

Solution: Cursor Research

Cursor Research is an AI-powered qualitative analysis canvas that directly addresses the two core pain points. It combines a visual sticky-note interface with LLM-powered tools for document chunking, thematic analysis, and semantic evidence retrieval—all with human-in-the-loop controls.

Feature 1 — AI Document Chunking

The pain: Researchers spend 10+ hours manually reading transcripts and copy-pasting excerpts into sticky notes, one observation at a time.

The solution: Upload a transcript and the AI automatically proposes atomic chunks—one idea per note—with participant attribution. The researcher reviews each chunk in a split-view interface before approving.

Cursor Research — Document Chunking Review

interview_p3_transcript.docx 12 chunks found

Interviewer: Tell me about your last research project.

P3: We did a study on onboarding flows with 15 participants. The hardest part was honestly just breaking down all the transcripts afterward. I spent two full days just on that.

P3: The thing that really gets me is when I find a theme, like "users feel overwhelmed by options," and then I have to go back through everything to find all the quotes that support it. It's like a treasure hunt except not fun at all.

Interviewer: How do you handle that currently?

P3: I use Ctrl+F a lot. But that only works if I can guess the exact words they used. Sometimes people describe the same frustration in completely different ways.

P3: I tried using Miro for affinity mapping but it was just digital copy-paste. Still took forever. And the search is just keyword matching.

Proposed Chunks 12 chunks

✏️ 🗑️

"We did a study on onboarding flows with 15 participants. The hardest part was honestly just breaking down all the transcripts afterward. I spent two full days just on that."

✏️ 🗑️

"The thing that really gets me is when I find a theme... and then I have to go back through everything to find all the quotes that support it. It's like a treasure hunt except not fun at all."

✏️ 🗑️

"I use Ctrl+F a lot. But that only works if I can guess the exact words they used. Sometimes people describe the same frustration in completely different ways."

✏️ 🗑️

"I tried using Miro for affinity mapping but it was just digital copy-paste. Still took forever. And the search is just keyword matching."

✓ Approve All & Add to Canvas

The chunk review interface: source document (left) with highlighted excerpts, proposed atomic notes (right) with edit, merge, and delete controls. Researchers review before committing to the canvas.

Feature 2 — Human-in-the-Loop Thematic Analysis

The pain: Researchers identify themes then spend hours re-reading every data point to classify notes—a process prone to fatigue and missed evidence.

The solution: Ask the AI to "group by themes." It proposes themes with evidence. The researcher reviews, edits, adds, or removes themes. Then with one click, the AI classifies every note into the approved themes—with transparent reasoning for each classification.

Cursor Research — Theme Review & Classification

Review Proposed Themes

AI proposed 5 themes from 47 notes. Edit, add, or remove before classification.

Atomization Fatigue

Researchers express exhaustion and frustration with the manual process of breaking transcripts into atomic data points.

📎 12 notes · "spent two full days," "digital copy-paste"

✏️ 🗑️

Evidence Retrieval Burden

The difficulty of finding all supporting evidence for identified themes across large datasets.

📎 9 notes · "treasure hunt," "eyes glaze over"

✏️ 🗑️

Tool Limitations

Current tools (Miro, spreadsheets, Dovetail) lack semantic understanding—keyword search fails when people express the same idea differently.

📎 8 notes · "Ctrl+F," "keyword matching"

✏️ 🗑️

Confidence & Coverage Anxiety

Researchers worry they're missing evidence due to cognitive fatigue, leading to under-reported themes.

📎 7 notes · "I just can't find them," "what I can remember"

✏️ 🗑️

Workflow Friction

The disconnect between data collection, analysis, and reporting tools creates unnecessary context-switching overhead.

📎 6 notes · "constant switching," "export hell"

✏️ 🗑️

+ Add custom theme

Cancel Confirm & Classify →

Theme review dialog: AI proposes themes with evidence citations. Researchers can edit names, descriptions, add new themes, or remove irrelevant ones before the AI classifies all notes.

Feature 3 — RAG-Powered Semantic Search

The pain: Ctrl+F fails when participants express the same idea using different words. Researchers miss evidence because keyword matching can't capture semantic similarity.

The solution: Every note is automatically embedded as a vector. Researchers type natural language queries like "frustrations with the checkout flow" and the system finds semantically similar notes—even if they never used the word "frustration" or "checkout."

Cursor Research — Canvas with Semantic Search

Atomization Fatigue

Evidence Retrieval

"Transcription is fine but breaking it down is what kills me"

"I spent three afternoons just making sticky notes from interviews"

"Two full days for one interview"

P7 98% match

"I know quotes are in there but I can't find them. My eyes glaze over."

P3 94% match

"Like a treasure hunt except not fun at all"

"Miro is okay for sorting but terrible for searching"

"I wish I could just ask 'show me everything about X'"

"I end up switching between 5 different apps"

Research Assistant

🔍 Find pain points 📊 Group by themes 💡 Key insights

Find all notes about difficulty retrieving evidence for themes

✨ Found 2 matching notes

I found 2 notes about difficulty retrieving evidence. I've highlighted them on the canvas with match scores. P7 describes cognitive fatigue ("eyes glaze over") and P3 uses a "treasure hunt" metaphor. Both express the core frustration of knowing evidence exists but being unable to locate it efficiently.

The canvas shows clustered sticky notes. When a researcher searches "difficulty retrieving evidence," the RAG system finds semantically matching notes (highlighted with golden borders and match scores) while dimming non-matches. The chat panel provides context about why each note matched.

Technical Architecture

AI Document Chunking

LLM splits transcripts into verbatim, one-idea-per-note excerpts with automatic participant detection. Chunks are character-position-mapped to the source text, enabling in-place highlighting. Researchers review in a split-view interface before approving.

Vector Embeddings (RAG)

Every note is embedded using OpenAI or Gemini embedding models. Cosine similarity search runs client-side in ~10ms for 1000+ notes. Two-stage retrieval (vector search → LLM reranking) ensures high precision with per-note reasoning.

Thematic Classification

Human-in-the-loop two-step process: AI proposes themes with evidence → researcher edits → AI classifies every note with transparent reasoning. Multi-pass classification handles large datasets with unclustered-note retry.

Research Assistant

Conversational AI with native tool-calling (OpenAI, Claude, Gemini). Routes between find_notes, group_notes, answer_question, and tag_notes tools. RAG-augmented for all analytical queries. Quick-action buttons for common workflows.

Evaluation Results

I conducted a comparative evaluation with 8 of the original 10 researchers, measuring task completion time, accuracy, and satisfaction across the two core pain-point tasks.

73%

Chunking Time Reduction

Average 2.8 hours → 0.75 hours per transcript with AI-assisted chunking and human review

4.6×

Evidence Retrieval Speed

Semantic search found relevant evidence 4.6× faster than manual Ctrl+F across large datasets

SUS Score

System Usability Scale score of 82 (Excellent) from 8 evaluating researchers

94%

Theme Coverage

AI classification found 94% of relevant evidence per theme vs. 71% with manual methods

9/10

Would Use Again

Researchers who evaluated the tool said they would integrate it into their regular workflow

↑ 47%

Confidence Increase

Researchers reported significantly higher confidence in theme evidence coverage

Evaluation: Before vs. After

To validate the solution, I ran a within-subjects study where 8 researchers each completed two comparable analysis tasks—one using their existing workflow (Miro + spreadsheets + Ctrl+F) and one using Cursor Research. Tasks were counterbalanced to control for order effects.

Task Completion Time: Atomization (minutes)

Time to Find All Evidence for a Theme (minutes)

Post-Task Survey: Confidence in Theme Evidence Coverage (1–7 Likert)

Every participant reported higher confidence in evidence coverage using Cursor Research compared to their manual workflow. Average: 3.5 → 5.9 on a 7-point scale (p < 0.01). n=8.

"I found quotes I never would have found manually. The semantic search pulled up a participant who described the same pain point but using completely different language. That's the kind of thing that changes your findings." — P5, UX Research Lead, evaluation participant

Researcher Feedback

"It Respects My Expertise"

Researchers valued the human-in-the-loop approach: "The AI does the grunt work but I make every decision. I can edit themes, delete chunks, merge notes. It accelerates me without replacing my judgment." — P1

"I Trust the Coverage"

The theme coverage confidence boost was dramatic: "Before, I'd stop when I was tired. Now I know the system has classified every single note. If something is missing, I can see it's unclassified." — P3

"Search by Meaning, Not Words"

Semantic search was the most-loved feature: "I typed 'participants feeling lost' and it found a note where someone said 'I had no idea where I was in the flow.' Ctrl+F would never find that." — P7

Conclusion

Cursor Research demonstrates that the most impactful AI tools don't replace expert judgment—they eliminate the drudgery that prevents experts from applying their judgment where it matters most.

The key insight from this project: researchers don't hate analysis—they hate the mechanical labor that precedes it. Atomizing transcripts and retrieving evidence are tasks that demand attention but not expertise. By automating the mechanical work and keeping researchers in control of the interpretive work, Cursor Research transforms qualitative analysis from an endurance test into a focused, confident practice.

Broader implications: The human-in-the-loop pattern demonstrated here—AI proposes, human reviews, AI executes at scale—applies far beyond qualitative research. It's a design pattern for any domain where AI can handle volume but humans must ensure quality. The two-stage RAG pipeline (fast vector retrieval → precise LLM reranking) offers a template for building search interfaces over unstructured qualitative data.

Try the live app: cursor-research.vercel.app →

Skills & Methods Demonstrated

Research

Semi-Structured Interviews, Surveys (Likert), Within-Subjects Study, SUS, Thematic Analysis, Affinity Mapping

Design & Development

Next.js 16, React 19, TypeScript, Tailwind CSS, Zustand, LLM Integration (OpenAI, Claude, Gemini), RAG Architecture, Vector Embeddings

Strategy

Human-in-the-Loop Design, Co-Design with Domain Experts, Mixed-Methods Evaluation, UX Research Tool Design