Two opposite designs for AI meeting notes: transcribe everything vs enhance what you typed

A developer compared two AI meeting note-taking tools, Otter and Granola, and found they represent opposite design philosophies: Otter transcribes everything and then summarizes, while Granola enhances user-typed notes with audio context. The key insight is that Granola's approach often produces better notes because it incorporates the user's relevance signal, but it fails if the user takes no notes. The developer argues that capturing human judgment through typed bullets is more impactful than upgrading the AI model.

I ran the same meeting through two AI notetakers, Otter and Granola, expecting to compare accuracy. The accuracy was close. What actually separated them was something more interesting: they aren't built to do the same job. They sit on opposite models of what a meeting "note" even is , and once you see the two designs, the "which is better" question dissolves into "which model fits your workflow." Strip away the branding and you get two functions. This is conceptual — I'm describing the designs , not either company's internal code: Model A — transcribe everything, then summarize Otter notes = summarize transcribe audio Model B — enhance what you typed Granola notes = enhance user bullets, transcribe audio Model A captures the whole call verbatim, labels who spoke, and then runs an extractive summary over that complete record. The transcript is the primary artifact; the summary is derived from it. You did nothing during the meeting; the tool recorded everything. Model B inverts the inputs. You jot a few rough bullets during the call, and the tool merges your notes with what it heard — using your fragments as a seed and filling in the specifics from the audio. The finished summary is the primary artifact; the raw transcript is secondary Granola even deletes the audio once it has the text . The difference isn't cosmetic. The two models take different inputs , so they produce structurally different outputs and fail in different ways. I fed both an 80-second, two-speaker product meeting synthetic voices, so I knew the exact right answer and, for Granola, typed six sloppy bullets while it listened — fragments like "get API latency under 200ms, file P1." Same meeting, opposite artifacts: one a faithful transcript, the other a send-ready recap. Here's the part worth internalizing if you build with LLMs at all. Model B tends to produce more useful notes not because of a better model, but because it has a better input: your relevance signal. When you type "file P1," you've told the system this mattered — a piece of human judgment an extractive summarizer over a raw transcript simply doesn't have. The augmentation model gets to condition on what a participant already decided was important. That also predicts its failure mode exactly: type nothing and Model B collapses toward Model A. Granola's whole edge assumes you take notes; give it no bullets and you lose the augmentation that makes it special. Model A has no such dependency — it records everything whether you participate or not, which is precisely why it's the safer default for a meeting you can't also take notes in. So the trade is structural, not a quality ranking: Audio capture — whether a bot joins the call or it records your system audio — is a separate axis from note generation ; I'm only talking about the latter here. For choosing a tool, the framing that actually helps isn't "which is more accurate." Both are fine on clean audio. It's "do I want a record or a recap?" A record you can search and quote → the transcribe-everything model. A finished write-up you'll forward without editing → the enhance-your-notes model. I put the full head-to-head — pricing, privacy postures, and which one wins for which kind of meeting — in this tested comparison of Otter and Granola https://aialleyway.com/otter-vs-granola/ , but the model question is the one to settle first, because it's upstream of every feature. And the lesson for anyone building AI summarization: the cheapest way to make a summary feel smarter is often to capture the user's relevance signal, not to upgrade the model. A few human-typed bullets as a conditioning input beat a larger model summarizing cold, because the human already solved the hardest part — deciding what mattered. Design for that input and your "dumb" summarizer punches well above its weight. If you've built a summarizer that captures relevance signal some other way — reactions, highlights, edits-as-feedback — I'd like to hear how in the comments.