{"slug": "an-offline-meeting-transcriber", "title": "An Offline Meeting Transcriber", "summary": "A developer used Claude Code to build an offline meeting transcriber that processes audio into Granola-style markdown notes entirely on a MacBook, ensuring sensitive meeting recordings never leave the device. The tool runs three local stages — transcription via mlx-whisper, speaker diarization via pyannote, and summarization via Ollama — and was migrated to the Swamp platform for composability after the initial bash version proved difficult to reuse. The project highlights how AI-assisted coding can rapidly produce functional tools while underscoring the importance of modular architecture for long-term maintainability.", "body_md": "I take \"handwritten\" meeting notes in Obsidian when I can. Some of these meetings I'd want transcribed as well, but they are high trust, private, and sensitive — the exact kinds of meetings you don't invite strangers to. So the notes and recordings shouldn't leave my laptop either to be processed or stored by strangers.\n\nOn a whim, I had Claude Code build `offline-meeting-transcriber`\n\n: audio in, Granola-style markdown out, nothing leaves the MacBook. To be clear, I didn't write any of this by hand — Claude Code did the bash, the Python, and later the swamp models, while I steered, reviewed diffs, and made the design calls. The first version was a bash wrapper around four Python files. It worked. It was also a dead end the moment I wanted to use any piece of it for something else.\n\nI had it migrated to [swamp](https://swamp.club) for the same reason I migrated [ADW](https://matgreten.dev/posts/visibility-into-the-black-box/) — but this time the lever was composability, not observability. The pieces wanted to be reused. Bash had them welded together.\n\n## What it Does\n\n`bin/meeting-process samples/standup.m4a \"Engineering Standup\"`\n\nruns three stages locally:\n\ntranscribes the audio to segment-level JSON.`mlx-whisper`\n\n`--no-speech-threshold 0.6`\n\nis in there because without it Whisper hallucinates \"Thanks for watching!\" on every silent stretch.tags each Whisper segment with the speaker whose timeline overlaps it the most. Segments reach the LLM as`pyannote/speaker-diarization-3.1`\n\n`[SPEAKER_00]: …`\n\n,`[SPEAKER_01]: …`\n\n. Label stability inside a meeting matters more than getting names right — renaming`SPEAKER_00`\n\n→`Alice`\n\nis a one-line`sed`\n\nafter the fact.on local Ollama summarizes the transcript into a Granola-style note. Chunked at ~2500 tokens with 200-token overlap. The token estimator is`qwen3.6:35b-a3b-nvfp4`\n\n`len(text.split()) * 1.3`\n\n— no tiktoken, no extra dep, accurate enough for chunk boundaries.\n\nFinal markdown lands in `~/Obsidian/Meetings/Unsorted/`\n\nfor now.\n\n## Why Swamp\n\nThree things in the bash version were obviously reusable and obviously stuck:\n\n**Transcription.**`mlx_whisper`\n\ndoesn't care that it's transcribing a meeting. It would transcribe a podcast, an interview, a voice memo I left myself in the car. The bash script only knew about meetings.**Diarization.** Same model, same merge logic, same`[SPEAKER_xx]: …`\n\noutput. Useful anywhere I have multi-speaker audio.**Summarization with a Granola-style prompt.** This one is meeting-shaped, but the*chunking + merge*infrastructure underneath it isn't. Different prompt, different downstream consumer, same machinery.\n\nIn bash, none of those were components. They were lines in a single script, with implicit data passing through filenames in `out/`\n\n, and the only way to \"reuse\" any of them was to copy-paste the script and edit it. That's the path I've been on for years and the path I keep regretting.\n\nSo I had the pipeline broken out into three swamp models under `models/@mgreten/`\n\n, all published on swamp.club:\n\n— wraps the binary, exposes`@mgreten/mlx-whisper`\n\n`transcribe(audioPath)`\n\n, output is a typed transcript artifact.— takes the audio and the transcript artifact, returns a diarized artifact. Soft-fails when the HF token is missing; the workflow continues on the undiarized transcript. A bad diarization never blocks the note.`@mgreten/pyannote-diarizer`\n\n— takes the transcript artifact and a model tag, returns markdown plus a separate`@mgreten/meeting-summarizer`\n\n`write_note`\n\nmethod that lands the file in the vault. The`combine_notes`\n\nmethod (handwritten + analysis merge) lives here too.\n\nEach one has a typed input, a typed output, and exactly one job. The output of each step is a data artifact the next step pulls by name, not a file path I have to remember to clean up:\n\n```\n- name: summarize\n  steps:\n    - name: run-summarize\n      task:\n        type: model_method\n        modelIdOrName: meeting-summarizer\n        methodName: summarize\n        inputs:\n          transcriptJson: ${{ data.latest(\"pyannote-diarizer\", inputs.noteName).attributes.transcriptJson }}\n          instanceName: ${{ inputs.noteName }}\n  dependsOn:\n    - job: diarize\n      condition:\n        type: succeeded\n```\n\nThe workflow YAML is one way of wiring those three models. It's not the only way. That's the whole point.\n\n## What Composability Actually Buys Me\n\nHermes is the obvious next caller. When I want my agent to be able to transcribe a recording I just dropped into it, I don't expose `bin/meeting-process`\n\nto it as a shell command. I expose the model. Same typed inputs, same typed outputs, no shell quoting, no parsing stdout. The bash wrapper still exists for me at the terminal — it shells out to swamp now and gives me a progress counter — but it's no longer the only entry point.\n\nThe watch-folder daemon is the next one after that. v2 territory, not built. But when I build it, it's calling the workflow, not re-implementing it.\n\nThis is the part bash couldn't give me. Not \"the pipeline is observable.\" The pipeline is *separable*. The day I want to reuse the diarizer in something that has nothing to do with meetings, I'm not copy-pasting anything.\n\n## The `--notes`\n\nFlag is the Real Design Insight\n\nThe agents almost shipped a tool that overwrote my handwritten meeting notes.\n\nThe first version they built generated *the* meeting note. Beautiful, clean, full of action items. The problem: I take \"handwritten\" notes during meetings, and those notes have the context the audio doesn't — the side conversation, the link I scribbled, the thing I almost said. The generated note has different value. It's complete where mine is partial, but it's also confidently wrong in places my human note isn't. That's the kind of call no agent was going to make for me; I had to catch it in review and redirect.\n\nSo `--notes`\n\nwas added. When you pass it an existing note, the pipeline:\n\n- Leaves the handwritten note untouched.\n- Writes the LLM summary as\n`<base name> Analysis.md`\n\n. - Writes a\n`<base name> - Combined Notes.md`\n\nthat links to the original, embeds the handwritten content, nests the analysis underneath, and leaves placeholder headers for personal observations and next steps.\n\nThree files instead of one. The human note stays canonical. The analysis is a sibling, not a replacement. This is also a composability story — the combine step is its own model, callable on any analysis + existing-note pair, not a special case buried in the meeting pipeline.\n\n## What I Got Wrong\n\nTwo things are in the friction log because they should be.\n\n**The v1.5 LOC budget overshot.** The plan capped v1 at 250 lines. v1.5 is 383 across `bin/ + src/ + prompts/`\n\n. Most of the overage is wiring around `pyannote`\n\n— the gated HF model, the soft-fail path, the `--turns-json`\n\ntest escape hatch that lets me feed synthetic turns when I don't want to download an 80 MB checkpoint just to run a unit test. None of that complexity is in the chunker or the summarizer. It's the price of treating pyannote as a real component instead of a shell call. Worth it, but I should stop being surprised by it.\n\n**I shipped v1.5 without exercising pyannote end-to-end on a real meeting.** The HF token wasn't provisioned. The merge logic (Whisper segment ↔ pyannote turn, max-overlap assignment) was verified against a synthetic 2-speaker JSON. There's a 22-second 2-speaker fixture on disk (`samples/test-2speakers.m4a`\n\n, alternating `say -v Alex`\n\nand `say -v Samantha`\n\n, ffmpeg-concat) waiting for the first token-equipped run. The friction log entry is there partly so I don't forget that the test passed and the system is still unproven.\n\n## What I'm Building Next\n\n**Hermes as a caller.** Wire the agent to invoke the workflow directly instead of shelling out. First real test of whether the composability story holds outside my own terminal.**Watch folder.** Drop an`.m4a`\n\nin a directory, the workflow picks it up, processes it, writes the note. v2 territory.**Real multi-person meeting validation.** The 4-person diarization test from the original plan is still owed. I won't trust the speaker labels until I've seen them hold on a 60-minute meeting with people I know. Testing this right now actually.\n\nThe offline-by-default constraint forced the components to be the unit. There was no cloud API to lean on, and once I'd built local pieces for transcribe, diarize, and summarize, swamp let me keep them separate without paying the integration tax every time I wired them together. The same three models will power the watch folder, the agent caller, and whatever else I haven't thought of yet. That's the part I couldn't get from a script.", "url": "https://wpnews.pro/news/an-offline-meeting-transcriber", "canonical_source": "https://matgreten.dev/posts/offline-meeting-transcriber/", "published_at": "2026-05-29 03:32:12+00:00", "updated_at": "2026-05-29 03:45:34.941117+00:00", "lang": "en", "topics": ["ai-tools", "ai-products", "machine-learning", "natural-language-processing"], "entities": ["Claude Code", "Obsidian", "Granola", "swamp", "mlx-whisper", "pyannote"], "alternates": {"html": "https://wpnews.pro/news/an-offline-meeting-transcriber", "markdown": "https://wpnews.pro/news/an-offline-meeting-transcriber.md", "text": "https://wpnews.pro/news/an-offline-meeting-transcriber.txt", "jsonld": "https://wpnews.pro/news/an-offline-meeting-transcriber.jsonld"}}