{"slug": "show-hn-cody-voice-control-for-neovim-using-its-own-commands-lsp", "title": "Show HN: Cody – Voice control for Neovim using its own commands, LSP", "summary": "Developer Juan Garza released Cody, an open-source Neovim plugin that enables voice control of the editor using its native commands, LSP, and plugin integrations. The tool routes spoken commands through a Node.js bridge to OpenAI's Realtime API, allowing hands-free editing without leaving the terminal. Cody is available on GitHub under the MIT license.", "body_md": "Neovim-first voice control for developers who want hands-free, low-latency command of their editor without leaving it.\n\nCody deliberately stays inside the editor: no screen overlay, no mouse pointer, no general desktop assistant. It lives inside Neovim and turns short voice or text commands into editor actions.\n\nExamples:\n\n```\n:CodyDo go to line 48\n:CodyDo go to file src/server.ts\n:CodyDo edit this line to return early when request.user is missing\n```\n\nCody should not rebuild editor primitives. It should route voice intent into the editor command surface developers already use, and install missing command providers only when that is explicitly supported by the user's setup.\n\n```\nNeovim Lua plugin          Node Realtime bridge          OpenAI Realtime\n----------------          --------------------          ---------------\n:CodyDo / voice cmds  ->   JSONL over stdio        ->    WebSocket session\neditor command adapter <-   function-call router    <-    gpt-realtime-2\nbuffer/cursor context  ->   prompt + tool schemas   ->    text/audio input\n```\n\nRather than capturing the screen and pointing at UI elements, Cody sends editor state:\n\n- current file, filetype, cursor line/column\n- current line and nearby buffer lines\n- available editor commands from native Neovim, LSP, and installed plugins\n\nThe important layer is not \"go to line\" itself. Neovim already has that. The useful layer is:\n\n- detect what the editor can already do\n- expose those capabilities to GPT Realtime as callable tools\n- install a missing provider when the user's plugin manager supports it\n- route the spoken command to the best existing command\n\nInitial command providers:\n\n- Native Neovim: line jumps, file edits, buffers, windows, quickfix\n- LSP: rename, code actions, references, definitions\n- Pickers: Telescope, fzf-lua, Snacks picker, mini.pick\n- AI/code edit plugins: CodeCompanion, Avante, Copilot Chat, or a Cody-owned Realtime edit fallback\n\nThis means there is no separate Phase 1 for proving basic editor commands. We start at the adapter.\n\nRequirements:\n\n- Neovim 0.10+\n- Node.js 20+\n`OPENAI_API_KEY`\n\nfor intelligent commands`sox`\n\nfor voice input:`brew install sox`\n\nInstall dependencies and build the local bridge:\n\n```\nnpm install\nnpm run build\n```\n\nInstall with your plugin manager. The Node bridge must be built, so use a build\nhook. With **lazy.nvim**:\n\n```\n{\n  \"juancgarza/cody\",\n  build = \"npm install && npm run build\", -- compiles the Node bridge (dist/)\n  opts = {\n    -- enable_shell = true,    -- on by default once setup() runs\n    -- enable_commands = true, -- on by default once setup() runs\n    -- tts_enabled = true, tts_voice_id = \"<elevenlabs-voice-id>\",\n  },\n  -- lazy.nvim calls require(\"cody\").setup(opts) automatically.\n}\n```\n\nThen export `OPENAI_API_KEY`\n\n(and `ELEVENLABS_API_KEY`\n\nfor TTS) in the shell you\nlaunch Neovim from, and run `:CodyStart`\n\n.\n\nWithout a plugin manager, or during development:\n\n``` js\nset runtimepath^=/path/to/cody\nruntime plugin/cody.lua\nlua require(\"cody\").setup()\n```\n\nIf you pass the runtimepath before Neovim starts, the `plugin/`\n\nfile is sourced\nautomatically:\n\n```\nnvim --cmd 'set runtimepath^=/path/to/cody'\n```\n\nFor `nvim -u NONE`\n\n, plugin loading is disabled; use the explicit `runtime plugin/cody.lua`\n\nform above.\n\nOptional quick-command routing:\n\n``` js\nrequire(\"cody\").setup({\n  quick_commands = \"fallback\", -- \"fallback\" | \"always\" | \"off\"\n\n  -- Shell command tool (lets Cody run allowlisted terminal commands via\n  -- vim.system). ON by default once setup() runs; set false to disable.\n  enable_shell = true,\n  shell_skip_confirm = true,        -- default true (no prompt); set false to confirm each command\n  -- shell_allowlist = nil,         -- list of allowed executables; nil = built-in default set\n  -- shell_timeout_ms = 15000,      -- per-command timeout (clamped 1000..120000)\n  -- shell_output_max_bytes = 8000, -- cap stdout+stderr returned to the model\n\n  -- Ex-command tool (lets Cody run :CodyTranscript, :split, and change settings\n  -- via :CodySet). ON by default once setup() runs; set false to disable.\n  enable_commands = true,\n  -- commands_confirm = false,      -- ask before each command (default off; allowlist is the guard)\n  -- commands_allowlist = nil,      -- list of allowed command names; nil = built-in default set\n\n  show_assistant_messages = true,\n  feedback_panel = true,\n  feedback_auto_open = true,\n  feedback_height = 16,\n  feedback_width = 96,\n  feedback_recent_lines = 4,\n  feedback_conversation_items = 12,\n  context_max_lines = 2000,\n  context_max_bytes = 240000,\n\n  -- Optional spoken feedback (ElevenLabs). Off unless tts_enabled = true.\n  tts_enabled = false,\n  tts_provider = \"elevenlabs\",\n  tts_voice_id = nil,          -- falls back to $ELEVENLABS_VOICE_ID\n  tts_model_id = nil,          -- falls back to $ELEVENLABS_MODEL_ID, then eleven_flash_v2_5\n  tts_speak_phases = false,    -- opt-in \"Listening.\" / \"Thinking.\"\n  tts_speak_actions = true,    -- \"Editing range.\" / \"Renaming.\"\n  tts_speak_results = true,    -- \"Done.\" / \"Failed: <reason>.\"\n  tts_speak_messages = true,   -- short final assistant replies\n  tts_message_max_chars = 160, -- skip spoken messages longer than this\n  tts_request_timeout_ms = nil, -- falls back to $CODY_TTS_REQUEST_TIMEOUT_MS, then 10000\n})\n```\n\n`fallback`\n\nis the default: typed `:CodyDo`\n\ncommands go through GPT Realtime when\n`OPENAI_API_KEY`\n\nis set, and simple local regex commands are used only when the\nkey is absent.\nAssistant messages are shown by default and truncated to fit the command line;\nset `show_assistant_messages = false`\n\nto suppress prose during voice sessions.\nCody sends the active buffer with line numbers and a cursor marker when it fits\nthe context limits above; larger buffers are cursor-centered and marked as\ntruncated with omitted-line counts.\nThe feedback panel is enabled by default and auto-opens on Cody activity. It\nshows the current phase, intent, transcript, selected tool/action, result,\nassistant message text, and a short recent event stream.\n\nThen start it:\n\n```\n:CodyStart\n```\n\nCody can optionally speak short, high-signal confirmations using\n[ElevenLabs](https://elevenlabs.io/docs/api-reference/text-to-speech/convert).\nIt is off unless you opt in, and it is deliberately terse: it never reads back\nyour command, streamed transcript, or streamed assistant text.\n\nWhat gets spoken, by category (each toggleable):\n\n`tts_speak_phases`\n\n:`Listening.`\n\n,`Thinking.`\n\n(off by default)`tts_speak_actions`\n\n: the selected tool, e.g.`Editing range.`\n\n,`Renaming.`\n\n(read-only locator/context tools stay silent)`tts_speak_results`\n\n:`Done.`\n\non success,`Failed: <reason>.`\n\non failure`tts_speak_messages`\n\n: a short final assistant reply, only when it fits`tts_message_max_chars`\n\nSpeech is cancelled immediately on a new turn, `:CodyVoiceStop`\n\n, an\ninterruption, or a failure, so stale audio never trails the current action.\n\nEnable it and provide a voice:\n\n``` js\nrequire(\"cody\").setup({\n  tts_enabled = true,\n  tts_voice_id = \"<elevenlabs-voice-id>\",\n  -- tts_model_id = \"eleven_flash_v2_5\", -- optional; this is the default\n  -- tts_request_timeout_ms = 10000,     -- optional; default is 10s\n})\nexport ELEVENLABS_API_KEY=\"...\"\n# Optional, can be set here instead of in setup():\nexport ELEVENLABS_VOICE_ID=\"<elevenlabs-voice-id>\"\nexport ELEVENLABS_MODEL_ID=\"eleven_flash_v2_5\"\nexport CODY_TTS_REQUEST_TIMEOUT_MS=\"10000\"\n```\n\nThe API key is read from the shell environment in Node and is never passed from\nLua. The voice and model fall back to `ELEVENLABS_VOICE_ID`\n\n/\n`ELEVENLABS_MODEL_ID`\n\nwhen set; otherwise Cody uses `eleven_flash_v2_5`\n\n, the\nElevenLabs low-latency model for real-time use. Cody also defaults to the\nsmaller `mp3_22050_32`\n\noutput format to reduce response payload size. Override\nthat with `ELEVENLABS_OUTPUT_FORMAT`\n\nif you prefer higher-bitrate audio.\n\nPlayback uses macOS `afplay`\n\non a temporary `mp3`\n\nfile. On other platforms, set\n`CODY_TTS_PLAYER_COMMAND`\n\nto an audio player that accepts a file path argument\n(for example `mpg123`\n\nor `ffplay`\n\n). If `ELEVENLABS_API_KEY`\n\nor the voice id is\nmissing while `tts_enabled`\n\nis true, Cody reports it once and continues without\nspoken feedback.\n\nUseful live checks:\n\n```\nnpm run tts:voices\nnpm run tts:smoke -- \"Cody spoken feedback is working. Done.\" \"<elevenlabs-voice-id>\"\n```\n\nInside Neovim, these check the same bridge process used by `:CodyVoiceSession`\n\n:\n\n```\n:CodyTtsStatus\n:CodyTtsSmoke Cody spoken feedback is working. Done.\n```\n\nIf the shell smoke test works but `:CodyTtsStatus`\n\nsays TTS is disabled or the\nAPI key is missing, restart Neovim from the shell that exports the variables, or\nrun `:CodyStop`\n\nthen `:CodyStart`\n\nafter changing `require(\"cody\").setup(...)`\n\n.\nAn ElevenLabs `402`\n\nresponse means the request reached ElevenLabs but failed due\nto billing, quota, or plan/voice access.\n\nCody can run terminal commands from inside Neovim via `vim.system`\n\n, exposed to\nGPT as the `editor_run_command`\n\ntool. It is **on by default once setup() runs**\n(set\n\n`enable_shell = false`\n\nto disable; a bare plugin load with no `setup()`\n\nstays\noff). It is gated several ways:- the bridge only advertises the tool when\n`enable_shell`\n\nis on (which sets`CODY_ENABLE_SHELL=1`\n\nfor the Node bridge); - the Lua handler refuses when\n`enable_shell = false`\n\n, even if the tool is somehow advertised (the bridge env is captured at start, so the two layers can briefly disagree until a restart); - every command is checked against an allowlist of executables. The per-command\n`vim.fn.confirm`\n\nprompt is**off by default**(`shell_skip_confirm = true`\n\n); set`shell_skip_confirm = false`\n\nto be asked before every command.\n\nCommands run **without a shell** (argv only), so pipes, globs, redirection, and\n`; & |`\n\nare rejected — pass an argv array like `[\"git\", \"status\", \"--short\"]`\n\nfor\nanything with spaces in arguments. Output (stdout+stderr) is capped before being\nsent to the model, and execution is asynchronous, so a slow command never freezes\nthe editor; it is killed at `shell_timeout_ms`\n\n.\n\nThe allowlist binds the executable name only. Some allowed tools are general\ninterpreters or build drivers (`node -e`\n\n, `python -c`\n\n, `make`\n\n, `npm run`\n\n,\n`cargo`\n\n) that can run arbitrary code, so treat the allowlist as a convenience\nfilter, not a sandbox — the per-command confirmation is the real authorization\nboundary. Set `shell_skip_confirm = true`\n\nonly when you trust the session.\n\n``` js\nrequire(\"cody\").setup({\n  enable_shell = true,\n  -- shell_skip_confirm = true,                            -- skip the per-command prompt (use with care)\n  -- shell_allowlist = { \"npm\", \"git\", \"make\", \"cargo\" },  -- replaces the built-in default set\n})\n```\n\nThen ask, for example, `:CodyDo run the tests`\n\nor say \"git status\". Changing\n`enable_shell`\n\nrequires restarting the bridge (`:CodyStop`\n\nthen `:CodyStart`\n\n).\n\nCody can also run **Ex commands** (the kind you type after `:`\n\n) as the\n`editor_command`\n\ntool — so voice/text like \"open the transcript\" or \"split the\nwindow\" maps to `:CodyTranscript`\n\n/ `:split`\n\n. Like the shell tool it is **on by\ndefault once setup() runs** (set\n\n`enable_commands = false`\n\nto disable), and only\n**allowlisted** command names run;\n\n`:!`\n\n, `:lua`\n\n, the `!`\n\nvariant, and `|`\n\nchaining\nare rejected.\n\n``` js\nrequire(\"cody\").setup({\n  enable_commands = true,\n  -- commands_confirm = true,                      -- ask before each command (default off; allowlist is the guard)\n  -- commands_allowlist = { \"CodyTranscript\", \"split\", \"MyCmd\" }, -- replaces the built-in set\n})\n```\n\nThe default allowlist covers safe Cody/display/navigation commands plus the\nbuilt-in **netrw file explorer** (`CodyTranscript`\n\n, `CodyFeedbackOpen`\n\n,\n`CodyCapabilities`\n\n, `split`\n\n, `vsplit`\n\n, `only`\n\n, `close`\n\n, `wincmd`\n\n, `nohlsearch`\n\n,\n`redraw`\n\n, `Explore`\n\n, `Lexplore`\n\n, `Sexplore`\n\n, `Vexplore`\n\n, …). File-*writing*/buffer-loading\ncommands (`write`\n\n, `update`\n\n, `edit`\n\n, `tabnew`\n\n, …) are intentionally **excluded** —\nwith a path argument they write or load arbitrary files — so add them via\n`commands_allowlist`\n\nonly if you want that (ideally with `commands_confirm = true`\n\n).\nThen say things like \"open the transcript\", \"open the file tree\", or\n`:CodyDo show the feedback panel`\n\n.\n\nTo change a setting by voice, Cody runs `:CodySet <key> <value>`\n\n(also usable\ndirectly):\n\n```\n:CodySet feedback_height 30\n:CodySet show_assistant_messages false\n```\n\n`:CodySet`\n\nonly changes **live-applicable display keys** (`feedback_height`\n\n,\n`feedback_width`\n\n, `feedback_recent_lines`\n\n, `feedback_conversation_items`\n\n,\n`context_max_lines`\n\n, `context_max_bytes`\n\n, `show_assistant_messages`\n\n) which take\neffect immediately. Env-derived flags (`enable_shell`\n\n, `enable_commands`\n\n,\n`tts_*`\n\n) and the confirm-guard toggles (`commands_confirm`\n\n, `shell_skip_confirm`\n\n)\nare **not** runtime-settable — set them in `setup()`\n\n(and restart the bridge for\nthe env-derived ones).\n\n```\n:CodyStart\n:CodyStop\n:CodyDo go to line 48\n:CodyDo go to file lua/cody/init.lua\n:CodyDo edit this line to handle nil paths\n:CodyCapabilities\n:CodyCapabilities json\n:CodyFeedback\n:CodyFeedbackOpen\n:CodyFeedbackClose\n:CodyFeedbackClear\n:CodyTranscript\n:CodySet feedback_height 30\n:CodyInstall\n:CodyInstall telescope.nvim\n:CodyInstall json\n:CodyStartTsLsp\n:CodyVoiceStart\n:CodyVoiceSession\n:CodyVoicePress\n:CodyVoiceRelease\n:CodyVoiceStop\n:CodyTtsStatus\n:CodyTtsSmoke\n```\n\n`:CodyInstall`\n\nexplains missing installable providers and renders lazy.nvim\nspecs when lazy.nvim is detected. `:CodyInstall <provider>`\n\nasks for explicit\nconfirmation, then copies the suggested spec to a register; it does not edit\nplugin configuration or install anything silently.\n\nFor local TypeScript/JavaScript testing without your own LSP config, Cody includes\nan explicit helper that starts Neovim's built-in LSP client against the repo-local\n`typescript-language-server`\n\n:\n\n```\n:e src/realtime-session.ts\n:CodyStartTsLsp\n:CodyCapabilities\n```\n\n`CodyStartTsLsp`\n\nis opt-in and only attaches to the current JS/TS buffer. Use\n`:CodyStartTsLsp!`\n\nto force it for an unusual filetype.\n\nFeedback panel controls:\n\n```\n:CodyFeedback       \" toggle\n:CodyFeedbackOpen\n:CodyFeedbackClose\n:CodyFeedbackClear\n:CodyTranscript     \" full conversation in a scrollable window (q to close)\n```\n\nThe feedback panel is a compact, non-focusable HUD: it shows only the most recent\nlines that fit `feedback_height`\n\nand redraws on every event, so you cannot scroll\nit. To read or scroll a long (or streamed) assistant message, open\n`:CodyTranscript`\n\n— a focusable, wrapping window with the full conversation\n(`q`\n\nto close, normal motions / `<C-d>`\n\n/`<C-u>`\n\nto scroll). To make the inline\npanel itself taller, raise `feedback_height`\n\n(and optionally lower\n`feedback_recent_lines`\n\nto give the conversation more room):\n\n``` js\nrequire(\"cody\").setup({\n  feedback_height = 30,      -- default 16; capped to the editor height\n  feedback_recent_lines = 2, -- default 4; fewer event lines = more message room\n})\n```\n\nSuggested push-to-talk mapping:\n\n```\nvim.keymap.set(\"n\", \"<leader>vs\", \"<cmd>CodyVoiceStart<cr>\")\nvim.keymap.set(\"n\", \"<leader>vl\", \"<cmd>CodyVoiceSession<cr>\")\nvim.keymap.set(\"n\", \"<leader>vp\", \"<cmd>CodyVoicePress<cr>\")\nvim.keymap.set(\"n\", \"<leader>vr\", \"<cmd>CodyVoiceRelease<cr>\")\nvim.keymap.set(\"n\", \"<leader>ve\", \"<cmd>CodyVoiceStop<cr>\") -- cancel/stop fallback\nvim.keymap.set(\"n\", \"<leader>cd\", \":CodyDo \")\n```\n\nVoice uses Realtime server-side VAD. Normal flow is `:CodyVoiceStart`\n\n, speak a\nshort command, then stop speaking; Cody stops recording and submits the turn when\nthe server reports speech has ended. For explicit turn boundaries, bind\n`:CodyVoicePress`\n\nto key down and `:CodyVoiceRelease`\n\nto key up where your\nkeymap layer supports that shape. `:CodyVoiceStop`\n\ncancels the current recorder,\nmodel response, and queued tool results.\n\nFor a persistent microphone session:\n\n```\n:CodyVoiceSession\n\" speak commands one at a time\n\" say: stop listening\n```\n\n`CodyVoiceSession`\n\nkeeps the recorder open across VAD turns. When you say \"stop\nlistening\", the model should call Cody's `cody_stop_voice_session`\n\ntool and the\nbridge stops the recorder.\n\nGeneral search uses whichever picker Cody detects as available. For example, if Telescope is loaded:\n\n```\n:CodyDo find auth service\n:CodyDo search for auth token in the workspace\n```\n\nThe generated picker tool receives `mode = \"files\"`\n\nfor file/path search and\n`mode = \"grep\"`\n\nfor workspace text search.\n\n```\nexport OPENAI_API_KEY=\"sk-...\"\nexport OPENAI_REALTIME_MODEL=\"gpt-realtime-2\"\nexport CODY_AUDIO_DEVICE=\"\" # optional sox device override\nexport CODY_ENABLE_SHELL=\"1\" # shell tool; normally set via enable_shell in setup() (default on)\nexport CODY_ENABLE_COMMANDS=\"1\" # Ex-command tool; normally set via enable_commands in setup() (default on)\n\n# Optional spoken feedback (see \"Spoken Feedback (TTS)\")\nexport ELEVENLABS_API_KEY=\"...\"\nexport ELEVENLABS_VOICE_ID=\"<elevenlabs-voice-id>\"\nexport ELEVENLABS_MODEL_ID=\"eleven_flash_v2_5\" # optional\nexport ELEVENLABS_OUTPUT_FORMAT=\"mp3_22050_32\" # optional\nexport CODY_TTS_REQUEST_TIMEOUT_MS=\"10000\"     # optional\nexport CODY_TTS_PLAYER_COMMAND=\"afplay\"        # optional, non-macOS players\n```\n\n`gpt-realtime-2`\n\nis the default because the current OpenAI Realtime docs use it in the WebSocket and session examples.\n\nLocal deterministic checks:\n\n```\nnpm run typecheck\nnpm test\nlua test/adapter_spec.lua\nnvim -l test/tts_env_spec.lua\nnvim -l test/shell_handler_spec.lua\nnvim -l test/command_handler_spec.lua\n```\n\n`npm test`\n\ncovers the TypeScript bridge, including the TTS feedback-to-speech\nmapping and cancellation. `test/tts_env_spec.lua`\n\nruns under Neovim's LuaJIT\n(not the system `lua`\n\n) and checks the bridge environment built from the TTS\nconfig.\n\nLive router evals use the actual Realtime model with fake editor context/capabilities and check the first selected tool:\n\n```\nexport OPENAI_API_KEY=\"sk-...\"\nnpm run eval:router\n```\n\nIf `OPENAI_API_KEY`\n\nis not set, `eval:router`\n\nskips without failing.\nSearch evals use fixture files under `test/fixtures/search`\n\n.\n\nCody is deliberately narrow:\n\n- It should feel like a modal editor command layer, not a chat sidebar.\n- Voice commands should be short and imperative.\n- Navigation should reuse native editor commands or the user's preferred picker.\n- The model should use tools, not narrate pretend actions.\n- Write tools are limited to the active editor buffers.\n\n**Command adapter**: detect native/LSP/plugin commands and expose them as Realtime tools.** Provider installer**: install missing command providers through`lazy.nvim`\n\nor another detected package manager.**Realtime text loop**: typed commands call the adapter through GPT Realtime.** Push-to-talk voice**: voice commands flow through the same adapter.** Smarter edits**: add Tree-sitter context and stricter write guardrails.\n\nUseful later steps:\n\n- Add a native push-to-talk key listener for press/release instead of two Vim commands.\n- Add a panel focus/scrollback mode and a copy/export command.\n\nDone:\n\n- Optional ElevenLabs spoken feedback driven by the feedback event stream (see \"Spoken Feedback (TTS)\").\n- Tree-sitter context for function/class-aware edits.\n- A small eval suite for command parsing and tool selection.", "url": "https://wpnews.pro/news/show-hn-cody-voice-control-for-neovim-using-its-own-commands-lsp", "canonical_source": "https://github.com/juancgarza/cody", "published_at": "2026-06-20 18:08:07+00:00", "updated_at": "2026-06-20 18:37:41.652049+00:00", "lang": "en", "topics": ["developer-tools", "artificial-intelligence", "large-language-models", "ai-agents"], "entities": ["Cody", "Neovim", "OpenAI", "GitHub", "Juan Garza", "LSP", "Telescope", "CodeCompanion"], "alternates": {"html": "https://wpnews.pro/news/show-hn-cody-voice-control-for-neovim-using-its-own-commands-lsp", "markdown": "https://wpnews.pro/news/show-hn-cody-voice-control-for-neovim-using-its-own-commands-lsp.md", "text": "https://wpnews.pro/news/show-hn-cody-voice-control-for-neovim-using-its-own-commands-lsp.txt", "jsonld": "https://wpnews.pro/news/show-hn-cody-voice-control-for-neovim-using-its-own-commands-lsp.jsonld"}}