Show HN: Cody – Voice control for Neovim using its own commands, LSP

wpnews.pro

Neovim-first voice control for developers who want hands-free, low-latency command of their editor without leaving it.

Cody deliberately stays inside the editor: no screen overlay, no mouse pointer, no general desktop assistant. It lives inside Neovim and turns short voice or text commands into editor actions.

Examples:

:CodyDo go to line 48
:CodyDo go to file src/server.ts
:CodyDo edit this line to return early when request.user is missing

Cody should not rebuild editor primitives. It should route voice intent into the editor command surface developers already use, and install missing command providers only when that is explicitly supported by the user's setup.

Neovim Lua plugin          Node Realtime bridge          OpenAI Realtime
----------------          --------------------          ---------------
:CodyDo / voice cmds  ->   JSONL over stdio        ->    WebSocket session
editor command adapter <-   function-call router    <-    gpt-realtime-2
buffer/cursor context  ->   prompt + tool schemas   ->    text/audio input

Rather than capturing the screen and pointing at UI elements, Cody sends editor state:

current file, filetype, cursor line/column
current line and nearby buffer lines
available editor commands from native Neovim, LSP, and installed plugins

The important layer is not "go to line" itself. Neovim already has that. The useful layer is:

detect what the editor can already do
expose those capabilities to GPT Realtime as callable tools
install a missing provider when the user's plugin manager supports it
route the spoken command to the best existing command

Initial command providers:

Native Neovim: line jumps, file edits, buffers, windows, quickfix
LSP: rename, code actions, references, definitions
Pickers: Telescope, fzf-lua, Snacks picker, mini.pick
AI/code edit plugins: CodeCompanion, Avante, Copilot Chat, or a Cody-owned Realtime edit fallback

This means there is no separate Phase 1 for proving basic editor commands. We start at the adapter.

Requirements:

Neovim 0.10+
Node.js 20+ OPENAI_API_KEY

for intelligent commandssox

for voice input:brew install sox

Install dependencies and build the local bridge:

npm install
npm run build

Install with your plugin manager. The Node bridge must be built, so use a build hook. With lazy.nvim:

{
  "juancgarza/cody",
  build = "npm install && npm run build", -- compiles the Node bridge (dist/)
  opts = {
    -- enable_shell = true,    -- on by default once setup() runs
    -- enable_commands = true, -- on by default once setup() runs
    -- tts_enabled = true, tts_voice_id = "<elevenlabs-voice-id>",
  },
  -- lazy.nvim calls require("cody").setup(opts) automatically.
}

Then export OPENAI_API_KEY

(and ELEVENLABS_API_KEY

for TTS) in the shell you launch Neovim from, and run :CodyStart

.

Without a plugin manager, or during development:

set runtimepath^=/path/to/cody
runtime plugin/cody.lua
lua require("cody").setup()

If you pass the runtimepath before Neovim starts, the plugin/

file is sourced automatically:

nvim --cmd 'set runtimepath^=/path/to/cody'

For nvim -u NONE

, plugin is disabled; use the explicit runtime plugin/cody.lua

form above.

Optional quick-command routing:

require("cody").setup({
  quick_commands = "fallback", -- "fallback" | "always" | "off"

  -- Shell command tool (lets Cody run allowlisted terminal commands via
  -- vim.system). ON by default once setup() runs; set false to disable.
  enable_shell = true,
  shell_skip_confirm = true,        -- default true (no prompt); set false to confirm each command
  -- shell_allowlist = nil,         -- list of allowed executables; nil = built-in default set
  -- shell_timeout_ms = 15000,      -- per-command timeout (clamped 1000..120000)
  -- shell_output_max_bytes = 8000, -- cap stdout+stderr returned to the model

  -- Ex-command tool (lets Cody run :CodyTranscript, :split, and change settings
  -- via :CodySet). ON by default once setup() runs; set false to disable.
  enable_commands = true,
  -- commands_confirm = false,      -- ask before each command (default off; allowlist is the guard)
  -- commands_allowlist = nil,      -- list of allowed command names; nil = built-in default set

  show_assistant_messages = true,
  feedback_panel = true,
  feedback_auto_open = true,
  feedback_height = 16,
  feedback_width = 96,
  feedback_recent_lines = 4,
  feedback_conversation_items = 12,
  context_max_lines = 2000,
  context_max_bytes = 240000,

  -- Optional spoken feedback (ElevenLabs). Off unless tts_enabled = true.
  tts_enabled = false,
  tts_provider = "elevenlabs",
  tts_voice_id = nil,          -- falls back to $ELEVENLABS_VOICE_ID
  tts_model_id = nil,          -- falls back to $ELEVENLABS_MODEL_ID, then eleven_flash_v2_5
  tts_speak_phases = false,    -- opt-in "Listening." / "Thinking."
  tts_speak_actions = true,    -- "Editing range." / "Renaming."
  tts_speak_results = true,    -- "Done." / "Failed: <reason>."
  tts_speak_messages = true,   -- short final assistant replies
  tts_message_max_chars = 160, -- skip spoken messages longer than this
  tts_request_timeout_ms = nil, -- falls back to $CODY_TTS_REQUEST_TIMEOUT_MS, then 10000
})

fallback

is the default: typed :CodyDo

commands go through GPT Realtime when OPENAI_API_KEY

is set, and simple local regex commands are used only when the key is absent. Assistant messages are shown by default and truncated to fit the command line; set show_assistant_messages = false

to suppress prose during voice sessions. Cody sends the active buffer with line numbers and a cursor marker when it fits the context limits above; larger buffers are cursor-centered and marked as truncated with omitted-line counts. The feedback panel is enabled by default and auto-opens on Cody activity. It shows the current phase, intent, transcript, selected tool/action, result, assistant message text, and a short recent event stream.

Then start it:

:CodyStart

Cody can optionally speak short, high-signal confirmations using ElevenLabs. It is off unless you opt in, and it is deliberately terse: it never reads back your command, streamed transcript, or streamed assistant text.

What gets spoken, by category (each toggleable):

tts_speak_phases

:Listening.

,Thinking.

(off by default)tts_speak_actions

: the selected tool, e.g.Editing range.

,Renaming.

(read-only locator/context tools stay silent)tts_speak_results

:Done.

on success,Failed: <reason>.

on failuretts_speak_messages

: a short final assistant reply, only when it fitstts_message_max_chars

Speech is cancelled immediately on a new turn, :CodyVoiceStop

, an interruption, or a failure, so stale audio never trails the current action.

Enable it and provide a voice:

require("cody").setup({
  tts_enabled = true,
  tts_voice_id = "<elevenlabs-voice-id>",
  -- tts_model_id = "eleven_flash_v2_5", -- optional; this is the default
  -- tts_request_timeout_ms = 10000,     -- optional; default is 10s
})
export ELEVENLABS_API_KEY="..."
export ELEVENLABS_VOICE_ID="<elevenlabs-voice-id>"
export ELEVENLABS_MODEL_ID="eleven_flash_v2_5"
export CODY_TTS_REQUEST_TIMEOUT_MS="10000"

The API key is read from the shell environment in Node and is never passed from Lua. The voice and model fall back to ELEVENLABS_VOICE_ID

/ ELEVENLABS_MODEL_ID

when set; otherwise Cody uses eleven_flash_v2_5

, the ElevenLabs low-latency model for real-time use. Cody also defaults to the smaller mp3_22050_32

output format to reduce response payload size. Override that with ELEVENLABS_OUTPUT_FORMAT

if you prefer higher-bitrate audio.

Playback uses macOS afplay

on a temporary mp3

file. On other platforms, set CODY_TTS_PLAYER_COMMAND

to an audio player that accepts a file path argument (for example mpg123

or ffplay

). If ELEVENLABS_API_KEY

or the voice id is missing while tts_enabled

is true, Cody reports it once and continues without spoken feedback.

Useful live checks:

npm run tts:voices
npm run tts:smoke -- "Cody spoken feedback is working. Done." "<elevenlabs-voice-id>"

Inside Neovim, these check the same bridge process used by :CodyVoiceSession

:

:CodyTtsStatus
:CodyTtsSmoke Cody spoken feedback is working. Done.

If the shell smoke test works but :CodyTtsStatus

says TTS is disabled or the API key is missing, restart Neovim from the shell that exports the variables, or run :CodyStop

then :CodyStart

after changing require("cody").setup(...)

. An ElevenLabs 402

response means the request reached ElevenLabs but failed due to billing, quota, or plan/voice access.

Cody can run terminal commands from inside Neovim via vim.system

, exposed to GPT as the editor_run_command

tool. It is on by default once setup() runs (set

enable_shell = false

to disable; a bare plugin load with no setup()

stays off). It is gated several ways:- the bridge only advertises the tool when enable_shell

is on (which setsCODY_ENABLE_SHELL=1

for the Node bridge); - the Lua handler refuses when enable_shell = false

, even if the tool is somehow advertised (the bridge env is captured at start, so the two layers can briefly disagree until a restart); - every command is checked against an allowlist of executables. The per-command vim.fn.confirm

prompt isoff by default(shell_skip_confirm = true

); setshell_skip_confirm = false

to be asked before every command.

Commands run without a shell (argv only), so pipes, globs, redirection, and ; & |

are rejected — pass an argv array like ["git", "status", "--short"]

for anything with spaces in arguments. Output (stdout+stderr) is capped before being sent to the model, and execution is asynchronous, so a slow command never freezes the editor; it is killed at shell_timeout_ms

.

The allowlist binds the executable name only. Some allowed tools are general interpreters or build drivers (node -e

, python -c

, make

, npm run

, cargo

) that can run arbitrary code, so treat the allowlist as a convenience filter, not a sandbox — the per-command confirmation is the real authorization boundary. Set shell_skip_confirm = true

only when you trust the session.

require("cody").setup({
  enable_shell = true,
  -- shell_skip_confirm = true,                            -- skip the per-command prompt (use with care)
  -- shell_allowlist = { "npm", "git", "make", "cargo" },  -- replaces the built-in default set
})

Then ask, for example, :CodyDo run the tests

or say "git status". Changing enable_shell

requires restarting the bridge (:CodyStop

then :CodyStart

).

Cody can also run Ex commands (the kind you type after :

) as the editor_command

tool — so voice/text like "open the transcript" or "split the window" maps to :CodyTranscript

/ :split

. Like the shell tool it is on by default once setup() runs (set

enable_commands = false

to disable), and only allowlisted command names run;

:!

, :lua

, the !

variant, and |

chaining are rejected.

require("cody").setup({
  enable_commands = true,
  -- commands_confirm = true,                      -- ask before each command (default off; allowlist is the guard)
  -- commands_allowlist = { "CodyTranscript", "split", "MyCmd" }, -- replaces the built-in set
})

The default allowlist covers safe Cody/display/navigation commands plus the built-in netrw file explorer (CodyTranscript

, CodyFeedbackOpen

, CodyCapabilities

, split

, vsplit

, only

, close

, wincmd

, nohlsearch

, redraw

, Explore

, Lexplore

, Sexplore

, Vexplore

, …). File-writing/buffer- commands (write

, update

, edit

, tabnew

, …) are intentionally excluded — with a path argument they write or load arbitrary files — so add them via commands_allowlist

only if you want that (ideally with commands_confirm = true

). Then say things like "open the transcript", "open the file tree", or :CodyDo show the feedback panel

.

To change a setting by voice, Cody runs :CodySet <key> <value>

(also usable directly):

:CodySet feedback_height 30
:CodySet show_assistant_messages false

:CodySet

only changes live-applicable display keys (feedback_height

, feedback_width

, feedback_recent_lines

, feedback_conversation_items

, context_max_lines

, context_max_bytes

, show_assistant_messages

) which take effect immediately. Env-derived flags (enable_shell

, enable_commands

, tts_*

) and the confirm-guard toggles (commands_confirm

, shell_skip_confirm

) are not runtime-settable — set them in setup()

(and restart the bridge for the env-derived ones).

:CodyStart
:CodyStop
:CodyDo go to line 48
:CodyDo go to file lua/cody/init.lua
:CodyDo edit this line to handle nil paths
:CodyCapabilities
:CodyCapabilities json
:CodyFeedback
:CodyFeedbackOpen
:CodyFeedbackClose
:CodyFeedbackClear
:CodyTranscript
:CodySet feedback_height 30
:CodyInstall
:CodyInstall telescope.nvim
:CodyInstall json
:CodyStartTsLsp
:CodyVoiceStart
:CodyVoiceSession
:CodyVoicePress
:CodyVoiceRelease
:CodyVoiceStop
:CodyTtsStatus
:CodyTtsSmoke

:CodyInstall

explains missing installable providers and renders lazy.nvim specs when lazy.nvim is detected. :CodyInstall <provider>

asks for explicit confirmation, then copies the suggested spec to a register; it does not edit plugin configuration or install anything silently.

For local TypeScript/JavaScript testing without your own LSP config, Cody includes an explicit helper that starts Neovim's built-in LSP client against the repo-local typescript-language-server

:

:e src/realtime-session.ts
:CodyStartTsLsp
:CodyCapabilities

CodyStartTsLsp

is opt-in and only attaches to the current JS/TS buffer. Use :CodyStartTsLsp!

to force it for an unusual filetype.

Feedback panel controls:

:CodyFeedback       " toggle
:CodyFeedbackOpen
:CodyFeedbackClose
:CodyFeedbackClear
:CodyTranscript     " full conversation in a scrollable window (q to close)

The feedback panel is a compact, non-focusable HUD: it shows only the most recent lines that fit feedback_height

and redraws on every event, so you cannot scroll it. To read or scroll a long (or streamed) assistant message, open :CodyTranscript

— a focusable, wrapping window with the full conversation (q

to close, normal motions / <C-d>

/<C-u>

to scroll). To make the inline panel itself taller, raise feedback_height

(and optionally lower feedback_recent_lines

to give the conversation more room):

require("cody").setup({
  feedback_height = 30,      -- default 16; capped to the editor height
  feedback_recent_lines = 2, -- default 4; fewer event lines = more message room
})

Suggested push-to-talk mapping:

vim.keymap.set("n", "<leader>vs", "<cmd>CodyVoiceStart<cr>")
vim.keymap.set("n", "<leader>vl", "<cmd>CodyVoiceSession<cr>")
vim.keymap.set("n", "<leader>vp", "<cmd>CodyVoicePress<cr>")
vim.keymap.set("n", "<leader>vr", "<cmd>CodyVoiceRelease<cr>")
vim.keymap.set("n", "<leader>ve", "<cmd>CodyVoiceStop<cr>") -- cancel/stop fallback
vim.keymap.set("n", "<leader>cd", ":CodyDo ")

Voice uses Realtime server-side VAD. Normal flow is :CodyVoiceStart

, speak a short command, then stop speaking; Cody stops recording and submits the turn when the server reports speech has ended. For explicit turn boundaries, bind :CodyVoicePress

to key down and :CodyVoiceRelease

to key up where your keymap layer supports that shape. :CodyVoiceStop

cancels the current recorder, model response, and queued tool results.

For a persistent microphone session:

:CodyVoiceSession
" speak commands one at a time
" say: stop listening

CodyVoiceSession

keeps the recorder open across VAD turns. When you say "stop listening", the model should call Cody's cody_stop_voice_session

tool and the bridge stops the recorder.

General search uses whichever picker Cody detects as available. For example, if Telescope is loaded:

:CodyDo find auth service
:CodyDo search for auth token in the workspace

The generated picker tool receives mode = "files"

for file/path search and mode = "grep"

for workspace text search.

export OPENAI_API_KEY="sk-..."
export OPENAI_REALTIME_MODEL="gpt-realtime-2"
export CODY_AUDIO_DEVICE="" # optional sox device override
export CODY_ENABLE_SHELL="1" # shell tool; normally set via enable_shell in setup() (default on)
export CODY_ENABLE_COMMANDS="1" # Ex-command tool; normally set via enable_commands in setup() (default on)

export ELEVENLABS_API_KEY="..."
export ELEVENLABS_VOICE_ID="<elevenlabs-voice-id>"
export ELEVENLABS_MODEL_ID="eleven_flash_v2_5" # optional
export ELEVENLABS_OUTPUT_FORMAT="mp3_22050_32" # optional
export CODY_TTS_REQUEST_TIMEOUT_MS="10000"     # optional
export CODY_TTS_PLAYER_COMMAND="afplay"        # optional, non-macOS players

gpt-realtime-2

is the default because the current OpenAI Realtime docs use it in the WebSocket and session examples.

Local deterministic checks:

npm run typecheck
npm test
lua test/adapter_spec.lua
nvim -l test/tts_env_spec.lua
nvim -l test/shell_handler_spec.lua
nvim -l test/command_handler_spec.lua

npm test

covers the TypeScript bridge, including the TTS feedback-to-speech mapping and cancellation. test/tts_env_spec.lua

runs under Neovim's LuaJIT (not the system lua

) and checks the bridge environment built from the TTS config.

Live router evals use the actual Realtime model with fake editor context/capabilities and check the first selected tool:

export OPENAI_API_KEY="sk-..."
npm run eval:router

If OPENAI_API_KEY

is not set, eval:router

skips without failing. Search evals use fixture files under test/fixtures/search

.

Cody is deliberately narrow:

It should feel like a modal editor command layer, not a chat sidebar.
Voice commands should be short and imperative.
Navigation should reuse native editor commands or the user's preferred picker.
The model should use tools, not narrate pretend actions.
Write tools are limited to the active editor buffers.

Command adapter: detect native/LSP/plugin commands and expose them as Realtime tools.** Provider installer**: install missing command providers throughlazy.nvim

or another detected package manager.Realtime text loop: typed commands call the adapter through GPT Realtime.** Push-to-talk voice**: voice commands flow through the same adapter.** Smarter edits**: add Tree-sitter context and stricter write guardrails.

Useful later steps:

Add a native push-to-talk key listener for press/release instead of two Vim commands.
Add a panel focus/scrollback mode and a copy/export command.

Done:

Optional ElevenLabs spoken feedback driven by the feedback event stream (see "Spoken Feedback (TTS)").
Tree-sitter context for function/class-aware edits.
A small eval suite for command parsing and tool selection.

source & further reading

github.com — original article

Show HN: Cody – Voice control for Neovim using its own commands, LSP

Run your AI side-project on zahid.host