Show HN: MandoCode – local-first AI coding agent (.NET and Ollama) MandoCode, a new open-source AI coding agent built on .NET and Ollama, launched today as a local-first alternative to cloud-based assistants. The tool, which requires no API keys and runs entirely in the terminal, can read, write, search, and plan across entire codebases while supporting any file type. Its release addresses developer concerns about privacy and cost by offering a fully local AI coding experience that can optionally connect to cloud models for more powerful capabilities. Your AI coding assistant — run locally or in the cloud with Ollama. No API keys required. Just you and your code. MandoCode is an AI coding assistant built on RazorConsole https://github.com/RazorConsole/RazorConsole , powered by Semantic Kernel https://github.com/microsoft/semantic-kernel and Ollama https://ollama.ai . RazorConsole makes the entire terminal UI possible — Razor components, a virtual DOM, and Spectre.Console rendering all running in the console. Run locally or connect to Ollama cloud — no API keys required for anything, including web search an optional free Tavily https://www.tavily.com/ key upgrades search reliability . It gives you Claude-Code-style project awareness — reading, writing, searching, planning, and web browsing across your entire codebase — without ever leaving your terminal. It understands any file type : C , JavaScript, TypeScript, Python, CSS, HTML, JSON, config files, and more. .NET 8 SDK — dotnet.microsoft.com/download/dotnet/8.0 https://dotnet.microsoft.com/en-us/download/dotnet/8.0 SDK includes the runtime — install only the SDK Ollama — ollama.com/download https://ollama.com/download MandoCode walks you through setup on first run dotnet tool install -g MandoCode mandocode First run launches a guided wizard: it detects Ollama, offers to start it, walks you through cloud sign-in if you'd like more powerful models, and auto-pulls a sensible default. You can re-run it any time with /setup . mandocode --doctor Prints your runtime version, Ollama status, models pulled, and cloud sign-in state. Using cloud models Cloud model context is managed on Ollama's servers and set to the model's maximum by default — nothing on your machine affects it, including the desktop app's slider. :cloud tags ? Skip this section. If you use local models and see responses cut off, the model "forgetting" earlier conversation, edits failing repeatedly on files it just wrote, or this message: ⚠ Response was cut off because the model's CONTEXT WINDOW filled … …your Ollama context window is almost certainly too small. The context window is how much conversation + code the model can see at once — and Ollama defaults it to ~4k tokens , which an agentic session fills almost immediately. When it overflows, the oldest content including the system prompt — the model's instructions is silently dropped. If you use the Ollama desktop app the tray icon , the app's Settings → Context length slider controls this — and it overrides everything else, including MandoCode's config: There's no universally right slider position — it's a trade between how much the model can see and fitting in your GPU's memory every 8k of window costs roughly 0.5–1.5 GB of VRAM depending on the model : Too low the 4k default : the symptoms above — the model's own instructions silently fall out of the window and it stops behaving. Too high for your GPU : the model spills into system RAM, tokens/sec craters, and turns crawl or look hung. Starting points : 16k for most GPUs, 32k with 8 GB+ VRAM. Only raise it if you're seeing the symptoms above; step back down a notch if generation slows badly after raising it. If you run ollama serve yourself no desktop app , MandoCode handles it: it sets OLLAMA CONTEXT LENGTH from your contextLength config when it starts the daemon, and auto-sizes it to the hardware tier of the model you pick in /setup or /model . Tune it manually with: mandocode --config set contextLength 16384 Verify what your daemon is actually using with ollama ps look at the CONTEXT column . Run /learn inside MandoCode for a friendly explainer. The context window's evil twin — and unlike the slider above, this one applies to every model, cloud included . If the model announces work and then just stops — "I'll create the game…" and the turn ends with no plan, no files, and no error — your maxTokens is too low. It caps a single reply NumPredict , and reasoning models spend output tokens thinking before they emit a tool call, so a low cap cuts them off before they ever act. Fresh installs default to 32k and never notice it. But if your config predates v0.11, or you once lowered maxTokens thinking it was the context window they're different knobs — this caps what the model says , the context window caps what it sees , check it: mandocode --config show look at "Max Tokens" mandocode --config set maxTokens 32768 The telltale sign: token tracking shows output pinned at exactly your cap, turn after turn e.g. 2k out every time . Note that a running session keeps the config it loaded at startup — restart MandoCode or use /config set in-app for the change to take effect. git clone https://github.com/DevMando/MandoCode.git cd MandoCode dotnet build src/MandoCode/MandoCode.csproj dotnet run --project src/MandoCode/MandoCode.csproj -- /path/to/your/project | Every file write and delete is intercepted with a color-coded diff. You approve, deny, or redirect — nothing touches disk without your say-so. | Type | | Complex requests are automatically broken into step-by-step plans. Review the plan, then watch each step execute with progress tracking. | The AI can search the web and read webpages to find documentation, tutorials, or answers — no API keys needed. Optionally add a free | | Lofi and synthwave tracks bundled right in. A waveform visualizer runs in the corner while you code. Because vibes matter. | If Ollama isn't running, MandoCode shows setup guidance inline instead of a bare error. Use | | Feature | Description | | |---|---|---| AI | Project-aware assistant | Reads, writes, deletes, and searches your entire codebase | AI | Web search & fetch | Web search and webpage reading — keyless via DuckDuckGo, or Tavily with a free API key | AI | MCP server support | Connect to any Model Context Protocol server stdio or remote HTTP — Claude-Desktop-compatible config | AI | Streaming responses | Real-time output with animated spinners | AI | Task planner | Auto-detects complex requests and breaks them into steps | AI | Fallback function parsing | Handles models that output tool calls as raw JSON | UI | Diff approvals | Color-coded diffs with approve / deny / redirect | UI | Markdown rendering | Rich terminal output — headers, tables, code blocks, quotes | UI | Syntax highlighting | C , Python, JavaScript/TypeScript, Bash | UI | Clickable file links | OSC 8 hyperlinks for file paths | UI | Terminal theme detection | Auto-adapts colors for light and dark terminals | UI | Taskbar progress | Windows Terminal integration during task execution | Input | / command autocomplete | Slash commands with dropdown navigation | Input | @ file references | Attach file content to any prompt | Input | shell escape | Run shell commands inline git status , ls | Input | /copy and /copy-code | Copy responses or code blocks to clipboard | Music | Lofi + synthwave | Bundled tracks with volume, genre switching, waveform visualizer | Config | Configuration wizard | Guided setup with model selection and connection testing | Config | Config validation | Auto-clamps invalid settings to safe ranges | Reliability | Retry + deduplication | Exponential backoff and duplicate call prevention | Education | /learn command | LLM education guide with optional AI educator chat | Type / to see the autocomplete dropdown, or to run a shell command. | Command | What it does | |---|---| /help | Show commands and usage examples | /setup | Guided wizard — reconnect to Ollama, install/sign in, or pick a different model | /model | Quick switch — pick a different model context window auto-sized for local tiers | /config | Adjust settings — guided wizard | /config set