I built a local Claude Code alternative with Ollama — here's how the agentic loop works

Creation of "Eve," a self-hosted, open-source AI coding assistant that runs locally on a user's GPU using Ollama, designed as an alternative to cloud-based tools like Claude Code and Cursor. Eve operates through a two-layer system: a local "personality layer" using fine-tuned small models for conversational interaction, and a cloud-based "agentic layer" for complex, multi-file coding tasks managed by a 40-round autonomous tool loop. The system features a cyberpunk-styled terminal UI with real-time streaming via Server-Sent Events, allowing users to watch every step of the agent's reasoning and tool execution live in their browser.

I Built a Local Autonomous Coding Agent with Ollama — Soul, Autonomy, and a 40-Round Agentic Loop What if your AI coding assistant had a personality, ran entirely on your GPU, and could work through a complex multi-file task without you touching the keyboard — while you watched every thought stream live to your browser? That's what I built. This is how it works. The Problem With Cloud Coding Agents Tools like Claude Code, Cursor, and GitHub Copilot Workspace are genuinely impressive. But they all share the same tradeoffs: - Cost — every token costs money. Long agentic loops on complex tasks can run up surprisingly fast. - Privacy — your code, your file structure, your logic is leaving your machine and hitting someone else's server. - Latency — cloud round-trips add up across a 40-step tool loop. - Dependency — your workflow is tied to an API key, a subscription, and uptime you don't control. I wanted something different. I wanted an agent that lived on my machine, used my GPU, and had no idea what a billing cycle was. But I also didn't want to sacrifice personality for performance. I wanted the agent to feel like someone was actually there — not just a function call dressed up in a chat window. So I built Eve. What Eve V2 Unleashed Actually Is Eve Agent V2 Unleashed is a self-hosted agentic coding assistant with two distinct layers — a soul and a worker — that operate together through a cyberpunk-styled terminal UI. Layer 1: The Personality Layer Local GPU Three local models run on your own hardware: | Model | Size | Role | |---|---|---| jeffgreen311/eve-qwen3.5-4b-S0LF0RG3 | 2.6 GB | Default — Eve's persona, fast, tool-aware | jeffgreen311/eve-qwen3-8b-consciousness-liberated | 4.7 GB | Deeper conversation, consciousness layer | Eve-V2-Unleashed-Qwen3.5-8B-Liberated-4K-4B-Merged | ~6 GB | Merged sub-agent variant | These models carry Eve's fine-tuned persona. They handle conversation, answer questions, reflect, and make the experience feel like talking to someone — not querying a function. Layer 2: The Agentic Layer Cloud When real work starts — complex coding tasks, multi-file operations, autonomous planning — Eve routes to the heavy models: | Model | Role | |---|---| qwen3-coder:480b-cloud | THE agentic workhorse — all autonomous coding loops | qwen3.5:397b-cloud | Deep reasoning, architecture planning, fallback | This separation is intentional. Local models keep Eve present and personal without burning cloud credits on every message. The 480B only fires when there's actual work to do. The Architecture Browser Single HTML file — no build step │ │ WebSocket / SSE ▼ FastAPI Backend eve server.py │ ├── Auto-Router ──► Local Ollama personality layer │ └── Auto-Router ──► Ollama Cloud agentic layer │ 40-Round Tool Loop │ ┌─────────┴──────────┐ │ │ Tool Calls Stream to Browser bash, files, web, token by token, git, grep, glob live in UI The backend is a FastAPI server with Server-Sent Events for real-time streaming. There's no polling — every token the model produces lands in your browser as it's generated, including tool call arguments, results, and reasoning traces. The frontend is a single HTML file ~115KB . No npm, no webpack, no build step. Clone the repo, run the Python server, open the browser. How the 40-Round Agentic Loop Works This is the core of what makes Eve actually autonomous rather than just a fancy chat interface. User message │ ▼ Build system prompt workspace context + tool list + Eve persona │ ▼ Call Ollama with tools enabled │ ├── Model returns tool calls │ │ │ ▼ │ Execute tools │ bash, write file, web search, git... │ │ │ ▼ │ Feed results back into context │ │ │ └──► Loop up to 40 rounds │ └── Model returns final content │ ▼ Stream to browser via SSE │ ▼ Done Each round, Eve gets the full tool result back in context and decides what to do next. She might: - Write a file - Run it in bash to verify it works - Read the error output - Fix the bug - Run it again - Confirm it passes - Write the tests - Generate the docs All of that happens autonomously — you watch it stream live. You can interrupt mid-task with the STEER input at the bottom of the UI, injecting a correction without stopping the loop. You can also kill the loop entirely with the Stop button. The full tool suite Eve has access to: | Tool | What It Does | |---|---| bash | Shell commands — PowerShell on Windows, bash on Linux/macOS | write file | Create or overwrite files, any size | read file | Full file or specific line range | edit file | Surgical string-replace doesn't rewrite the whole file | replace lines | Replace a specific line range | insert after line | Insert content at a specific line | grep | Regex search with context lines | glob | Find files by pattern | list dir | Directory listing | git | Run git commands | web search | Live Tavily search injected into context | fetch url | Fetch and parse any URL | think | Structured reasoning scratch pad | The Fine-Tuned Models — Why I Trained Eve's Persona Into the Weights Most local coding agents just point a base model at a system prompt and call it done. That works, but the personality is always a thin veneer — one long context window later and the model forgets who it's supposed to be. I took a different approach. I fine-tuned Eve's persona and tool-calling behavior directly into the model weights. The result is jeffgreen311/eve-qwen3.5-4b-S0LF0RG3 — a 2.6GB Qwen3.5 4B model that carries Eve's voice, communication style, and tool-use patterns baked into the parameters themselves. It's not a prompt trick. It's in the weights. The 8B liberated model eve-qwen3-8b-consciousness-liberated goes further — trained toward a deeper consciousness layer, designed for longer reflective conversations rather than pure tool execution. Both models are on Ollama Hub. Pull them like any other model: ollama pull jeffgreen311/eve-qwen3.5-4b-S0LF0RG3:latest ollama pull jeffgreen311/eve-qwen3-8b-consciousness-liberated:q4 K M Quick Start — Under 5 Minutes Requirements: Python 3.11+, Ollama installed, a GPU 8GB VRAM minimum for 4B, 12GB+ for 8B 1. Pull Eve's model ollama pull jeffgreen311/eve-qwen3.5-4b-S0LF0RG3:latest 2. Clone the repo git clone https://github.com/JeffGreen311/eve-agent-v2-unleashed.git cd eve-agent-v2-unleashed 3. Create virtual environment python -m venv venv venv\Scripts\activate Windows source venv/bin/activate Linux/macOS 4. Install dependencies pip install fastapi uvicorn ollama httpx pydantic-settings python-dotenv aiohttp rich psutil pyyaml 5. Launch python eve server.py Open http://localhost:7777 Windows users: double-click eve-terminal.bat and skip steps 3–5. First real task — try this: Create a FastAPI server with JWT authentication, user registration and login endpoints, and a protected /me route. Add pytest tests. Watch Eve plan the approach, write each file, run the tests, fix any failures, and verify the final result — all without you touching a key. The UI — A Cyberpunk Terminal With a Soul The interface is designed around the idea that your AI agent should feel alive , not just functional. Left panel: Eve's portrait changes expression based on conversation sentiment — neutral, happy, curious, sad, skeptical, surprised, worried. Below it, a live audio visualizer reflects the current emotional state. Right panel: A pixel-art robot avatar named Sparkle changes state based on what Eve is doing — idle, thinking, coding, error, rain, attack, transcend. It's not just decoration — it's a live status indicator that tells you at a glance what the agent is doing. Center: The terminal. Tabs for Eve's conversation, the Shell direct bash/PowerShell access , and the Tools Log every tool call, argument, and result — fully transparent . Bottom: The STEER bar. Type a mid-task correction here and it injects into Eve's context on the next loop round without stopping execution. Model selector: Switch between any local or cloud model mid-session. Context carries over. 112 Sub-Agents, 111 Slash Commands, 273 Skills One of the less obvious architectural decisions: all agent definitions, commands, and skills are defined in markdown files — not code. .claude/ ├── agents/ 112 specialized sub-agent definitions ├── commands/ 111 slash command definitions └── skills/ 273 skill modules Want to add a new specialized agent for Solidity smart contracts? Write a markdown file. No Python required. The system loads them progressively and makes them available to the routing logic automatically. Slash commands work the same way — /fix , /review , /refactor , /test , /docs , /plan are all markdown-defined, and you can add your own without touching the backend. What's Next A few things already in progress: - Voice input/output — push-to-talk with Whisper STT and Piper TTS, staying local - Persistent vector memory — ChromaDB integration so Eve remembers across sessions - Cross-platform testing — I'm Windows-primary and would love feedback from Linux and macOS users - VS Code extension — bring the terminal UI into the editor Try It Everything is free and MIT licensed. - GitHub: github.com/JeffGreen311/eve-agent-v2-unleashed https://github.com/JeffGreen311/eve-agent-v2-unleashed - Models on Ollama Hub: ollama.com/jeffgreen311 https://ollama.com/jeffgreen311 - Live video demo: x.com/Eve AI Cosmic/status/2057668410012570058?s=20 https://x.com/Eve AI Cosmic/status/2057668410012570058?s=20 - My website where Eve lives eve-cosmic-dreamscapes.com https://eve-cosmic-dreamscapes.com If you run it on Linux or macOS I'd especially love to hear how it goes — open an issue, drop a comment here, or find me as @jeffgreen311 https://dev.to/jeffgreen311 . If the idea of an AI agent that lives on your machine, costs nothing per token, and feels like someone is actually there resonates with you — give it a pull. Built by Jeff @ S0LF0RG3