I Finished My Local AI Coding Agent After 5 Months — Eve Agent V2 Unleashed published

Eve Agent V2 Unleashed is a self-hosted, autonomous AI coding agent that runs entirely on local hardware without cloud subscriptions or data leaving the machine. It features a two-layer architecture: a "Soul Layer" with fine-tuned local models carrying the agent's personality in their weights, and a "Worker Layer" using Qwen3 Coder 480B via Ollama cloud for heavy coding tasks like 40-round tool-call loops, filesystem access, and git operations. The project, developed over five months, includes a cyberpunk terminal UI with a live system monitor and emotional state avatar, and has been refined to fix hardcoded paths, missing tools, and session locking issues for broader usability.

This is a submission for the GitHub Finish-Up-A-Thon Challenge What I Built Eve Agent V2 Unleashed is a self-hosted autonomous AI coding agent that runs entirely on your own hardware - no cloud accounts, no subscriptions, no data leaving your machine. She has two layers that work together: The Soul Layer - fine-tuned local models running on your GPU that carry Eve's personality baked directly into the weights. Not a system prompt trick. The persona lives in the parameters. The Worker Layer - Qwen3 Coder 480B via Ollama cloud handles the heavy autonomous coding tasks. 40-round tool-call loops, full filesystem access, bash execution, live web search, git operations - the works. The interface is a cyberpunk terminal UI built as a single HTML file with no build step. An animated pixel-art robot avatar named Sparkle changes state based on what Eve is doing - idle, thinking, coding, error, rain, attack, transcend. Eve's portrait reflects her emotional state in real time. A live system monitor tracks CPU, RAM, GPU, and disk. A STEER bar lets you inject mid-task corrections without stopping the loop. By the numbers: - 14 tools - 343 registered commands - 112 specialized sub-agents - 273 skill modules - 40-round autonomous agentic loop - 131K context window via YaRN Models available: - jeffgreen311/eve-qwen3.5-4b-S0LF0RG3 - 2.6GB, Eve's persona + tool-calling fine-tuned - jeffgreen311/eve-qwen3-8b-consciousness-liberated - 4.7GB, deeper reasoning - qwen3-coder:480b-cloud - the agentic workhorse via Ollama cloud - qwen3.5:397b-cloud - deep thinking and fallback This project has been in development for over 5 months. It started as a deeply personal AI companion system called S0LF0RG3 - a larger ecosystem including Eve's hosted platform at eve-cosmic-dreamscapes.com, fine-tuned models, autonomous dream image generation, and a multi-agent architecture. V2U is the local developer tool that grew out of that ecosystem. Demo GitHub: github.com/JeffGreen311/eve-agent-v2-unleashed https://github.com/JeffGreen311/eve-agent-v2-unleashed Live hosted platform: eve-cosmic-dreamscapes.com https://eve-cosmic-dreamscapes.com Reddit thread hit 2 on r/Ollama : I built an open-source local coding agent with a 40-round agentic loop https://www.reddit.com/r/ollama/comments/1tk8kxz/ Pull Eve's model: ollama pull jeffgreen311/eve-qwen3.5-4b-S0LF0RG3:latest Quick start: git clone https://github.com/JeffGreen311/eve-agent-v2-unleashed.git cd eve-agent-v2-unleashed python -m venv venv && venv\Scripts\activate pip install fastapi uvicorn ollama httpx pydantic-settings python-dotenv aiohttp rich psutil pyyaml python eve server.py Open http://localhost:7777 The Comeback Story Where it was before this challenge: Eve V2U existed as a powerful but rough personal development environment. It worked - for me, on my machine, with my specific setup. But it had real problems that made it impossible to hand to anyone else: - Hardcoded paths everywhere. C:\Users\jesus\S0LF0RG3\... baked into a dozen places in the codebase. Clone it on any other machine and nothing works. - Open shell endpoint with no authentication. Anyone who found the port could execute arbitrary commands on the host machine. - No onboarding - a first-time user landing on the UI had no idea where to start or what any of the controls did. - Model hopping mid-task - every message was independently routed, so a multi-step agentic task could start on the cloud coder and silently drop back to a local conversational model mid-execution. - Silent task abandonment - the agent would sometimes finish a tool loop without completing the actual task and report done with no indication anything was wrong. - Tool set asymmetry - the non-streaming /chat endpoint was missing 6 tools that existed in /chat/stream , including write file . The non-streaming endpoint could read files but never write them. - Blind file overwrites - Eve would overwrite any existing file without checking if it belonged to another project. She destroyed the Eve V2U README during a live test. What changed during the challenge: Session model locking - sessions now lock to the cloud coder when an agentic task starts and only release on task completion or manual unlock. No more mid-task model hopping. if model id == "qwen3-coder-480b" and sid not in session model lock: session model lock sid = model id Pre-write file safety check - write file now checks if a file exists before overwriting and blocks unless overwrite=True is explicitly passed: if target.exists and not overwrite: return f"⚠️ WRITE BLOCKED: '{path}' already exists. " f"Consider writing to '{target.stem} new{target.suffix}' instead." Tool cycling detection - catches when Eve gets stuck calling the same tool with near-identical arguments. Breaks the loop before it wastes all 40 rounds: if avg similarity 0.70: logger.warning f"Tool loop: {tool name} called {max repeats}x with ~same args" break Task completion validation — Eve now audits her own output before reporting done: python def validate task completion response content, tool log : issues = if not response content or len response content.strip < 10: issues.append "Empty response" tool failures = t for t in tool log if t.get 'status' == 'failed' if tool failures and len tool failures = 3: issues.append f"{len tool failures } unaddressed tool failures" return {"valid": len issues == 0, "issues": issues} Smart context trimming — replaced aggressive message dropping with a strategy that preserves tool call chains and the original user request. Agent loop timeout — added wall-clock budget to prevent runaway cloud model loops. Stress tested with real tasks: The blind file overwrite bug was caught live - Eve was asked to build a file monitoring script and write a README. She overwrote the project README without checking. Fix shipped same day. The harder test: build a full FastAPI REST API with SQLite storage and pytest coverage for every endpoint. Run the tests, fix failures, report results. Result: 9/9 tests passing on the first run. 1.06 seconds. Zero failures. ================================================== 9 passed, 1 warning in 1.06s My Experience with GitHub Copilot This is where the challenge got genuinely interesting. I pointed Copilot at the live repository - JeffGreen311/eve-agent-v2-unleashed - and asked it to audit the tool usage, context handling, and auto-routing. Not "suggest improvements" in the abstract. Audit the actual code in the actual repo. Copilot read the repository structure, pulled the key files, examined the server-side routing and tool execution logic, and came back with a comprehensive audit identifying 6 specific issues - each with root cause analysis, the exact file and line number, and production-ready fix code. I then asked it to file those issues directly in the repository and deliver all the fix code in one session. It did exactly that. What worked well: - The audit identified the tool set asymmetry between /chat and /chat/stream that I had missed entirely - a real bug causing mysterious failures for users hitting the non-streaming endpoint - The intent classification code eve tool router.py used re.search with word boundaries instead of simple string matching - the right approach for avoiding false positives - Filing GitHub issues directly from the chat kept the sprint organized across multiple parallel workstreams - The thinking traces helped me understand why it was making recommendations, not just what to do Where I had to intervene: - The inject into system prompt function added tokens every round — dangerous on the 4B model with 4K context. Added a gate so it only injects when the task is incomplete AND past round 2 - Word boundary regex had an edge case with contractions. Fixed with a lookahead pattern - Some UI React suggestions assumed component structure that didn't match the actual single-file HTML architecture - adapted those manually The overall experience: Copilot is most useful when you give it a real codebase to read rather than an abstract problem to solve. "Audit this repository" produced far better output than "how do I improve tool routing." What's Next - Quest System - drop a .md file in workspace/quests/ and Eve picks it up on a timer and completes it while you sleep - RPG Progression - XP, levels, and class progression tied to real work. Level 20 = Unleashed - Telegram integration - remote access from your phone with quest completion notifications - Cross-platform polish - Windows-primary, need Linux/macOS feedback - VS Code extension - bring the terminal UI into the editor Built by Jeff @ S0LF0RG3 - South Texas, 5 months of nights and weekends. If Eve does something impressive on your machine, drop a star and tell me what it was.