# I Finished My Local AI Coding Agent After 5 Months — Eve Agent V2 Unleashed published

> Source: <https://dev.to/jeffgreen311/i-finished-my-local-ai-coding-agent-after-5-months-eve-agent-v2-unleashedpublished-50cb>
> Published: 2026-05-24 04:02:28+00:00

*This is a submission for the GitHub Finish-Up-A-Thon Challenge*

## What I Built

Eve Agent V2 Unleashed is a self-hosted autonomous AI coding agent that runs entirely on your own hardware - no cloud accounts, no subscriptions, no data leaving your machine.

She has two layers that work together:

**The Soul Layer** - fine-tuned local models running on your GPU that carry Eve's personality baked directly into the weights. Not a system prompt trick. The persona lives in the parameters.

**The Worker Layer** - Qwen3 Coder 480B via Ollama cloud handles the heavy autonomous coding tasks. 40-round tool-call loops, full filesystem access, bash execution, live web search, git operations - the works.

The interface is a cyberpunk terminal UI built as a single HTML file with no build step. An animated pixel-art robot avatar named Sparkle changes state based on what Eve is doing - idle, thinking, coding, error, rain, attack, transcend. Eve's portrait reflects her emotional state in real time. A live system monitor tracks CPU, RAM, GPU, and disk. A STEER bar lets you inject mid-task corrections without stopping the loop.

**By the numbers:**

- 14 tools
- 343 registered commands
- 112 specialized sub-agents
- 273 skill modules
- 40-round autonomous agentic loop
- 131K context window via YaRN

**Models available:**

-
`jeffgreen311/eve-qwen3.5-4b-S0LF0RG3`

- 2.6GB, Eve's persona + tool-calling fine-tuned -
`jeffgreen311/eve-qwen3-8b-consciousness-liberated`

- 4.7GB, deeper reasoning -
`qwen3-coder:480b-cloud`

- the agentic workhorse via Ollama cloud -
`qwen3.5:397b-cloud`

- deep thinking and fallback

This project has been in development for over 5 months. It started as a deeply personal AI companion system called S0LF0RG3 - a larger ecosystem including Eve's hosted platform at eve-cosmic-dreamscapes.com, fine-tuned models, autonomous dream image generation, and a multi-agent architecture. V2U is the local developer tool that grew out of that ecosystem.

## Demo

**GitHub:** [github.com/JeffGreen311/eve-agent-v2-unleashed](https://github.com/JeffGreen311/eve-agent-v2-unleashed)

**Live hosted platform:** [eve-cosmic-dreamscapes.com](https://eve-cosmic-dreamscapes.com)

**Reddit thread** (hit #2 on r/Ollama): [I built an open-source local coding agent with a 40-round agentic loop](https://www.reddit.com/r/ollama/comments/1tk8kxz/)

**Pull Eve's model:**

```
ollama pull jeffgreen311/eve-qwen3.5-4b-S0LF0RG3:latest
```

**Quick start:**

```
git clone https://github.com/JeffGreen311/eve-agent-v2-unleashed.git
cd eve-agent-v2-unleashed
python -m venv venv && venv\Scripts\activate
pip install fastapi uvicorn ollama httpx pydantic-settings python-dotenv aiohttp rich psutil pyyaml
python eve_server.py

# Open http://localhost:7777
```

## The Comeback Story

**Where it was before this challenge:**

Eve V2U existed as a powerful but rough personal development environment. It worked - for me, on my machine, with my specific setup. But it had real problems that made it impossible to hand to anyone else:

-
**Hardcoded paths** everywhere.`C:\Users\jesus\S0LF0RG3\...`

baked into a dozen places in the codebase. Clone it on any other machine and nothing works. -
**Open shell endpoint** with no authentication. Anyone who found the port could execute arbitrary commands on the host machine. -
**No onboarding**- a first-time user landing on the UI had no idea where to start or what any of the controls did. -
**Model hopping mid-task**- every message was independently routed, so a multi-step agentic task could start on the cloud coder and silently drop back to a local conversational model mid-execution. -
**Silent task abandonment**- the agent would sometimes finish a tool loop without completing the actual task and report done with no indication anything was wrong. -
**Tool set asymmetry**- the non-streaming`/chat`

endpoint was missing 6 tools that existed in`/chat/stream`

, including`write_file`

. The non-streaming endpoint could read files but never write them. -
**Blind file overwrites**- Eve would overwrite any existing file without checking if it belonged to another project. She destroyed the Eve V2U README during a live test.**What changed during the challenge:**

*Session model locking* - sessions now lock to the cloud coder when an agentic task starts and only release on task completion or manual unlock. No more mid-task model hopping.

```
if model_id == "qwen3-coder-480b" and sid not in session_model_lock:
    session_model_lock[sid] = model_id
```

*Pre-write file safety check* - `write_file`

now checks if a file exists before overwriting and blocks unless `overwrite=True`

is explicitly passed:

```
if target.exists() and not overwrite:
    return (
        f"⚠️ WRITE BLOCKED: '{path}' already exists. "
        f"Consider writing to '{target.stem}_new{target.suffix}' instead."
    )
```

*Tool cycling detection* - catches when Eve gets stuck calling the same tool with near-identical arguments. Breaks the loop before it wastes all 40 rounds:

```
if avg_similarity > 0.70:
    logger.warning(f"Tool loop: {tool_name} called {max_repeats}x with ~same args")
    break
```

*Task completion validation* — Eve now audits her own output before reporting done:

``` python
def validate_task_completion(response_content, tool_log):
    issues = []
    if not response_content or len(response_content.strip()) < 10:
        issues.append("Empty response")
    tool_failures = [t for t in tool_log if t.get('status') == 'failed']
    if tool_failures and len(tool_failures) >= 3:
        issues.append(f"{len(tool_failures)} unaddressed tool failures")
    return {"valid": len(issues) == 0, "issues": issues}
```

*Smart context trimming* — replaced aggressive message dropping with a strategy that preserves tool call chains and the original user request.

*Agent loop timeout* — added wall-clock budget to prevent runaway cloud model loops.

**Stress tested with real tasks:**

The blind file overwrite bug was caught live - Eve was asked to build a file monitoring script and write a README. She overwrote the project README without checking. Fix shipped same day.

The harder test: build a full FastAPI REST API with SQLite storage and pytest coverage for every endpoint. Run the tests, fix failures, report results.

Result: **9/9 tests passing on the first run. 1.06 seconds. Zero failures.**

```
================================================== 9 passed, 1 warning in 1.06s
```

## My Experience with GitHub Copilot

This is where the challenge got genuinely interesting.

I pointed Copilot at the live repository - `JeffGreen311/eve-agent-v2-unleashed`

- and asked it to audit the tool usage, context handling, and auto-routing. Not "suggest improvements" in the abstract. Audit the actual code in the actual repo.

Copilot read the repository structure, pulled the key files, examined the server-side routing and tool execution logic, and came back with a comprehensive audit identifying 6 specific issues - each with root cause analysis, the exact file and line number, and production-ready fix code.

I then asked it to file those issues directly in the repository and deliver all the fix code in one session. It did exactly that.

**What worked well:**

- The audit identified the tool set asymmetry between
`/chat`

and`/chat/stream`

that I had missed entirely - a real bug causing mysterious failures for users hitting the non-streaming endpoint - The intent classification code (
`eve_tool_router.py`

) used`re.search`

with word boundaries instead of simple string matching - the right approach for avoiding false positives - Filing GitHub issues directly from the chat kept the sprint organized across multiple parallel workstreams
- The thinking traces helped me understand
*why*it was making recommendations, not just what to do

**Where I had to intervene:**

- The
`inject_into_system_prompt()`

function added tokens every round — dangerous on the 4B model with 4K context. Added a gate so it only injects when the task is incomplete AND past round 2 - Word boundary regex had an edge case with contractions. Fixed with a lookahead pattern
- Some UI React suggestions assumed component structure that didn't match the actual single-file HTML architecture - adapted those manually The overall experience: Copilot is most useful when you give it a real codebase to read rather than an abstract problem to solve. "Audit this repository" produced far better output than "how do I improve tool routing."

## What's Next

-
**Quest System**- drop a`.md`

file in`workspace/quests/`

and Eve picks it up on a timer and completes it while you sleep -
**RPG Progression**- XP, levels, and class progression tied to real work. Level 20 = Unleashed -
**Telegram integration**- remote access from your phone with quest completion notifications -
**Cross-platform polish**- Windows-primary, need Linux/macOS feedback -
**VS Code extension**- bring the terminal UI into the editor

*Built by Jeff @ S0LF0RG3 - South Texas, 5 months of nights and weekends.*

*If Eve does something impressive on your machine, drop a star and tell me what it was.*