{"slug": "a-non-coding-coding-agent", "title": "A non-coding coding agent", "summary": "A developer created Socreates, a Socratic coding agent that challenges decisions and catches mistakes but never writes code itself. The agent operates as a rubber duck with brutal opinions, forcing developers to type all code themselves while the tool provides feedback through a control loop that gathers workspace facts, executes tools, and manages context. The project aims to restore the traditional coding experience where developers enjoyed writing their own code.", "body_md": "# A non-coding coding agent\n\nWe love coding agents. They can build a full-featured SaaS and potentially make you a millionaire if you leave them running overnight with the right prompt. They will burn your GPU or your budget, include a few unprompted vulnerabilities, will bloat your code risking your sanity once you start debugging it, will put emojis in your comments, and will ultimately make you question your life choices.\n\nSo I thought, if they are so cool — it must be interesting to build one myself and steal the fame of Anthropic.\n\nBut as you know, this blog is often on the edge of absurdic programming, so the agent we’re building today is probably the first non-coding coding agent.\n\nThe agent would be called Socreates (yes, with a typo), and it is a Socratic agent. It will catch your mistakes, challenge your decisions, act as a [rubber duck](https://en.wikipedia.org/wiki/Rubber_duck_debugging) with brutal opinions — but it will never touch your code. You have to type it all yourself. And I’ve heard many developers actually enjoyed writing code in the good old days. Some even called themselves “coders”.\n\n## A gent\n\nBefore diving into code, let’s clarify what a “coding agent” actually is.\n\nWe know that LLM is just a next-token predictor. A reasoning model is the same LLM trained to spend more time on intermediate steps. An agent is a control loop that uses LLM to decide what to inspect, which tools to call, and when to stop.\n\nThis agentic loop is why Claude Code feels way more capable than the same model in a chat window.\n\nA coding agent in its simplest form has onnly a few core jobs:\n\n- Gathering facts about the workspace to help the model start doing things for you (file tree, git repo state etc)\n- Executing tools (structured actions, like reading files, running commands on\n*your*machine - like remote procedure calls from the past) - Controlling context size – clipping long outputs, avoiding redundant tool calls, compacting context in every possible way, because LLMs have limits\n- Providing memory to persist conversation state if you restart an agent the next day\n\nSome agents do more, like delegating certain tasks to bounded sub-agents, orchestrating them and doing things in parallel, but we keep it simple. One loop, four tools, no dependencies.\n\n## The loop\n\nThe loop itself is almost trivial:\n\n- User types a message\n- Agent sends system prompt + conversation history to an LLM\n- The LLM responds with text and/or requests some tool calls, we stop the loop if it’s a final answer\n- If not - agent runs the tools, feeding the results back to an LLM on the next iteration\n- Go to step 3, forcing the response if it takes too long to iterate\n\nThis is the entire “agent” part. Everything else is plumbing and parsing to make various components work together (but isn’t it the essence of modern programming?)\n\n## LLM\n\nModels are getting better and better. Some people can afford running them locally, some can afford running in the cloud, other can afford having a job that pays for Claude API keys. To make swapping an LLM easier we define an interface for all of them, and it’s a rather simple one:\n\n```\ntype LLM interface {\n    Chat(ctx context.Context, req ChatRequest) (*ChatResponse, error)\n}\n\ntype ChatRequest struct {\n    Messages []Message\n    Tools    []Tool\n}\n\ntype ChatResponse struct {\n    Content   string\n    ToolCalls []ToolCall\n    Usage     Usage\n}\n```\n\n`ChatRequest`\n\nis a conversation history and available tools schema. `ChatResponse`\n\nis text content and/or structured tool calls (if the model wants to do something). `Usage`\n\ntracks token consumption so we can print stats after each turn and decide whether it’s worth it. The agent wouldn’t know if it’s talking to a silly 7B model or DeepSeek in the cloud.\n\nBoth Ollama and OpenAI-compatible APIs support “native tool calling” via API. You send a `tools`\n\narray describing available functions as a JSON Schema, and the model responds with structured `tool_calls`\n\n. The conversation may approximately look like:\n\n```\n> system: \"You are a coding companion...\"\n> user: \"review my code\"\n< assistant: {tool_calls: [{function: {name: \"list_files\", arguments: \"{}\"}}]}\n> tool: \"[F] main.go\\n[D] pkg/\"    (tool_call_id: \"call_1\")\n< assistant: {tool_calls: [{function: {name: \"read_file\", arguments: \"{\\\"path\\\":\\\"main.go\\\"}\"}}]}\n> tool: \"[main.go: lines 1-1000 of 1000]\\n   1: package main\\n...\"   (tool_call_id: \"call_2\")\n< assistant: \"Why did you put all 1000 lines in one file? Where are all the tests?\"\n```\n\nEach provider needs its own HTTP client because the wire formats slightly differ:\n\n- Ollama (\n`/api/chat`\n\n) returns`arguments`\n\nas a JSON object already, so you marshal it with`json.RawMessage`\n\n. Token usage comes in every response as`prompt_eval_count`\n\nand`eval_count`\n\n. - OpenAI/DeepSeek (\n`/v1/chat/completions`\n\n) assumes that`content`\n\nis serialized as JSON`null`\n\n(not omitted) for tool-only messages; tool results without a call ID get rejected. Token usage is provided, too, but in a different format – as a standard`usage`\n\nobject.\n\nThese are boring details and implementations are boring, too. You may check it on Github (link at the end).\n\n## System Prompt\n\nNow you may feel like a real markdown engineer.\n\nIn coding agents the prompt is usually assembled from multiple layers: a stable prefix (instructions + tool schemas + workspace summary), then the changing session state (recent history + user request). That’s how you end up using cached tokens (cheap ones) for the prefix and expensive ones for the rest of it.\n\nSocreates is simple enough and the system prompt (“stable prefix”) is just a brief message with a workspace path injected with a printf:\n\n```\nYou are a coding companion — a sharp, critical reviewer who catches bugs\nand challenges decisions. You NEVER write code. The developer types all code;\nyou ask questions, spot issues, and verify correctness using tools.\n\nWorkspace root: socreates (use \".\" or relative paths like \"main.go\" in all tool calls)\n\n## TOOL USAGE\n- ALL paths are relative to workspace root. NEVER use absolute paths.\n- read_file shows \"[file: lines X-Y of Z]\" — plan reads to cover the file in 1-2 calls max.\n- search returns up to 30 matches. Use specific patterns.\n- RESPOND within 2-3 tool rounds. Do not explore exhaustively.\n\n## BEHAVIOR\n1. No code, no snippets, no pseudocode — ever.\n2. Be concise: 2-4 points per response. No preamble, no filler.\n3. Cite specific lines when pointing out issues.\n4. Challenge assumptions: \"What if X is nil?\", \"Did you handle the error on line N?\"\n5. When code looks correct, say so in one line and stop.\n\nContext: task=review my code files=[main.go, tools.go]\n```\n\nThis is the part that LLMs wrote for me after arguing with each other for some time, and I highly doubt it’s a good one. But it does the job. Unfortunately, this prompt is the essence of the product. You change the “personality” or the rules - and you get a lobotomised junior rubber duck instead of an experienced critic.\n\nThe last line (or lines, in practice) are *session memory* – the agent keeps track of the current task description and recently touched files, appending them to the system prompt so the model has continuity across tool rounds without needing to re-read everything. This is the dynamic part of the prompt.\n\nTool schemas are passed via the API’s native `tools`\n\nparameter, so at least those are well-structured and not random markdown. Tools have brief descriptions and examples on how to call them.\n\n## Looooop\n\nWe interrogate our LLM in a loop. For a number of iterations we send the accumulated system prompt + conversation so far, until it stops “thinking” and gives us a final answer. We have a cap for the number of iterations, otherwise some witty models would be happy to drain your budget on every simple request. We politely follow model’s demands and call necessary tools, ideally – in parallel.\n\n```\nfunc (a *Agent) Chat(ctx context.Context, input string) (string, error) {\n    a.messages = append(a.messages, Message{Role: RoleUser, Content: input})\n\n    for step := range a.maxSteps {\n        a.truncateHistory()\n        system := buildSystemPrompt(a.cwd, a.memory)\n        if step >= a.maxSteps-2 { // LLM has been warned, time to wrap it up!\n            system += \"\\n\\n[SYSTEM: Step limit reached. You MUST respond now.]\"\n        }\n        msgs := append([]Message{{Role: RoleSystem, Content: system}}, a.messages...)\n\n        // Too late. Don't run the tools at all on a final iteration\n        var tools []Tool\n        if step < a.maxSteps-1 {\n            tools = a.tools\n        }\n\n        resp, err := a.llm.Chat(ctx, ChatRequest{Messages: msgs, Tools: tools})\n        if err != nil { return \"\", err }\n\n        if len(resp.ToolCalls) == 0 { // Final answer, we're done!\n            a.messages = append(a.messages, Message{Role: RoleAssistant, Content: resp.Content})\n            return resp.Content, nil\n        }\n\n        // LLM needs more information: call talls\n        a.messages = append(a.messages, Message{Role: RoleAssistant, ToolCalls: resp.ToolCalls})\n        results := executeInParallel(resp.ToolCalls)\n        for _, r := range results {\n            a.messages = append(a.messages, Message{Role: RoleTool, Content: r.Content, ToolCallID: r.ID})\n        }\n    }\n    return \"I'm lost. Try rephrasing?\", nil\n}\n```\n\nUser message is added once before the loop, not on every iteration. This mistake costed me a few cents on DeepSeek.\n\nStep limit is rather generous (to my taste) - I stop after 10 iterations. I’ve heard most agents expect only a couple of tool calls per iteration and only 3-5 iterations. But the models I tried were slow thinkers.\n\nWarning the LLM about iteration limit helped to avoid scenarios where it has been thinking and not giving any proper answer at the end.\n\nParallel tool calling is likely not so important for a toy agent – reading files is quick – but it’s Go, so at least some concurrency is expected.\n\n## Context compaction\n\nToken budgets keep me awake. Without compaction a long conversation exceeds the model’s context window making it dumber, but also starts costing real money. Socreates uses a naive two-pass approach:\n\n``` js\nconst maxHistoryTokens = 16000 // around 64KB of text, which is enough for everyone, right?\n\nfunc (a *Agent) truncateHistory() {\n    // Pass 1: Trim historical tool outputs (keep first 400 chars)\n    for i := range a.messages {\n        if a.messages[i].Role == RoleTool && estimateTokens(a.messages[i].Content) > 100 {\n            a.messages[i].Content = a.messages[i].Content[:400] + \"\\n...[truncated]\"\n        }\n        if a.historyTokens() <= maxHistoryTokens { return }\n    }\n    // Pass 2: Drop complete req/res turns from the head of the list\n    for len(a.messages) > 2 && a.historyTokens() > maxHistoryTokens {\n        cut := 1\n        if a.messages[0].Role == RoleAssistant && len(a.messages[0].ToolCalls) > 0 {\n            // we must not break message + tool connections, DeepSeek rejects invalid references in tool calls\n            // so we cut some extra, if we have to\n            for cut < len(a.messages) && a.messages[cut].Role == RoleTool {\n                cut++\n            }\n        }\n        a.messages = a.messages[cut:]\n    }\n}\n```\n\nI was surprised that we can’t drop assistant response message without dropping its tool responses. Orphaned tool responses seem to be invalid for OpenAI/DeepSeek protocol. But that only makes compaction more aggressive, which might work in our favour cost-wise.\n\n## Tools\n\nA plain LLM can suggest commands in markdown. An agent with tools receives them in a structured way and executes them. This allows us to validate inputs and have at least some safety boundaries for tool calling. At least the model can’t hallucinate arbitrary actions without us noticing.\n\nSince our agent is a non-coding one – it can not write files. I think we could go pretty far with just four tools:\n\n`list_files`\n\n: fs.WalkDir file tree listing, within the workspace`read_file`\n\n: io.Scanner text read from a line range`search`\n\n: recursive regex search, like grep but without shell injections`run_command`\n\n: the most dangerous one, run anything (`go test`\n\nor`git diff`\n\n), but always ask user to confirm\n\nEvery tool output is truncated after 16K characters (~4K tokens). We also inform the model about file sizes, so it has a chance to use tools wisely. Our `read_file`\n\ntool also returns a continuation hint: `[150 more lines. Use start=501 to continue.]`\n\n, telling the model how to proceed. At least with DeepSeek I’ve seen it helping.\n\nOur path resolution is simple, but good enough to avoid reading from `./pkg/../../../etc/passwd`\n\n.\n\n```\nfunc resolvePath(root, path string) string {\n    abs := filepath.Clean(filepath.Join(root, path))\n    if rel, err := filepath.Rel(root, abs); err != nil || strings.HasPrefix(rel, \"..\") {\n        return \"\"\n    }\n    return abs\n}\n```\n\nAnd yes, there is an option to auto-approve all commands if you run it in an isolated environment that satisfies your levels of paranoia.\n\n## Memory\n\nA coding agent should survive across turns and restarts. Our session state is a full conversation: user messages, assistant responses, tool calls, and tool results. Like a “transcript”, stores as JSONL inside `.socreates/session.json`\n\n. One session at a time. I’m not good at multi-tasking anyway.\n\n```\ntype Session struct {\n    ID       string    `json:\"id\"`\n    Created  time.Time `json:\"created\"`\n    Memory   Memory    `json:\"memory\"`\n    Messages []Message `json:\"messages\"`\n}\n\ntype Memory struct {\n    Task  string   `json:\"task\"`  // First user message\n    Files []string `json:\"files\"` // Last files touched by tools\n}\n```\n\nOn `/reset`\n\n, the current session is archived (renamed) and a fresh one starts. On restart, the agent loads its last session and continues where it left off. The `Memory`\n\nprovides some clues appended to the system prompt, so that the model would know what we were working on without reading the entire history.\n\nIn other words, the stored transcript is complete (every message, full), but what we send in the context to an LLM is compacted and truncated. A bounded window I think it’s called.\n\n## REPL\n\nThe CLI is just stdin/stdout, no TUI, no fancy blinking animations. I run it with `rlwrap`\n\nfor some line editing, so it looks like this:\n\n``` php\n> review my error handling\n  (thinking...)\n  -> list_files(map[path:.])\n  -> read_file(map[path:main.go])\n\nWhat happens on line 247 if llm.Chat returns an error? You return it immediately,\nbut the user message was already appended to history on line 249. Doesn't that\nmean a failed request still pollutes the conversation state?\n\n  [tokens: 12340 in, 156 out | session: 24680 in, 312 out]\n```\n\nI’ve tested it with qwen, llama, gemma and deepseek API, with mixed results, of course. But it’s not complete rubish, so I’m happy.\n\nAt least it was a nice experiment: how would a coding agent look like if you use it as a fellow programmer and don’t let it touch your keyboard but are happy to hear their grunting and nagging.\n\nIf you want to give it a try – it’s on Github: [github.com/zserge/socreates](https://github.com/zserge/socreates). Suggestions, contributions and feedback are always welcome!\n\nI hope you’ve enjoyed this article. You can follow – and contribute to – on [Github](https://github.com/zserge), [Mastodon](https://mastodon.social/@zserge), [Twitter](https://twitter.com/zsergo) or subscribe via [rss](/rss.xml).\n\n*May 25, 2026*\n\nSee also:\n[The old way to the modern web services](/posts/go-web-services/) and [more](/posts/).", "url": "https://wpnews.pro/news/a-non-coding-coding-agent", "canonical_source": "https://zserge.com/posts/socreates/", "published_at": "2026-05-27 23:41:26+00:00", "updated_at": "2026-05-27 23:56:40.897766+00:00", "lang": "en", "topics": ["ai-agents", "large-language-models", "ai-tools", "generative-ai", "artificial-intelligence"], "entities": ["Anthropic", "Socreates", "Claude Code"], "alternates": {"html": "https://wpnews.pro/news/a-non-coding-coding-agent", "markdown": "https://wpnews.pro/news/a-non-coding-coding-agent.md", "text": "https://wpnews.pro/news/a-non-coding-coding-agent.txt", "jsonld": "https://wpnews.pro/news/a-non-coding-coding-agent.jsonld"}}