{"slug": "building-m31a-a-terminal-native-ai-coding-agent-that-ships-not-just-suggests", "title": "Building M31A: A Terminal-Native AI Coding Agent That Ships, Not Just Suggests", "summary": "A developer built M31A (M31 Autonomous), a terminal-native AI coding agent written in Go that owns a six-phase workflow end-to-end: Initialize, Discuss, Plan, Execute, Verify, and Ship. The agent runs as a single static binary with zero telemetry, works on any POSIX shell, and ends each run with a verified git commit and a learning ledger entry. The architecture features a clean six-layer separation of concerns, including a TUI layer, workflow engine, providers, tools, domain packages, and infrastructure.", "body_md": "Most AI coding assistants are glorified autocomplete on steroids. They suggest code, maybe write a function or two, but leave you holding the bag when it comes to testing, verification, and actually shipping the changes.\n\n**M31A (M31 Autonomous)** takes a different approach. It's a terminal-based AI coding agent written in Go that owns a **six-phase workflow end-to-end**: Initialize → Discuss → Plan → Execute → Verify → Ship. Every run ends with a verified git commit and a learning ledger entry. One static binary, zero telemetry, any POSIX shell.\n\nIn this post, I'll walk you through the architecture, design decisions, and technical highlights of this open-source project.\n\nHere's the typical workflow with most AI coding tools:\n\nThe AI \"helped\" with step 1, but you're still doing 80% of the work. And if something breaks three commits later? Good luck figuring out what the AI actually changed.\n\nM31A flips this model. Instead of being a suggestion engine, it's an **autonomous agent** that:\n\nM31A is built with a clean six-layer architecture:\n\n```\n┌─────────────────────────────────────────────────────────────┐\n│  TUI Layer (Bubble Tea)                                     │\n│  29 screens, keyboard/mouse handling, streaming display     │\n└─────────────────────────────────────────────────────────────┘\n                           ↓\n┌─────────────────────────────────────────────────────────────┐\n│  Workflow Engine                                             │\n│  Six-phase orchestration, LLM streaming, plan parsing       │\n└─────────────────────────────────────────────────────────────┘\n                           ↓\n         ┌─────────────────┼─────────────────┐\n         ↓                 ↓                 ↓\n┌──────────────┐  ┌──────────────┐  ┌────────────────────┐\n│  Providers   │  │  Tools       │  │  Domain Packages   │\n│  OpenRouter  │  │  Bash        │  │  session, ledger   │\n│  Zen         │  │  FileRead    │  │  rollback, bisect  │\n│  Fallback    │  │  FileWrite   │  │  taskrunner        │\n│              │  │  Glob, Grep  │  │  keychain          │\n└──────────────┘  └──────────────┘  └────────────────────┘\n         ↓                 ↓                 ↓\n┌─────────────────────────────────────────────────────────────┐\n│  Infrastructure Layer                                        │\n│  git, config, tokens, codeintel, fileutil, logging          │\n└─────────────────────────────────────────────────────────────┘\n```\n\nThe key insight? **Separation of concerns at every level.** The TUI doesn't know about LLM APIs. The workflow engine doesn't know about terminal rendering. The tools don't know about workflow phases.\n\nThe heart of M31A is the workflow engine, implemented in `internal/workflow/engine.go`\n\n. Let's break down each phase:\n\nThe agent detects your project type (Go, Python, Node, etc.), initializes git if needed, and creates a `.m31a/`\n\nplanning directory with:\n\n`PROJECT.md`\n\n— project metadata`STATE.md`\n\n— current workflow state`TASKS.md`\n\n— task list (populated later)\n\n```\n// From internal/workflow/initialize.go\nfunc (e *Engine) runInitialize(ctx context.Context) error {\n    // Detect project type, framework, language\n    project := e.detectProject()\n\n    // Initialize git repo if needed\n    if !e.git.IsRepository() {\n        e.git.Init()\n    }\n\n    // Create planning directory\n    os.MkdirAll(e.planningDir, 0755)\n\n    // Write PROJECT.md, STATE.md\n    e.writeProjectState(project)\n}\n```\n\nBefore jumping into code, the agent asks clarifying questions via LLM streaming. This prevents the classic \"I built exactly what you asked for, but not what you wanted\" problem.\n\nThe discuss phase uses embedded prompt templates (loaded via `//go:embed prompts/*.md`\n\n) to guide the LLM toward asking useful questions about scope, constraints, and edge cases.\n\nThe agent generates a structured implementation plan in markdown format. A custom parser (`internal/workflow/plan_parser.go`\n\n) extracts:\n\n```\n// From internal/workflow/plan_parser.go\ntype Plan struct {\n    Title      string\n    Tasks      []Task\n    Questions  []string\n    Notes      string\n}\n\ntype Task struct {\n    ID           int\n    Action       string\n    Description  string\n    Files        []string\n    Dependencies []int\n}\n```\n\nThe plan parser supports refinement with retry logic (max 3 retries, max 5 refinements) and classifies prompt complexity: `trivial → simple → moderate → complex`\n\n.\n\nThis is where the rubber meets the road. The task runner (`pkg/taskrunner/runner.go`\n\n) uses **Kahn's algorithm** for topological sorting to determine execution order:\n\n```\n// From pkg/taskrunner/runner.go\nfunc (r *Runner) Schedule() ([][]int, error) {\n    // Build adjacency list and in-degree count\n    inDegree := make(map[int]int)\n    dependents := make(map[int][]int)\n\n    for _, t := range r.tasks {\n        for _, dep := range t.Dependencies {\n            inDegree[t.ID]++\n            dependents[dep] = append(dependents[dep], t.ID)\n        }\n    }\n\n    // Find all tasks with no dependencies\n    var queue []int\n    for _, t := range r.tasks {\n        if inDegree[t.ID] == 0 {\n            queue = append(queue, t.ID)\n        }\n    }\n\n    // Process tasks in topological order\n    var groups [][]int\n    for len(queue) > 0 {\n        groups = append(groups, queue)\n        var next []int\n        for _, id := range queue {\n            for _, dep := range dependents[id] {\n                inDegree[dep]--\n                if inDegree[dep] == 0 {\n                    next = append(next, dep)\n                }\n            }\n        }\n        queue = next\n    }\n\n    return groups, nil\n}\n```\n\nTasks within a group can run with bounded parallelism (default: 4 concurrent tasks via semaphore). The executor includes a **self-heal loop** that retries recoverable failures up to 2 times.\n\nThe agent runs verification checks:\n\nIf verification fails, the agent can rollback the commit chain using git-bisect integration.\n\nThe final phase:\n\nM31A supports two LLM providers out of the box:\n\nThe provider layer (`internal/provider/`\n\n) includes some clever engineering:\n\nWhen a provider degrades (429 rate limit, 503 service unavailable), M31A automatically switches to a healthy provider. The fallback logic uses **parallel health checks** to minimize latency:\n\n```\n// From internal/provider/fallback.go\nfunc FindFallbackProvider(registry *Registry, current string) (string, *FallbackEvent, error) {\n    // Collect candidate providers\n    candidates := registry.ListAll()\n\n    // Parallel health checks (10s timeout)\n    ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)\n    defer cancel()\n\n    ch := make(chan result, len(candidates))\n    for _, c := range candidates {\n        go func(c candidate) {\n            status := c.provider.HealthCheck(ctx)\n            ch <- result{name: c.name, status: status}\n        }(c)\n    }\n\n    // Return first healthy provider in priority order\n    for i := 0; i < len(candidates); i++ {\n        r := <-ch\n        if r.status.Status == \"live\" || r.status.Status == \"slow\" {\n            registry.TrySetActive(r.name)\n            return r.name, &FallbackEvent{...}, nil\n        }\n    }\n}\n```\n\nM31A includes a **model arbitrage system** (`pkg/arbitrage/`\n\n) that automatically switches to the cheapest model that meets the task's capability threshold:\n\n```\n// From pkg/arbitrage/arbitrage.go\nfunc (s *Scorer) Score(task Task) (ComplexityLevel, int) {\n    level := classifyText(task.Action, task.Description)\n\n    // Boost complexity when task touches many files\n    if len(task.Files) > 3 {\n        level = boostLevel(level, 1)\n    }\n\n    // Boost when task has many dependencies\n    if len(task.Dependencies) > 3 {\n        level = boostLevel(level, 1)\n    }\n\n    input, output := s.EstimateTokens(level, task)\n    return level, input + output\n}\n```\n\nThe scorer uses keyword analysis to classify tasks as `simple`\n\n, `moderate`\n\n, or `complex`\n\n, then recommends the cheapest model that can handle that complexity level.\n\nM31A ships with 5 core tools:\n\nThe tool surface area is intentionally small. Each tool is aggressively sandboxed with:\n\nEvery tool call is gated by a permission modal with configurable timeout (default 300s):\n\n```\n// From internal/tools/permissions.go\ntype PermissionMode string\n\nconst (\n    ModeAsk         PermissionMode = \"ask\"\n    ModeAllowAll    PermissionMode = \"allow_all\"\n    ModeDenyAll     PermissionMode = \"deny_all\"\n)\n\nfunc (d *Dispatcher) RequestPermission(ctx context.Context, tool Tool, input ToolInput) error {\n    if d.mode == ModeAllowAll {\n        return nil\n    }\n\n    // Send permission request to TUI\n    ch := make(chan PermissionResponse)\n    d.emitter.Emit(PermissionRequestMsg{...})\n\n    // Wait for user response with timeout\n    select {\n    case resp := <-ch:\n        if !resp.Approved {\n            return ErrPermissionDenied\n        }\n    case <-time.After(d.timeout):\n        return ErrPermissionTimeout\n    }\n}\n```\n\nEach tool declares its risk level:\n\n```\ntype RiskLevel string\n\nconst (\n    RiskSafe        RiskLevel = \"safe\"\n    RiskMedium      RiskLevel = \"medium\"\n    RiskDangerous   RiskLevel = \"dangerous\"\n    RiskDestructive RiskLevel = \"destructive\"\n)\n```\n\nBash is `dangerous`\n\n, FileWrite is `medium`\n\n, FileRead is `safe`\n\n. The permission system uses these levels to determine whether to prompt the user.\n\nOne of M31A's most interesting features is the **cross-session learning ledger** (`pkg/ledger/`\n\n). Every session writes a structured record to a markdown file:\n\n```\n| Session | Model | Tasks | Failed | Cost | Duration | Framework |\n|---------|-------|-------|--------|------|----------|-----------|\n| a1b2c3d4 | claude-3.5-sonnet | 5 | 1 | $0.12 | 8min | react |\n| e5f6g7h8 | gpt-4-turbo | 3 | 0 | $0.08 | 4min | go |\n```\n\nThe ledger tracks:\n\nOver time, the agent can query the ledger to learn from past sessions:\n\n```\n// From pkg/ledger/ledger.go\ntype LedgerStats struct {\n    TotalSessions      int\n    AvgTaskCount       float64\n    AvgCost            float64\n    AvgDurationMinutes float64\n    TotalFailedTasks   int\n    TopFailures        []string\n    TopFrameworks       []string\n    ByProjectType      map[string]int\n}\n```\n\nThis creates a feedback loop where the agent gets sharper over time, learning which frameworks are common, what types of tasks fail, and how long things typically take.\n\nLong conversations blow the context window. M31A solves this with **AutoDream** (`pkg/autodream/`\n\n), an automatic context consolidation system:\n\n```\n// From pkg/autodream/autodream.go\nfunc (c *Consolidator) Consolidate() (ConsolidationResult, error) {\n    // Protect system prompts and recent messages\n    protected := c.protectedIndices()\n    candidates := c.candidateIndices(protected)\n\n    // Summarize oldest 50% of non-protected messages\n    midpoint := len(candidates) / 2\n    toCompress := candidates[:midpoint]\n\n    // Build summary prompt\n    summary := c.summarize(toCompress)\n\n    // Replace old messages with summary\n    c.messages = c.replaceWithSummary(toCompress, summary)\n\n    return ConsolidationResult{\n        MessagesRemoved: len(toCompress),\n        TokensSaved:     c.estimateTokensSaved(toCompress, summary),\n    }\n}\n```\n\nAutoDream triggers at 60% context usage by default. It uses role-sampled summarization (system prompts are never compressed) and preserves recent messages for continuity.\n\nThe terminal UI is built with [Bubble Tea](https://github.com/charmbracelet/bubbletea), following the Elm architecture. Screen routing uses an enum-based dispatcher:\n\n```\n// From internal/tui/app_state.go\ntype Screen int\n\nconst (\n    ScreenREPL Screen = iota\n    ScreenGoalInput\n    ScreenPhaseModelPicker\n    ScreenPlan\n    ScreenDiscuss\n    ScreenExecute\n    ScreenVerify\n    ScreenShip\n    ScreenModelSelector\n    ScreenSettings\n    // ... 19 more screens\n)\n\nfunc (m AppState) Update(msg tea.Msg) (tea.Model, tea.Cmd) {\n    switch msg := msg.(type) {\n    case SwitchScreenMsg:\n        m.screen = msg.Screen\n        return m, nil\n    }\n\n    // Route to active screen's Update function\n    switch m.screen {\n    case ScreenREPL:\n        m.repl, cmd = m.repl.Update(msg)\n    case ScreenPlan:\n        m.plan, cmd = m.plan.Update(msg)\n    // ...\n    }\n}\n```\n\nThe TUI includes some nice touches:\n\nWhen verification fails, M31A can rollback the commit chain using git-bisect integration (`pkg/bisect/`\n\n):\n\n```\n// From pkg/rollback/rollback.go\nfunc (r *Rollback) HardReset(commit string) error {\n    // Create backup branch before destructive operation\n    backupName := fmt.Sprintf(\"m31a/rollback-backup-%d\", time.Now().Unix())\n    r.git.CreateBranch(backupName)\n\n    // Auto-stash uncommitted changes\n    if r.git.HasUncommittedChanges() {\n        r.git.Stash()\n        defer r.git.StashPop()\n    }\n\n    // Hard reset to target commit\n    return r.git.ResetHard(commit)\n}\n```\n\nThe rollback system maintains a commit chain with soft/hard/safe reset options. Safe reset creates backup branches before any destructive operation.\n\nAPI keys are stored using OS-native keychain backends (`pkg/keychain/`\n\n):\n\n`pass`\n\nCLI fallback`/usr/bin/security`\n\nCLI\n\n```\n// From pkg/keychain/keychain.go\ntype Keychain interface {\n    Get(service string) (string, error)\n    Set(service, value string) error\n    Delete(service string) error\n}\n```\n\nThe keychain abstraction uses build tags to select the platform-specific implementation at compile time. Service names follow the pattern `m31a/openrouter`\n\n, `m31a/zen`\n\n.\n\n**Key resolution order**:\n\n`M31A_OPENROUTER_API_KEY`\n\n`OPENROUTER_API_KEY`\n\n`m31a/openrouter`\n\n`provider.openrouter.api_key`\n\nKeys are never written to disk in plaintext when keychain is available.\n\nM31A is compiled with `CGO_ENABLED=0`\n\n, producing a fully static binary with no C dependencies:\n\n```\n# From Makefile\nbuild:\n    CGO_ENABLED=0 go build -ldflags \"-s -w \\\n        -X main.Version=$(VERSION) \\\n        -X main.Commit=$(COMMIT) \\\n        -X main.Date=$(DATE)\" \\\n        -o m31a ./cmd/m31a\n```\n\nThe binary is typically 15-20MB (stripped with `-s -w`\n\nldflags). Cross-compilation targets include linux/darwin/windows × amd64/arm64.\n\n**Zero telemetry**: no analytics, no crash reporting, no usage pings. Your code never leaves your machine except when sent to the LLM provider for inference.\n\nSessions persist to `<workDir>/.m31a/session.json`\n\n, including:\n\n`messages.json`\n\n)If you hit `Ctrl+C`\n\n, lose network, or your laptop dies, you can resume mid-workflow:\n\n``` bash\n$ m31a --resume\n# Shows session browser with recent sessions\n# Restores workflow state and continues from last checkpoint\n```\n\nM31A uses Go's standard `testing`\n\npackage with no external mocking frameworks:\n\n`t.Parallel()`\n\nCoverage targets:\n\n`pkg/taskrunner`\n\n(89.9%), `pkg/bisect`\n\n(91.3%), `pkg/rollback`\n\n(89.1%)The test suite includes some interesting patterns:\n\n```\n// Security test for SSRF protection\nfunc TestWebFetch_BlocksPrivateIPs(t *testing.T) {\n    tests := []struct {\n        url      string\n        wantErr  error\n    }{\n        {\"http://127.0.0.1/admin\", ErrPrivateIPBlocked},\n        {\"http://192.168.1.1/config\", ErrPrivateIPBlocked},\n        {\"http://10.0.0.1/secret\", ErrPrivateIPBlocked},\n        {\"http://169.254.169.254/metadata\", ErrPrivateIPBlocked}, // AWS metadata\n    }\n\n    for _, tt := range tests {\n        t.Run(tt.url, func(t *testing.T) {\n            t.Parallel()\n            _, err := WebFetch(tt.url)\n            if !errors.Is(err, tt.wantErr) {\n                t.Errorf(\"got %v, want %v\", err, tt.wantErr)\n            }\n        })\n    }\n}\n```\n\nInstallation is a one-liner:\n\n```\n# macOS (Homebrew)\nbrew install eshanized/tap/m31a\n\n# Linux / macOS (curl)\ncurl -fsSL https://raw.githubusercontent.com/eshanized/M31A/main/install.sh | bash\n\n# From source (any OS)\ngit clone https://github.com/eshanized/M31A.git\ncd M31A\nCGO_ENABLED=0 go build -o m31a ./cmd/m31a\n```\n\nOn first launch, M31A prompts for your OpenRouter or Zen API key and stores it in the OS keychain.\n\n**Basic usage**:\n\n``` bash\n$ m31a\n# TUI launches\n# Type your goal: \"refactor the auth middleware to use JWT with RS256\"\n# Agent runs through six phases\n# Ends with verified git commit\n```\n\n**Slash commands**:\n\n```\n/help          list all commands\n/workflow      kick off the six-phase flow\n/model         open the model selector (fuzzy search)\n/provider      switch provider\n/ledger stats  show your cross-session ledger\n/rollback      show the commit chain; --hard to reset\n/compress      trigger AutoDream manually\n```\n\nM31A is at v1.0.0 with the core feature set complete. The roadmap includes:\n\nBuilding M31A taught me a few things:\n\n**Workflow ownership matters more than code generation**. The six-phase workflow is more valuable than any single code suggestion.\n\n**Small tool surface area is a feature**. Five well-sandboxed tools are easier to secure than twenty half-baked ones.\n\n**Learning compounds**. The cross-session ledger creates a feedback loop that makes the agent better over time.\n\n**Terminal UIs can be delightful**. Bubble Tea proves that terminal apps don't have to be ugly or hard to use.\n\n**Static binaries are liberating**. No runtime dependencies, no Docker required, just download and run.\n\nM31A is an experiment in what AI coding assistants could be if they owned the entire workflow instead of just the fun part. It's not perfect — the TUI test coverage needs work (38.6%), and there are some known bugs around git status detection — but the architecture is sound and the core workflow is production-ready.\n\nIf you're interested in the intersection of AI, developer tools, and terminal UIs, I'd love your feedback. Star the repo, open an issue, or better yet, try it on your codebase and let me know what breaks.\n\n**Links**:\n\n*Thanks to the Bubble Tea, Lip Gloss, and Glamour teams for making terminal UIs enjoyable to build. And thanks to everyone who has tried M31A and reported bugs — your feedback makes it better.*", "url": "https://wpnews.pro/news/building-m31a-a-terminal-native-ai-coding-agent-that-ships-not-just-suggests", "canonical_source": "https://dev.to/eshanized/building-m31a-a-terminal-native-ai-coding-agent-that-ships-not-just-suggests-2p0m", "published_at": "2026-06-15 23:14:24+00:00", "updated_at": "2026-06-15 23:46:59.250570+00:00", "lang": "en", "topics": ["ai-agents", "developer-tools", "large-language-models", "generative-ai", "ai-products"], "entities": ["M31A", "Go", "OpenRouter", "Bubble Tea", "POSIX"], "alternates": {"html": "https://wpnews.pro/news/building-m31a-a-terminal-native-ai-coding-agent-that-ships-not-just-suggests", "markdown": "https://wpnews.pro/news/building-m31a-a-terminal-native-ai-coding-agent-that-ships-not-just-suggests.md", "text": "https://wpnews.pro/news/building-m31a-a-terminal-native-ai-coding-agent-that-ships-not-just-suggests.txt", "jsonld": "https://wpnews.pro/news/building-m31a-a-terminal-native-ai-coding-agent-that-ships-not-just-suggests.jsonld"}}