{"slug": "i-couldn-t-find-a-production-ready-go-framework-for-ai-agents-so-i-built-one", "title": "I Couldn't Find a Production-Ready Go Framework for AI Agents. So I Built One.", "summary": "A developer built **eywa**, an open-source Go framework for conversational AI agents, after finding no production-ready alternatives in the Go ecosystem. The framework, released under an MIT license, implements hexagonal architecture with strict separation between business domain and infrastructure, using interfaces called \"ports\" for components like distributed locks and LLM abstractions. Eywa addresses concurrency and reliability challenges that arise when handling real-world traffic, such as multiple simultaneous webhook events.", "body_md": "*How I built a production-grade Go framework for conversational AI agents — and the architecture decisions that actually matter.*\n\nFifteen years of writing software professionally and I never open sourced a single thing.\n\nNot because I didn't want to. It's just how it works when you build inside companies — the code belongs to them, the problems are specific to their domain, and by the time you could abstract something useful, you've already moved on to the next fire.\n\nTwo months ago I decided to change that.\n\nI was building a conversational AI agent in Go. Needed a framework. Went looking for one and found... Python. More Python. A few Go repos that were abandoned in 2023. And a lot of \"just wrap the OpenAI SDK\" advice that works fine until you have real traffic and your agent starts responding twice to the same message.\n\nNothing production-ready. Nothing with actual architecture. Nothing I could hand to a team and say *this will hold up.*\n\nSo I built it. The result is **eywa** — a Go framework for conversational AI agents, hexagonal architecture, v1.0.0, MIT license, open source.\n\nThe name comes from Avatar — Eywa is the neural network connecting all living things on Pandora. The metaphor fit: a system that connects LLMs, channels, memory, and tools into a single organism, where each part perceives and responds to the environment.\n\nHere's what I learned building it.\n\nThe AI ecosystem lives in Python. LangChain, LlamaIndex, CrewAI — all Python. If you're prototyping, exploring, or running notebooks, this makes complete sense.\n\nBut if you're running something in production at scale — where real users are sending real messages and you need observability, concurrency control, and something that doesn't fall over at 3am — Go is a very different story.\n\nGo gives you:\n\n`go test -race`\n\n) — which will find bugs Python won't even seeThe Python frameworks assume you'll have one request at a time or handle concurrency via queues outside the framework. When you're dealing with WhatsApp webhooks at scale — multiple events per user arriving milliseconds apart — that assumption breaks.\n\nThe core principle in eywa is that the business domain should have absolutely zero knowledge of infrastructure.\n\nNo OpenAI SDK imports in domain code. No Redis calls. No MongoDB queries. Just interfaces — what eywa calls **ports**.\n\nThe domain defines what it needs. Infrastructure implements it. Wiring happens at startup.\n\nHere's the Bond port — the distributed lock:\n\n```\ntype Bond interface {\n    AcquireLock(ctx context.Context, key string, ttl time.Duration) (bool, error)\n    ReleaseLock(ctx context.Context, key string) error\n    ExtendLock(ctx context.Context, key string, ttl time.Duration) error\n}\n```\n\nThe domain knows it can acquire and release locks. It does not know that the implementation uses Redis Redlock under the hood. In tests, you inject a no-op. In production, you inject the Redis adapter.\n\nSame pattern for the Oracle (the LLM abstraction):\n\n```\ntype OracleRequest struct {\n    Model         string\n    SystemPrompt  string\n    Messages      []OracleMessage\n    Temperature   float64\n    MaxTokens     int\n    Tools         []OracleTool\n    UseTools      bool\n    Attachments   []LLMAttachment\n}\n```\n\nThe domain sends an `OracleRequest`\n\n. Whether that goes to Anthropic, OpenAI, Gemini, Bedrock, or VertexAI is an infrastructure concern. Swap providers at startup. Run multiple providers simultaneously. The domain doesn't care.\n\nThis is not over-engineering. It's what makes the system testable, maintainable, and survivable when the next LLM provider comes out and everyone wants to switch.\n\nOne thing I invested heavily in: naming. Not just clean variable names — a consistent domain vocabulary that every piece of code uses.\n\nYes, the names are intentional. I wanted a consistent domain vocabulary instead of \"Manager\", \"Service\", \"Handler\", and \"Util\" — names that tell you nothing about what the component actually does in the context of an AI agent.\n\n| Name | What it is |\n|---|---|\nWeave |\nThe runtime engine — orchestrates everything per event |\nSpirit |\nAgent configuration — LLM, tools, system prompt, behavior |\nPulse |\nInbound event — a message received from a channel |\nOracle |\nLLM abstraction — send prompt, receive response |\nBond |\nDistributed lock — prevents concurrent duplicate responses |\nVoice |\nOutbound adapter — sends replies back to the channel |\nScout |\nContext enrichment step — runs before the LLM call |\nLore |\nRAG — retrieval-augmented generation |\nImprint |\nLong-term memory injection |\nVigil |\nHuman-in-the-loop takeover |\nRite |\nApproval workflow — gates actions behind human confirmation |\nConduit |\nMCP (Model Context Protocol) client adapter |\n\nWhen your code says `bond.AcquireLock(...)`\n\ninstead of `redisLock.Lock(...)`\n\n, you stop thinking about infrastructure and start thinking about the domain. Terminology is design.\n\nHere's a scenario that happens in production and almost no framework handles it:\n\nA user sends a WhatsApp message. The webhook fires. Your agent starts processing — LLM call in progress, 800ms into it.\n\nThe user gets impatient and sends the same message again. Second webhook fires.\n\nNow you have two goroutines processing the same user's context simultaneously. The first finishes, writes the response and updates memory. The second finishes, writes *another* response using stale memory state, overwriting the first update.\n\nThe user gets two responses. Memory is inconsistent. You've introduced a race condition at the application level.\n\nThis is Bond.\n\nBefore the Weave processes any Pulse, it acquires a distributed lock keyed by the user's session ID. If the lock is already held, the event is discarded. Only one active processing per user, ever.\n\nThe contract is precise: `AcquireLock`\n\nreturns `(false, nil)`\n\nwhen the lock is held (expected case), and `(false, error)`\n\nonly for infrastructure failures. This distinction matters — the caller handles them differently.\n\nThe pipeline that runs on every Pulse before the LLM call:\n\n```\nPulse → Scouts → Pathfinder → Spirit → Oracle → Actions → Voice\n```\n\nScouts are sequential context enrichment steps. They read from external systems and inject knowledge into the Pulse before the model sees anything.\n\n```\ntype Scout interface {\n    GetName() string\n    Harvest(ctx context.Context, event *entities.Pulse) error\n    IsApplicable(event *entities.Pulse) bool\n}\n```\n\nThe critical design decision: **Scouts are fail-open**.\n\nA Scout that returns an error gets logged. The pipeline continues without its data. The LLM call still happens.\n\nWhy? Because if a Scout is hitting a CRM to enrich the user's context, and the CRM is having a slow morning, you don't want your entire agent to stop responding. You want it to keep working with less context, gracefully.\n\nThe entire Weave is assembled at startup with a fluent builder:\n\n```\nweave, err := eywa.NewWeaveBuilder(ctx).\n    WithRepositories(spiritRepo, memoryRepo, echoRepo, chronicleRepo).\n    WithBond(bond).\n    WithActionRegistry(eywa.NewActionRegistry()).\n    WithScoutRegistry(eywa.NewScoutRegistry()).\n    AddOracle(eywaopenai.NewOracle(apiKey)).\n    WithConfig(config).\n    Build()\n```\n\nMongoDB for Spirit configuration and conversation history. Redis for distributed locking and in-flight memory. OpenAI as the Oracle. Everything injected — nothing global.\n\nTo add Anthropic as an additional provider:\n\n```\nAddOracle(eywaopenai.NewOracle(openaiKey)).\nAddOracle(eywaanthropic.NewOracle(anthropicKey)).\n```\n\nSpirits define which provider they use. The OracleFactory selects the right one at runtime.\n\nThe entire framework ships as 19 independent Go modules:\n\n```\ngithub.com/wmulabs/eywa                     # core\ngithub.com/wmulabs/eywa/fiber               # HTTP adapter\ngithub.com/wmulabs/eywa/mongo               # MongoDB repositories\ngithub.com/wmulabs/eywa/redis               # Redis Bond + memory\ngithub.com/wmulabs/eywa/mcp                 # MCP client (Conduit)\ngithub.com/wmulabs/eywa/providers/anthropic\ngithub.com/wmulabs/eywa/providers/openai\ngithub.com/wmulabs/eywa/providers/gemini\ngithub.com/wmulabs/eywa/providers/bedrock\ngithub.com/wmulabs/eywa/providers/vertexai\ngithub.com/wmulabs/eywa/providers/weaviate\ngithub.com/wmulabs/eywa/providers/qdrant\ngithub.com/wmulabs/eywa/providers/pgvector\ngithub.com/wmulabs/eywa/providers/pinecone\ngithub.com/wmulabs/eywa/channels/whatsapp\ngithub.com/wmulabs/eywa/gcp/cloudtasks\ngithub.com/wmulabs/eywa/gcp/gcs\ngithub.com/wmulabs/eywa/gcp/gemini\n```\n\nIf you don't use Bedrock, you don't import it. You don't get its dependencies in your `go.sum`\n\n. You don't get its security surface. Go developers care about this.\n\nBefore calling this v1.0, I went through a proper security review:\n\n`file://`\n\n, `ftp://`\n\nall blocked`io.LimitReader`\n\n`subtle.ConstantTimeCompare`\n\n— no timing attacks`go test -race`\n\nNone of this is exciting. All of it matters.\n\neywa is at v1.0.0. Stable, production-hardened, documented.\n\nIf you're building AI agents in Go — or you've been wanting to but couldn't find something serious enough to base a production system on — I'd genuinely love your feedback.\n\nPull requests welcome. Issues welcome. Blunt criticism welcome.\n\nMaybe the world didn't need another AI framework. But it definitely needed more engineering in the ones it had.", "url": "https://wpnews.pro/news/i-couldn-t-find-a-production-ready-go-framework-for-ai-agents-so-i-built-one", "canonical_source": "https://dev.to/wmoraes/-i-couldnt-find-a-production-ready-go-framework-for-ai-agents-so-i-built-one-3je8", "published_at": "2026-05-29 05:03:27+00:00", "updated_at": "2026-05-29 05:42:56.639699+00:00", "lang": "en", "topics": ["ai-agents", "ai-tools", "ai-infrastructure"], "entities": ["Eywa", "LangChain", "LlamaIndex", "CrewAI", "OpenAI", "Go", "MIT", "Avatar"], "alternates": {"html": "https://wpnews.pro/news/i-couldn-t-find-a-production-ready-go-framework-for-ai-agents-so-i-built-one", "markdown": "https://wpnews.pro/news/i-couldn-t-find-a-production-ready-go-framework-for-ai-agents-so-i-built-one.md", "text": "https://wpnews.pro/news/i-couldn-t-find-a-production-ready-go-framework-for-ai-agents-so-i-built-one.txt", "jsonld": "https://wpnews.pro/news/i-couldn-t-find-a-production-ready-go-framework-for-ai-agents-so-i-built-one.jsonld"}}