KIMI + Agnes: A Real-World Test of Cross-Provider Agent Chain Correctover

NeuralBridge, an open-source SDK for LLM pipelines, introduced 'Correctover'—a mechanism that verifies semantic correctness of LLM outputs before passing them to the next agent. In a real-world test, the SDK orchestrated KIMI (Moonshot) and Agnes AI in a chain, where KIMI planned architecture and Agnes wrote code, with Correctover automatically switching providers when a contract failed. The test demonstrated that traditional API gateways, which only check HTTP status codes, miss semantic failures that Correctover catches.

A few days ago I had an idea: what if one LLM could orchestrate other LLMs as agents — not just calling them, but verifying that each agent's output was actually correct before passing it to the next? I work on NeuralBridge an open-source self-healing SDK for LLM pipelines , so I decided to build it and test it with two real providers: Most API gateways and LLM routers stop at "HTTP 200" — they retry or switch providers, but they never check if the output is actually correct . What everyone else does: try: result = call llm prompt return result HTTP 200 = success? 🚩 except Exception: result = call llm fallback prompt return result Still not verified This is dangerous. A failover from gpt-4o to gpt-4o-mini might silently drop 3 critical fields. A KIMI response that returns "200 OK" might still be missing key entities. Correctover is the idea that switching providers isn't enough — you must verify semantic equivalence after every switch. We built a simple DAG-based chain executor with three key capabilities: Contract before passing to the next node python from neuralbridge import SelfHealingEngine, ProviderConfig, Contract from neuralbridge.chain import ChainBuilder engine = SelfHealingEngine providers= engine.add provider ProviderConfig name="moonshot", base url="https://api.moonshot.cn/v1", api key="...", models= "moonshot-v1-8k", "moonshot-v1-32k" , engine.add provider ProviderConfig name="agnes", base url="https://apihub.agnes-ai.com/v1", api key="...", models= "agnes-2.0-flash" , chain = ChainBuilder engine .node name="planner", system="You are a senior architect.", prompt="Design a plan for: {task}", contract=Contract required entities= "架构", "模块" , model="moonshot-v1-32k", timeout=120 .node name="coder", system="You are a Python developer.", prompt="Implement: {planner}", contract=Contract required entities= "import ", "def " , forbidden patterns= "我不能", "sorry" , model="agnes-2.0-flash", depends on= "planner" , timeout=180 .build result = chain.run task="A CSV to JSON converter with validation" KIMI plans the architecture, Agnes writes the code: | Node | Provider | Time | Contract | |---|---|---|---| | planner | moonshot-v1-32k | 17.8s | ✅ Architecture + Modules | | coder | agnes-2.0-flash | 10.8s | ✅ import + def runnable code | Total: 28.5s. The planner's design output was used as context for the coder, and the coder actually implemented the design not random boilerplate . This is where it gets interesting. In a separate test, the deep analysis node was supposed to output analysis with "优点" pros and "缺点" cons : deep analysis agnes-2.0-flash → Contract failed missing "优点"/"缺点" ↻ Correctover triggered ↻ Automatically switched to moonshot-v1-32k → ✅ Validation passed This is Correctover working in production: The first provider returned text, but it didn't satisfy the semantic contract. The engine automatically retried with a different provider, and the second attempt passed validation. In our test, Agnes AI responses took 18–233 seconds . Without proper timeouts default 8s in most SDKs , every call would fail. We had to set timeout=120 and total timeout=300 for realistic workloads. The deep analysis case above is exactly the kind of failure that traditional gateways miss: Traditional proxy: Your App → Gateway → KIMI → 429 → Gateway also 429 SDK NeuralBridge : Your App embedded → KIMI → 429 → backoff → ✅ → continuous fail → circuit break → switch provider → ✅ No extra hop, no data through third party, no infrastructure to maintain. The 2026 AI market is exploding $7.6B+ for agentic AI, 40-50% CAGR , but 88% of enterprise AI projects never reach production IDC/Lenovo . The bottleneck isn't capability — it's reliability. Even academia agrees: a May 2026 arXiv paper 2606.01416 https://arxiv.org/abs/2606.01416 showed that verifier-guided self-healing reduces silent failures to 0.0% , compared to 5.5%+ for retry-only approaches. We're open-sourcing the chain module as part of NeuralBridge SDK v5.x. The core engine is Apache 2.0 — you can use the self-healing, circuit breakers, and Correctover validation today. Try it: pip install neuralbridge python from neuralbridge import SelfHealingEngine, Contract from neuralbridge.chain import ChainBuilder NeuralBridge is an open-source Apache 2.0 self-healing SDK for LLM pipelines. Correctover — semantic validation after failover — is our core differentiator from every other LLM gateway and router.