KIMI + Agnes: A Real-World Test of Cross-Provider Agent Chain Correctover

wpnews.pro

cd /news/large-language-models/kimi-agnes-a-real-world-test-of-cros… · home › topics › large-language-models › article

[ARTICLE · art-36357] src=dev.to ↗ pub=2026-06-22T09:57Z topic=large-language-models verified=true sentiment=↑ positive

KIMI + Agnes: A Real-World Test of Cross-Provider Agent Chain Correctover

NeuralBridge, an open-source SDK for LLM pipelines, introduced 'Correctover'—a mechanism that verifies semantic correctness of LLM outputs before passing them to the next agent. In a real-world test, the SDK orchestrated KIMI (Moonshot) and Agnes AI in a chain, where KIMI planned architecture and Agnes wrote code, with Correctover automatically switching providers when a contract failed. The test demonstrated that traditional API gateways, which only check HTTP status codes, miss semantic failures that Correctover catches.

read3 min views1 publishedJun 22, 2026

A few days ago I had an idea: what if one LLM could orchestrate other LLMs as agents — not just calling them, but verifying that each agent's output was actually correct before passing it to the next?

I work on ** NeuralBridge** (an open-source self-healing SDK for LLM pipelines), so I decided to build it and test it with two real providers:

Most API gateways and LLM routers stop at "HTTP 200" — they retry or switch providers, but they never check if the output is actually correct.

try:
    result = call_llm(prompt)
    return result  # HTTP 200 = success? 🚩
except Exception:
    result = call_llm_fallback(prompt)
    return result  # Still not verified!

This is dangerous. A failover from gpt-4o to gpt-4o-mini might silently drop 3 critical fields. A KIMI response that returns "200 OK" might still be missing key entities.

Correctover is the idea that switching providers isn't enough — you must verify semantic equivalence after every switch.

We built a simple DAG-based chain executor with three key capabilities:

Contract

before passing to the next node

from neuralbridge import SelfHealingEngine, ProviderConfig, Contract
from neuralbridge.chain import ChainBuilder

engine = SelfHealingEngine(providers=[])
engine.add_provider(ProviderConfig(
    name="moonshot",
    base_url="https://api.moonshot.cn/v1",
    api_key="...",
    models=["moonshot-v1-8k", "moonshot-v1-32k"],
))
engine.add_provider(ProviderConfig(
    name="agnes",
    base_url="https://apihub.agnes-ai.com/v1",
    api_key="...",
    models=["agnes-2.0-flash"],
))

chain = (
    ChainBuilder(engine)
    .node(name="planner",
        system="You are a senior architect.",
        prompt="Design a plan for: {task}",
        contract=Contract(required_entities=["架构", "模块"]),
        model="moonshot-v1-32k",
        timeout=120)
    .node(name="coder",
        system="You are a Python developer.",
        prompt="Implement: {planner}",
        contract=Contract(
            required_entities=["import ", "def "],
            forbidden_patterns=["我不能", "sorry"]),
        model="agnes-2.0-flash",
        depends_on=["planner"],
        timeout=180)
    .build()
)

result = chain.run(
    task="A CSV to JSON converter with validation"
)

KIMI plans the architecture, Agnes writes the code:

Node	Provider	Time	Contract
planner	moonshot-v1-32k	17.8s	✅ Architecture + Modules
coder	agnes-2.0-flash	10.8s	✅ import + def (runnable code)

Total: 28.5s. The planner's design output was used as context for the coder, and the coder actually implemented the design (not random boilerplate).

This is where it gets interesting. In a separate test, the deep_analysis

node was supposed to output analysis with "优点" (pros) and "缺点" (cons):

deep_analysis(agnes-2.0-flash) → Contract failed (missing "优点"/"缺点")
    ↻ Correctover triggered!
    ↻ Automatically switched to moonshot-v1-32k
    → ✅ Validation passed

This is Correctover working in production: The first provider returned text, but it didn't satisfy the semantic contract. The engine automatically retried with a different provider, and the second attempt passed validation.

In our test, Agnes AI responses took 18–233 seconds. Without proper timeouts (default 8s in most SDKs!), every call would fail. We had to set timeout=120

and total_timeout=300

for realistic workloads.

The deep_analysis

case above is exactly the kind of failure that traditional gateways miss:

Traditional proxy:  Your App → Gateway → KIMI → 429 → Gateway also 429
SDK (NeuralBridge): Your App(embedded) → KIMI → 429 → backoff → ✅
                                              → continuous fail → circuit break → switch provider → ✅

No extra hop, no data through third party, no infrastructure to maintain.

The 2026 AI market is exploding ($7.6B+ for agentic AI, 40-50% CAGR), but 88% of enterprise AI projects never reach production (IDC/Lenovo). The bottleneck isn't capability — it's reliability.

Even academia agrees: a May 2026 arXiv paper (2606.01416) showed that verifier-guided self-healing reduces silent failures to 0.0%, compared to 5.5%+ for retry-only approaches.

We're open-sourcing the chain module as part of NeuralBridge SDK v5.x. The core engine is Apache 2.0 — you can use the self-healing, circuit breakers, and Correctover validation today.

Try it:

pip install neuralbridge
python
from neuralbridge import SelfHealingEngine, Contract
from neuralbridge.chain import ChainBuilder

NeuralBridge is an open-source (Apache 2.0) self-healing SDK for LLM pipelines. Correctover — semantic validation after failover — is our core differentiator from every other LLM gateway and router.

source & further reading

dev.to — original article AI Age Estimation: Ethics and Implications at the Border - SmarterArticles S1E10 Detect AI-Generated PDFs: What Works and What Does Not We built a free status monitor for 77 AI APIs. Here's what 6 weeks of data taught us.

~/api · this article 200

$curl api.wpnews.pro/v1/news/kimi-agnes-a-real-world-…

Read original on dev.to → dev.to/hhhfs9s7y9code/kimi-agnes-a-real-world-te…

mentioned entities

NeuralBridge

KIMI

Agnes AI

Moonshot

Correctover

metadata

slugkimi-agnes-a-real-world-test-of-cross-provider-agent-chain-correctover

topic#large-language-models

secondary4 topics

sentimentpositive

canonicaldev.to

navigation

← prevHow to Think About AI Before It'…

next →Founders OS – give your AI clien…

── more in #large-language-models 4 stories · sorted by recency

byteiota.com · 22 Jun · #large-language-models

Vercel eve: The Open-Source Agent Framework That Treats Agents as Directories

dev.to · 22 Jun · #large-language-models

MCP vs Skills: Why Skills Save Context Tokens

dev.to · 22 Jun · #large-language-models

Building AI Agents That Interact With Blockchain: A Deep Technical Guide Using LangChain

github.com · 22 Jun · #large-language-models

Show HN: Prismag – Per-block model routing for the terminal and any IDE

── more on @neuralbridge 3 stories trending now

wpnews · 21 Jun · #large-language-models

Anthropic faces a class action lawsuit accusing it of selling Claude Max subscribers far less than advertised

wpnews · 21 Jun · #artificial-intelligence

Plotting AI model release cadence: two labs are accelerating, three aren't

wpnews · 21 Jun · #ai-safety

Author Argues for Slower AI Despite Cancer Benefits

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required