Show HN: NeuralBridge - Self-Healing SDK for LLM-Powered AI Agents

NeuralBridge, an embedded SDK for LLM-powered AI agents, self-heals from API failures with an 84.1% auto-recovery rate. The open-source SDK recognizes over 280 fault patterns and employs 30+ recovery strategies, diagnosing issues in 19 microseconds. NeuralBridge is released under Apache 2.0, with pro features available for enterprise use.

After months of production experience running LLM calls at scale, we realized something uncomfortable: every AI agent eventually crashes . Not because the code is wrong, but because LLM APIs fail in ways you can't predict. Timeouts. Rate limits. Empty responses. Schema violations. Drift. These aren't edge cases — they're the norm. So we built NeuralBridge: an embedded SDK that makes LLM calls self-healing. Try running 100,000 LLM calls through any single provider. You'll see: Most teams solve this by building their own retry logic, circuit breakers, and fallback chains. It works — until it doesn't. Because the next failure is always the one you didn't anticipate. Instead of a gateway which adds latency and infrastructure , we embedded the reliability logic directly into the SDK: python from neuralbridge import SelfHealingEngine engine = SelfHealingEngine result = engine.call "Write a Python function for binary search" if result.recovered: print f"Fault: {result.diagnosis}" print f"Recovery: {result.recovery action}" When a call fails, the engine: | Metric | Value | |---|---| | Auto-recovery rate | 84.1% of faults | | Fault patterns recognized | 280+ | | Recovery strategies | 30+ | | Learned rules flywheel | 88+ | | Diagnosis latency | 19us P50 | | Install size | 375 KB | We went Apache 2.0 because reliability infrastructure should be a commodity. The SDK is free and open. Pro features enterprise SSO, audit logs, priority support fund continued development. pip install neuralbridge-sdk python import neuralbridge as nb result = nb.run "Explain quantum computing in one sentence" print result.text pip install neuralbridge-sdk We'd love your feedback, issues, and contributions. What failure patterns have you seen in production that we should handle?