cd /news/ai-agents/show-hn-find-where-multi-agent-ai-sy… Β· home β€Ί topics β€Ί ai-agents β€Ί article
[ARTICLE Β· art-38757] src=github.com β†— pub= topic=ai-agents verified=true sentiment=↑ positive

Show HN: Find where multi-agent AI systems break before production

A new open-source tool, swarm-test, lets developers find failures in multi-agent AI systems before production by analyzing agent topology statically, without live LLM calls. It detects cascade failures, single points of failure, context leakage, and other issues, outputting a Swarm Score and interactive D3 dashboard. The tool supports CrewAI, LangGraph, AutoGen, and custom systems, and integrates with GitHub Actions for PR annotations.

read6 min views1 publishedJun 25, 2026
Show HN: Find where multi-agent AI systems break before production
Image: source

Find where your multi-agent AI system breaks β€” before production does.

Static reliability testing for CrewAI, LangGraph, AutoGen, and custom agent systems. No live LLM calls, no API cost.

Chain 14 agents at 95% reliability each and your system is ~49% reliable end-to-end (0.95^14

). The failures aren't inside any single agent β€” they're in how they connect: silent cascade failures, hidden single points of failure, fragile dependencies. swarm-test finds them by analyzing your agent topology.

pip install swarm-test
swarm-test run my_crew.py --open

--open

launches an interactive D3 dashboard in your browser the moment the run finishes β€” Swarm Score, force-directed agent graph with single-points-of-failure pulsing red, sortable health and redundancy tables, and every finding grouped by severity.

No real script handy? Build a synthetic topology straight from the CLI:

swarm-test run -a "Orchestrator,Worker1,Worker2" -e "Orchestrator>Worker1,Orchestrator>Worker2"
  • One agent fails and silently takes down everything downstream β€” cascade failure - A single agent the whole system depends on; remove it and the swarm splits β€” blast radius / SPOF - Credentials, PII, or other sensitive data leaking across agent boundaries β€” context leakage - Agents drifting from their assigned role; prompt-injection-style goal hijacking β€” intent drift - A slow upstream with no timeout boundary blocking the whole pipeline β€” timeout resilience - Dense cliques, echo chambers, and cycles that bypass the orchestrator β€” collusion detection - Agents stuck in loops β€” runaway step counts and retry storms that burn tokens with no error thrown β€” trajectory analysis - Output schema mismatches across agent edges β€” contract violation(opt-in; provide a contracts YAML)

  • 0–100 Swarm Score with a verdict line (EXCELLENT β†’ CRITICAL) β€” one-line output for CI

  • Agent role classification (orchestrator, aggregator, validator, gateway, worker, monitor, router) with confidence scores

  • Role-adjusted severity β€” a validator leaking context is upgraded; an orchestrator's blast radius is downgraded

  • Historical tracking β€” trend across runs, diffs new vs. resolved findings

  • Interactive HTML report ( --open

) β€” D3 force-directed graph, NxN heatmap, filterable findings - GitHub Action with PR annotations and job-summary score

  • Graph export to Mermaid, DOT, or PNG (SPOFs red, redundant green)
  • Framework adapters: CrewAI, LangGraph, AutoGen, generic / static graph
  • YAML config ( .swarmtest.yml

) and entry-point plugin system

on: [pull_request]
jobs:
  swarm-test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: surajkumar811/swarm-test@v0.3.0
        with:
          script: my_crew.py
          fail-on-severity: high

Findings appear inline on the PR as ::error::

/ ::warning::

/ ::notice::

annotations; the Swarm Score is posted to the workflow job summary.

from swarm_test import SwarmProbe

probe  = SwarmProbe(crew, swarm_name="my-crew")
report = probe.run_all()
report.print_summary()
report.to_html("report.html")
pip install swarm-test
pip install "swarm-test[crewai]"
pip install "swarm-test[langgraph]"
pip install "swarm-test[autogen]"
pip install "swarm-test[png]"        # for PNG graph export

How it works

swarm-test builds a NetworkX directed graph from your agent system β€” nodes are agents, edges are interactions extracted by each framework adapter. All tests are static graph analyses; no LLM calls are made, and results are deterministic given the same topology.

Cascade failureβ€” simulates each agent failing in turn and measures downstream impact.** Blast radius**β€” detects articulation points (graph-theoretic SPOFs) and scores every agent on a 0–100 redundancy scale composed of path redundancy (30%), role uniqueness (25%), tool coverage (20%), betweenness centrality (15%), and degree ratio (10%).Context leakageβ€” scans interaction payloads against a sensitive-data regex set extensible from.swarmtest.yml

.Intent driftβ€” flags agents whose observed behavior diverges from their declared role; includes prompt-injection heuristics.** Collusion**β€” finds dense cliques, echo chambers, and cycles that bypass the declared orchestrator.** Timeout resilience**β€” identifies long synchronous chains with no timeout boundary.** Trajectory analysis**β€” flags self-loops, ping-pong pairs, multi-agent feedback cycles, unbounded loops with no exit, repeated parallel calls, and cycles deeper thanmax_trajectory_depth

(default 5).Contract violationβ€” validates agent outputs against JSON schemas declared per edge (opt-in; pass--contracts contracts.yml

).

Roles are classified from structural metrics (in/out degree, betweenness centrality) plus naming hints, each with a 0–100% confidence score. Severity is then role-adjusted: an orchestrator with high blast radius is expected and gets downgraded; a validator leaking context is a security incident and gets upgraded.

Output modes & formats

Flag Output
--quiet / -q
Headline verdict only (one line). Ideal for if checks in CI scripts.
(default)
Headline + test results + critical/high findings + SPOFs.
--verbose / -V
Every finding, graph metrics, full health and redundancy tables.

Output formats via --output-format

: console

, json

, markdown

, html

. The same verbosity setting is configurable in .swarmtest.yml

.

Graph export

swarm-test graph my_crew.py --format mermaid
swarm-test graph my_crew.py --format dot --output topology.dot
swarm-test graph my_crew.py --format png --output topology.png   # needs the [png] extra

Mermaid renders inline on GitHub, so you can drop the output straight into a README or PR description. Colors: red = SPOF, orange = moderate redundancy, green = fully redundant.

Historical tracking

Every run writes a small JSON snapshot to .swarmtest-history/

. Subsequent runs print a trend line below the headline verdict:

Swarm Score: 72/100 β€” NEEDS IMPROVEMENT (3 critical findings)
Trend: ↑ +18 from last run (was 54) β€” improving
Recent: 54 β†’ 61 β†’ 58 β†’ 72
βœ“ 3 findings resolved since last run
⚠ 1 new finding since last run

Browse with swarm-test history show

. Disable per-run with --no-history

, or globally via history_enabled: false

in .swarmtest.yml

. .swarmtest-history/

is gitignored by default; commit it if you want the trend to survive across CI machines.

Configuration (.swarmtest.yml)

fail_on_severity: high        # critical | high | medium | low | info | none
max_blast_radius: 0.5         # 0.0 – 1.0
disabled_tests:
  - collusion
sensitive_patterns:
  - "INTERNAL-[A-Z0-9]+"
output_format: html
output_path: ./swarm.html
timeout_seconds: 30
strict: false                 # treat ANY finding as a failure

Auto-discovers .swarmtest.yml

, .swarmtest.yaml

, swarmtest.yml

, or a [tool.swarmtest]

table in pyproject.toml

. CLI flags always override config-file values. Exit codes from run

: 0

(passed), 1

(findings exceed thresholds), 2

(config or runtime error).

Plugin system

Ship custom tests as installable Python packages. Register under the swarm_test.plugins

entry-point group; swarm-test auto-discovers and runs them alongside the built-in tests:

[project.entry-points."swarm_test.plugins"]
my_custom_test = "my_package.plugins:MyPlugin"
swarm-test plugins list

See examples/plugin_template/ for a runnable starter.

Framework examples (CrewAI, LangGraph, AutoGen, static)

from crewai import Crew
from swarm_test import SwarmProbe
SwarmProbe(crew, swarm_name="my-crew").run_all().print_summary()

from langgraph.graph import StateGraph
from swarm_test import SwarmProbe
SwarmProbe(compiled_graph, swarm_name="my-langgraph").run_all().to_json("report.json")

from autogen import GroupChatManager
from swarm_test import SwarmProbe
SwarmProbe(manager, swarm_name="my-autogen").run_all().print_summary()

from swarm_test import SwarmProbe, AgentNode, InteractionEvent, EventType
a = AgentNode(name="Fetcher", role="researcher")
b = AgentNode(name="Summarizer", role="writer")
SwarmProbe(
    swarm_name="my-swarm",
    agents=[a, b],
    events=[InteractionEvent(source_agent_id=a.id, target_agent_id=b.id, event_type=EventType.TASK_DELEGATE)],
).run_all().print_summary()

PyPI:https://pypi.org/project/swarm-test/β€”pip install swarm-test

Issues:https://github.com/surajkumar811/swarm-test/issues** License:**MIT β€” free and open source

If swarm-test catches a real bug for you, please star the repo β€” it helps other teams find it.

── more in #ai-agents 4 stories Β· sorted by recency
── more on @swarm-test 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain β€” perfect for shipping the agent you just read about.

$git push zahid main
β†’ Live at https://your-agent.zahid.host βœ“
Get free account β†’ Pricing
from €0/mo Β· no card required
LIVE [news/show-hn-find-where-m…] indexed:0 read:6min 2026-06-25 Β· β€”