{"slug": "we-scored-14800-mcp-servers-on-behavioral-trust-here-s-what-we-found", "title": "We Scored 14,800+ MCP Servers on Behavioral Trust. Here's What We Found.", "summary": "Dominion Observatory, a system that provides behavioral trust scores for over 14,800 MCP servers by analyzing their runtime behavior rather than static source code. Unlike traditional static analysis, which only checks for code vulnerabilities, this system detects real-world issues such as performance degradation over time, inconsistent tool reliability, and anomalous behavior shifts. The tool functions as an MCP server itself, allowing AI agents to query trust scores in real time and even gate agent-to-agent payment settlements based on a server's operational reputation.", "body_md": "The Model Context Protocol ecosystem is growing fast. Thousands of MCP servers now offer tools that AI agents call autonomously — executing code, querying databases, moving money, managing infrastructure. Agents are making decisions on behalf of humans, and those decisions depend on servers they've never met.\nRecently, a well-circulated analysis scanned roughly 1,800 MCP servers and found security issues in a significant percentage of them. That work was valuable. Static analysis catches real bugs: injection vulnerabilities, missing input validation, insecure defaults.\nBut here's the question nobody asked: what happens after deployment?\nA server can pass every static check and still behave terribly in production — dropping requests, responding with garbage after midnight, degrading quietly over weeks until an agent makes a costly mistake. Static analysis is a snapshot. Production is a film.\nWe built Dominion Observatory to watch the film.\nDominion Observatory provides behavioral trust scores for 14,800+ MCP servers — nearly 8x the coverage of the largest published static analysis. But coverage isn't the point. The methodology is.\nInstead of reading source code, Dominion scores servers based on how they actually behave at runtime:\nA trust score isn't a binary pass/fail. It's a continuous signal that reflects a server's operational reputation — built from observed behavior, not assumed intent.\nWhen you shift from \"does this code look safe?\" to \"does this server behave reliably?\", you start seeing patterns that static analysis simply cannot detect.\nDegradation over time. A server that worked perfectly three months ago might now be timing out on 30% of requests. No code changed — maybe the underlying infrastructure shifted, maybe a dependency started throttling, maybe the maintainer moved on. Static analysis sees the same clean code. Behavioral scoring sees the decay.\nInconsistent reliability across tools. A single MCP server might expose five tools where four perform well and one is essentially broken. Behavioral scoring operates at the granularity of individual tool interactions, not just the server as a whole.\nAnomalous behavior shifts. A server that suddenly starts returning responses 10x faster than its historical baseline might sound like good news — or it might mean it's returning cached garbage instead of computing real results. Anomaly detection flags deviations in both directions.\nAvailability patterns. Some servers are rock-solid during US business hours and unreachable at other times. For a global agent economy, that's a reliability concern that only shows up through continuous observation.\nThese aren't theoretical scenarios. They're the kinds of signals that emerge when you instrument trust at the behavioral layer.\nDominion Observatory isn't a dashboard you check once. It's infrastructure that agents query in real time, at the moment of decision.\nThe system is itself an MCP server (available via Streamable HTTP at https://dominion-observatory.sgdata.workers.dev/mcp\n), which means any MCP-capable agent can call it natively. The core tools:\nget_trust_score\n— Retrieve the behavioral trust score for any MCP server before calling itdetect_anomalies\n— Check whether a server is currently exhibiting unusual behaviorget_leaderboard\n— See which servers rank highest for reliability in a given categoryget_ecosystem_stats\n— Understand the overall health of the MCP ecosystemreport_tool_outcome\n— Contribute your own interaction data back to the scoring engineThe most consequential integration point is the beforeSettle\nhook. In agent-to-agent payment flows — where one agent pays another for a service rendered via MCP — the trust score can gate whether settlement proceeds. If a server's behavioral trust has dropped below a threshold, the payment holds. This turns trust from a nice-to-have metric into an economic primitive.\nThink of it as a credit score for MCP servers. Not based on who they say they are, but on what they've actually done.\nTo be clear: static analysis is important. You should absolutely scan MCP servers for injection flaws, validate their input handling, and audit their permission models. Tools that do this well are doing necessary work.\nBut static analysis answers the question: \"Could this server misbehave?\"\nBehavioral scoring answers the question: \"Is this server misbehaving?\"\nThe first is a security audit. The second is an operational reputation system. A mature MCP ecosystem needs both — just as the traditional web needs both code review and uptime monitoring.\nThe difference becomes critical as the agent economy scales. When thousands of agents are autonomously selecting which MCP servers to call, making payments, and chaining tool calls across multiple servers, you need trust signals that operate at runtime speed and reflect current reality. You can't re-audit source code on every request. You can query a behavioral trust score in milliseconds.\nWe're at an inflection point. MCP adoption is accelerating, and the servers agents depend on are increasingly operated by unknown third parties. The agent economy will either develop robust trust infrastructure, or it will learn expensive lessons about what happens when autonomous systems make decisions without accountability.\nDominion Observatory is our contribution to the first outcome. It's open source, it's composable, and it's designed to be infrastructure that other systems build on — not a walled garden.\nQuery it directly. Point any MCP client at https://dominion-observatory.sgdata.workers.dev/mcp\nusing Streamable HTTP transport. Call get_ecosystem_stats\nto see the current state of the ecosystem, or get_trust_score\nfor any server you're curious about.\nContribute data. The scoring engine gets better with more interaction data. Use report_tool_outcome\nto feed back your own observations about MCP server behavior. More data means more accurate trust signals for everyone.\nStar the repo. The engine is open source at github.com/vdineshk/daee-engine. Issues, PRs, and ideas are welcome.\nBuild on it. If you're building agent infrastructure — orchestration frameworks, payment rails, marketplace platforms — behavioral trust scoring is a building block. Integrate it. Extend it. Make agents smarter about who they trust.\nThe MCP ecosystem is too important to fly blind. Let's build the accountability layer together.", "url": "https://wpnews.pro/news/we-scored-14800-mcp-servers-on-behavioral-trust-here-s-what-we-found", "canonical_source": "https://dev.to/dinesh_kumar_576bd94722fd/we-scored-14800-mcp-servers-on-behavioral-trust-heres-what-we-found-o9k", "published_at": "2026-05-20 05:48:02+00:00", "updated_at": "2026-05-20 06:02:19.988892+00:00", "lang": "en", "topics": ["artificial-intelligence", "cybersecurity", "developer-tools", "research", "data"], "entities": ["Model Context Protocol", "Dominion Observatory", "MCP"], "alternates": {"html": "https://wpnews.pro/news/we-scored-14800-mcp-servers-on-behavioral-trust-here-s-what-we-found", "markdown": "https://wpnews.pro/news/we-scored-14800-mcp-servers-on-behavioral-trust-here-s-what-we-found.md", "text": "https://wpnews.pro/news/we-scored-14800-mcp-servers-on-behavioral-trust-here-s-what-we-found.txt", "jsonld": "https://wpnews.pro/news/we-scored-14800-mcp-servers-on-behavioral-trust-here-s-what-we-found.jsonld"}}