for AI agents.
ECP is a vendor-neutral protocol for testing agent outputs, tool calls, and evaluator-visible audit context — across frameworks, models, eval platforms, and CI systems.
$ pip install "ecp-runtime==0.3.3" "ecp-sdk==0.3.3"
$ ecp init
$ ecp run --manifest ecp_eval/manifest.yaml --json
MCP is for tools. ECP is for evals. #
MCP gives agents a common way to use tools. ECP gives evaluators a common way to inspect what an agent returned, what tools it used, and what audit evidence it exposed — independent of the framework that built the agent or the platform that runs the test.
Beyond the final answer. #
Most evals start with the final answer. ECP also checks the behavior behind it.
public_output
tool_calls
evaluation_context
ecp run --manifest
Runs anywhere
Run evals locally or wire ecp run into your CI. Exits non-zero on failure, so a regression breaks the build.
Framework neutral
Wrap agents built with plain Python, LangChain, LlamaIndex, CrewAI, or PydanticAI behind one evaluation contract.
JSON-RPC contract
Implement the protocol in any language: agent/initialize, agent/step, agent/reset over stdio or Streamable HTTP.
Your existing agent stack. #
Start grading agents in five commands. #
Install the runtime, initialize a starter manifest, and run your first eval.