{"slug": "your-ai-agent-is-a-stack-of-files-muster-1-0-0-tests-all-of-them", "title": "Your AI agent is a stack of files. muster 1.0.0 tests all of them.", "summary": "Muster 1.0.0, a new CLI tool from developer Garrison, tests AI agent stacks across seven layers including persona, skills, SOP, tools, memory, heartbeat, and A2A. It performs both static validation and behavioral grading against any OpenAI-compatible model, ensuring files are not only well-formed but also followed by the model in practice. The tool is available on npm and supports local models like Ollama and NVIDIA NIM.", "body_md": "Look at what defines an AI agent now. It is not one file anymore.\n\nThere is a persona file that sets the voice and the safety posture. A skills\n\ndirectory that says what the agent can do and when to reach for it. An\n\n`AGENTS.md`\n\nthat spells out the standard operating procedure. A tools manifest\n\nlisting the functions it may call. A memory file holding what it should remember\n\nabout you. A heartbeat checklist for its scheduled work. An agent card that\n\nadvertises it to other agents. Each of these has its own emerging spec, and each\n\none is a place the agent can quietly go wrong.\n\nHere is the part that kept bothering me: a file that parses is not the same as a\n\nfile the model follows. You can have a perfectly valid persona spec and a model\n\nthat ignores half of it under pressure. You can write a rule in your SOP and\n\nwatch a crafted message talk the model out of it. Validation tells you the file\n\nis well-formed. It says nothing about behavior.\n\nmuster is my attempt to test both. Version 1.0.0 is out on npm today.\n\nmuster checks seven layers of that file stack, plus how the layers compose. For\n\neach layer it does two things.\n\nThe static check parses the file and validates it against its spec. This runs\n\noffline and is byte-for-byte reproducible, so you can drop it into CI as a hard\n\ngate and trust the result. No network, no flakiness, same bytes every time\n\n(RFC 8785 canonical JSON under the hood).\n\nThe behavioral check grades a live model against what the file declares. It runs\n\nreal multi-turn conversations against any OpenAI-compatible endpoint and scores\n\nthe transcripts. For a persona that means verbosity, refusals, and state shifts.\n\nFor an SOP it means compliance probes and adversarial ones. For memory it means\n\nrecall and privacy leaks. Behavioral grading is probabilistic, so muster runs\n\neach case several times and takes a k-of-n majority rather than trusting a\n\nsingle roll.\n\nThe layers, with the command for each:\n\n| Layer | File | Command |\n|---|---|---|\n| Persona | `Soul.md` |\n`check` , `resolve` , `cts run` , `behave run`\n|\n| Skills | `SKILL.md` |\n`skills run` |\n| SOP | `AGENTS.md` |\n`sop run` |\n| Tools | `TOOLS.md` |\n`tools run` |\n| Memory |\n`MEMORY.md` / `USER.md`\n|\n`memory run` |\n| Heartbeat | `HEARTBEAT.md` |\n`heartbeat run` |\n| A2A | Agent Card | `a2a run` |\n| Cross-layer | all of the above | `crosslayer run` |\n\nYou bring your own model. Local Ollama, NVIDIA NIM, OpenAI, anything that speaks\n\nthe OpenAI chat API. There is no provider baked in, and the API key is read from\n\nan environment variable at request time. It never goes in a flag, a manifest, or\n\na file on disk. A test in the repo fails the build if a secret-shaped string is\n\never committed, which is the kind of guard rail I wish more projects had.\n\n```\nnpm install -g @garrison-hq/muster\n\n# every command ships with a runnable example\nmuster check examples/soul/Soul.md --json\nmuster skills run examples/skills/manifest.yaml\nmuster a2a run examples/a2a/manifest.json\n```\n\nThe static commands need nothing but Node 22. To grade a model, point a layer at\n\nan endpoint and set `MUSTER_API_KEY`\n\n.\n\nmuster started as one thing: the reference conformance harness for Soul.md\n\nRFC-1, a persona format. The interesting accident was that the engine underneath\n\ndid not care about personas at all. Parse, validate, resolve, grade, report. The\n\nspec was a plugin. Once that was clear, six more layers followed on the same\n\ncore, and a 1.0.0 that was supposed to be a single-format tool turned into a\n\ntest suite for the whole stack.\n\nThe other thing worth admitting: most of this was built by AI agents working\n\nthrough a spec-driven process, and the entire trail is in the repository. Every\n\nlayer has a specification, a plan, work-package tasks, and a post-merge review,\n\nall under `kitty-specs/`\n\n. I left it in on purpose. If you want to see how the\n\nthing was actually made, it is right there next to the code.\n\nIt is a CLI. There is no stable library API yet, so if you want to write a new\n\nadapter you do it inside the repo for now. Behavioral grading is only as good as\n\nyour endpoint and your thresholds, and it will never be deterministic the way\n\nthe static checks are. And the seven layers track specs that are themselves\n\nyoung, so expect them to move.\n\nThat is the honest shape of it. If you are building agents from files and you\n\nhave no way to test those files, muster is for you. The code is Apache-2.0 on\n\n[GitHub](https://github.com/garrison-hq/muster), the docs are at\n\n[garrison-hq.github.io/muster](https://garrison-hq.github.io/muster), and I would\n\ngenuinely like to know which layer you reach for first.", "url": "https://wpnews.pro/news/your-ai-agent-is-a-stack-of-files-muster-1-0-0-tests-all-of-them", "canonical_source": "https://dev.to/garrison-hq/your-ai-agent-is-a-stack-of-files-muster-100-tests-all-of-them-44pd", "published_at": "2026-06-16 18:43:51+00:00", "updated_at": "2026-06-16 19:17:38.371066+00:00", "lang": "en", "topics": ["ai-agents", "developer-tools", "large-language-models", "ai-safety", "mlops"], "entities": ["Garrison", "Muster", "Ollama", "NVIDIA NIM", "OpenAI", "npm", "Soul.md", "RFC-1"], "alternates": {"html": "https://wpnews.pro/news/your-ai-agent-is-a-stack-of-files-muster-1-0-0-tests-all-of-them", "markdown": "https://wpnews.pro/news/your-ai-agent-is-a-stack-of-files-muster-1-0-0-tests-all-of-them.md", "text": "https://wpnews.pro/news/your-ai-agent-is-a-stack-of-files-muster-1-0-0-tests-all-of-them.txt", "jsonld": "https://wpnews.pro/news/your-ai-agent-is-a-stack-of-files-muster-1-0-0-tests-all-of-them.jsonld"}}