{"slug": "the-contract-is-the-interface-agent-driven-steampipe-stave-in-one-command", "title": "The contract is the interface: agent-driven Steampipe Stave in one command", "summary": "The article describes Stave, a cloud-security tool that replaces the traditional collector-based onboarding model with a \"contract\" approach. Instead of shipping a collector that must be configured for each customer's environment, Stave provides JSON schemas and Steampipe-to-Stave column mappings that allow any agent to ingest data by satisfying the contract. The system uses a declarative YAML format with four operation types (field, static, extract, computed) to transform data, and ships 3,957 controls across 109 asset types with auto-generated JSON schemas.", "body_md": "Consider a typical cloud-security tool's onboarding flow. A customer installs the tool. The tool's collector tries to authenticate to AWS, fails because the role isn't there yet, the customer follows three pages of setup docs, the role gets created, the collector authenticates, the collector runs, the collector finds nothing because the tool only knows about S3 and IAM and the customer's workload is on EKS. End of week one.\nWe don't ship a collector. Stave evaluates obs.v0.1\nJSON snapshots — whatever produces them. That decision sounds extreme until you've watched the same \"the collector doesn't see our environment\" conversation play out three times. So instead of a collector, Stave ships a contract: per-asset JSON Schemas, per-asset Steampipe→Stave column mappings, and one command (stave contract show\n) that emits everything an agent needs to author its own ingest. The customer's preferred source (Steampipe, AWS Config, Terraform state, an internal inventory API) plugs in by satisfying the contract.\nThis post walks through the steps that closes the pipeline.\n$ stave contract show --asset-type aws_s3_bucket\nContract: aws_s3_bucket\nSchema: schemas/observation/v1/asset-types/aws_s3_bucket.schema.json\nControls: 102 | Chains: 15\nProperty paths (catalog reads these — sorted by chain unlock, then control unlock):\nPATH CONTROLS CHAINS SEVERITY NOTE\n──── ──────── ────── ──────── ────\nstorage.kind 91 15 critical\nstorage.tags.data-classification 14 2 critical intent\nstorage.access.public_read 8 2 critical\nstorage.controls.public_access_fully_blocked 3 1 critical\n...\nSteampipe mapping: contracts/steampipe/aws_s3_bucket.yaml\nThat output names everything the customer's ingest agent needs:\nFor the 17 most catalog-impactful asset types, the mapping is committed. For the rest, the customer's agent has the schema; it can author its own.\nThe Steampipe→Stave mapping is one ordered list of operations per asset type. Four operation kinds cover every transform shape:\nfield\n— direct column → property mapping with optional coerce/defaultstatic\n— a fixed value (e.g. properties.storage.kind: bucket\n)extract\n— pull a nested JSON value from a JSON-shaped columncomputed\n— derive from already-set property paths (all\n/ any\nreduction)Operations run in YAML order; later ops can read paths written by earlier ones. The first mapping we wrote — contracts/steampipe/aws_s3_bucket.yaml\n— replaced a Python function with a declarative file. The loader changes are 100 lines; the resulting observation is byte-identical to what the imperative function produced.\noperations:\n- kind: static\npath: properties.storage.kind\nvalue: bucket\n- kind: field\npath: properties.storage.tags\ncolumn: tags\ndefault: {}\ntype: dict\n- kind: extract\npath: properties.storage.encryption.algorithm\ncolumn: server_side_encryption_configuration\njson_path: \"Rules.0.ApplyServerSideEncryptionByDefault.SSEAlgorithm\"\nkey_variants:\nRules: rules\nSSEAlgorithm: sse_algorithm\ndefault: \"none\"\n- kind: computed\npath: properties.storage.controls.public_access_fully_blocked\nop: all\ninputs:\n- properties.storage.controls.public_access_block.block_public_acls\n- properties.storage.controls.public_access_block.block_public_policy\n- properties.storage.controls.public_access_block.ignore_public_acls\n- properties.storage.controls.public_access_block.restrict_public_buckets\nThe format is the contract. Any agent in any language can parse the YAML and produce conforming observations.\nThe catalog ships 3,957 controls; together they declare applicable_asset_types\nfor 109 distinct asset types. To validate that a mapping's target paths are real, we needed a JSON Schema per asset type. Hand-authoring 109 schemas is a Tuesday lost; the schema generator already existed (it walks every control's predicate AST and infers the property paths + types), but defaulted to the top-3 most-used types.\ngo run ./internal/tools/genassetschemas/... -top 200\nmake sync-schemas\nOutput: 109 per-asset schemas under schemas/observation/v1/asset-types/\n. Every level is additionalProperties: true\n— the schemas are discoverability artifacts, not restrictive gates. A schema that lists one property (security_hub.enabled\non aws_securityhub_account\n, for example) tells an agent \"this asset type matters to the catalog; here is the one property to populate.\" Thin schemas are still useful.\nThe next 10 asset types by control coverage — aws_iam_role\n, aws_lambda_function\n, aws_cognito_user_pool\n, aws_cloudtrail_trail\n, aws_kms_key\n, aws_ec2_instance\n, aws_sqs_queue\n, aws_iam_user\n, aws_opensearch_domain\n, aws_stepfunctions_state_machine\n— got hand-authored mappings. They served two purposes: actual coverage for the most-asked-for types, and a ground-truth corpus to validate Iter 5's auto-generator against.\nEvery mapping carries a derived_properties:\nblock listing the catalog-read properties that cannot come from a single Steampipe column. Example from aws_iam_role.yaml\n:\nderived_properties:\n- path: properties.identity.role.cross_account_trust_without_external_id\nsource: \"Parse trust_policy — detect external Account in Principal without sts:ExternalId condition\"\n- path: properties.identity.permission_categories.has_incompatible_categories\nsource: Policy analysis against controldata/taxonomy/permission_categories.yaml\n- path: properties.identity.access_advisor.available\nsource: iam:GenerateServiceLastAccessedDetails + iam:GetServiceLastAccessedDetails (separate API call per role)\nThat block is the agent's TODO list. Silently producing an observation without those derived properties is the failure mode the derived_properties:\nsection prevents — Stave's controls don't see the property, the catalog finds nothing wrong, the breach happens anyway.\nThe three sources — schema, predicate index, mapping file — already existed. Joining them required three separate file reads. The new command joins them once:\nstave contract show --asset-type aws_iam_role --format json\n{\n\"asset_type\": \"aws_iam_role\",\n\"has_schema\": true,\n\"schema_path\": \"schemas/observation/v1/asset-types/aws_iam_role.schema.json\",\n\"controls_count\": 198,\n\"chains_count\": 38,\n\"property_paths\": [\n{\n\"path\": \"properties.identity.kind\",\n\"controls_count\": 196,\n\"chains_count\": 35,\n\"max_severity\": \"critical\",\n\"is_intent_property\": false\n},\n...\n],\n\"steampipe_mapping\": \"contracts/steampipe/aws_iam_role.yaml\"\n}\nOr:\nstave contract show --list\nAsset types with controls: 109 (schema: 109, steampipe mapping: 17)\nTYPE SCHEMA CONTROLS CHAINS MAPPING\n──── ────── ──────── ────── ───────\naws_iam_role yes 198 38 steampipe\naws_s3_bucket yes 102 15 steampipe\naws_lambda_function yes 169 12 steampipe\naws_bedrock_agent yes 24 5 -\n...\nThe implementation reuses everything already in the codebase: compose.LoadControlsFrom\n, compose.LoadChainDefinitions\n, predindex.Build\n(the same index the stave gaps\ncommand uses), and a 50-line helper in internal/contracts/schema/load.go\nto access the embedded per-asset schemas. The command is ~330 lines; nothing is new data — it's projection over existing data.\nThe remaining ~98 asset types could be hand-authored or auto-generated. We tried auto. The generator joins the cached Steampipe column catalog with each per-asset schema's property paths, applies a four-rule matching priority (per-asset overrides, schema-path lookup with multi-token scoring, tags convention, fallback to properties.<ns>.<col>\n), and emits a YAML in the same operations-list format Iter 1 established.\nmake gen-steampipe-mappings # generate, skip existing\nmake gen-steampipe-mappings-validate # measure accuracy\nValidation runs the generator against the 11 hand-authored YAMLs (Iter 1 + Iter 3) and compares the auto-generated (column, path)\ntuples against the ground truth:\nOverall: 149/177 = 84% accuracy across 17 type(s)\n84% — past the 80% target. The remaining 16% are the multi-target JSON-path extracts the brief flagged as inherently manual (one column → two property paths is not something a name-similarity heuristic can synthesise). Auto-generated YAMLs carry _auto_generated: true\n+ _review_required: N\n+ _unmatched_paths: [...]\nso the reviewer's surface is bounded.\nThe detailed story of the heuristic — and how it went from 8% accuracy on the first pass to 84% on the fourth — is its own post. The point here is what's committed: 17 total mappings (11 hand-authored, 6 auto-generated), every one of them an artifact a customer's agent can read in any language.\nThe architecture choice that makes this work: extractors are client-owned. Stave does not ship a collector. The contracts/steampipe/\ndirectory contains instructions, not code. An agent reads the schema and the mapping; the agent produces the observation; Stave evaluates the observation. The collector boundary is a file, not a process.\nThis decision has been in our architecture docs since the project started, but until now there was no single command that surfaced the contract to an agent. An agent that wanted to author a Steampipe ingest for a new asset type had to:\nThe agent runs one command and gets all three. The agent runs make gen-steampipe-mappings\nand gets a starting-point YAML it can refine. The integration is a lot easier.\nNothing in the Stave Go binary changed across the five iterations except the new cmd/contract/\ndirectory (one file, ~330 LOC). The agent infrastructure is:\nexamples/agents/stave_transform.py\n— reference loader (Python)contracts/steampipe/*.yaml\n— 17 mappings (committed)scripts/gen-steampipe-mappings.py\n— auto-generator (Python, ~280 LOC)scripts/steampipe-columns.json\n— cached column catalog (refreshable from a live Steampipe install)The deterministic policy engine is unchanged. The contract evolves; the engine doesn't.\nReplace Steampipe with any external data source — AWS Config, Terraform state, your internal inventory, Salesforce, OpenAPI specs — and the pipeline shape is the same:\nDefine the canonical target contract. For Stave it's obs.v0.1\nJSON with per-asset-type sub-schemas. For your tool, it's whatever shape your engine reads.\nAuthor one mapping per source per asset type. YAML is fine. Operations list with field/static/extract/computed semantics covers most transform shapes.\nShip a discovery command. One CLI that joins the schema + the path list + the mapping into a single agent-readable output. The agent stops needing your team's docs.\nAuto-generate the boring half. Most column→path mappings are name-similarity. The exceptions are rare enough to hand-author. Use the hand-authored set as a ground-truth corpus to measure your generator's accuracy.\nMark uncertainty explicitly. _review_required\n, _unmatched_paths\n, derived_properties:\n. Silent gaps are worse than loud ones.\nFive points, one functioning pipeline. The customer who needed three pages of collector setup now needs make gen-steampipe-mappings\nand an agent that can read a YAML.", "url": "https://wpnews.pro/news/the-contract-is-the-interface-agent-driven-steampipe-stave-in-one-command", "canonical_source": "https://dev.to/bala_paranj_059d338e44e7e/the-contract-is-the-interface-agent-driven-steampipe-stave-in-one-command-17lj", "published_at": "2026-05-23 11:15:39+00:00", "updated_at": "2026-05-23 11:33:11.670002+00:00", "lang": "en", "topics": ["cloud-computing", "cybersecurity", "developer-tools", "open-source", "enterprise-software"], "entities": ["Stave", "Steampipe", "AWS", "S3", "IAM", "EKS", "AWS Config", "Terraform"], "alternates": {"html": "https://wpnews.pro/news/the-contract-is-the-interface-agent-driven-steampipe-stave-in-one-command", "markdown": "https://wpnews.pro/news/the-contract-is-the-interface-agent-driven-steampipe-stave-in-one-command.md", "text": "https://wpnews.pro/news/the-contract-is-the-interface-agent-driven-steampipe-stave-in-one-command.txt", "jsonld": "https://wpnews.pro/news/the-contract-is-the-interface-agent-driven-steampipe-stave-in-one-command.jsonld"}}