The contract is the interface: agent-driven Steampipe Stave in one command The article describes Stave, a cloud-security tool that replaces the traditional collector-based onboarding model with a "contract" approach. Instead of shipping a collector that must be configured for each customer's environment, Stave provides JSON schemas and Steampipe-to-Stave column mappings that allow any agent to ingest data by satisfying the contract. The system uses a declarative YAML format with four operation types (field, static, extract, computed) to transform data, and ships 3,957 controls across 109 asset types with auto-generated JSON schemas. Consider a typical cloud-security tool's onboarding flow. A customer installs the tool. The tool's collector tries to authenticate to AWS, fails because the role isn't there yet, the customer follows three pages of setup docs, the role gets created, the collector authenticates, the collector runs, the collector finds nothing because the tool only knows about S3 and IAM and the customer's workload is on EKS. End of week one. We don't ship a collector. Stave https://github.com/sufield/stave evaluates obs.v0.1 JSON snapshots — whatever produces them. That decision sounds extreme until you've watched the same "the collector doesn't see our environment" conversation play out three times. So instead of a collector, Stave ships a contract : per-asset JSON Schemas, per-asset Steampipe→Stave column mappings, and one command stave contract show that emits everything an agent needs to author its own ingest. The customer's preferred source Steampipe, AWS Config, Terraform state, an internal inventory API plugs in by satisfying the contract. This post walks through the steps that closes the pipeline. What the customer sees bash $ stave contract show --asset-type aws s3 bucket Contract: aws s3 bucket Schema: schemas/observation/v1/asset-types/aws s3 bucket.schema.json Controls: 102 | Chains: 15 Property paths catalog reads these — sorted by chain unlock, then control unlock : PATH CONTROLS CHAINS SEVERITY NOTE ──── ──────── ────── ──────── ──── storage.kind 91 15 critical storage.tags.data-classification 14 2 critical intent storage.access.public read 8 2 critical storage.controls.public access fully blocked 3 1 critical ... Steampipe mapping: contracts/steampipe/aws s3 bucket.yaml That output names everything the customer's ingest agent needs: - The schema — the JSON Schema the agent's output must satisfy - The property paths — what fields the catalog actually reads on this asset type, ranked by how many controls and chains they unlock - The mapping — a ready-to-run YAML telling the agent which Steampipe column maps to which Stave property path For the 17 most catalog-impactful asset types, the mapping is committed. For the rest, the customer's agent has the schema; it can author its own. The YAML mapping format The Steampipe→Stave mapping is one ordered list of operations per asset type. Four operation kinds cover every transform shape: - field — direct column → property mapping with optional coerce/default - static — a fixed value e.g. properties.storage.kind: bucket - extract — pull a nested JSON value from a JSON-shaped column - computed — derive from already-set property paths all / any reduction Operations run in YAML order; later ops can read paths written by earlier ones. The first mapping we wrote — contracts/steampipe/aws s3 bucket.yaml — replaced a Python function with a declarative file. The loader changes are 100 lines; the resulting observation is byte-identical to what the imperative function produced. operations: - kind: static path: properties.storage.kind value: bucket - kind: field path: properties.storage.tags column: tags default: {} type: dict - kind: extract path: properties.storage.encryption.algorithm column: server side encryption configuration json path: "Rules.0.ApplyServerSideEncryptionByDefault.SSEAlgorithm" key variants: Rules: rules SSEAlgorithm: sse algorithm default: "none" - kind: computed path: properties.storage.controls.public access fully blocked op: all inputs: - properties.storage.controls.public access block.block public acls - properties.storage.controls.public access block.block public policy - properties.storage.controls.public access block.ignore public acls - properties.storage.controls.public access block.restrict public buckets The format is the contract. Any agent in any language can parse the YAML and produce conforming observations. Per-asset JSON Schemas The catalog ships 3,957 controls; together they declare applicable asset types for 109 distinct asset types. To validate that a mapping's target paths are real, we needed a JSON Schema per asset type. Hand-authoring 109 schemas is a Tuesday lost; the schema generator already existed it walks every control's predicate AST and infers the property paths + types , but defaulted to the top-3 most-used types. go run ./internal/tools/genassetschemas/... -top 200 make sync-schemas Output: 109 per-asset schemas under schemas/observation/v1/asset-types/ . Every level is additionalProperties: true — the schemas are discoverability artifacts , not restrictive gates. A schema that lists one property security hub.enabled on aws securityhub account , for example tells an agent "this asset type matters to the catalog; here is the one property to populate." Thin schemas are still useful. Ten hand-authored mappings The next 10 asset types by control coverage — aws iam role , aws lambda function , aws cognito user pool , aws cloudtrail trail , aws kms key , aws ec2 instance , aws sqs queue , aws iam user , aws opensearch domain , aws stepfunctions state machine — got hand-authored mappings. They served two purposes: actual coverage for the most-asked-for types, and a ground-truth corpus to validate Iter 5's auto-generator against. Every mapping carries a derived properties: block listing the catalog-read properties that cannot come from a single Steampipe column. Example from aws iam role.yaml : derived properties: - path: properties.identity.role.cross account trust without external id source: "Parse trust policy — detect external Account in Principal without sts:ExternalId condition" - path: properties.identity.permission categories.has incompatible categories source: Policy analysis against controldata/taxonomy/permission categories.yaml - path: properties.identity.access advisor.available source: iam:GenerateServiceLastAccessedDetails + iam:GetServiceLastAccessedDetails separate API call per role That block is the agent's TODO list. Silently producing an observation without those derived properties is the failure mode the derived properties: section prevents — Stave's controls don't see the property, the catalog finds nothing wrong, the breach happens anyway. The Contract Show Command The three sources — schema, predicate index, mapping file — already existed. Joining them required three separate file reads. The new command joins them once: stave contract show --asset-type aws iam role --format json { "asset type": "aws iam role", "has schema": true, "schema path": "schemas/observation/v1/asset-types/aws iam role.schema.json", "controls count": 198, "chains count": 38, "property paths": { "path": "properties.identity.kind", "controls count": 196, "chains count": 35, "max severity": "critical", "is intent property": false }, ... , "steampipe mapping": "contracts/steampipe/aws iam role.yaml" } Or: stave contract show --list Asset types with controls: 109 schema: 109, steampipe mapping: 17 TYPE SCHEMA CONTROLS CHAINS MAPPING ──── ────── ──────── ────── ─────── aws iam role yes 198 38 steampipe aws s3 bucket yes 102 15 steampipe aws lambda function yes 169 12 steampipe aws bedrock agent yes 24 5 - ... The implementation reuses everything already in the codebase: compose.LoadControlsFrom , compose.LoadChainDefinitions , predindex.Build the same index the stave gaps command uses , and a 50-line helper in internal/contracts/schema/load.go to access the embedded per-asset schemas. The command is ~330 lines; nothing is new data — it's projection over existing data. Auto-generator The remaining ~98 asset types could be hand-authored or auto-generated. We tried auto. The generator joins the cached Steampipe column catalog with each per-asset schema's property paths, applies a four-rule matching priority per-asset overrides, schema-path lookup with multi-token scoring, tags convention, fallback to properties.