Form-to-JSON Extraction Agent A new Form-to-JSON Extraction Agent uses an AgentAz governance specification to extract form data into schema-valid JSON, scoring confidence per field and flagging ambiguous or missing fields for human review. The agent operates under a read-only trust level A2, with cost and loop bounds, and never fabricates data. Overview Turns filled-in forms and documents into schema-valid JSON mapped to your fields. Scores confidence per field so low-certainty values are visible, not hidden. Flags illegible, ambiguous, or missing fields for human review instead of guessing. Defensive: extracts only what's present, never fabricates a field, and validates the schema. AgentAz™ specification A lightweight, design-time governance spec for security review. It documents what this agent is authorized to do — and why — and pairs with whatever policy engine you already run. It does not enforce anything at runtime. Machine-readable contract agentaz.json , validated against the open AgentAz™ JSON Schema — bundled for offline use and published at a permanent URL: { "$schema": "./agentaz.schema.json", "version": "2.0.0", "last reviewed": "2026-06-24", "agent id": "form-to-json-agent", "trust level": "A2", "dna pattern": "Extraction", "worst case action": "Returns a wrong/absent field for review; never fabricates. Cannot write data.", "authority boundary": "Extracts only present fields and validates; never guesses; write tools absent.", "tags": "document-processing", "forms", "extraction", "read-only" , "tool boundary": { "allowed tools": "read form", "extract present fields", "validate schema", "mark absent" , "execution tools absent": true }, "output boundary": { "format": "structured json", "never emits": "data write" , "never fabricates": true }, "cost boundary": { "max usd per trace loop": 0.18, "alert threshold usd": 0.12 }, "loop boundary": { "max reasoning turns": 6 }, "human handoff": { "triggers": "ambiguous field", "low confidence" , "destination": "review queue" }, "audit": { "append only": true, "logs": "fields" } } New to this? Read the AgentAz specification guide /agentaz-specifications — Trust Levels, DNA patterns, and how it complements your runtime. AgentAz™ is open source under Apache-2.0 https://www.apache.org/licenses/LICENSE-2.0 — schema frozen v1.0.0 and source on GitHub https://github.com/agent-kits/agentaz . Governance matrix A scannable summary of this blueprint's governance coverage, derived from its AgentAz™ specification. It documents the boundaries that already ship — not new functionality. | Agent goal | Bounded by the authority spec above | |---|---| | Trust Level | A2 — Recommend | | Tool access | Least privilege — execution tools absent read-only | | Context handling | Grounded in provided inputs; cites or flags rather than guessing | | Memory strategy | Task-scoped; no persistent cross-session memory | | Human approval | Required on ambiguous field, low confidence → review queue | | Audit trail | Append-only log fields | | Cost & loop bounds | ≤ $0.18 per loop · ≤ 6 reasoning turns | | Recovery / escalation | Escalates to review queue | Agent component mapping A framework-neutral view of how this blueprint maps to standard agent-architecture components the vocabulary common to ADK-style frameworks . It describes structure for clarity — not an official integration or certified compatibility. | Agent | Primary reasoner — Recommend authority A2 | |---|---| | Tools | read form, extract present fields, validate schema, mark absent — execution tools absent read-only | | Memory | Task-scoped working context; no persistent cross-session memory | | Guardrails | Worst-case classified A2 ; no execution tools; ≤ $0.18/loop · ≤ 6 turns | | Evaluator | Confidence and authority-boundary checks; low-confidence or out-of-bounds results are flagged, not actioned | | Handoff | Escalates to review queue on ambiguous field, low confidence | Failure modes Specific ways this blueprint can fail, and how it is designed to detect, contain, and recover from each — the boundaries that make it safe to run, stated plainly. Fabricates a value for a field that is actually blank. - Detection - Blank fields are detected and the agent marks them absent. - Mitigation - It never guesses — missing fields are marked absent, not invented. - Recovery - The human fills the gap from the source. Misreads a field through an OCR error or wrong field mapping. - Detection - Low-confidence fields are flagged and schema validation runs. - Mitigation - Extraction only — output feeds a human or system after review. - Recovery - The reviewer corrects the field. Maps a value to the wrong schema key. - Detection - Schema validation and type checks catch mismatches. - Mitigation - Ambiguous mappings are flagged. - Recovery - The mapping is corrected before downstream use. Evaluation Field-level accuracy and zero fabrication are primary — an invented value for a blank field is the silent failure. | Field accuracy | Share of extracted fields matching ground truth, reported per field type. | |---|---| | Fabrication rate | Frequency of invented values for blank fields — should be near zero. | | Schema validity | Share of outputs that pass schema and type validation. | | Mapping accuracy | Share of values mapped to the correct schema key. | | Latency | Time to extract per form. | Recommended approach. Use a labeled set of forms, including ones with blank fields, with known values; report per-field accuracy and treat any invented value for a blank as a critical fabrication. Validate all output against the schema. When to use Use it when - You process forms or documents into structured data at volume. - You want per-field confidence and a clean schema-valid output. - You want illegible or missing fields flagged for review, not guessed. - You need faithful extraction you can trust downstream. Avoid it when - You want it to fill in missing fields by inference — it won't guess values. - You have no schema for it to map to and validate against. - You can't review flagged low-confidence fields. - Your documents are free-form with no extractable fields. System prompt You are a Form-to-JSON Extraction Agent. You read a filled-in form/document and output schema-valid JSON of the fields, with per-field confidence. You are judged on accurate, faithful extraction and on NEVER guessing, inventing, or inferring a field value that isn't actually on the document. == CORE PRINCIPLES == 1. Extract only what's present. Output a value only if it's actually on the document. If a field is blank, missing, or unreadable, mark it as such — never fill it with a guess or an inferred value. 2. Confidence per field. Attach a confidence to every extracted value. Low-confidence, ambiguous, or poor-OCR fields are flagged for human review. 3. Faithful to the document. Capture what the form says. Don't normalize away meaning, don't "correct" values, and don't merge or split fields in ways that change them. == HARD RULES NON-NEGOTIABLE == - NO FABRICATED FIELDS: Never invent, infer, or guess a value. Not on the document = null + "missing"/"illegible" flag. This is absolute — a confidently wrong value is worse than a flagged blank. - NO SENSITIVE INFERENCE: Never derive a sensitive value ID number, DOB, etc. that isn't explicitly present. - CONFIDENCE + FLAGS: Every field carries a confidence; low-confidence/ambiguous/illegible - review required. - SCHEMA-VALID: Output must validate against the provided schema types, required, enums . Report fields that don't fit rather than coercing silently. - FAITHFUL: Preserve values as written; flag rather than "fix" suspicious values. == METHOD == - Read the document. For each schema field, locate the value, extract it as written, score confidence, and flag missing/illegible/ambiguous. Validate against the schema. Return JSON + a review list. == OUTPUT FORMAT return ONE JSON object == { "document type": "