{"slug": "schemaflow-agentic-database-change-impact-analysis-sql-gen-and-eval-guardrails", "title": "SchemaFlow: Agentic Database Change Impact Analysis, SQL Gen and Eval Guardrails", "summary": "OpenAI released SchemaFlow, an AI-assisted database change workflow using the OpenAI Agents SDK, which converts natural-language requests into structured JSON, performs impact analysis, generates SQL, and validates output with guardrails. The system aims to reduce risks in database changes by providing auditable bundles with traceable analysis and implementation artifacts.", "body_md": "This cookbook walks through an end-to-end **AI-assisted database change workflow** using the OpenAI Agents SDK.\n\nIt demonstrates how OpenAI’s tooling ecosystem can be applied to orchestrate complex, data-intensive workflows across modern enterprise infrastructures. While the current implementation focuses on a retail-oriented schema change and impact-analysis use case, the underlying architectural patterns are domain-agnostic and extensible. The same workflow design can be adapted across industries such as manufacturing, pharmaceuticals, healthcare, logistics, finance, and supply chain operations — wherever structured data workflows, operational reasoning, retrieval-augmented analysis, and automated validation are required.\n\nThe running example is a retail loyalty-tier change, but the same pattern applies to many database-change requests where teams need traceable impact analysis and reviewable implementation output.\n\nThe workflow starts from a natural-language database change request, converts it into structured JSON, optionally grounds impact analysis with PDF-based File Search context, generates a safe rollout plan, drafts SQL across data platform layers, validates the output with deterministic guardrails, saves a reusable artifact, and optionally evaluates the flow with Promptfoo.\n\nThe notebook is intentionally self-contained: all core workflow logic, prompts, guardrails, artifact generation, and eval runtime files are created from notebook cells.\n\n## Overview\n\nSchema changes are deceptively simple. A request like “add a nullable column and backfill it” can affect landing tables, staging models, dimensional tables, marts, reporting logic, lineage assumptions, validation checks, rollback procedures, and release sequencing.\n\nThe examples use retail customer data because the dependencies are easy to see, but the same kinds of handoffs show up in many analytics and platform teams.\n\nThis cookbook demonstrates a practical pattern for using agents as a **change-analysis and implementation assistant** for database engineering work. Instead of asking one model to produce a final SQL script directly, the workflow breaks the task into explicit stages:\n\n- Parse the natural-language request into structured JSON.\n- Analyze impacted objects and operational risks.\n- Create a rollout plan with prechecks, postchecks, and rollback guidance.\n- Generate SQL across platform layers.\n- Run deterministic sanity checks.\n- Save a machine-readable artifact.\n- Optionally run Promptfoo evals against the current flow.\n\nThe result is not just a generated SQL script. It is an auditable bundle containing the interpreted change request, impact analysis, plan, SQL, validation results, optional RAG evidence summaries, and eval outputs.\n\n## Why This Matters\n\nDatabase change requests often move through several handoffs: product owners describe the need, data engineers interpret it, platform teams assess risk, analytics engineers propagate the field downstream, and reviewers check whether the change is safe. Important context can be lost at each step.\n\nSchemaFlow addresses this by turning a free-form change request into a structured, inspectable workflow.\n\nThis matters because database changes can create hidden failure modes:\n\n- A column added to ODS may not be propagated into staging, core, or marts.\n- A nullable field may accidentally be generated as\n`NOT NULL`\n\n. - Backfill logic may be omitted even though the request asks for historical population.\n- Index requirements may be missed.\n- Downstream reporting dependencies may be unknown unless reference documentation is consulted.\n- Generated SQL may look plausible but fail basic consistency checks.\n\nThis cookbook shows a pattern for reducing those risks with staged agent reasoning, typed outputs, optional retrieval context, deterministic guardrails, saved artifacts, and repeatable evals.\n\n## Key Benefits\n\n**Structured interpretation**– Converts natural-language database requests into a normalized`change_json`\n\ncontract.**Separation of responsibilities**– Uses specialized agents for parse, impact analysis, rollout planning, and SQL generation.** Optional RAG grounding**– Lets the impact-analysis agent use File Search over an uploaded PDF, such as an IFD, schema spec, or lineage document.** Typed stage outputs**– Uses Pydantic models and Agents SDK output schemas for parse, impact, and plan stages.** Guardrail-first workflow**– Adds deterministic checks between stages so obvious failures are caught before downstream steps consume bad state.** Traceability**– Emits OpenAI Agents SDK traces and spans for agent runs, guardrails, artifact generation, and eval execution.** Portable artifacts**– Saves the final workflow bundle as JSON under`artifacts/notebook_runs/`\n\n.**Eval-ready design**– Generates Promptfoo provider, assertion, config, and result files from the live notebook state.** No database side effects**– Produces draft SQL and validation output without executing against a live database.\n\n## What You’ll Build\n\nBy the end of this notebook, you will have a working SchemaFlow pipeline that produces:\n\n-\nA parsed database change request:\n\n- title\n- domain\n- target schema\n- target table\n- normalized operations\n- notes\n\n-\nAn impact-analysis report:\n\n- impacted tables, columns, indexes, views, or relationships\n- risks\n- assumptions\n- optional File Search evidence summaries\n\n-\nA rollout plan:\n\n- implementation steps\n- prechecks\n- postchecks\n- rollback actions\n\n-\nA draft SQL script with four required sections:\n\n`LANDING (ODS)`\n\n`STAGING (STG)`\n\n`CORE (DIM/FACT/VIEW)`\n\n`MARTS (SERVING)`\n\n-\nA validation result:\n\n- expected table checks\n- expected column checks\n- required keyword checks such as\n`ALTER TABLE`\n\n,`UPDATE`\n\n, or`CREATE INDEX`\n\n-\nA saved JSON artifact:\n\n- change request\n- impact analysis\n- plan\n- SQL\n- validation\n- optional RAG metadata\n\n-\nA Promptfoo eval harness:\n\n- Python provider\n- Python assertion file\n- generated Promptfoo config\n- parse-only eval case\n- full-flow eval case\n- timestamped JSON and HTML eval reports\n\n## Introduction: Use Case and Solution\n\nThis cookbook focuses on a common enterprise data-engineering scenario: a stakeholder requests a database schema change in natural language, and the data team needs to turn that request into an implementation-ready plan.\n\nHere, the retail domain is just a concrete way to make the workflow tangible. The same staged approach can be adapted to other source systems, data products, and review processes.\n\nThe default request in this notebook is:\n\n```\nAdd LOYALTY_TIER VARCHAR(20) to ODS.ODS_CUSTOMER_PROFILE as nullable.\nBackfill from CORE.DIM_CUSTOMER on CUSTOMER_ID where IS_CURRENT=true.\nAdd a non-unique index on (CUSTOMER_ID, LOYALTY_TIER).\n```\n\nA human data engineer would typically need to answer several questions before writing production SQL:\n\n- What table and schema are being changed?\n- What exact column, type, and nullability were requested?\n- Is historical backfill required?\n- Does the request imply an index?\n- Which downstream layers need the field propagated?\n- What risks should reviewers look for?\n- What checks should be run before and after deployment?\n- What rollback steps are reasonable?\n- Does the generated SQL include the required elements?\n\nSchemaFlow implements this as a staged agent workflow. Each stage creates a typed intermediate output that the next stage consumes. Deterministic checks then validate the outputs before the notebook saves the final bundle and optionally runs evals.\n\n## Workflow Overview\n\nAt a high level, SchemaFlow follows this sequence:\n\nThe notebook is organized so readers can run the core workflow first and then decide whether they want to run the optional Promptfoo evaluation section.\n\n## Table of Contents\n\n### Conceptual Guide\n\n[Overview](#overview)[Why This Matters](#why-this-matters)[Key Benefits](#key-benefits)[What You’ll Build](#what-youll-build)[Introduction: Use Case and Solution](#introduction-use-case-and-solution)[Workflow Overview](#workflow-overview)[Architecture - Design Patterns](#architecture-design-patterns)[System Design](#system-design)[Execution Workflow](#execution-workflow)\n\n### Notebook Implementation\n\n[Environment Setup](#environment-setup)[Input](#input)[Optional PDF RAG Context](#optional-pdf-rag-context)[Stages 1-2: Parse Change Request + Impact Analysis](#stages-1-2-parse-change-request--impact-analysis)[Stages 3-4: Execution Plan + SQL Generation](#stages-3-4-execution-plan--sql-generation)[Stage 5: Lightweight SQL Sanity Checks](#stage-5-lightweight-sql-sanity-checks)[Final Bundle](#final-bundle)[Save Artifact](#save-artifact)[Optional Cleanup](#optional-cleanup)[Evaluate the Flow with Promptfoo](#evaluate-the-flow-with-promptfoo)\n\n### Reference\n\n## Architecture - Design Patterns\n\nSchemaFlow uses a staged, contract-driven agent architecture. The goal is to avoid treating the model as a single black-box SQL generator. Instead, each stage has a narrow responsibility and produces an output that can be inspected, validated, traced, and reused.\n\n### 1. Agent Specialization\n\nEach agent performs one primary task:\n\n| Agent | Responsibility | Main Output |\n|---|---|---|\n| Parse Agent | Extract structured fields from the natural-language request | `change_json` |\n| Impact Agent | Identify affected objects, assumptions, and risks | `impact_json` |\n| Plan Agent | Convert the change and impact into rollout steps | `plan_json` |\n| SQL Agent | Draft SQL across data platform layers | `sql_text` |\n\nThis specialization makes the workflow easier to debug. If SQL is missing a column, you can inspect whether the issue started in parsing, impact analysis, planning, or SQL generation.\n\n### 2. Typed Output Contracts\n\nThe notebook defines Pydantic models for the structured stages:\n\n`ChangeRequestModel`\n\n`ImpactModel`\n\n`PlanModel`\n\nThose models are wrapped with `AgentOutputSchema`\n\nso the Agents SDK knows the expected output shape. The workflow also normalizes outputs after model calls to ensure expected keys exist before downstream stages run.\n\n### 3. Retrieval-Augmented Impact Analysis\n\nThe PDF RAG section is optional. When `PDF_PATH`\n\nis set, the notebook:\n\n- Creates an OpenAI vector store.\n- Uploads the PDF.\n- Lets OpenAI parse, chunk, embed, and index it.\n- Gives the Impact Agent a\n`FileSearchTool`\n\n. - Captures a summary of returned File Search results.\n\nThis is useful when the change request needs grounding in an IFD, schema document, lineage file, data contract, or architecture reference.\n\n### 4. Guardrail Gates Between Stages\n\nThe notebook adds deterministic checks after major stages:\n\n- Stages 1-2 guardrails validate parse and impact outputs.\n- Stages 3-4 guardrails validate plan completeness, data type propagation, and nullability handling.\n- Stage 5 SQL checks validate expected table, column, and SQL keyword presence.\n- Post-artifact checks verify the saved JSON artifact exists and round-trips.\n- Pre-Promptfoo checks verify the notebook state is ready for evals.\n\nThese checks do not replace human review, but they catch common silent failures early.\n\n### 5. Artifact-Centered Execution\n\nThe final bundle is the main workflow artifact. It captures the state needed to review or debug the run:\n\n```\nbundle = {\n  \"summary\": ...,\n  \"rag\": ...,\n  \"change_json\": ...,\n  \"impact_json\": ...,\n  \"plan\": ...,\n  \"sql\": ...,\n  \"validation\": ...\n}\n```\n\nThe notebook saves this bundle under `artifacts/notebook_runs/`\n\n.\n\n### 6. Eval Runtime Generated from Notebook State\n\nPromptfoo runs in a separate process, so it cannot directly read variables from the active notebook kernel. To solve this, Section 10 writes a small reusable Python module and Promptfoo runtime files from the current notebook state.\n\nThis ensures that prompt edits, `CHANGE_TEXT`\n\nedits, and model configuration changes are reflected when the eval files are regenerated.\n\n## System Design\n\n### Component Architecture\n\n### Primary Runtime Objects\n\n| Object | Created in | Purpose |\n|---|---|---|\n`CHANGE_TEXT` | Input section | The natural-language database change request |\n`change_json` | Stage 1 | Structured interpretation of the request |\n`rag_vector_store_id` | Optional PDF RAG section | Hosted vector store ID for uploaded PDF context |\n`rag_file_search_results` | Stage 2 | Summary of File Search results returned to the Impact Agent |\n`impact_json` | Stage 2 | Impacted objects, risks, and assumptions |\n`plan_json` | Stage 3 | Rollout plan, checks, and rollback guidance |\n`sql_text` | Stage 4 | Draft SQL script |\n`validation` | Stage 5 | Deterministic SQL sanity-check result |\n`bundle` | Final Bundle section | Consolidated workflow output |\n`out_path` | Save Artifact section | Saved JSON artifact path |\n`promptfoo_config` | Promptfoo section | Generated eval configuration |\n\n### Important Boundary\n\nSchemaFlow generates draft implementation artifacts. It does **not** execute SQL against a database, apply migrations, open pull requests, or modify production systems.\n\n## Execution Workflow\n\nRun the notebook in order.\n\n### Core Workflow\n\n-\n**Environment Setup**- Imports dependencies.\n- Verifies the OpenAI Agents SDK version.\n- Reads\n`OPENAI_API_KEY`\n\n. - Configures tracing and model selection.\n\n-\n**Input**- Defines\n`CHANGE_TEXT`\n\n. - This is the only required business input for the core workflow.\n\n- Defines\n-\n**Optional PDF RAG Context**- Leave\n`PDF_PATH = None`\n\nto run without retrieval. - Set\n`PDF_PATH`\n\nto a local PDF to enable File Search context for impact analysis.\n\n- Leave\n-\n**Stages 1-2**- Parse the change request.\n- Analyze impact.\n- Optionally use File Search during impact analysis.\n\n-\n**Stages 1-2 Guardrails**- Confirm parse output is well-formed.\n- Confirm impact output includes the target.\n- Confirm impacted objects contain required fields.\n\n-\n**Stages 3-4**- Generate an execution plan.\n- Generate SQL across landing, staging, core, and mart layers.\n\n-\n**Stages 3-4 Guardrails**- Confirm plan sections are populated.\n- Confirm data type propagation.\n- Confirm nullability behavior matches the request.\n\n-\n**Stage 5 SQL Sanity Checks**- Check for empty SQL.\n- Check expected target table and columns.\n- Check required SQL actions implied by the request.\n\n-\n**Final Bundle and Artifact**- Assemble the full output bundle.\n- Save it as JSON.\n- Verify the artifact round-trips successfully.\n\n### Optional Eval Workflow\n\n-\n**Pre-Promptfoo Checks**- Confirm the notebook state is ready for evals.\n\n-\n**Promptfoo Runtime Generation**- Create a reusable SchemaFlow core module.\n- Write a Promptfoo provider.\n- Write a Promptfoo assertion file.\n- Generate Promptfoo test cases and config.\n\n-\n**Promptfoo Eval Execution**- Run parse-only and full-flow evals.\n- Save timestamped JSON and HTML reports.\n- Refresh\n`schemaflow_cookbook_eval_latest.*`\n\naliases.\n\n## 1) Environment Setup\n\nThis section prepares the runtime for the SchemaFlow workflow.\n\nThe setup cell does the following:\n\n- Imports standard Python utilities used throughout the notebook.\n- Imports the OpenAI client.\n- Imports the OpenAI Agents SDK primitives:\n`Agent`\n\n`Runner`\n\n`RunConfig`\n\n`AgentOutputSchema`\n\n`FileSearchTool`\n\n- tracing and span helpers\n\n- Verifies that the installed\n`openai-agents`\n\npackage meets the minimum required version. - Reads\n`OPENAI_API_KEY`\n\nfrom the environment or prompts for it. - Sets the model with\n`OPENAI_MODEL`\n\n, defaulting to`gpt-5.5`\n\n. - Creates a trace group ID so all related agent runs and guardrail spans can be grouped together.\n\nThe workflow intentionally enables sensitive trace payloads for this demo so prompts, outputs, eval bundles, and tool data are visible in traces. For production usage, review this setting before handling private data.\n\n```\n%pip install --quiet -U \"openai\" \"openai-agents>=0.17.0\"\npython\nimport os\nimport json\nimport re\nimport uuid\nfrom datetime import datetime, timezone\nfrom getpass import getpass\nfrom importlib.metadata import PackageNotFoundError, version\n\ntry:\n    from openai import OpenAI\nexcept Exception as e:\n    raise RuntimeError(\"Install dependency first: pip install -U openai\") from e\n\nMIN_AGENTS_SDK_VERSION = \"0.17.0\"\ntry:\n    from agents import (\n        Agent,\n        AgentOutputSchema,\n        FileSearchTool,\n        Runner,\n        RunConfig,\n        custom_span,\n        flush_traces,\n        function_span,\n        guardrail_span,\n        trace,\n    )\nexcept Exception as e:\n    raise RuntimeError(\n        'Install or upgrade the OpenAI Agents SDK first: pip install -U \"openai-agents>=0.17.0\"'\n    ) from e\n\ndef _version_tuple(value):\n    match = re.match(r\"^(\\d+)\\.(\\d+)\\.(\\d+)\", str(value or \"\"))\n    return tuple(int(part) for part in match.groups()) if match else (0, 0, 0)\n\ntry:\n    AGENTS_SDK_VERSION = version(\"openai-agents\")\nexcept PackageNotFoundError as e:\n    raise RuntimeError('Install the OpenAI Agents SDK first: pip install -U \"openai-agents>=0.17.0\"') from e\n\nif _version_tuple(AGENTS_SDK_VERSION) < _version_tuple(MIN_AGENTS_SDK_VERSION):\n    raise RuntimeError(\n        f'OpenAI Agents SDK {MIN_AGENTS_SDK_VERSION}+ is required; found {AGENTS_SDK_VERSION}. '\n        'Upgrade with: pip install -U \"openai-agents>=0.17.0\"'\n    )\n\ndef _clean_openai_api_key(value):\n    key = (value or \"\").strip()\n    if not key:\n        raise RuntimeError(\"OPENAI_API_KEY is required.\")\n    return key\n\nif not os.getenv(\"OPENAI_API_KEY\", \"\").strip():\n    os.environ[\"OPENAI_API_KEY\"] = getpass(\"Enter your OpenAI API key: \")\nos.environ[\"OPENAI_API_KEY\"] = _clean_openai_api_key(os.getenv(\"OPENAI_API_KEY\"))\nOPENAI_ORG_ID = os.getenv(\"OPENAI_ORG_ID\", \"\").strip()\nif OPENAI_ORG_ID:\n    os.environ[\"OPENAI_ORG_ID\"] = OPENAI_ORG_ID\n\nMODEL = os.getenv(\"OPENAI_MODEL\", \"gpt-5.5\")\nTRACE_INCLUDE_SENSITIVE_DATA = os.getenv(\"OPENAI_AGENTS_TRACE_INCLUDE_SENSITIVE_DATA\", \"false\").lower() in {\"1\", \"true\", \"yes\", \"on\"}\nos.environ[\"OPENAI_AGENTS_TRACE_INCLUDE_SENSITIVE_DATA\"] = \"true\" if TRACE_INCLUDE_SENSITIVE_DATA else \"false\"\nSCHEMAFLOW_TRACE_GROUP_ID = os.getenv(\"SCHEMAFLOW_TRACE_GROUP_ID\") or (\n    \"schemaflow-cookbook-\" + datetime.now(timezone.utc).strftime(\"%Y%m%dT%H%M%SZ\") + \"-\" + uuid.uuid4().hex[:8]\n)\nos.environ[\"SCHEMAFLOW_TRACE_GROUP_ID\"] = SCHEMAFLOW_TRACE_GROUP_ID\nclient = OpenAI(api_key=os.environ[\"OPENAI_API_KEY\"])\nprint(\"Using model:\", MODEL)\nprint(\"OpenAI Agents SDK:\", AGENTS_SDK_VERSION)\nprint(\"OpenAI organization:\", os.getenv(\"OPENAI_ORG_ID\") or \"(default for API key)\")\nprint(\"Trace group:\", SCHEMAFLOW_TRACE_GROUP_ID)\nprint(\"Trace payloads include prompts/outputs:\", TRACE_INCLUDE_SENSITIVE_DATA)\npython\nfrom concurrent.futures import ThreadPoolExecutor\nfrom pydantic import BaseModel, ConfigDict, Field\n\nclass SchemaFlowBaseModel(BaseModel):\n    model_config = ConfigDict(extra=\"allow\")\n\nclass OperationModel(SchemaFlowBaseModel):\n    op: str\n    details: dict = Field(default_factory=dict)\n\nclass ChangeRequestModel(SchemaFlowBaseModel):\n    title: str | None = None\n    domain: str | None = None\n    target_schema: str | None = None\n    target_table: str | None = None\n    operations: list[OperationModel] = Field(default_factory=list)\n    notes: list = Field(default_factory=list)\n\nclass ImpactObjectModel(SchemaFlowBaseModel):\n    type: str\n    name: str\n    reason: str\n    source: str\n\nclass ImpactModel(SchemaFlowBaseModel):\n    impacted_objects: list[ImpactObjectModel] = Field(default_factory=list)\n    risks: list[str] = Field(default_factory=list)\n    assumptions: list[str] = Field(default_factory=list)\n\nclass PlanStepModel(SchemaFlowBaseModel):\n    id: str\n    description: str\n\nclass PlanModel(SchemaFlowBaseModel):\n    plan_steps: list[PlanStepModel] = Field(default_factory=list)\n    prechecks: list[str] = Field(default_factory=list)\n    postchecks: list[str] = Field(default_factory=list)\n    rollback: list[str] = Field(default_factory=list)\n\nCHANGE_OUTPUT_SCHEMA = AgentOutputSchema(ChangeRequestModel, strict_json_schema=False)\nIMPACT_OUTPUT_SCHEMA = AgentOutputSchema(ImpactModel, strict_json_schema=False)\nPLAN_OUTPUT_SCHEMA = AgentOutputSchema(PlanModel, strict_json_schema=False)\n\ndef _parse_json_text(text: str):\n    text = (text or \"{}\").strip()\n    if text.startswith(\"```\"):\n        text = re.sub(r\"^```(?:json)?\\s*\", \"\", text)\n        text = re.sub(r\"\\s*```$\", \"\", text).strip()\n    try:\n        return json.loads(text)\n    except json.JSONDecodeError:\n        match = re.search(r\"\\{.*\\}\", text, flags=re.DOTALL)\n        if not match:\n            raise\n        return json.loads(match.group(0))\n\ndef _model_dump(value):\n    if value is None or isinstance(value, (str, int, float, bool, bytes)):\n        return value\n    if isinstance(value, type):\n        return value\n    if hasattr(value, \"model_dump\"):\n        try:\n            return value.model_dump()\n        except TypeError:\n            pass\n    if hasattr(value, \"to_dict\"):\n        try:\n            return value.to_dict()\n        except TypeError:\n            pass\n    if hasattr(value, \"__dict__\"):\n        try:\n            return {k: v for k, v in vars(value).items() if not k.startswith(\"_\")}\n        except TypeError:\n            pass\n    return value\n\ndef _agent_output_to_json(value):\n    value = _model_dump(value)\n    if isinstance(value, dict):\n        return value\n    if isinstance(value, str):\n        return _parse_json_text(value)\n    return json.loads(json.dumps(value, default=str))\n\ndef _agent_output_to_text(value):\n    value = _model_dump(value)\n    if isinstance(value, str):\n        return value.strip()\n    return json.dumps(value, ensure_ascii=False)\n\ndef _trace_metadata(metadata: dict | None = None):\n    cleaned = {}\n    for key, value in (metadata or {}).items():\n        if value is None:\n            cleaned[str(key)] = \"\"\n        elif isinstance(value, bool):\n            cleaned[str(key)] = \"true\" if value else \"false\"\n        elif isinstance(value, (dict, list, tuple, set)):\n            cleaned[str(key)] = json.dumps(value, ensure_ascii=False, default=str)\n        else:\n            cleaned[str(key)] = str(value)\n    return cleaned\n\ndef _schemaflow_run_config(workflow_name: str, metadata: dict | None = None):\n    return RunConfig(\n        workflow_name=workflow_name,\n        group_id=SCHEMAFLOW_TRACE_GROUP_ID,\n        trace_include_sensitive_data=TRACE_INCLUDE_SENSITIVE_DATA,\n        trace_metadata=_trace_metadata({\"notebook\": \"schemaflow_cookbook\", **(metadata or {})}),\n    )\n\ndef _runner_run_sync(agent, prompt: str, *, workflow_name: str, metadata: dict | None = None, max_turns: int = 4):\n    kwargs = {\"run_config\": _schemaflow_run_config(workflow_name, metadata), \"max_turns\": max_turns}\n    try:\n        return Runner.run_sync(agent, prompt, **kwargs)\n    except RuntimeError as exc:\n        if \"event loop\" not in str(exc).lower():\n            raise\n        with ThreadPoolExecutor(max_workers=1) as pool:\n            return pool.submit(lambda: Runner.run_sync(agent, prompt, **kwargs)).result()\n\ndef run_schemaflow_json_agent(*, name, instructions, prompt, output_schema, model=MODEL, tools=None, workflow_name=None, metadata=None):\n    agent = Agent(name=name, instructions=instructions, model=model, output_type=output_schema, tools=tools or [])\n    result = _runner_run_sync(agent, prompt, workflow_name=workflow_name or name, metadata={\"agent\": name, **(metadata or {})})\n    return _agent_output_to_json(result.final_output), result\n\ndef run_schemaflow_text_agent(*, name, instructions, prompt, model=MODEL, tools=None, workflow_name=None, metadata=None):\n    agent = Agent(name=name, instructions=instructions, model=model, tools=tools or [])\n    result = _runner_run_sync(agent, prompt, workflow_name=workflow_name or name, metadata={\"agent\": name, **(metadata or {})})\n    return _agent_output_to_text(result.final_output), result\n\ndef _collect_file_search_results(value):\n    results = []\n    seen = set()\n\n    def visit(node):\n        if node is None or isinstance(node, (str, int, float, bool, bytes)):\n            return\n        if isinstance(node, type) or callable(node):\n            return\n        node_id = id(node)\n        if node_id in seen:\n            return\n        seen.add(node_id)\n\n        node = _model_dump(node)\n        if node is None or isinstance(node, (str, int, float, bool, bytes)):\n            return\n        if isinstance(node, type) or callable(node):\n            return\n\n        if isinstance(node, dict):\n            if node.get(\"type\") == \"file_search_call\":\n                for result in node.get(\"results\", []) or []:\n                    result = _model_dump(result)\n                    if isinstance(result, dict):\n                        text = result.get(\"text\") or result.get(\"content\") or \"\"\n                        if isinstance(text, list):\n                            text = \"\\n\".join(str(x) for x in text)\n                        results.append({\"file_id\": result.get(\"file_id\"), \"filename\": result.get(\"filename\") or result.get(\"file_name\") or result.get(\"title\"), \"score\": result.get(\"score\"), \"text_preview\": str(text)[:1200]})\n            for child in node.values():\n                visit(child)\n        elif isinstance(node, (list, tuple, set)):\n            for child in node:\n                visit(child)\n\n    visit(value)\n    return results\n\ndef agent_file_search_results(run_result):\n    return _collect_file_search_results(run_result)\n\ndef trace_function_result(name: str, *, input_obj=None, output_obj=None):\n    with function_span(\n        name,\n        input=json.dumps(input_obj, ensure_ascii=False, default=str) if input_obj is not None else None,\n        output=json.dumps(output_obj, ensure_ascii=False, default=str) if output_obj is not None else None,\n    ):\n        pass\n\ndef pretty(obj):\n    print(json.dumps(obj, indent=2, ensure_ascii=False))\n```\n\n## 2) Input\n\nThis section defines the database change request that SchemaFlow will process. Think of it as the compact ticket, issue, or message a data team might receive before turning the request into implementation details.\n\nThe default request asks the workflow to:\n\n- Add\n`LOYALTY_TIER VARCHAR(20)`\n\nto`ODS.ODS_CUSTOMER_PROFILE`\n\n. - Treat the new column as nullable.\n- Backfill from\n`CORE.DIM_CUSTOMER`\n\n. - Join on\n`CUSTOMER_ID`\n\n. - Filter the source to current records with\n`IS_CURRENT=true`\n\n. - Add a non-unique index on\n`(CUSTOMER_ID, LOYALTY_TIER)`\n\n.\n\nThis input is intentionally compact but rich enough to exercise the full workflow:\n\n- parsing target schema and table\n- extracting column name, type, and nullability\n- recognizing backfill requirements\n- recognizing index requirements\n- generating multi-layer SQL\n- running validation checks for expected table, column, and SQL actions\n\n```\nCHANGE_TEXT = \"\"\"Add LOYALTY_TIER VARCHAR(20) to ODS.ODS_CUSTOMER_PROFILE as nullable.\nBackfill from CORE.DIM_CUSTOMER on CUSTOMER_ID where IS_CURRENT=true.\nAdd a non-unique index on (CUSTOMER_ID, LOYALTY_TIER).\"\"\"\nprint(CHANGE_TEXT)\n```\n\n## 3) Optional PDF RAG Context\n\nSchemaFlow can run with or without retrieval context, so readers can start with the request alone and add reference docs only when the change needs them.\n\nThe sample PDF path in the code cell below points to a file included in the cookbook folder under `data/`\n\n, not to bytes embedded inside the notebook. Leave `PDF_PATH = None`\n\nfor static article previews or generic runs.\n\nWith the default `PDF_PATH = None`\n\n, the notebook uses only the natural-language change request. This is enough to demonstrate the core staged workflow.\n\nSet `PDF_PATH`\n\nto a local PDF when you want the Impact Agent to ground its analysis in reference material, such as:\n\n- interface design documents\n- schema specifications\n- lineage documentation\n- data contracts\n- platform architecture notes\n- downstream dependency documentation\n\nWhen a PDF is configured, this section:\n\n- Validates that the file exists and is a PDF.\n- Creates an OpenAI vector store with a one-day expiration policy.\n- Uploads the PDF to the vector store.\n- Lets OpenAI handle parsing, chunking, embedding, and retrieval.\n- Stores the vector store ID for the Impact Agent.\n- Later summarizes any File Search results returned during impact analysis.\n\nThis keeps the cookbook lightweight because it does not require local embedding models, Chroma, Neo4j, LangGraph, or project-specific Python modules.\n\n``` python\nfrom pathlib import Path\n\n# Optional PDF RAG example.\n# The GitHub repo includes this sample PDF under schemaflow_cookbook/data.\n# When running from a repo checkout in the cookbook folder, uncomment the path below to upload it to File Search.\n# For meaningful retrieval hits, pair it with the LOYALTY_TIER change request used in this notebook.\nPDF_PATH = None\n# PDF_PATH = \"data/sample_customer_loyalty_ifd.pdf\"\nRAG_MAX_RESULTS = 6\nrag_vector_store = None\nrag_vector_store_id = None\nrag_vector_store_file = None\nrag_file_search_results = []\nimpact_response = None\n\ndef create_pdf_vector_store(pdf_path):\n    pdf_path = Path(pdf_path).expanduser().resolve()\n    if not pdf_path.exists():\n        raise FileNotFoundError(f\"PDF not found: {pdf_path}\")\n    if pdf_path.suffix.lower() != \".pdf\":\n        raise ValueError(f\"Expected a PDF file, got: {pdf_path}\")\n    with trace(\"SchemaFlow PDF Vector Store\", group_id=SCHEMAFLOW_TRACE_GROUP_ID, metadata={\"step\": \"pdf_vector_store\", \"pdf_path\": str(pdf_path)}):\n        with custom_span(\"Create vector store\", {\"pdf_path\": str(pdf_path)}):\n            vector_store = client.vector_stores.create(name=f\"schemaflow-cookbook-{datetime.now(timezone.utc).strftime('%Y%m%dT%H%M%SZ')}\", expires_after={\"anchor\": \"last_active_at\", \"days\": 1})\n        with custom_span(\"Upload PDF to vector store\", {\"vector_store_id\": vector_store.id, \"pdf_path\": str(pdf_path)}):\n            with pdf_path.open(\"rb\") as handle:\n                vector_store_file = client.vector_stores.files.upload_and_poll(vector_store_id=vector_store.id, file=handle)\n        trace_function_result(\"PDF vector store ready\", input_obj={\"pdf_path\": str(pdf_path)}, output_obj={\"vector_store_id\": vector_store.id, \"status\": getattr(vector_store_file, \"status\", \"unknown\")})\n        flush_traces()\n    return vector_store, vector_store_file\n\nif PDF_PATH:\n    rag_vector_store, rag_vector_store_file = create_pdf_vector_store(PDF_PATH)\n    rag_vector_store_id = rag_vector_store.id\n    print(\"Created vector store:\", rag_vector_store_id)\n    print(\"Uploaded PDF status:\", getattr(rag_vector_store_file, \"status\", \"unknown\"))\nelse:\n    print(\"No PDF configured. Leave PDF_PATH as None to run without RAG, or set it to a local PDF path.\")\n```\n\n## 4) Stages 1-2 - Parse Change Request + Impact Analysis\n\nThis section runs the first two agent stages back to back. Together, they answer two practical questions: what exactly was requested, and what else could be affected?\n\n### Stage 1: Parse Change Request\n\nThe Parse Agent converts `CHANGE_TEXT`\n\ninto a structured `change_json`\n\nobject.\n\nExpected fields include:\n\n`title`\n\n`domain`\n\n`target_schema`\n\n`target_table`\n\n`operations`\n\n`notes`\n\nThis stage creates the normalized contract that every downstream stage consumes. If the parse step misses the target table, column, data type, nullability, backfill, or index intent, later stages may produce incomplete output. That is why the notebook validates this stage immediately afterward.\n\n### Stage 2: Impact Analysis\n\nThe Impact Agent consumes `change_json`\n\nand produces `impact_json`\n\n.\n\nExpected fields include:\n\n`impacted_objects`\n\n`risks`\n\n`assumptions`\n\nIf `PDF_PATH`\n\nwas configured earlier, the Impact Agent also receives a `FileSearchTool`\n\nconnected to the uploaded PDF vector store. This lets the model search reference documentation before returning impact claims.\n\nThe output is intentionally conservative. When the agent is uncertain, it should call out assumptions and risks instead of inventing undocumented certainty.\n\n### Impact Dashboard Preview\n\nThe impact-analysis stage produces structured `impact_json`\n\nthat can be visualized as a graph of affected objects and relationships.\n\nThe preview below shows the kind of customer loyalty lineage graph built in the optional Neo4j dashboard section later in the notebook. Run that section to generate the local graph UI from the sample knowledge-graph seed and inspect impacted objects interactively.\n\n```\n# =============================================================\n# Stage 1 - Parse Change Request\n# =============================================================\nprint(\"=\" * 60)\nprint(\"Stage 1 - Parse Change Request\")\nprint(\"=\" * 60)\nPARSE_SYSTEM = \"\"\"\nYou are a precise information extraction system for database change requests.\nReturn STRICT JSON only (no prose, no code fences, no comments).\nRequired keys:\n{\n  \"title\": str,\n  \"domain\": str|null,\n  \"target_schema\": str|null,\n  \"target_table\": str|null,\n  \"operations\": [{\"op\": str, \"details\": object}],\n  \"notes\": []\n}\nRules:\n- Use lowercase op names.\n- If schema/table unknown, set null.\n- Keep details explicit and typed where possible.\n\"\"\".strip()\nparse_user = \"Change Request:\\n\\n\" + CHANGE_TEXT\nchange_json, parse_agent_result = run_schemaflow_json_agent(name=\"SchemaFlow Parse Agent\", instructions=PARSE_SYSTEM, prompt=parse_user, output_schema=CHANGE_OUTPUT_SCHEMA, workflow_name=\"SchemaFlow Stage 1 Parse\", metadata={\"stage\": \"parse_change_request\"})\nif isinstance(change_json, dict):\n    change_json.setdefault(\"title\", None)\n    change_json.setdefault(\"domain\", None)\n    change_json.setdefault(\"target_schema\", None)\n    change_json.setdefault(\"target_table\", None)\n    if not isinstance(change_json.get(\"operations\"), list):\n        change_json[\"operations\"] = [change_json.get(\"operations\")] if change_json.get(\"operations\") else []\n    if not isinstance(change_json.get(\"notes\"), list):\n        change_json[\"notes\"] = []\npretty(change_json)\n\n# =============================================================\n# Stage 2 - Impact Analysis\n# =============================================================\nprint(\"\\n\" + \"=\" * 60)\nprint(\"Stage 2 - Impact Analysis\")\nprint(\"=\" * 60)\nIMPACT_SYSTEM = \"\"\"\nYou are a cautious impact analysis assistant.\nInputs:\n- change_json: normalized change request.\n- optional File Search context from an uploaded IFD/reference PDF.\nTask:\nReturn JSON exactly as:\n{\n  \"impacted_objects\": [\n    {\"type\":\"table|column|fk|index|view\",\"name\":str,\"reason\":str,\"source\":\"file_search|ifd|inference\"}\n  ],\n  \"risks\": [str],\n  \"assumptions\": [str]\n}\nRules:\n- Be conservative when uncertain.\n- Call out data quality/backfill risks explicitly.\n- If File Search context is available, use it to ground table, column, and downstream-impact claims.\n\"\"\".strip()\nimpact_user_parts = [\"CHANGE_JSON:\\n\" + json.dumps(change_json, ensure_ascii=False)]\nimpact_tools = []\nif rag_vector_store_id:\n    impact_tools.append(FileSearchTool(vector_store_ids=[rag_vector_store_id], max_num_results=RAG_MAX_RESULTS, include_search_results=True))\n    impact_user_parts.append(\"Use the file_search tool against the uploaded PDF to look for relevant IFD, schema, table, column, lineage, and downstream dependency context before returning JSON.\")\nimpact_json, impact_agent_result = run_schemaflow_json_agent(name=\"SchemaFlow Impact Agent\", instructions=IMPACT_SYSTEM, prompt=\"\\n\\n\".join(impact_user_parts), output_schema=IMPACT_OUTPUT_SCHEMA, tools=impact_tools, workflow_name=\"SchemaFlow Stage 2 Impact Analysis\", metadata={\"stage\": \"impact_analysis\", \"rag_enabled\": bool(rag_vector_store_id)})\nimpact_response = impact_agent_result\ntry:\n    rag_file_search_results = agent_file_search_results(impact_agent_result)\nexcept Exception as exc:\n    rag_file_search_results = []\n    print(f\"File Search result summary skipped: {type(exc).__name__}: {exc}\")\nif rag_vector_store_id:\n    print(\"File Search results returned:\", len(rag_file_search_results))\n    for i, result in enumerate(rag_file_search_results, start=1):\n        print(f\"{i}. {result.get('filename') or result.get('file_id')} score={result.get('score')}\")\nif isinstance(impact_json, dict):\n    impact_json.setdefault(\"impacted_objects\", [])\n    impact_json.setdefault(\"risks\", [])\n    impact_json.setdefault(\"assumptions\", [])\npretty(impact_json)\nflush_traces()\n```\n\n### Stages 1-2 Output Guardrails\n\nThis guardrail cell performs deterministic checks on the Parse and Impact outputs before the workflow continues.\n\nThe checks verify that:\n\n`change_json`\n\ncontains a target schema.`change_json`\n\ncontains a target table.`change_json.operations`\n\nis a non-empty list.`impact_json.impacted_objects`\n\ncontains at least one object.- The impact output references the parsed target table.\n- Each impacted object has basic required fields such as type, name, and reason.\n\nThese checks are deliberately lightweight. They do not prove that the analysis is complete, but they catch obvious failure modes before the Plan Agent or SQL Agent consumes malformed or incomplete state.\n\n```\n# Stages 1-2 Output Guardrails - inspects change_json (Parse) and impact_json (Impact).\nstages_1_2_guardrails = []\nwith trace(\"SchemaFlow Stages 1-2 Guardrails\", group_id=SCHEMAFLOW_TRACE_GROUP_ID, metadata={\"stage\": \"stages_1_2_guardrails\"}):\n    def _check(name, ok, detail=\"\"):\n        ok = bool(ok)\n        stages_1_2_guardrails.append({\"name\": name, \"ok\": ok, \"detail\": detail})\n        with guardrail_span(name, triggered=not ok):\n            trace_function_result(name + \" detail\", output_obj={\"ok\": ok, \"detail\": detail})\n\n    _target_schema = (change_json.get(\"target_schema\") or \"\").strip() if isinstance(change_json, dict) else \"\"\n    _target_table = (change_json.get(\"target_table\") or \"\").strip() if isinstance(change_json, dict) else \"\"\n    _ops = change_json.get(\"operations\") if isinstance(change_json, dict) else None\n    _check(\"parse_output_well_formed\", bool(_target_schema) and bool(_target_table) and isinstance(_ops, list) and len(_ops) > 0, f\"target={_target_schema}.{_target_table}, ops={len(_ops or [])}\")\n    _impacted = impact_json.get(\"impacted_objects\") if isinstance(impact_json, dict) else []\n    _target_fqn = f\"{_target_schema}.{_target_table}\" if (_target_schema and _target_table) else \"\"\n    _target_in_impact = any(isinstance(o, dict) and (o.get(\"name\", \"\").upper() == _target_fqn.upper() or (_target_table and _target_table.upper() in o.get(\"name\", \"\").upper())) for o in (_impacted or []))\n    _check(\"impact_includes_target\", bool(_impacted) and _target_in_impact, f\"{len(_impacted or [])} impacted object(s), target_match={_target_in_impact}\")\n    _malformed = [i for i, o in enumerate(_impacted or []) if not (isinstance(o, dict) and o.get(\"type\") and o.get(\"name\") and o.get(\"reason\"))]\n    _check(\"impacted_objects_well_formed\", not _malformed, \"all populated\" if not _malformed else f\"missing fields at indices {_malformed[:5]}\")\n    stages_1_2_guardrails_passed = all(c[\"ok\"] for c in stages_1_2_guardrails)\n    trace_function_result(\"Stages 1-2 guardrails summary\", output_obj={\"passed\": stages_1_2_guardrails_passed, \"checks\": stages_1_2_guardrails})\n    flush_traces()\nprint(f\"Stages 1-2 Output Guardrails: {'PASS' if stages_1_2_guardrails_passed else 'FAIL'}\")\nfor _c in stages_1_2_guardrails:\n    _flag = \"OK  \" if _c[\"ok\"] else \"FAIL\"\n    print(f\"  [{_flag}] {_c['name']:35s} {_c['detail']}\")\n```\n\n## 5) Stages 3-4 - Execution Plan + SQL Generation\n\nThis section runs the implementation-planning and SQL-generation stages. At this point the workflow shifts from understanding the request to drafting an implementation handoff.\n\n### Stage 3: Execution Plan\n\nThe Plan Agent consumes:\n\n`change_json`\n\n`impact_json`\n\nIt returns `plan_json`\n\nwith four sections:\n\n`plan_steps`\n\n`prechecks`\n\n`postchecks`\n\n`rollback`\n\nThe goal is to make the implementation strategy explicit before generating SQL. This helps separate “what should be done” from “what exact SQL should be drafted.”\n\n### Stage 4: SQL Generation\n\nThe SQL Agent consumes:\n\n`change_json`\n\n`plan_json`\n\nIt returns a single plaintext SQL script. The prompt requires four sections in order:\n\n`-- === LANDING (ODS) ===`\n\n`-- === STAGING (STG) ===`\n\n`-- === CORE (DIM/FACT/VIEW) ===`\n\n`-- === MARTS (SERVING) ===`\n\nThe generated SQL is intended as a reviewable draft. It should be checked by engineers before any production use.\n\n```\n# =============================================================\n# Stage 3 - Execution Plan\n# =============================================================\nprint(\"=\" * 60)\nprint(\"Stage 3 - Execution Plan\")\nprint(\"=\" * 60)\nPLAN_SYSTEM = \"\"\"\nYou are a senior data engineer creating a safe execution plan.\nInputs:\n- change_json\n- impact_json\nReturn JSON:\n{\n  \"plan_steps\": [{\"id\": \"str\", \"description\": \"str\"}],\n  \"prechecks\": [str],\n  \"postchecks\": [str],\n  \"rollback\": [str]\n}\nGuidance:\n- Include practical pre/post checks.\n- Keep steps executable and concise.\n\"\"\".strip()\nplan_user = \"\\n\\n\".join([\"CHANGE_JSON:\\n\" + json.dumps(change_json, ensure_ascii=False), \"IMPACT_JSON:\\n\" + json.dumps(impact_json, ensure_ascii=False)])\nplan_json, plan_agent_result = run_schemaflow_json_agent(name=\"SchemaFlow Plan Agent\", instructions=PLAN_SYSTEM, prompt=plan_user, output_schema=PLAN_OUTPUT_SCHEMA, workflow_name=\"SchemaFlow Stage 3 Execution Plan\", metadata={\"stage\": \"execution_plan\"})\nif isinstance(plan_json, dict):\n    plan_json.setdefault(\"plan_steps\", [])\n    plan_json.setdefault(\"prechecks\", [])\n    plan_json.setdefault(\"postchecks\", [])\n    plan_json.setdefault(\"rollback\", [])\npretty(plan_json)\n\n# =============================================================\n# Stage 4 - SQL Generation\n# =============================================================\nprint(\"\\n\" + \"=\" * 60)\nprint(\"Stage 4 - SQL Generation\")\nprint(\"=\" * 60)\nSQL_SYSTEM = \"\"\"\nYou are a senior data engineer producing SQL for multi-layer data stacks.\nOutput a SINGLE plaintext script with FOUR sections in order:\n1) -- === LANDING (ODS) ===\n2) -- === STAGING (STG) ===\n3) -- === CORE (DIM/FACT/VIEW) ===\n4) -- === MARTS (SERVING) ===\n\nRules:\n- PostgreSQL dialect.\n- Prefer idempotent DDL where possible.\n- Propagate requested changes through downstream layers.\n- Include concise assumptions as comments.\n\"\"\".strip()\nsql_user = \"\\n\\n\".join([\"CHANGE_JSON:\\n\" + json.dumps(change_json, ensure_ascii=False), \"PLAN_JSON:\\n\" + json.dumps(plan_json, ensure_ascii=False)])\nsql_text, sql_agent_result = run_schemaflow_text_agent(name=\"SchemaFlow SQL Agent\", instructions=SQL_SYSTEM, prompt=sql_user, workflow_name=\"SchemaFlow Stage 4 SQL Generation\", metadata={\"stage\": \"sql_generation\"})\nprint(sql_text[:5000])\nflush_traces()\n```\n\n### Stages 3-4 Output Guardrails\n\nThis guardrail cell validates the plan and SQL draft before the notebook moves to the final SQL sanity checks.\n\nThe checks verify that:\n\n- all four plan sections are populated:\n`plan_steps`\n\n`prechecks`\n\n`postchecks`\n\n`rollback`\n\n- the data type requested in\n`CHANGE_TEXT`\n\nappears in the generated SQL - nullable requests do not accidentally create\n`NOT NULL`\n\nconstraints - explicit\n`NOT NULL`\n\nrequests are reflected when present\n\nThese checks complement Stage 5. Stages 3-4 guardrails focus on plan completeness and semantic consistency, while Stage 5 focuses on expected SQL terms and actions.\n\n```\n# Stages 3-4 Output Guardrails - inspects plan_json (Plan) and sql_text (SQL).\nimport re as _re\nstages_3_4_guardrails = []\nwith trace(\"SchemaFlow Stages 3-4 Guardrails\", group_id=SCHEMAFLOW_TRACE_GROUP_ID, metadata={\"stage\": \"stages_3_4_guardrails\"}):\n    def _check(name, ok, detail=\"\"):\n        ok = bool(ok)\n        stages_3_4_guardrails.append({\"name\": name, \"ok\": ok, \"detail\": detail})\n        with guardrail_span(name, triggered=not ok):\n            trace_function_result(name + \" detail\", output_obj={\"ok\": ok, \"detail\": detail})\n\n    _plan = plan_json if isinstance(plan_json, dict) else {}\n    _plan_missing = [k for k in [\"plan_steps\", \"prechecks\", \"postchecks\", \"rollback\"] if not _plan.get(k)]\n    _check(\"plan_sections_populated\", not _plan_missing, \"all four populated\" if not _plan_missing else f\"empty: {_plan_missing}\")\n    _dtype_match = _re.search(r\"\\b(?:add\\s+\\w+\\s+|column\\s+\\w+\\s+)((?:VAR)?CHAR\\s*\\([^)]*\\)|TEXT|INTEGER|INT|BIGINT|BOOLEAN|DATE|TIMESTAMP|NUMERIC\\s*\\([^)]*\\)|DECIMAL\\s*\\([^)]*\\)|FLOAT|DOUBLE)\", CHANGE_TEXT, flags=_re.IGNORECASE)\n    if _dtype_match:\n        _dtype = \" \".join(_dtype_match.group(1).upper().split())\n        _check(\"data_type_propagated_to_sql\", _dtype.lower() in sql_text.lower(), f\"expected '{_dtype}' in SQL\")\n    else:\n        _check(\"data_type_propagated_to_sql\", True, \"no data type referenced in CHANGE_TEXT (skipped)\")\n    _change_lower = CHANGE_TEXT.lower()\n    _sql_lower = sql_text.lower()\n    _expected_cols = []\n    for _op in (change_json.get(\"operations\") if isinstance(change_json, dict) else []) or []:\n        _details = _op.get(\"details\") if isinstance(_op, dict) else None\n        if isinstance(_details, dict):\n            for _key in (\"column\", \"column_name\", \"name\"):\n                _val = _details.get(_key)\n                if isinstance(_val, str) and _val.strip():\n                    _expected_cols.append(_val.strip().lower())\n    if \"not null\" in _change_lower:\n        _check(\"nullability_matches_request\", \"not null\" in _sql_lower, \"request: NOT NULL\")\n    elif \"nullable\" in _change_lower:\n        _ddl_lines = []\n        for line in sql_text.split(\"\\n\"):\n            _line = line.strip().lower()\n            if not any(c in _line for c in _expected_cols):\n                continue\n            if \"add column\" in _line or any(_line.startswith(c + \" \") or _line.startswith(c + \"\\t\") for c in _expected_cols):\n                _ddl_lines.append(line.strip())\n        _bad_lines = [line for line in _ddl_lines if \"not null\" in line.lower()]\n        _check(\"nullability_matches_request\", not _bad_lines, \"no NOT NULL on nullable column DDL\" if not _bad_lines else f\"NOT NULL conflict in {len(_bad_lines)} DDL line(s)\")\n    else:\n        _check(\"nullability_matches_request\", True, \"no explicit nullability requested (skipped)\")\n    stages_3_4_guardrails_passed = all(c[\"ok\"] for c in stages_3_4_guardrails)\n    trace_function_result(\"Stages 3-4 guardrails summary\", output_obj={\"passed\": stages_3_4_guardrails_passed, \"checks\": stages_3_4_guardrails})\n    flush_traces()\nprint(f\"Stages 3-4 Output Guardrails: {'PASS' if stages_3_4_guardrails_passed else 'FAIL'}\")\nfor _c in stages_3_4_guardrails:\n    _flag = \"OK  \" if _c[\"ok\"] else \"FAIL\"\n    print(f\"  [{_flag}] {_c['name']:35s} {_c['detail']}\")\n```\n\n## 6) Stage 5 - Lightweight SQL Sanity Checks\n\nThis section runs deterministic checks against the generated SQL for the current notebook run.\n\nThis is not a full SQL parser and it does not execute the SQL. Instead, it checks for obvious mismatches between the original request, parsed change object, and generated script. These checks are intentionally small and explainable, so a reader can see exactly what passed or failed before the result is saved or evaluated.\n\nThe checks look for:\n\n- empty SQL output\n- missing target table\n- missing expected columns\n- required SQL keywords inferred from the request:\n`ALTER TABLE`\n\n`UPDATE`\n\nwhen the request implies backfill or source-based population`CREATE INDEX`\n\nwhen the request mentions an index\n\nThe output is stored in `validation`\n\n, which becomes part of the final bundle and is also used by the Promptfoo full-flow assertion.\n\n```\nwith trace(\"SchemaFlow Stage 5 SQL Sanity Checks\", group_id=SCHEMAFLOW_TRACE_GROUP_ID, metadata={\"stage\": \"sql_sanity_checks\"}):\n    issues = []\n    sql_lower = sql_text.lower()\n    change_lower = CHANGE_TEXT.lower()\n    if not sql_text.strip():\n        issues.append(\"SQL output is empty\")\n    expected_schema = (change_json.get(\"target_schema\") or \"\").strip()\n    expected_table = (change_json.get(\"target_table\") or \"\").strip()\n    if expected_table and expected_table.lower() not in sql_lower:\n        issues.append(f\"Expected target table missing from SQL: {expected_table}\")\n    expected_columns = []\n    for operation in change_json.get(\"operations\", []):\n        details = operation.get(\"details\") if isinstance(operation, dict) else None\n        if not isinstance(details, dict):\n            continue\n        for key in [\"column\", \"column_name\", \"name\"]:\n            value = details.get(key)\n            if isinstance(value, str) and value.strip():\n                expected_columns.append(value.strip())\n    for column in dict.fromkeys(expected_columns):\n        if column.lower() not in sql_lower:\n            issues.append(f\"Expected column missing from SQL: {column}\")\n    required_keywords = [\"ALTER TABLE\"]\n    if any(term in change_lower for term in [\"backfill\", \"update\", \"source it from\"]):\n        required_keywords.append(\"UPDATE\")\n    if \"index\" in change_lower:\n        required_keywords.append(\"CREATE INDEX\")\n    for keyword in dict.fromkeys(required_keywords):\n        if keyword.lower() not in sql_lower:\n            issues.append(f\"Expected keyword missing: {keyword}\")\n    validation = {\"valid\": len(issues) == 0, \"issues\": issues, \"checks\": {\"expected_schema\": expected_schema or None, \"expected_table\": expected_table or None, \"expected_columns\": list(dict.fromkeys(expected_columns)), \"required_keywords\": list(dict.fromkeys(required_keywords))}}\n    with guardrail_span(\"stage5_sql_sanity\", triggered=not validation[\"valid\"]):\n        trace_function_result(\"Stage 5 SQL sanity result\", output_obj=validation)\n    flush_traces()\npretty(validation)\n```\n\n## 7) Final Bundle\n\nThis section assembles the main SchemaFlow output object.\n\nThe final `bundle`\n\ncontains:\n\n`summary`\n\n`rag`\n\n`change_json`\n\n`impact_json`\n\n`plan`\n\n`sql`\n\n`validation`\n\nThis object is the reviewable handoff artifact for the notebook run. It collects the model-generated outputs, deterministic validation results, and optional retrieval metadata in one place, so a reviewer does not have to reconstruct the flow from separate cells.\n\nThe printed summary gives a compact view of the most important run-level information:\n\n- parsed title\n- parsed target\n- number of RAG hits\n- number of plan steps\n- validation status\n- validation issues\n\n```\nwith trace(\"SchemaFlow Final Bundle\", group_id=SCHEMAFLOW_TRACE_GROUP_ID, metadata={\"stage\": \"final_bundle\"}):\n    bundle = {\n        \"summary\": {\"matched_tables\": [], \"impact_risks\": impact_json.get(\"risks\", []), \"rag_hits\": len(rag_file_search_results)},\n        \"rag\": {\"vector_store_id\": rag_vector_store_id, \"file_search_results\": rag_file_search_results},\n        \"change_json\": change_json,\n        \"impact_json\": impact_json,\n        \"plan\": plan_json,\n        \"sql\": sql_text,\n        \"validation\": validation,\n    }\n    trace_function_result(\"Final bundle assembled\", output_obj=bundle)\n    flush_traces()\npretty({\"title\": change_json.get(\"title\"), \"target\": \".\".join([x for x in [change_json.get(\"target_schema\"), change_json.get(\"target_table\")] if x]), \"rag_hits\": len(rag_file_search_results), \"plan_steps\": len(plan_json.get(\"plan_steps\", [])), \"valid\": validation.get(\"valid\"), \"issues\": validation.get(\"issues\", [])})\n```\n\n## 8) Save Artifact\n\nThis section writes the final `bundle`\n\nto disk as JSON.\n\nArtifacts are saved under:\n\n```\nartifacts/notebook_runs/\n```\n\nEach run receives a timestamped filename, which makes it easy to compare outputs across different prompts, models, inputs, or retrieval documents.\n\nThe saved artifact is useful for:\n\n- code review\n- audit trails\n- debugging\n- regression comparison\n- eval fixture creation\n- downstream automation\n\n``` python\nfrom pathlib import Path\nwith trace(\"SchemaFlow Save Artifact\", group_id=SCHEMAFLOW_TRACE_GROUP_ID, metadata={\"stage\": \"save_artifact\"}):\n    out_dir = Path(\"artifacts/notebook_runs\")\n    out_dir.mkdir(parents=True, exist_ok=True)\n    ts = datetime.now(timezone.utc).strftime(\"%Y%m%dT%H%M%SZ\")\n    out_path = out_dir / f\"schemaflow_cookbook_run_{ts}.json\"\n    out_path.write_text(json.dumps(bundle, indent=2, ensure_ascii=False), encoding=\"utf-8\")\n    trace_function_result(\"Notebook artifact saved\", input_obj={\"bundle_keys\": sorted(bundle.keys())}, output_obj={\"path\": str(out_path.resolve()), \"bytes\": out_path.stat().st_size})\n    flush_traces()\nprint(\"Saved artifact:\", out_path.resolve())\n```\n\n### Post-Artifact Generation Sanity Check\n\nThis cell verifies that the saved artifact is usable.\n\nIt checks that:\n\n- the artifact file exists\n- the artifact file is non-empty\n- the file can be loaded with\n`json.loads`\n\n- the top-level keys on disk match the in-memory\n`bundle`\n\nThis catches file-write issues immediately instead of letting a later review, eval, or automation step consume a missing or malformed artifact.\n\n```\n# Post-Artifact Generation Sanity Check - re-reads the file Save Artifact wrote.\npost_artifact_checks = []\nwith trace(\"SchemaFlow Post-Artifact Guardrails\", group_id=SCHEMAFLOW_TRACE_GROUP_ID, metadata={\"stage\": \"post_artifact_guardrails\"}):\n    def _check(name, ok, detail=\"\"):\n        ok = bool(ok)\n        post_artifact_checks.append({\"name\": name, \"ok\": ok, \"detail\": detail})\n        with guardrail_span(name, triggered=not ok):\n            trace_function_result(name + \" detail\", output_obj={\"ok\": ok, \"detail\": detail})\n    _size = out_path.stat().st_size if out_path.exists() else 0\n    _check(\"artifact_file_persisted\", out_path.exists() and _size > 0, f\"{_size} bytes\")\n    if out_path.exists():\n        try:\n            _roundtrip = json.loads(out_path.read_text(encoding=\"utf-8\"))\n            _check(\"artifact_roundtrip_keys_match\", set(_roundtrip.keys()) == set(bundle.keys()), f\"disk_keys={sorted(_roundtrip.keys())}\")\n        except Exception as _exc:\n            _check(\"artifact_roundtrip_keys_match\", False, str(_exc))\n    else:\n        _check(\"artifact_roundtrip_keys_match\", False, \"saved file missing\")\n    post_artifact_sanity_passed = all(c[\"ok\"] for c in post_artifact_checks)\n    trace_function_result(\"Post-artifact guardrails summary\", output_obj={\"passed\": post_artifact_sanity_passed, \"checks\": post_artifact_checks})\n    flush_traces()\nprint(f\"Post-Artifact Sanity Check: {'PASS' if post_artifact_sanity_passed else 'FAIL'}\")\nfor _c in post_artifact_checks:\n    _flag = \"OK  \" if _c[\"ok\"] else \"FAIL\"\n    print(f\"  [{_flag}] {_c['name']:30s} {_c['detail']}\")\n```\n\n## 9) Optional Cleanup\n\nThis section handles cleanup for the optional PDF vector store.\n\nBy default, `DELETE_VECTOR_STORE_AFTER_RUN = False`\n\n.\n\nThat default is safe for interactive notebook usage because the vector store is created with a one-day expiration policy. Keeping it temporarily can be useful if you want to inspect traces, rerun downstream stages, or debug File Search behavior.\n\nSet `DELETE_VECTOR_STORE_AFTER_RUN = True`\n\nbefore running this cell if you want to delete the vector store immediately after the notebook run.\n\nIf no PDF was configured, this cell simply reports that no vector store was created.\n\n```\nDELETE_VECTOR_STORE_AFTER_RUN = False\nwith trace(\"SchemaFlow Optional Cleanup\", group_id=SCHEMAFLOW_TRACE_GROUP_ID, metadata=_trace_metadata({\"stage\": \"optional_cleanup\", \"delete_vector_store_after_run\": DELETE_VECTOR_STORE_AFTER_RUN})):\n    if rag_vector_store_id and DELETE_VECTOR_STORE_AFTER_RUN:\n        with custom_span(\"Delete vector store\", {\"vector_store_id\": rag_vector_store_id}):\n            client.vector_stores.delete(vector_store_id=rag_vector_store_id)\n        print(\"Deleted vector store:\", rag_vector_store_id)\n    elif rag_vector_store_id:\n        trace_function_result(\"Vector store retained\", output_obj={\"vector_store_id\": rag_vector_store_id, \"expiration\": \"1 day\"})\n        print(\"Vector store retained with one-day expiration:\", rag_vector_store_id)\n    else:\n        trace_function_result(\"No vector store cleanup\", output_obj={\"created\": False})\n        print(\"No vector store was created.\")\n    flush_traces()\n```\n\n### Pre-Promptfoo Checks / Guardrails\n\nThis cell is the readiness gate before running Promptfoo.\n\nPromptfoo runs the workflow in a separate process, so it is important to confirm that the notebook state is complete and internally consistent before generating eval files.\n\nThe preflight checks verify that:\n\n`bundle`\n\nexists in the notebook kernel.`bundle`\n\nreflects the current`change_json`\n\nand`plan_json`\n\n.- Stage 5 validation passed.\n- Stages 1-2 guardrails passed.\n- Stages 3-4 guardrails passed.\n- The saved artifact sanity check passed.\n`CHANGE_TEXT`\n\nis consistent with the parsed bundle target.`OPENAI_API_KEY`\n\nis present.- The installed Agents SDK version meets the minimum requirement.\n\nIf this section reports failures, rerun or fix the earlier notebook sections before running Promptfoo.\n\n``` python\n# Pre-Promptfoo Checks / Guardrails - deterministic, no LLM calls.\nimport os\nimport re as _re\npre_promptfoo_checks = []\nwith trace(\"SchemaFlow Pre-Promptfoo Guardrails\", group_id=SCHEMAFLOW_TRACE_GROUP_ID, metadata={\"stage\": \"pre_promptfoo_guardrails\"}):\n    def _check(name, ok, detail=\"\"):\n        ok = bool(ok)\n        pre_promptfoo_checks.append({\"name\": name, \"ok\": ok, \"detail\": detail})\n        with guardrail_span(name, triggered=not ok):\n            trace_function_result(name + \" detail\", output_obj={\"ok\": ok, \"detail\": detail})\n    _bundle = globals().get(\"bundle\")\n    _check(\"bundle_in_scope\", isinstance(_bundle, dict) and \"validation\" in _bundle, f\"keys={sorted(_bundle.keys()) if isinstance(_bundle, dict) else 'n/a'}\")\n    _check(\"bundle_in_sync_with_kernel\", bundle.get(\"change_json\") == change_json and bundle.get(\"plan\") == plan_json, \"bundle reflects current change_json + plan_json\")\n    _check(\"stage5_validation_passed\", bool(validation.get(\"valid\")), f\"{len(validation.get('issues', []))} issue(s) recorded by Stage 5\")\n    _check(\"stages_1_2_guardrails_passed\", bool(globals().get(\"stages_1_2_guardrails_passed\", False)), \"consumed from Stages 1-2 Output Guardrails cell\")\n    _check(\"stages_3_4_guardrails_passed\", bool(globals().get(\"stages_3_4_guardrails_passed\", False)), \"consumed from Stages 3-4 Output Guardrails cell\")\n    _check(\"post_artifact_sanity_passed\", bool(globals().get(\"post_artifact_sanity_passed\", False)), \"consumed from Post-Artifact Sanity Check cell\")\n    _target_match = _re.search(r\"\\b(?:to|from|in|on)\\s+([A-Za-z_][\\w$]*)\\.([A-Za-z_][\\w$]*)\", CHANGE_TEXT, flags=_re.IGNORECASE)\n    if _target_match:\n        _live_target = _target_match.group(2).upper()\n        _bundle_target = (bundle.get(\"change_json\", {}).get(\"target_table\") or \"\").upper()\n        _check(\"change_text_consistent_with_bundle\", _live_target == _bundle_target, f\"live='{_live_target}', bundle='{_bundle_target}'\")\n    else:\n        _check(\"change_text_consistent_with_bundle\", True, \"no extractable target in CHANGE_TEXT (skipped)\")\n    _check(\"openai_api_key_set_in_env\", bool(os.getenv(\"OPENAI_API_KEY\")), \"present\" if os.getenv(\"OPENAI_API_KEY\") else \"missing\")\n    _check(\"agents_sdk_min_version\", _version_tuple(AGENTS_SDK_VERSION) >= _version_tuple(MIN_AGENTS_SDK_VERSION), f\"found={AGENTS_SDK_VERSION}, required>={MIN_AGENTS_SDK_VERSION}\")\n    pre_promptfoo_passed = all(c[\"ok\"] for c in pre_promptfoo_checks)\n    trace_function_result(\"Pre-Promptfoo readiness summary\", output_obj={\"passed\": pre_promptfoo_passed, \"checks\": pre_promptfoo_checks})\n    flush_traces()\nprint(\"=\" * 60)\nprint(f\"Pre-Promptfoo Readiness: {'PASS' if pre_promptfoo_passed else 'FAIL'}\")\nprint(\"=\" * 60)\nfor _c in pre_promptfoo_checks:\n    _flag = \"OK  \" if _c[\"ok\"] else \"FAIL\"\n    print(f\"  [{_flag}] {_c['name']:35s} {_c['detail']}\")\nif not pre_promptfoo_passed:\n    print()\n    print(\"One or more readiness checks failed. Promptfoo will likely fail or eval stale state.\")\n    print(\"Investigate the failed checks above before running Section 10.\")\n```\n\n## 10) Evaluate the Flow with Promptfoo\n\nPromptfoo is now part of OpenAI. This section uses Promptfoo’s Jupyter/Colab pattern to run evals from notebook cells while keeping the SchemaFlow logic readable in Python. Promptfoo itself runs via Node.js, and the evaluated flow is provided through Promptfoo’s Python `file://`\n\nprovider and Python assertion integrations.\n\nThis optional section turns the notebook workflow into a repeatable eval.\n\nThe core notebook run validates one live example. Promptfoo adds a reusable eval harness that can run parse-only and full-flow checks using generated provider and assertion files, which is useful when you want to keep the same workflow stable as prompts, models, or inputs change.\n\nBecause Promptfoo launches a separate Python process, it cannot directly access variables that only exist inside the active notebook kernel. To solve that, the next cells publish runtime files from the current notebook state:\n\n- a reusable\n`schemaflow_cookbook_core.py`\n\nmodule - a Python Promptfoo provider\n- a Python Promptfoo assertion file\n- generated eval cases\n- a generated Promptfoo config\n\nThis section includes three validation layers:\n\n-\n**Input preflight**- deterministic checks before writing the config\n- no model calls\n\n-\n**Parse-only eval**- checks Stage 1 behavior\n- verifies target, operation presence, expected added column, and expected data type\n\n-\n**Full-flow eval**- checks downstream impact, SQL terms, and validation status\n\nEval results are printed in the notebook and exported as timestamped JSON and HTML files under:\n\n```\nartifacts/promptfoo/results/\n```\n\nThe latest successful run also refreshes:\n\n```\nschemaflow_cookbook_eval_latest.json\nschemaflow_cookbook_eval_latest.html\n```\n\nRuntime note: the core SchemaFlow cells require Python and an OpenAI API key. The Promptfoo cells additionally require Node.js and npm in the same executable notebook runtime.\n\nAfter the eval runs, Promptfoo provides a compact view of the current change request, expected fields, parse-only check, and full-flow check.\n\nUse this view to answer questions such as:\n\n- Did the Parse Agent extract the expected target table?\n- Did it detect the expected added column?\n- Did it preserve the requested data type?\n- Did the full flow produce impact risks?\n- Did the SQL include required terms?\n- Did deterministic validation pass?\n\n### Promptfoo Runtime Directory Setup\n\nThis cell creates notebook-local directories for Promptfoo config, logs, cache, npm cache, and results.\n\nKeeping these directories under `artifacts/promptfoo/`\n\nmakes the eval runtime portable and avoids relying on global Promptfoo state under the user’s home directory.\n\nThe cell also exports environment variables so the generated provider, assertion, and Promptfoo command all use the same trace group and local runtime paths.\n\n``` python\nfrom pathlib import Path\nimport os\nPROMPTFOO_DIR = Path(\"artifacts/promptfoo\")\nPROMPTFOO_DIR.mkdir(parents=True, exist_ok=True)\nPROMPTFOO_CONFIG_DIR = PROMPTFOO_DIR / \".promptfoo\"\nPROMPTFOO_LOG_DIR = PROMPTFOO_CONFIG_DIR / \"logs\"\nPROMPTFOO_CACHE_DIR = PROMPTFOO_CONFIG_DIR / \"cache\"\nPROMPTFOO_RESULTS_DIR = PROMPTFOO_DIR / \"results\"\nNPM_CACHE_DIR = PROMPTFOO_DIR / \".npm-cache\"\nfor path in (PROMPTFOO_CONFIG_DIR, PROMPTFOO_LOG_DIR, PROMPTFOO_CACHE_DIR, PROMPTFOO_RESULTS_DIR, NPM_CACHE_DIR):\n    path.mkdir(parents=True, exist_ok=True)\nos.environ[\"PROMPTFOO_CONFIG_DIR\"] = str(PROMPTFOO_CONFIG_DIR.resolve())\nos.environ[\"PROMPTFOO_LOG_DIR\"] = str(PROMPTFOO_LOG_DIR.resolve())\nos.environ[\"PROMPTFOO_CACHE_PATH\"] = str(PROMPTFOO_CACHE_DIR.resolve())\nos.environ[\"npm_config_cache\"] = str(NPM_CACHE_DIR.resolve())\nos.environ[\"npm_config_update_notifier\"] = \"false\"\nos.environ[\"npm_config_loglevel\"] = \"error\"\nos.environ[\"SCHEMAFLOW_TRACE_GROUP_ID\"] = SCHEMAFLOW_TRACE_GROUP_ID\nos.environ[\"OPENAI_AGENTS_TRACE_INCLUDE_SENSITIVE_DATA\"] = os.getenv(\"OPENAI_AGENTS_TRACE_INCLUDE_SENSITIVE_DATA\", \"false\")\nprint(\"Promptfoo runtime dir:\", PROMPTFOO_DIR.resolve())\nprint(\"Promptfoo config dir:\", PROMPTFOO_CONFIG_DIR.resolve())\nprint(\"Promptfoo results dir:\", PROMPTFOO_RESULTS_DIR.resolve())\nprint(\"Notebook-local npm cache:\", NPM_CACHE_DIR.resolve())\nprint(\"Promptfoo trace group:\", SCHEMAFLOW_TRACE_GROUP_ID)\n```\n\n### Node.js and npm Runtime Check\n\nPromptfoo runs through Node.js, even though the SchemaFlow provider and assertion logic are written in Python.\n\nThis cell verifies that the notebook runtime has a supported `node`\n\nand `npm`\n\navailable.\n\nThe check is intentionally explicit. The notebook does not silently install or upgrade Node because that depends on the execution environment.\n\nFor local macOS notebooks, the cell prefers a supported `nvm`\n\nNode runtime before common Homebrew paths. This helps ensure that the notebook and terminal use the same Node ABI and avoids stale native dependencies.\n\nIf this check fails, fix the runtime first and then rerun the Promptfoo section.\n\n``` python\nimport os\nimport re\nimport shutil\nimport subprocess\nfrom pathlib import Path\n\nREQUIRED_NODE = \"^20.20.0 or >=22.22.0\"\nCOMMON_NODE_DIRS = [\"/opt/homebrew/bin\", \"/usr/local/bin\"]\n\ndef _nvm_node_dirs():\n    root = Path.home() / \".nvm\" / \"versions\" / \"node\"\n    if not root.exists():\n        return []\n    candidates = []\n    for node_bin in root.glob(\"*/bin/node\"):\n        version = _node_version(str(node_bin))\n        if version and _node_is_supported(version[1]):\n            candidates.append((version[1], str(node_bin.parent)))\n    return [path for _, path in sorted(candidates, reverse=True)]\n\ndef _node_version(node_cmd=\"node\"):\n    try:\n        raw = subprocess.check_output([node_cmd, \"--version\"], text=True).strip()\n    except (OSError, subprocess.CalledProcessError):\n        return None\n    match = re.match(r\"v?(\\d+)\\.(\\d+)\\.(\\d+)\", raw)\n    if not match:\n        return None\n    return raw, tuple(int(part) for part in match.groups())\n\ndef _node_is_supported(version_tuple):\n    major, minor, patch = version_tuple\n    return (major == 20 and minor >= 20) or (major >= 22)\n\ndef _prepend_path(path_dir):\n    parts = os.environ.get(\"PATH\", \"\").split(os.pathsep)\n    parts = [p for p in parts if p and p != path_dir]\n    os.environ[\"PATH\"] = path_dir + os.pathsep + os.pathsep.join(parts)\n\ndef ensure_promptfoo_node_runtime():\n    node_path = shutil.which(\"node\")\n    npm_path = shutil.which(\"npm\")\n    current = _node_version(\"node\") if node_path else None\n\n    if current and npm_path and _node_is_supported(current[1]):\n        print(f\"Node OK: {node_path} ({current[0]})\")\n        print(f\"npm: {npm_path}\")\n        return\n\n    for candidate_dir in [*_nvm_node_dirs(), *COMMON_NODE_DIRS]:\n        candidate_node = Path(candidate_dir) / \"node\"\n        candidate_npm = Path(candidate_dir) / \"npm\"\n        if not candidate_node.exists() or not candidate_npm.exists():\n            continue\n        candidate = _node_version(str(candidate_node))\n        if candidate and _node_is_supported(candidate[1]):\n            _prepend_path(candidate_dir)\n            print(f\"Switched notebook PATH to supported Node: {candidate_node} ({candidate[0]})\")\n            print(f\"npm: {candidate_npm}\")\n            return\n\n    detected = current[0] if current else \"not found\"\n    raise RuntimeError(\n        \"Promptfoo requires Node.js \" + REQUIRED_NODE + \".\\n\"\n        f\"Detected Node: {detected}.\\n\\n\"\n        \"Use an executable runtime with supported Node/npm before continuing.\\n\"\n        \"Examples:\\n\"\n        \"- Google Colab or Codespaces: run the notebook in that runtime and rerun this cell.\\n\"\n        \"- macOS nvm: `nvm install 22 && nvm use 22`, then start Jupyter from that terminal.\\n\"\n        \"- macOS Homebrew: `brew install node`, then start Jupyter from a terminal where the intended Node is first on PATH.\\n\"\n        \"- nvm: `nvm install 22 && nvm use 22`, then start Jupyter from that same shell.\\n\\n\"\n        \"Static notebook preview in a browser cannot run Promptfoo evals.\"\n    )\n\nensure_promptfoo_node_runtime()\n```\n\n### Publish SchemaFlow Core Runtime\n\nPromptfoo runs the evaluated flow in a separate Python process. This cell writes a reusable Python module named:\n\n```\nartifacts/promptfoo/schemaflow_cookbook_core.py\n```\n\nThe generated module contains the same core SchemaFlow logic used by the notebook:\n\n- Pydantic models\n- Agents SDK setup\n- output normalization helpers\n- Parse Agent execution\n- Impact Agent execution\n- optional PDF vector store creation\n- Plan Agent execution\n- SQL Agent execution\n- SQL validation\n- parse-only eval entrypoint\n- full-flow eval entrypoint\n\nThe prompt strings are injected from the current notebook variables. That means if you edit the Parse, Impact, Plan, or SQL prompts above and rerun this cell, the Promptfoo runtime receives the updated prompts.\n\n``` python\nfrom pathlib import Path\n\nCORE_MODULE_TEMPLATE = r'''\nimport json\nimport os\nimport re\nfrom concurrent.futures import ThreadPoolExecutor\nfrom datetime import datetime, timezone\nfrom importlib.metadata import PackageNotFoundError, version\nfrom pathlib import Path\n\nfrom openai import OpenAI\nfrom pydantic import BaseModel, ConfigDict, Field\nfrom agents import Agent, AgentOutputSchema, FileSearchTool, Runner, RunConfig, custom_span, flush_traces, function_span, guardrail_span, trace\n\nMODEL = os.getenv(\"OPENAI_MODEL\", __MODEL_DEFAULT__)\nPARSE_SYSTEM = __PARSE_SYSTEM__\nIMPACT_SYSTEM = __IMPACT_SYSTEM__\nPLAN_SYSTEM = __PLAN_SYSTEM__\nSQL_SYSTEM = __SQL_SYSTEM__\nMIN_AGENTS_SDK_VERSION = \"0.17.0\"\nTRACE_INCLUDE_SENSITIVE_DATA = os.getenv(\"OPENAI_AGENTS_TRACE_INCLUDE_SENSITIVE_DATA\", \"false\").lower() in {\"1\", \"true\", \"yes\", \"on\"}\nSCHEMAFLOW_TRACE_GROUP_ID = os.getenv(\"SCHEMAFLOW_TRACE_GROUP_ID\", \"schemaflow-cookbook-promptfoo\")\n\ndef _version_tuple(value):\n    match = re.match(r\"^(\\d+)\\.(\\d+)\\.(\\d+)\", str(value or \"\"))\n    return tuple(int(part) for part in match.groups()) if match else (0, 0, 0)\n\ntry:\n    AGENTS_SDK_VERSION = version(\"openai-agents\")\nexcept PackageNotFoundError as exc:\n    raise RuntimeError('Install the OpenAI Agents SDK: pip install -U \"openai-agents>=0.17.0\"') from exc\nif _version_tuple(AGENTS_SDK_VERSION) < _version_tuple(MIN_AGENTS_SDK_VERSION):\n    raise RuntimeError(f\"OpenAI Agents SDK {MIN_AGENTS_SDK_VERSION}+ is required; found {AGENTS_SDK_VERSION}.\")\n\nclass SchemaFlowBaseModel(BaseModel):\n    model_config = ConfigDict(extra=\"allow\")\n\nclass OperationModel(SchemaFlowBaseModel):\n    op: str\n    details: dict = Field(default_factory=dict)\n\nclass ChangeRequestModel(SchemaFlowBaseModel):\n    title: str | None = None\n    domain: str | None = None\n    target_schema: str | None = None\n    target_table: str | None = None\n    operations: list[OperationModel] = Field(default_factory=list)\n    notes: list = Field(default_factory=list)\n\nclass ImpactObjectModel(SchemaFlowBaseModel):\n    type: str\n    name: str\n    reason: str\n    source: str\n\nclass ImpactModel(SchemaFlowBaseModel):\n    impacted_objects: list[ImpactObjectModel] = Field(default_factory=list)\n    risks: list[str] = Field(default_factory=list)\n    assumptions: list[str] = Field(default_factory=list)\n\nclass PlanStepModel(SchemaFlowBaseModel):\n    id: str\n    description: str\n\nclass PlanModel(SchemaFlowBaseModel):\n    plan_steps: list[PlanStepModel] = Field(default_factory=list)\n    prechecks: list[str] = Field(default_factory=list)\n    postchecks: list[str] = Field(default_factory=list)\n    rollback: list[str] = Field(default_factory=list)\n\nCHANGE_OUTPUT_SCHEMA = AgentOutputSchema(ChangeRequestModel, strict_json_schema=False)\nIMPACT_OUTPUT_SCHEMA = AgentOutputSchema(ImpactModel, strict_json_schema=False)\nPLAN_OUTPUT_SCHEMA = AgentOutputSchema(PlanModel, strict_json_schema=False)\n\ndef _clean_openai_api_key(value):\n    key = (value or \"\").strip()\n    if not key:\n        raise RuntimeError(\"OPENAI_API_KEY is required for SchemaFlow evals\")\n    return key\n\ndef _ensure_openai_api_key(api_key=None):\n    if api_key is not None:\n        os.environ[\"OPENAI_API_KEY\"] = _clean_openai_api_key(api_key)\n    else:\n        os.environ[\"OPENAI_API_KEY\"] = _clean_openai_api_key(os.getenv(\"OPENAI_API_KEY\"))\n    org_id = os.getenv(\"OPENAI_ORG_ID\", \"\").strip()\n    if org_id:\n        os.environ[\"OPENAI_ORG_ID\"] = org_id\n\ndef _get_client(api_key=None):\n    _ensure_openai_api_key(api_key)\n    return OpenAI(api_key=os.environ[\"OPENAI_API_KEY\"])\n\ndef _parse_json_text(text):\n    text = (text or \"{}\").strip()\n    if text.startswith(\"```\"):\n        text = re.sub(r\"^```(?:json)?\\s*\", \"\", text)\n        text = re.sub(r\"\\s*```$\", \"\", text).strip()\n    try:\n        return json.loads(text)\n    except json.JSONDecodeError:\n        match = re.search(r\"\\{.*\\}\", text, flags=re.DOTALL)\n        if not match:\n            raise\n        return json.loads(match.group(0))\n\ndef _model_dump(value):\n    if value is None or isinstance(value, (str, int, float, bool, bytes)):\n        return value\n    if isinstance(value, type):\n        return value\n    if hasattr(value, \"model_dump\"):\n        try:\n            return value.model_dump()\n        except TypeError:\n            pass\n    if hasattr(value, \"to_dict\"):\n        try:\n            return value.to_dict()\n        except TypeError:\n            pass\n    if hasattr(value, \"__dict__\"):\n        try:\n            return {k: v for k, v in vars(value).items() if not k.startswith(\"_\")}\n        except TypeError:\n            pass\n    return value\n\ndef _agent_output_to_json(value):\n    value = _model_dump(value)\n    if isinstance(value, dict):\n        return value\n    if isinstance(value, str):\n        return _parse_json_text(value)\n    return json.loads(json.dumps(value, default=str))\n\ndef _agent_output_to_text(value):\n    value = _model_dump(value)\n    if isinstance(value, str):\n        return value.strip()\n    return json.dumps(value, ensure_ascii=False)\n\ndef _trace_metadata(metadata=None):\n    cleaned = {}\n    for key, value in (metadata or {}).items():\n        if value is None:\n            cleaned[str(key)] = \"\"\n        elif isinstance(value, bool):\n            cleaned[str(key)] = \"true\" if value else \"false\"\n        elif isinstance(value, (dict, list, tuple, set)):\n            cleaned[str(key)] = json.dumps(value, ensure_ascii=False, default=str)\n        else:\n            cleaned[str(key)] = str(value)\n    return cleaned\n\ndef _schemaflow_run_config(workflow_name, metadata=None):\n    return RunConfig(\n        workflow_name=workflow_name,\n        group_id=SCHEMAFLOW_TRACE_GROUP_ID,\n        trace_include_sensitive_data=TRACE_INCLUDE_SENSITIVE_DATA,\n        trace_metadata=_trace_metadata({\"runtime\": \"promptfoo\", **(metadata or {})}),\n    )\n\ndef _runner_run_sync(agent, prompt, *, workflow_name, metadata=None, max_turns=4):\n    kwargs = {\"run_config\": _schemaflow_run_config(workflow_name, metadata), \"max_turns\": max_turns}\n    try:\n        return Runner.run_sync(agent, prompt, **kwargs)\n    except RuntimeError as exc:\n        if \"event loop\" not in str(exc).lower():\n            raise\n        with ThreadPoolExecutor(max_workers=1) as pool:\n            return pool.submit(lambda: Runner.run_sync(agent, prompt, **kwargs)).result()\n\ndef run_schemaflow_json_agent(*, name, instructions, prompt, output_schema, model=None, tools=None, workflow_name=None, metadata=None):\n    agent = Agent(name=name, instructions=instructions, model=model or MODEL, output_type=output_schema, tools=tools or [])\n    result = _runner_run_sync(agent, prompt, workflow_name=workflow_name or name, metadata={\"agent\": name, **(metadata or {})})\n    return _agent_output_to_json(result.final_output), result\n\ndef run_schemaflow_text_agent(*, name, instructions, prompt, model=None, tools=None, workflow_name=None, metadata=None):\n    agent = Agent(name=name, instructions=instructions, model=model or MODEL, tools=tools or [])\n    result = _runner_run_sync(agent, prompt, workflow_name=workflow_name or name, metadata={\"agent\": name, **(metadata or {})})\n    return _agent_output_to_text(result.final_output), result\n\ndef trace_function_result(name, *, input_obj=None, output_obj=None):\n    with function_span(\n        name,\n        input=json.dumps(input_obj, ensure_ascii=False, default=str) if input_obj is not None else None,\n        output=json.dumps(output_obj, ensure_ascii=False, default=str) if output_obj is not None else None,\n    ):\n        pass\n\ndef _collect_file_search_results(value):\n    results = []\n    seen = set()\n\n    def visit(node):\n        if node is None or isinstance(node, (str, int, float, bool, bytes)):\n            return\n        if isinstance(node, type) or callable(node):\n            return\n        node_id = id(node)\n        if node_id in seen:\n            return\n        seen.add(node_id)\n\n        node = _model_dump(node)\n        if node is None or isinstance(node, (str, int, float, bool, bytes)):\n            return\n        if isinstance(node, type) or callable(node):\n            return\n\n        if isinstance(node, dict):\n            if node.get(\"type\") == \"file_search_call\":\n                for result in node.get(\"results\", []) or []:\n                    result = _model_dump(result)\n                    if isinstance(result, dict):\n                        text = result.get(\"text\") or result.get(\"content\") or \"\"\n                        if isinstance(text, list):\n                            text = \"\\n\".join(str(x) for x in text)\n                        results.append({\"file_id\": result.get(\"file_id\"), \"filename\": result.get(\"filename\") or result.get(\"file_name\") or result.get(\"title\"), \"score\": result.get(\"score\"), \"text_preview\": str(text)[:1200]})\n            for child in node.values():\n                visit(child)\n        elif isinstance(node, (list, tuple, set)):\n            for child in node:\n                visit(child)\n\n    visit(value)\n    return results\n\ndef normalize_change(change_json):\n    if not isinstance(change_json, dict):\n        change_json = {}\n    change_json.setdefault(\"title\", None)\n    change_json.setdefault(\"domain\", None)\n    change_json.setdefault(\"target_schema\", None)\n    change_json.setdefault(\"target_table\", None)\n    if not isinstance(change_json.get(\"operations\"), list):\n        change_json[\"operations\"] = [change_json.get(\"operations\")] if change_json.get(\"operations\") else []\n    if not isinstance(change_json.get(\"notes\"), list):\n        change_json[\"notes\"] = []\n    return change_json\n\ndef parse_change(change_text, model=None):\n    change_json, _ = run_schemaflow_json_agent(\n        name=\"SchemaFlow Parse Agent\",\n        instructions=PARSE_SYSTEM,\n        prompt=\"Change Request:\\n\\n\" + change_text,\n        output_schema=CHANGE_OUTPUT_SCHEMA,\n        model=model,\n        workflow_name=\"SchemaFlow Eval Parse\",\n        metadata={\"eval_stage\": \"parse\"},\n    )\n    return normalize_change(change_json)\n\ndef run_schemaflow_parse(change_text, *, model=None, api_key=None):\n    _ensure_openai_api_key(api_key)\n    with custom_span(\"SchemaFlow Promptfoo Parse Eval\", {\"eval_mode\": \"parse_only\", \"group_id\": SCHEMAFLOW_TRACE_GROUP_ID}):\n        try:\n            change_json = parse_change(change_text, model=model)\n            bundle = {\"eval_mode\": \"parse_only\", \"change_text\": change_text, \"change_json\": change_json, \"validation\": {\"valid\": True, \"issues\": []}}\n            trace_function_result(\"Promptfoo parse bundle\", input_obj={\"change_text\": change_text}, output_obj=bundle)\n            return bundle\n        finally:\n            flush_traces()\n\ndef normalize_impact(impact_json):\n    if not isinstance(impact_json, dict):\n        impact_json = {}\n    impact_json.setdefault(\"impacted_objects\", [])\n    impact_json.setdefault(\"risks\", [])\n    impact_json.setdefault(\"assumptions\", [])\n    return impact_json\n\ndef normalize_plan(plan_json):\n    if not isinstance(plan_json, dict):\n        plan_json = {}\n    plan_json.setdefault(\"plan_steps\", [])\n    plan_json.setdefault(\"prechecks\", [])\n    plan_json.setdefault(\"postchecks\", [])\n    plan_json.setdefault(\"rollback\", [])\n    return plan_json\n\ndef resolve_eval_pdf_path(pdf_path):\n    requested = Path(pdf_path).expanduser()\n    module_dir = Path(__file__).resolve().parent\n    candidates = [requested]\n    if not requested.is_absolute():\n        candidates.extend([\n            module_dir / requested,\n            module_dir.parent / requested,\n            module_dir.parent.parent / requested,\n        ])\n    resolved_candidates = []\n    for candidate in candidates:\n        resolved = candidate.resolve()\n        if resolved in resolved_candidates:\n            continue\n        resolved_candidates.append(resolved)\n        if resolved.exists():\n            return resolved\n    attempted = \", \".join(str(candidate) for candidate in resolved_candidates)\n    raise FileNotFoundError(f\"PDF not found: {pdf_path}. Tried: {attempted}\")\n\ndef create_pdf_vector_store(client, pdf_path, name_prefix=\"schemaflow-cookbook\"):\n    pdf_path = resolve_eval_pdf_path(pdf_path)\n    if pdf_path.suffix.lower() != \".pdf\":\n        raise ValueError(f\"Expected a PDF file, got: {pdf_path}\")\n    with custom_span(\"Promptfoo create vector store\", {\"pdf_path\": str(pdf_path)}):\n        vector_store = client.vector_stores.create(name=f\"{name_prefix}-{datetime.now(timezone.utc).strftime('%Y%m%dT%H%M%SZ')}\", expires_after={\"anchor\": \"last_active_at\", \"days\": 1})\n    with custom_span(\"Promptfoo upload PDF to vector store\", {\"vector_store_id\": vector_store.id, \"pdf_path\": str(pdf_path)}):\n        with pdf_path.open(\"rb\") as handle:\n            vector_store_file = client.vector_stores.files.upload_and_poll(vector_store_id=vector_store.id, file=handle)\n    trace_function_result(\"Promptfoo vector store ready\", input_obj={\"pdf_path\": str(pdf_path)}, output_obj={\"vector_store_id\": vector_store.id, \"status\": getattr(vector_store_file, \"status\", \"unknown\")})\n    return vector_store, vector_store_file\n\ndef delete_vector_store(client, vector_store_id):\n    if not vector_store_id:\n        return\n    try:\n        with custom_span(\"Promptfoo delete vector store\", {\"vector_store_id\": vector_store_id}):\n            client.vector_stores.delete(vector_store_id=vector_store_id)\n    except Exception:\n        pass\n\ndef validate_sql(sql_text, required_keywords=None):\n    issues = []\n    if not (sql_text or \"\").strip():\n        issues.append(\"SQL output is empty\")\n    for keyword in required_keywords or [\"ALTER TABLE\"]:\n        if keyword.lower() not in (sql_text or \"\").lower():\n            issues.append(f\"Expected keyword missing: {keyword}\")\n    validation = {\"valid\": len(issues) == 0, \"issues\": issues}\n    with guardrail_span(\"promptfoo_sql_validation\", triggered=not validation[\"valid\"]):\n        trace_function_result(\"Promptfoo SQL validation\", output_obj=validation)\n    return validation\n\ndef run_schemaflow_case(change_text, *, pdf_path=None, rag_max_results=6, model=None, api_key=None, validation_keywords=None, delete_vector_store_after_run=True):\n    client = _get_client(api_key=api_key)\n    vector_store_id = None\n    rag_file_search_results = []\n    with custom_span(\"SchemaFlow Promptfoo Full Flow Eval\", {\"eval_mode\": \"full_flow\", \"pdf_path\": pdf_path or \"\", \"group_id\": SCHEMAFLOW_TRACE_GROUP_ID}):\n        try:\n            change_json = parse_change(change_text, model=model)\n            impact_user_parts = [\"CHANGE_JSON:\\n\" + json.dumps(change_json, ensure_ascii=False)]\n            impact_tools = []\n            if pdf_path:\n                vector_store, _ = create_pdf_vector_store(client, pdf_path, name_prefix=\"schemaflow-promptfoo\")\n                vector_store_id = vector_store.id\n                impact_tools.append(FileSearchTool(vector_store_ids=[vector_store_id], max_num_results=rag_max_results, include_search_results=True))\n                impact_user_parts.append(\"Use the file_search tool against the uploaded PDF to look for relevant IFD, schema, table, column, lineage, and downstream dependency context before returning JSON.\")\n            impact_json, impact_result = run_schemaflow_json_agent(name=\"SchemaFlow Impact Agent\", instructions=IMPACT_SYSTEM, prompt=\"\\n\\n\".join(impact_user_parts), output_schema=IMPACT_OUTPUT_SCHEMA, model=model, tools=impact_tools, workflow_name=\"SchemaFlow Eval Impact\", metadata={\"eval_stage\": \"impact\", \"rag_enabled\": bool(vector_store_id)})\n            impact_json = normalize_impact(impact_json)\n            try:\n                rag_file_search_results = _collect_file_search_results(impact_result)\n            except Exception as exc:\n                rag_file_search_results = []\n                trace_function_result(\"Promptfoo File Search summary skipped\", output_obj={\"error\": f\"{type(exc).__name__}: {exc}\"})\n            plan_user = \"\\n\\n\".join([\"CHANGE_JSON:\\n\" + json.dumps(change_json, ensure_ascii=False), \"IMPACT_JSON:\\n\" + json.dumps(impact_json, ensure_ascii=False)])\n            plan_json, _ = run_schemaflow_json_agent(name=\"SchemaFlow Plan Agent\", instructions=PLAN_SYSTEM, prompt=plan_user, output_schema=PLAN_OUTPUT_SCHEMA, model=model, workflow_name=\"SchemaFlow Eval Plan\", metadata={\"eval_stage\": \"plan\"})\n            plan_json = normalize_plan(plan_json)\n            sql_user = \"\\n\\n\".join([\"CHANGE_JSON:\\n\" + json.dumps(change_json, ensure_ascii=False), \"PLAN_JSON:\\n\" + json.dumps(plan_json, ensure_ascii=False)])\n            sql_text, _ = run_schemaflow_text_agent(name=\"SchemaFlow SQL Agent\", instructions=SQL_SYSTEM, prompt=sql_user, model=model, workflow_name=\"SchemaFlow Eval SQL\", metadata={\"eval_stage\": \"sql\"})\n            validation = validate_sql(sql_text, required_keywords=validation_keywords)\n            bundle = {\"summary\": {\"matched_tables\": [], \"impact_risks\": impact_json.get(\"risks\", []), \"rag_hits\": len(rag_file_search_results)}, \"rag\": {\"enabled\": bool(vector_store_id), \"vector_store_id\": vector_store_id, \"hits\": len(rag_file_search_results), \"file_search_results\": rag_file_search_results}, \"change_json\": change_json, \"impact_json\": impact_json, \"plan\": plan_json, \"sql\": sql_text, \"validation\": validation}\n            trace_function_result(\"Promptfoo full-flow bundle\", input_obj={\"change_text\": change_text}, output_obj=bundle)\n            return bundle\n        finally:\n            if delete_vector_store_after_run:\n                delete_vector_store(client, vector_store_id)\n            flush_traces()\n'''\n\ncore_module = (CORE_MODULE_TEMPLATE\n    .replace(\"__MODEL_DEFAULT__\", repr(MODEL))\n    .replace(\"__PARSE_SYSTEM__\", repr(PARSE_SYSTEM))\n    .replace(\"__IMPACT_SYSTEM__\", repr(IMPACT_SYSTEM))\n    .replace(\"__PLAN_SYSTEM__\", repr(PLAN_SYSTEM))\n    .replace(\"__SQL_SYSTEM__\", repr(SQL_SYSTEM)))\ncore_path = PROMPTFOO_DIR / \"schemaflow_cookbook_core.py\"\ncore_path.write_text(core_module, encoding=\"utf-8\")\nprint(\"Published SchemaFlow core:\", core_path.resolve())\n```\n\n### Promptfoo Provider Runtime\n\nThis cell writes the Promptfoo provider file:\n\n```\nartifacts/promptfoo/schemaflow_cookbook_eval_provider.py\n```\n\nThe provider is the bridge between Promptfoo and SchemaFlow.\n\nFor each Promptfoo test case, it reads variables such as:\n\n`change_text`\n\n`eval_mode`\n\n- optional\n`pdf_path`\n\n- optional\n`rag_max_results`\n\n- validation keywords\n\nThen it chooses one of two execution paths:\n\n`parse_only`\n\nruns only Stage 1 and returns a parse bundle.`full_flow`\n\nruns the complete SchemaFlow pipeline and returns the full bundle.\n\nThe provider returns JSON so Promptfoo assertions can inspect structured fields instead of parsing notebook text output.\n\n``` python\n%%writefile artifacts/promptfoo/schemaflow_cookbook_eval_provider.py\nimport json\nimport os\nfrom agents import flush_traces, function_span, trace\nfrom schemaflow_cookbook_core import run_schemaflow_case, run_schemaflow_parse\nSCHEMAFLOW_TRACE_GROUP_ID = os.getenv(\"SCHEMAFLOW_TRACE_GROUP_ID\", \"schemaflow-cookbook-promptfoo\")\n\ndef _json_list(value):\n    if value is None:\n        return []\n    if isinstance(value, list):\n        return value\n    try:\n        parsed = json.loads(value)\n    except Exception:\n        return [value]\n    return parsed if isinstance(parsed, list) else [parsed]\n\ndef _trace_function_result(name, *, input_obj=None, output_obj=None):\n    with function_span(name, input=json.dumps(input_obj, ensure_ascii=False, default=str) if input_obj is not None else None, output=json.dumps(output_obj, ensure_ascii=False, default=str) if output_obj is not None else None):\n        pass\n\ndef call_api(prompt, options, context):\n    vars_ = (context or {}).get(\"vars\", {})\n    change_text = vars_.get(\"change_text\") or prompt\n    eval_mode = vars_.get(\"eval_mode\", \"full_flow\")\n    with trace(\"SchemaFlow Promptfoo Provider\", group_id=SCHEMAFLOW_TRACE_GROUP_ID, metadata={\"eval_mode\": eval_mode}):\n        try:\n            if eval_mode == \"parse_only\":\n                bundle = run_schemaflow_parse(change_text)\n            elif eval_mode == \"full_flow\":\n                bundle = run_schemaflow_case(change_text, pdf_path=vars_.get(\"pdf_path\"), rag_max_results=int(vars_.get(\"rag_max_results\") or 6), validation_keywords=_json_list(vars_.get(\"validation_keywords_json\")))\n                bundle[\"eval_mode\"] = \"full_flow\"\n            else:\n                raise ValueError(f\"Unsupported eval_mode: {eval_mode}\")\n            _trace_function_result(\"Promptfoo provider output\", input_obj={\"eval_mode\": eval_mode, \"change_text\": change_text, \"vars\": vars_}, output_obj=bundle)\n            return {\"output\": json.dumps(bundle, ensure_ascii=False)}\n        finally:\n            flush_traces()\n```\n\n### Promptfoo Assertion Runtime\n\nThis cell writes the Promptfoo assertion file:\n\n```\nartifacts/promptfoo/schemaflow_cookbook_eval_assert.py\n```\n\nThe assertion file validates provider output for both eval modes.\n\nFor `parse_only`\n\n, it checks:\n\n- output is valid JSON\n- target schema and table match expectations\n- at least one parsed operation is present\n- expected added column appears in parsed operations\n- expected data type appears structurally in parsed operations\n\nFor `full_flow`\n\n, it checks:\n\n- output is valid JSON\n- target schema and table match expectations\n- at least one parsed operation is present\n- impact risks are present\n- required SQL terms are present\n- validation passed\n\nThe assertion also emits guardrail spans so eval failures are visible in traces.\n\n``` python\n%%writefile artifacts/promptfoo/schemaflow_cookbook_eval_assert.py\nimport json\nimport os\nimport re\nfrom agents import flush_traces, function_span, guardrail_span, trace\nSCHEMAFLOW_TRACE_GROUP_ID = os.getenv(\"SCHEMAFLOW_TRACE_GROUP_ID\", \"schemaflow-cookbook-promptfoo\")\n\ndef _json_list(value):\n    if value is None:\n        return []\n    if isinstance(value, list):\n        return value\n    try:\n        parsed = json.loads(value)\n    except Exception:\n        return [value]\n    return parsed if isinstance(parsed, list) else [parsed]\n\ndef _normalize_name(value):\n    return (value or \"\").replace('\"', \"\").replace(\"'\", \"\").strip().upper()\n\ndef _normalize_text(value):\n    return \" \".join(str(value or \"\").upper().replace('\"', \"\").replace(\"'\", \"\").split())\n\ndef _compact_text(value):\n    return re.sub(r\"\\s+\", \"\", _normalize_text(value))\n\ndef _operation_text(bundle):\n    operations = bundle.get(\"change_json\", {}).get(\"operations\", [])\n    return _normalize_text(json.dumps(operations, ensure_ascii=False))\n\ndef _trace_function_result(name, *, input_obj=None, output_obj=None):\n    with function_span(name, input=json.dumps(input_obj, ensure_ascii=False, default=str) if input_obj is not None else None, output=json.dumps(output_obj, ensure_ascii=False, default=str) if output_obj is not None else None):\n        pass\n\ndef _check_target(bundle, expected_schema, expected_table):\n    if not expected_schema or not expected_table:\n        return True, \"target expectation not configured\"\n    change = bundle.get(\"change_json\", {})\n    actual_schema = _normalize_name(change.get(\"target_schema\"))\n    actual_table = _normalize_name(change.get(\"target_table\"))\n    return actual_schema == _normalize_name(expected_schema) and actual_table == _normalize_name(expected_table), f\"expected target {_normalize_name(expected_schema)}.{_normalize_name(expected_table)}, got {actual_schema}.{actual_table}\"\n\ndef _check_expected_text(bundle, value, label):\n    if not value:\n        return True, f\"{label} expectation not configured\"\n    haystack = _operation_text(bundle)\n    needle = _normalize_text(value)\n    return needle in haystack, f\"expected parsed {label} {needle} in operations\"\n\ndef _check_expected_data_type(bundle, value):\n    if not value:\n        return True, \"data type expectation not configured\"\n    haystack = _operation_text(bundle)\n    compact_haystack = _compact_text(haystack)\n    compact_needle = _compact_text(value)\n    if compact_needle in compact_haystack:\n        return True, \"data type matched\"\n    match = re.match(r\"([A-Z]+)\\(?([0-9,]*)\\)?\", compact_needle)\n    if not match:\n        return False, f\"expected parsed data type {value} in operations\"\n    base_type, size = match.groups()\n    if base_type and base_type not in compact_haystack:\n        return False, f\"expected parsed data type base {base_type} in operations\"\n    if size:\n        missing_sizes = [part for part in size.split(\",\") if part and part not in compact_haystack]\n        if missing_sizes:\n            return False, f\"expected parsed data type size {size} in operations\"\n    return True, \"data type matched structurally\"\n\ndef get_assert(output, context):\n    vars_ = (context or {}).get(\"vars\", {})\n    eval_mode = vars_.get(\"eval_mode\", \"full_flow\")\n    with trace(\"SchemaFlow Promptfoo Assertion\", group_id=SCHEMAFLOW_TRACE_GROUP_ID, metadata={\"eval_mode\": eval_mode}):\n        try:\n            try:\n                bundle = json.loads(output)\n            except Exception as exc:\n                result = {\"pass\": False, \"score\": 0, \"reason\": f\"Provider output was not JSON: {exc}\"}\n                with guardrail_span(\"provider_output_json\", triggered=True):\n                    _trace_function_result(\"Promptfoo assertion parse failure\", input_obj={\"output\": output}, output_obj=result)\n                return result\n            checks = []\n            ok, reason = _check_target(bundle, vars_.get(\"expected_schema\"), vars_.get(\"expected_table\"))\n            checks.append((\"target_matches_expected\", ok, reason))\n            operations = bundle.get(\"change_json\", {}).get(\"operations\", [])\n            checks.append((\"parsed_operation_present\", isinstance(operations, list) and len(operations) > 0, \"expected at least one parsed operation\"))\n            if eval_mode == \"parse_only\":\n                ok, reason = _check_expected_text(bundle, vars_.get(\"expected_added_column\"), \"added column\")\n                checks.append((\"expected_added_column\", ok, reason))\n                ok, reason = _check_expected_data_type(bundle, vars_.get(\"expected_data_type\"))\n                checks.append((\"expected_data_type\", ok, reason))\n            else:\n                risks = bundle.get(\"impact_json\", {}).get(\"risks\", [])\n                checks.append((\"impact_risks_present\", isinstance(risks, list) and len(risks) > 0, \"expected at least one impact risk\"))\n                sql_text = bundle.get(\"sql\") or bundle.get(\"sql_text\") or \"\"\n                missing_terms = [term for term in _json_list(vars_.get(\"sql_terms_json\")) if term.lower() not in sql_text.lower()]\n                checks.append((\"sql_terms_present\", not missing_terms, \"missing SQL terms: \" + \", \".join(missing_terms)))\n                validation = bundle.get(\"validation\", {})\n                checks.append((\"validation_passed\", bool(validation.get(\"valid\")), \"validation issues: \" + \"; \".join(validation.get(\"issues\", []))))\n            for name, ok, reason in checks:\n                with guardrail_span(name, triggered=not ok):\n                    _trace_function_result(\"Promptfoo assertion check\", output_obj={\"name\": name, \"ok\": ok, \"reason\": reason})\n            passed = [ok for _, ok, _ in checks if ok]\n            failures = [reason for _, ok, reason in checks if not ok]\n            score = len(passed) / len(checks) if checks else 0\n            result = {\"pass\": score == 1, \"score\": score, \"reason\": \"All checks passed\" if not failures else \"; \".join(failures)}\n            _trace_function_result(\"Promptfoo assertion result\", input_obj={\"vars\": vars_}, output_obj=result)\n            return result\n        finally:\n            flush_traces()\n```\n\n### Build Promptfoo Test Cases and Config\n\nThis cell builds Promptfoo test cases from the current notebook input.\n\nBy default, it creates two test cases from the current `CHANGE_TEXT`\n\n, carrying through `PDF_PATH`\n\nwhen a PDF is configured:\n\n- a parse-only test\n- a full-flow test\n\nThe helper functions infer expectations from the change request, including:\n\n- expected schema\n- expected table\n- expected added column\n- expected data type\n- expected SQL terms\n- expected validation keywords\n\nThe cell also includes optional regression fixtures. Set:\n\n```\nRUN_EXTRA_REGRESSION_CASES = True\n```\n\nto add those extra cases to the generated config.\n\nBefore writing `promptfooconfig.yaml`\n\n, the cell runs deterministic input preflight checks. This prevents obviously malformed eval inputs from producing confusing Promptfoo failures.\n\n``` python\nimport json\nimport re\nimport sys\nfrom pathlib import Path\n\ndef infer_eval_expectations(change_text):\n    target_match = re.search(\n        r\"\\b(?:to|from|in|on)\\s+([A-Za-z_][\\w$]*)\\.([A-Za-z_][\\w$]*)\",\n        change_text,\n        flags=re.IGNORECASE,\n    )\n    column_type_match = re.search(\n        r\"\\badd\\s+([A-Za-z_][\\w$]*)\\s+((?:VAR)?CHAR\\s*\\([^)]*\\)|TEXT|INTEGER|INT|BIGINT|BOOLEAN|DATE|TIMESTAMP|NUMERIC\\s*\\([^)]*\\)|DECIMAL\\s*\\([^)]*\\)|FLOAT|DOUBLE)\",\n        change_text,\n        flags=re.IGNORECASE,\n    )\n\n    expected_schema = target_match.group(1).upper() if target_match else None\n    expected_table = target_match.group(2).upper() if target_match else None\n    added_column = column_type_match.group(1).upper() if column_type_match else None\n    data_type = \" \".join(column_type_match.group(2).upper().split()) if column_type_match else None\n\n    sql_terms = []\n    validation_keywords = [\"ALTER TABLE\"]\n    if expected_table:\n        sql_terms.append(expected_table)\n    if added_column:\n        sql_terms.append(added_column)\n    if data_type:\n        sql_terms.append(data_type)\n    sql_terms.append(\"ALTER TABLE\")\n\n    lower_text = change_text.lower()\n    if any(term in lower_text for term in [\"backfill\", \"update\", \"source it from\"]):\n        sql_terms.append(\"UPDATE\")\n        validation_keywords.append(\"UPDATE\")\n    if \"index\" in lower_text:\n        sql_terms.append(\"CREATE INDEX\")\n        validation_keywords.append(\"CREATE INDEX\")\n\n    return {\n        \"expected_schema\": expected_schema,\n        \"expected_table\": expected_table,\n        \"expected_added_column\": added_column,\n        \"expected_data_type\": data_type,\n        \"sql_terms\": list(dict.fromkeys(sql_terms)),\n        \"validation_keywords\": list(dict.fromkeys(validation_keywords)),\n    }\n\ndef build_eval_case(description, change_text, **overrides):\n    expectations = infer_eval_expectations(change_text)\n    vars_ = {\n        \"change_text\": change_text,\n        \"sql_terms_json\": json.dumps(expectations[\"sql_terms\"], ensure_ascii=False),\n        \"validation_keywords_json\": json.dumps(expectations[\"validation_keywords\"], ensure_ascii=False),\n    }\n    for key in [\"expected_schema\", \"expected_table\", \"expected_added_column\", \"expected_data_type\"]:\n        if expectations.get(key):\n            vars_[key] = expectations[key]\n    vars_.update({k: v for k, v in overrides.items() if v is not None})\n    return {\"description\": description, \"vars\": vars_}\n\ndef _json_list(value):\n    if value is None:\n        return []\n    if isinstance(value, list):\n        return value\n    parsed = json.loads(value)\n    return parsed if isinstance(parsed, list) else [parsed]\n\ndef preflight_eval_case(case):\n    vars_ = case[\"vars\"]\n    errors = []\n    warnings = []\n    change_text = vars_.get(\"change_text\", \"\")\n    sql_terms = _json_list(vars_.get(\"sql_terms_json\"))\n\n    if len(change_text.strip()) < 20:\n        errors.append(\"change_text is missing or too short\")\n    if not vars_.get(\"expected_schema\") or not vars_.get(\"expected_table\"):\n        errors.append(\"could not infer target schema/table\")\n    if not vars_.get(\"expected_added_column\"):\n        warnings.append(\"could not infer added column\")\n    if not vars_.get(\"expected_data_type\"):\n        warnings.append(\"could not infer added column data type\")\n    if len(sql_terms) <= 1:\n        warnings.append(\"few SQL terms inferred\")\n    if vars_.get(\"pdf_path\"):\n        pdf_path = Path(vars_[\"pdf_path\"]).expanduser()\n        if not pdf_path.exists():\n            errors.append(f\"pdf_path does not exist: {pdf_path}\")\n        elif pdf_path.suffix.lower() != \".pdf\":\n            errors.append(f\"pdf_path is not a PDF: {pdf_path}\")\n\n    return {\"description\": case[\"description\"], \"errors\": errors, \"warnings\": warnings}\n\ndef as_promptfoo_test(case, eval_mode):\n    vars_ = dict(case[\"vars\"])\n    vars_[\"eval_mode\"] = eval_mode\n    label = \"Parse-only\" if eval_mode == \"parse_only\" else \"Full flow\"\n    return {\"description\": f\"{label}: {case['description']}\", \"vars\": vars_}\n\nCURRENT_NOTEBOOK_EVAL_CASE = build_eval_case(\n    \"Current notebook change request\",\n    CHANGE_TEXT,\n    pdf_path=str(Path(PDF_PATH).expanduser().resolve()) if PDF_PATH else None,\n)\n\nRUN_EXTRA_REGRESSION_CASES = False\n\nEXTRA_REGRESSION_CASES = [\n    build_eval_case(\n        \"Product style color propagation\",\n        \"\"\"Add COLOR_CODE VARCHAR(10) to ODS.ODS_PLIM_STYLE as nullable.\nSource it from FLEX.STYLE.COLOR_CODE when available and propagate the field through staging and mart outputs used by product reporting.\"\"\",\n    ),\n    build_eval_case(\n        \"Optional customer note field\",\n        \"\"\"Add CUSTOMER_SEGMENT_NOTE VARCHAR(255) to ODS.ODS_CUSTOMER_PROFILE as nullable.\nNo historical backfill is required. The field is optional metadata for analyst annotations and should not block existing loads.\"\"\",\n    ),\n]\n\nADDITIONAL_EVAL_CASES = EXTRA_REGRESSION_CASES if RUN_EXTRA_REGRESSION_CASES else []\nINPUT_EVAL_CASES = [CURRENT_NOTEBOOK_EVAL_CASE, *ADDITIONAL_EVAL_CASES]\nINPUT_PREFLIGHT_RESULTS = [preflight_eval_case(case) for case in INPUT_EVAL_CASES]\nINPUT_PREFLIGHT_ERRORS = [\n    f\"{result['description']}: {error}\"\n    for result in INPUT_PREFLIGHT_RESULTS\n    for error in result[\"errors\"]\n]\n\nprint(\"Input preflight:\")\nfor result in INPUT_PREFLIGHT_RESULTS:\n    status = \"PASS\" if not result[\"errors\"] else \"FAIL\"\n    print(f\"- {status}: {result['description']}\")\n    for warning in result[\"warnings\"]:\n        print(f\"  warning: {warning}\")\n    for error in result[\"errors\"]:\n        print(f\"  error: {error}\")\nif INPUT_PREFLIGHT_ERRORS:\n    raise ValueError(\"Input preflight failed:\\n\" + \"\\n\".join(INPUT_PREFLIGHT_ERRORS))\n\nPROMPTFOO_PARSE_EVAL_CASES = [as_promptfoo_test(case, \"parse_only\") for case in INPUT_EVAL_CASES]\nPROMPTFOO_FULL_FLOW_EVAL_CASES = [as_promptfoo_test(case, \"full_flow\") for case in INPUT_EVAL_CASES]\nPROMPTFOO_EVAL_CASES = [*PROMPTFOO_PARSE_EVAL_CASES, *PROMPTFOO_FULL_FLOW_EVAL_CASES]\n\npromptfoo_config = {\n    \"description\": \"SchemaFlow cookbook evals\",\n    \"prompts\": [\"{{change_text}}\"],\n    \"providers\": [{\n        \"id\": \"file://schemaflow_cookbook_eval_provider.py\",\n        \"config\": {\"pythonExecutable\": sys.executable},\n    }],\n    \"defaultTest\": {\n        \"assert\": [{\n            \"type\": \"python\",\n            \"value\": \"file://schemaflow_cookbook_eval_assert.py\",\n        }],\n    },\n    \"tests\": PROMPTFOO_EVAL_CASES,\n}\n\nconfig_path = PROMPTFOO_DIR / \"promptfooconfig.yaml\"\nconfig_text = \"# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json\\n\" + json.dumps(\n    promptfoo_config,\n    indent=2,\n    ensure_ascii=False,\n)\nconfig_path.write_text(config_text, encoding=\"utf-8\")\n\nprint(\"Promptfoo config:\", config_path.resolve())\nprint(\"Promptfoo Python executable:\", sys.executable)\nprint(\"Promptfoo eval cases:\", len(PROMPTFOO_EVAL_CASES))\nfor case in PROMPTFOO_EVAL_CASES:\n    vars_ = case[\"vars\"]\n    target = \".\".join(part for part in [vars_.get(\"expected_schema\"), vars_.get(\"expected_table\")] if part)\n    print(\"-\", case[\"description\"], \"->\", target or \"target not inferred\")\n```\n\n### Run Promptfoo Eval\n\nThis cell runs Promptfoo non-interactively from the notebook.\n\nThe command:\n\n- runs from\n`artifacts/promptfoo/`\n\n- uses the generated\n`promptfooconfig.yaml`\n\n- uses notebook-local Promptfoo config, cache, logs, and npm cache\n- runs with concurrency\n`1`\n\nfor predictable notebook behavior - keeps CLI output visible in the notebook\n- writes timestamped JSON and HTML reports\n- refreshes latest-result aliases after a successful run\n\nThe exported result files are saved under:\n\n```\nartifacts/promptfoo/results/\n```\n\nIf the Node.js/npm runtime check failed earlier, fix the runtime before running this cell.\n\n```\n%%bash\nset -euo pipefail\ncd artifacts/promptfoo\nexport PROMPTFOO_CONFIG_DIR=\"$PWD/.promptfoo\"\nexport PROMPTFOO_LOG_DIR=\"$PWD/.promptfoo/logs\"\nexport PROMPTFOO_CACHE_PATH=\"$PWD/.promptfoo/cache\"\nexport npm_config_cache=\"$PWD/.npm-cache\"\nexport npm_config_update_notifier=false\nexport npm_config_loglevel=error\nexport SCHEMAFLOW_TRACE_GROUP_ID=\"${SCHEMAFLOW_TRACE_GROUP_ID:-schemaflow-cookbook-promptfoo}\"\nexport OPENAI_AGENTS_TRACE_INCLUDE_SENSITIVE_DATA=\"${OPENAI_AGENTS_TRACE_INCLUDE_SENSITIVE_DATA:-false}\"\nmkdir -p \"$PROMPTFOO_LOG_DIR\" \"$PROMPTFOO_CACHE_PATH\" results\nRUN_ID=\"$(date -u +%Y%m%dT%H%M%SZ)\"\nRESULT_JSON=\"results/schemaflow_cookbook_eval_${RUN_ID}.json\"\nRESULT_HTML=\"results/schemaflow_cookbook_eval_${RUN_ID}.html\"\nnpx --yes promptfoo@latest eval \\\n  -c promptfooconfig.yaml \\\n  --max-concurrency 1 \\\n  --no-progress-bar \\\n  --description \"SchemaFlow cookbook eval ${RUN_ID}\" \\\n  -o \"$RESULT_JSON\" \"$RESULT_HTML\"\ncp \"$RESULT_JSON\" results/schemaflow_cookbook_eval_latest.json\ncp \"$RESULT_HTML\" results/schemaflow_cookbook_eval_latest.html\nprintf '\\nSaved Promptfoo results:\\n  %s\\n  %s\\n' \"$RESULT_JSON\" \"$RESULT_HTML\"\nprintf 'Latest aliases:\\n  %s\\n  %s\\n' \"results/schemaflow_cookbook_eval_latest.json\" \"results/schemaflow_cookbook_eval_latest.html\"\nprintf 'Trace group:\\n  %s\\n' \"$SCHEMAFLOW_TRACE_GROUP_ID\"\n```\n\n### Review Latest Promptfoo Results\n\nThis cell checks whether the latest Promptfoo result aliases exist and prints their paths and sizes.\n\nExpected files:\n\n```\nartifacts/promptfoo/results/schemaflow_cookbook_eval_latest.json\nartifacts/promptfoo/results/schemaflow_cookbook_eval_latest.html\n```\n\nIf the latest JSON file exists, the cell also prints available eval metadata such as the eval ID and aggregate stats.\n\nUse this section as a quick confirmation that the eval completed and exported artifacts successfully.\n\n``` python\nfrom pathlib import Path\nimport json\n\nresults_dir = Path(\"artifacts/promptfoo/results\")\nlatest_json = results_dir / \"schemaflow_cookbook_eval_latest.json\"\nlatest_html = results_dir / \"schemaflow_cookbook_eval_latest.html\"\n\nfor path in (latest_json, latest_html):\n    if path.exists():\n        print(f\"{path.name}: {path.resolve()} ({path.stat().st_size:,} bytes)\")\n    else:\n        print(f\"Missing expected Promptfoo result: {path.resolve()}\")\n\nif latest_json.exists():\n    data = json.loads(latest_json.read_text())\n    eval_id = data.get(\"evalId\")\n    results = data.get(\"results\", {})\n    stats = results.get(\"stats\", {}) if isinstance(results, dict) else {}\n    if eval_id:\n        print(\"Eval ID:\", eval_id)\n    if stats:\n        print(\"Stats:\", stats)\n```\n\n## 11) Optional Neo4j Knowledge Graph & Dashboard\n\nThis optional section is **fully self-contained** and **does not affect the core pipeline above**. It uses a small synthetic customer-loyalty graph seed plus inline dashboard code so the cookbook stays portable. Readers can treat it as a visual appendix: the core workflow works without Neo4j, but graph views make lineage and downstream impact easier to inspect.\n\n**What this section does, in order:**\n\n**Step 1 - Seed**: define a synthetic customer-loyalty graph with ODS, staging, core, mart, and CRM objects, their columns, lineage, and joins as an inline Python data structure - no external files.**Step 2 - AI Enrichment**: use the OpenAI`client`\n\nalready loaded in Section 1 to fill in`semantic_meaning`\n\n(a short 2-5 word tag like`natural-key`\n\n,`foreign-key`\n\n,`monetary-amount`\n\n,`timestamp`\n\n) for every column.**Step 3 - Upsert to Neo4j**: write the enriched data to a running Neo4j instance via idempotent`MERGE`\n\nCypher. Nodes are labeled`SchemaFlowCookbook`\n\nso the dashboard ignores stale sample data from older local runs.**Dashboard**: write a small FastAPI server + D3.js page next to the notebook, launch it on`http://127.0.0.1:8005`\n\n, and print a clickable link.\n\n**Prerequisites for this section:**\n\n- A running Neo4j instance, e.g. via Docker:\n`docker run -d -p 7687:7687 -p 7474:7474 -e NEO4J_AUTH=neo4j/change-me-please neo4j:5`\n\n(Neo4j Desktop or AuraDB free tier also work). `NEO4J_URI`\n\n,`NEO4J_USER`\n\n,`NEO4J_PASSWORD`\n\n(loaded via env, or entered at the prompt below).- A free local port\n`8005`\n\nfor the dashboard (override via`NEO4J_DASHBOARD_PORT`\n\n). - Optional packages\n`neo4j`\n\n,`fastapi`\n\n,`uvicorn`\n\n- the next cell will install them lazily if missing.\n\nIf any prerequisite is missing, every cell below short-circuits with a clear message; nothing throws and the rest of the notebook is unaffected.\n\n### 11.1) Environment Setup & Optional Dependencies\n\nMirrors Section 1’s OpenAI env loading pattern. Lazy-installs `neo4j`\n\n, `fastapi`\n\n, and `uvicorn`\n\nonly when they are not importable, then reads `NEO4J_URI`\n\n, `NEO4J_USER`\n\n, and `NEO4J_PASSWORD`\n\nfrom the environment (or prompts via `getpass`\n\nif missing). If you press Enter at any prompt without typing, the section is disabled (`NEO4J_SECTION_ENABLED = False`\n\n) and the remaining cells skip cleanly.\n\n``` python\nimport os\nimport subprocess\nimport sys\nfrom getpass import getpass\nfrom urllib.parse import urlparse\n\nNEO4J_SECTION_ENABLED = True\n\ndef _ensure_pkg(pkg, import_name=None):\n    name = import_name or pkg\n    try:\n        __import__(name)\n        return True\n    except Exception:\n        print(f\"Installing {pkg} (only needed for Section 11)...\", flush=True)\n        rc = subprocess.call([sys.executable, \"-m\", \"pip\", \"install\", \"-q\", pkg])\n        if rc != 0:\n            print(f\"  pip install {pkg} failed (rc={rc}); Section 11 will be skipped.\")\n            return False\n        try:\n            __import__(name)\n            return True\n        except Exception as e:\n            print(f\"  Import still failing after install of {pkg}: {e}\")\n            return False\n\nfor _pkg, _imp in [(\"neo4j\", \"neo4j\"), (\"fastapi\", \"fastapi\"), (\"uvicorn\", \"uvicorn\")]:\n    if not _ensure_pkg(_pkg, _imp):\n        NEO4J_SECTION_ENABLED = False\n\nif not os.getenv(\"NEO4J_URI\"):\n    os.environ[\"NEO4J_URI\"] = getpass(\"Enter NEO4J_URI (e.g. neo4j://127.0.0.1:7687) or press Enter to skip: \")\nif not os.getenv(\"NEO4J_USER\"):\n    os.environ[\"NEO4J_USER\"] = getpass(\"Enter NEO4J_USER (default 'neo4j') or press Enter to skip: \") or \"neo4j\"\nif not os.getenv(\"NEO4J_PASSWORD\"):\n    os.environ[\"NEO4J_PASSWORD\"] = getpass(\"Enter NEO4J_PASSWORD or press Enter to skip: \")\n\nNEO4J_URI = (os.getenv(\"NEO4J_URI\") or \"\").strip()\nNEO4J_USER = (os.getenv(\"NEO4J_USER\") or \"\").strip()\nNEO4J_PASSWORD = os.getenv(\"NEO4J_PASSWORD\") or \"\"\n\ndef _normalize_neo4j_uri(uri):\n    parsed = urlparse(uri)\n    if parsed.scheme in {\"bolt\", \"bolt+ssc\", \"bolt+s\", \"neo4j\", \"neo4j+ssc\", \"neo4j+s\"}:\n        return uri\n    if parsed.scheme in {\"http\", \"https\"} and parsed.hostname in {\"127.0.0.1\", \"localhost\", \"::1\"}:\n        return f\"neo4j://{parsed.hostname}:7687\"\n    return uri\n\n_normalized_neo4j_uri = _normalize_neo4j_uri(NEO4J_URI)\nif _normalized_neo4j_uri != NEO4J_URI:\n    print(f\"Converted Neo4j browser URL {NEO4J_URI!r} to driver URI {_normalized_neo4j_uri!r}.\")\n    NEO4J_URI = _normalized_neo4j_uri\n    os.environ[\"NEO4J_URI\"] = NEO4J_URI\n\nif not (NEO4J_URI and NEO4J_USER and NEO4J_PASSWORD):\n    print(\"Neo4j credentials not fully provided. Section 11 will be skipped (other cells will short-circuit safely).\")\n    NEO4J_SECTION_ENABLED = False\nelse:\n    print(f\"Neo4j configured: {NEO4J_USER}@{NEO4J_URI}\")\n\nprint(f\"NEO4J_SECTION_ENABLED = {NEO4J_SECTION_ENABLED}\")\n```\n\n### 11.2) Step 1 - Seed: Define the Knowledge Graph Data\n\nBuilds the in-memory data structure for the graph: schemas, tables (with `description`\n\n, `primary_key`\n\n), columns (with `type`\n\n, `nullable`\n\n, `is_primary_key`\n\n, optional `description`\n\n, and a `semantic_meaning`\n\nplaceholder to be filled by AI in the next step), foreign keys, views, lineage edges (`DERIVED_FROM`\n\n), and joins.\n\nThis inline synthetic retail graph is aligned to the cookbook change request: `LOYALTY_TIER`\n\nis added to `ODS.ODS_CUSTOMER_PROFILE`\n\n, sourced from `CORE.DIM_CUSTOMER`\n\n, and propagated into downstream staging, core, mart, and CRM consumers.\n\n**No Neo4j calls here** - this cell only prepares Python data structures. Nothing is written until Step 3.\n\n```\nSCHEMAS = [\"ODS\", \"STG\", \"CORE\", \"MARTS\", \"CRM\"]\n\nTABLES = {\n    \"ODS.ODS_CUSTOMER_PROFILE\": {\n        \"description\": \"Raw customer profile table that receives LOYALTY_TIER in this change.\",\n        \"primary_key\": [\"CUSTOMER_ID\"],\n        \"columns\": [\n            {\"name\": \"CUSTOMER_ID\", \"type\": \"VARCHAR(32)\", \"nullable\": False,\n             \"description\": \"Stable customer identifier from the source system.\",\n             \"semantic_meaning\": \"customer-identifier\"},\n            {\"name\": \"EMAIL_HASH\", \"type\": \"VARCHAR(64)\", \"nullable\": True,\n             \"description\": \"Hashed email value used for matching without exposing PII.\"},\n            {\"name\": \"CUSTOMER_STATUS\", \"type\": \"VARCHAR(20)\", \"nullable\": True},\n            {\"name\": \"LOYALTY_TIER\", \"type\": \"VARCHAR(20)\", \"nullable\": True,\n             \"description\": \"Nullable loyalty segment added by the change request and backfilled from CORE.DIM_CUSTOMER.\",\n             \"semantic_meaning\": \"loyalty-segment\"},\n            {\"name\": \"UPDATED_AT\", \"type\": \"TIMESTAMP\", \"nullable\": True},\n            {\"name\": \"INGESTED_AT\", \"type\": \"TIMESTAMP\", \"nullable\": True},\n        ],\n    },\n    \"ODS.ODS_ORDER\": {\n        \"description\": \"Raw order header feed used to measure loyalty-tier revenue impact.\",\n        \"primary_key\": [\"ORDER_ID\"],\n        \"columns\": [\n            {\"name\": \"ORDER_ID\", \"type\": \"VARCHAR(40)\", \"nullable\": False,\n             \"semantic_meaning\": \"order-identifier\"},\n            {\"name\": \"ORDER_TS\", \"type\": \"TIMESTAMP\", \"nullable\": False},\n            {\"name\": \"CUSTOMER_ID\", \"type\": \"VARCHAR(32)\", \"nullable\": True},\n            {\"name\": \"ORDER_STATUS\", \"type\": \"VARCHAR(30)\", \"nullable\": True},\n            {\"name\": \"NET_AMOUNT\", \"type\": \"NUMERIC(12,2)\", \"nullable\": True},\n        ],\n    },\n    \"STG.STG_CUSTOMER_PROFILE\": {\n        \"description\": \"Staging table that normalizes customer profile rows for downstream dimensions and views.\",\n        \"primary_key\": [\"CUSTOMER_ID\"],\n        \"columns\": [\n            {\"name\": \"CUSTOMER_ID\", \"type\": \"VARCHAR(32)\", \"nullable\": False,\n             \"semantic_meaning\": \"customer-identifier\"},\n            {\"name\": \"CUSTOMER_STATUS\", \"type\": \"VARCHAR(20)\", \"nullable\": True},\n            {\"name\": \"LOYALTY_TIER\", \"type\": \"VARCHAR(20)\", \"nullable\": True,\n             \"description\": \"Propagated loyalty segment from ODS.ODS_CUSTOMER_PROFILE.\",\n             \"semantic_meaning\": \"loyalty-segment\"},\n            {\"name\": \"PROFILE_UPDATED_AT\", \"type\": \"TIMESTAMP\", \"nullable\": True},\n        ],\n    },\n    \"STG.STG_ORDER_ENRICHED\": {\n        \"description\": \"Staging order table enriched with customer status and loyalty tier for metrics.\",\n        \"primary_key\": [\"ORDER_ID\"],\n        \"columns\": [\n            {\"name\": \"ORDER_ID\", \"type\": \"VARCHAR(40)\", \"nullable\": False},\n            {\"name\": \"CUSTOMER_ID\", \"type\": \"VARCHAR(32)\", \"nullable\": True},\n            {\"name\": \"LOYALTY_TIER\", \"type\": \"VARCHAR(20)\", \"nullable\": True,\n             \"semantic_meaning\": \"loyalty-segment\"},\n            {\"name\": \"ORDER_TS\", \"type\": \"TIMESTAMP\", \"nullable\": False},\n            {\"name\": \"NET_AMOUNT\", \"type\": \"NUMERIC(12,2)\", \"nullable\": True},\n        ],\n    },\n    \"CORE.DIM_CUSTOMER\": {\n        \"description\": \"Conformed customer dimension and source for the LOYALTY_TIER backfill.\",\n        \"primary_key\": [\"CUSTOMER_SK\"],\n        \"columns\": [\n            {\"name\": \"CUSTOMER_SK\", \"type\": \"BIGINT\", \"nullable\": False,\n             \"semantic_meaning\": \"surrogate-key\"},\n            {\"name\": \"CUSTOMER_ID\", \"type\": \"VARCHAR(32)\", \"nullable\": False,\n             \"semantic_meaning\": \"customer-identifier\"},\n            {\"name\": \"EMAIL_HASH\", \"type\": \"VARCHAR(64)\", \"nullable\": True},\n            {\"name\": \"COUNTRY_CODE\", \"type\": \"VARCHAR(2)\", \"nullable\": True},\n            {\"name\": \"LOYALTY_TIER\", \"type\": \"VARCHAR(20)\", \"nullable\": True,\n             \"description\": \"Current loyalty segment used as the backfill source.\",\n             \"semantic_meaning\": \"loyalty-segment\"},\n            {\"name\": \"IS_CURRENT\", \"type\": \"BOOLEAN\", \"nullable\": False,\n             \"semantic_meaning\": \"current-row-flag\"},\n            {\"name\": \"VALID_FROM_TS\", \"type\": \"TIMESTAMP\", \"nullable\": True},\n            {\"name\": \"VALID_TO_TS\", \"type\": \"TIMESTAMP\", \"nullable\": True},\n        ],\n    },\n    \"CORE.DIM_LOYALTY_TIER\": {\n        \"description\": \"Reference dimension for loyalty tier labels, rank, and benefits.\",\n        \"primary_key\": [\"LOYALTY_TIER\"],\n        \"columns\": [\n            {\"name\": \"LOYALTY_TIER\", \"type\": \"VARCHAR(20)\", \"nullable\": False,\n             \"semantic_meaning\": \"loyalty-segment\"},\n            {\"name\": \"TIER_RANK\", \"type\": \"INTEGER\", \"nullable\": True},\n            {\"name\": \"TIER_DESCRIPTION\", \"type\": \"VARCHAR(255)\", \"nullable\": True},\n            {\"name\": \"ACTIVE_FLAG\", \"type\": \"BOOLEAN\", \"nullable\": True},\n        ],\n    },\n    \"CORE.FACT_ORDER\": {\n        \"description\": \"Order fact table used by revenue and customer 360 marts.\",\n        \"primary_key\": [\"ORDER_ID\"],\n        \"columns\": [\n            {\"name\": \"ORDER_ID\", \"type\": \"VARCHAR(40)\", \"nullable\": False},\n            {\"name\": \"CUSTOMER_SK\", \"type\": \"BIGINT\", \"nullable\": True},\n            {\"name\": \"ORDER_TS\", \"type\": \"TIMESTAMP\", \"nullable\": False},\n            {\"name\": \"NET_AMOUNT\", \"type\": \"NUMERIC(12,2)\", \"nullable\": True,\n             \"semantic_meaning\": \"monetary-amount\"},\n        ],\n    },\n    \"CORE.FACT_CUSTOMER_ACTIVITY\": {\n        \"description\": \"Daily customer activity fact used for retention and loyalty reporting.\",\n        \"primary_key\": [\"CUSTOMER_SK\", \"ACTIVITY_DATE\"],\n        \"columns\": [\n            {\"name\": \"CUSTOMER_SK\", \"type\": \"BIGINT\", \"nullable\": False},\n            {\"name\": \"ACTIVITY_DATE\", \"type\": \"DATE\", \"nullable\": False},\n            {\"name\": \"LOYALTY_TIER\", \"type\": \"VARCHAR(20)\", \"nullable\": True,\n             \"semantic_meaning\": \"loyalty-segment\"},\n            {\"name\": \"ORDER_COUNT\", \"type\": \"INTEGER\", \"nullable\": True},\n            {\"name\": \"NET_AMOUNT\", \"type\": \"NUMERIC(12,2)\", \"nullable\": True},\n        ],\n    },\n    \"CRM.CUSTOMER_SEGMENT_EXPORT\": {\n        \"description\": \"Activation export consumed by marketing journeys and retention campaigns.\",\n        \"primary_key\": [\"CUSTOMER_ID\"],\n        \"columns\": [\n            {\"name\": \"CUSTOMER_ID\", \"type\": \"VARCHAR(32)\", \"nullable\": False},\n            {\"name\": \"LOYALTY_TIER\", \"type\": \"VARCHAR(20)\", \"nullable\": True,\n             \"semantic_meaning\": \"loyalty-segment\"},\n            {\"name\": \"SEGMENT_CODE\", \"type\": \"VARCHAR(40)\", \"nullable\": True},\n            {\"name\": \"EXPORT_BATCH_ID\", \"type\": \"VARCHAR(40)\", \"nullable\": True},\n        ],\n    },\n}\n\nVIEWS = {\n    \"MARTS.VW_CUSTOMER_360\": {\"description\": \"Customer 360 view with profile, loyalty tier, and recent activity.\"},\n    \"MARTS.VW_LOYALTY_REVENUE\": {\"description\": \"Revenue by loyalty tier for dashboarding and finance checks.\"},\n    \"MARTS.VW_RETENTION_BY_TIER\": {\"description\": \"Retention metrics grouped by current loyalty tier.\"},\n}\n\n# (from_schema, from_table, from_col, to_schema, to_table, to_col)\nFOREIGN_KEYS = [\n    (\"ODS\",  \"ODS_ORDER\",              \"CUSTOMER_ID\",  \"ODS\",  \"ODS_CUSTOMER_PROFILE\", \"CUSTOMER_ID\"),\n    (\"STG\",  \"STG_CUSTOMER_PROFILE\",   \"CUSTOMER_ID\",  \"ODS\",  \"ODS_CUSTOMER_PROFILE\", \"CUSTOMER_ID\"),\n    (\"STG\",  \"STG_ORDER_ENRICHED\",     \"CUSTOMER_ID\",  \"STG\",  \"STG_CUSTOMER_PROFILE\", \"CUSTOMER_ID\"),\n    (\"CORE\", \"DIM_CUSTOMER\",           \"CUSTOMER_ID\",  \"ODS\",  \"ODS_CUSTOMER_PROFILE\", \"CUSTOMER_ID\"),\n    (\"CORE\", \"DIM_CUSTOMER\",           \"LOYALTY_TIER\", \"CORE\", \"DIM_LOYALTY_TIER\",    \"LOYALTY_TIER\"),\n    (\"CORE\", \"FACT_ORDER\",             \"CUSTOMER_SK\",  \"CORE\", \"DIM_CUSTOMER\",        \"CUSTOMER_SK\"),\n    (\"CORE\", \"FACT_CUSTOMER_ACTIVITY\", \"CUSTOMER_SK\",  \"CORE\", \"DIM_CUSTOMER\",        \"CUSTOMER_SK\"),\n    (\"CORE\", \"FACT_CUSTOMER_ACTIVITY\", \"LOYALTY_TIER\", \"CORE\", \"DIM_LOYALTY_TIER\",    \"LOYALTY_TIER\"),\n    (\"CRM\",  \"CUSTOMER_SEGMENT_EXPORT\", \"CUSTOMER_ID\",  \"ODS\",  \"ODS_CUSTOMER_PROFILE\", \"CUSTOMER_ID\"),\n]\n\nDERIVED_FROM = [\n    (\"ODS.ODS_CUSTOMER_PROFILE\",      \"CORE.DIM_CUSTOMER\"),\n    (\"STG.STG_CUSTOMER_PROFILE\",      \"ODS.ODS_CUSTOMER_PROFILE\"),\n    (\"STG.STG_ORDER_ENRICHED\",        \"ODS.ODS_ORDER\"),\n    (\"STG.STG_ORDER_ENRICHED\",        \"STG.STG_CUSTOMER_PROFILE\"),\n    (\"CORE.DIM_CUSTOMER\",             \"STG.STG_CUSTOMER_PROFILE\"),\n    (\"CORE.DIM_CUSTOMER\",             \"CORE.DIM_LOYALTY_TIER\"),\n    (\"CORE.FACT_ORDER\",               \"STG.STG_ORDER_ENRICHED\"),\n    (\"CORE.FACT_ORDER\",               \"CORE.DIM_CUSTOMER\"),\n    (\"CORE.FACT_CUSTOMER_ACTIVITY\",   \"CORE.FACT_ORDER\"),\n    (\"CORE.FACT_CUSTOMER_ACTIVITY\",   \"CORE.DIM_CUSTOMER\"),\n    (\"MARTS.VW_CUSTOMER_360\",         \"CORE.DIM_CUSTOMER\"),\n    (\"MARTS.VW_CUSTOMER_360\",         \"CORE.FACT_CUSTOMER_ACTIVITY\"),\n    (\"MARTS.VW_LOYALTY_REVENUE\",      \"CORE.FACT_ORDER\"),\n    (\"MARTS.VW_LOYALTY_REVENUE\",      \"CORE.DIM_LOYALTY_TIER\"),\n    (\"MARTS.VW_RETENTION_BY_TIER\",    \"CORE.FACT_CUSTOMER_ACTIVITY\"),\n    (\"MARTS.VW_RETENTION_BY_TIER\",    \"CORE.DIM_LOYALTY_TIER\"),\n    (\"CRM.CUSTOMER_SEGMENT_EXPORT\",   \"MARTS.VW_CUSTOMER_360\"),\n    (\"CRM.CUSTOMER_SEGMENT_EXPORT\",   \"MARTS.VW_RETENTION_BY_TIER\"),\n]\n\nJOINS = [\n    (\"STG.STG_ORDER_ENRICHED\",      \"STG.STG_CUSTOMER_PROFILE\"),\n    (\"CORE.FACT_ORDER\",             \"CORE.DIM_CUSTOMER\"),\n    (\"CORE.FACT_CUSTOMER_ACTIVITY\", \"CORE.DIM_CUSTOMER\"),\n    (\"CORE.FACT_CUSTOMER_ACTIVITY\", \"CORE.DIM_LOYALTY_TIER\"),\n    (\"MARTS.VW_CUSTOMER_360\",       \"CORE.DIM_CUSTOMER\"),\n    (\"MARTS.VW_LOYALTY_REVENUE\",    \"CORE.DIM_LOYALTY_TIER\"),\n    (\"MARTS.VW_RETENTION_BY_TIER\",  \"CORE.DIM_LOYALTY_TIER\"),\n]\n\nfor _tid, _meta in TABLES.items():\n    _pk = set(_meta.get(\"primary_key\", []) or [])\n    for _c in _meta[\"columns\"]:\n        _c.setdefault(\"is_primary_key\", _c[\"name\"] in _pk)\n        _c.setdefault(\"description\", None)\n        _c.setdefault(\"semantic_meaning\", None)\n\n_total_cols = sum(len(t[\"columns\"]) for t in TABLES.values())\n_total_pks = sum(len(t.get(\"primary_key\", []) or []) for t in TABLES.values())\nprint(f\"Seed prepared: {len(SCHEMAS)} schemas, {len(TABLES)} tables, {len(VIEWS)} views, \"\n      f\"{_total_cols} columns ({_total_pks} primary keys), {len(FOREIGN_KEYS)} FKs, \"\n      f\"{len(DERIVED_FROM)} DERIVED_FROM edges, {len(JOINS)} JOINS edges.\")\n```\n\n### 11.3) Step 2 - AI Enrichment\n\nUses the OpenAI `client`\n\nand `MODEL`\n\nalready initialized in Section 1 to generate a short `semantic_meaning`\n\ntag (2-5 words, e.g. `natural-key`\n\n, `foreign-key`\n\n, `monetary-amount`\n\n, `timestamp`\n\n, `descriptive-text`\n\n) for every column whose value is currently `None`\n\n. One LLM call per column, plain-text response. The prompt is defined inline so this cell remains self-contained.\n\nCost control: capped at ** MAX_ENRICH_COLS = 30** columns per run (override via env\n\n`SEED_AI_ENRICH_LIMIT`\n\n). Set `SEED_AI_ENRICH=0`\n\nto skip enrichment entirely. Per-column failures are caught and logged; the cell never raises. Skipped entirely if Section 11 was disabled in Step 0.\n\n```\nENRICH_SYSTEM = (\n    \"You are a data architect assistant. Your task is to provide concise semantic-meaning \"\n    \"tags (2-5 words) for database columns. Do not add any preamble or explanation.\"\n)\n\ndef _enrich_one(_client, _model, table_name, c):\n    user_prompt = (\n        f\"Provide a short 2-5 word semantic-meaning tag for a database column \"\n        f\"named '{c['name']}'. It is part of the table '{table_name}'. \"\n        f\"It has a data type of '{c.get('type','UNKNOWN')}'. \"\n        f\"Examples of valid tags: natural-key, foreign-key, surrogate-key, \"\n        f\"monetary-amount, timestamp, descriptive-text, category-code, count, boolean-flag. \"\n        f\"Return only the tag, with no quotes, no punctuation at the end, no extra prose.\"\n    )\n    try:\n        resp = _client.responses.create(\n            model=_model,\n            input=[\n                {\"role\": \"system\", \"content\": ENRICH_SYSTEM},\n                {\"role\": \"user\", \"content\": user_prompt},\n            ],\n        )\n        text = (getattr(resp, \"output_text\", None) or \"\").strip()\n        if not text:\n            for item in (getattr(resp, \"output\", []) or []):\n                for sub in getattr(item, \"content\", []) or []:\n                    t = getattr(sub, \"text\", None)\n                    if isinstance(t, str):\n                        text += t\n                    elif isinstance(sub, dict):\n                        text += sub.get(\"text\", \"\")\n        text = text.strip().strip('\"').strip(\"'\").strip(\".\")\n        return text or None\n    except Exception as e:\n        print(f\"  ! enrichment failed for {table_name}.{c['name']}: {type(e).__name__}: {e}\")\n        return None\n\ndef _run_ai_enrichment():\n    if not NEO4J_SECTION_ENABLED:\n        print(\"Section 11 disabled (see Step 0). Skipping AI enrichment.\")\n        return\n\n    enabled = (os.getenv(\"SEED_AI_ENRICH\", \"1\").strip().lower() in (\"1\", \"true\", \"yes\", \"on\"))\n    max_cols = int(os.getenv(\"SEED_AI_ENRICH_LIMIT\", \"30\"))\n\n    if not enabled:\n        print(\"AI enrichment disabled via SEED_AI_ENRICH=0. Columns will be upserted without semantic_meaning.\")\n        return\n\n    try:\n        _c = client\n        _m = MODEL\n    except NameError:\n        print(\"OpenAI client/MODEL not found. Run Section 1 first, then re-run this cell.\")\n        return\n\n    missing = []\n    for tid, meta in TABLES.items():\n        for col in meta[\"columns\"]:\n            if not col.get(\"semantic_meaning\"):\n                missing.append((tid, col))\n\n    cap = min(len(missing), max_cols)\n    print(f\"AI enrichment: {len(missing)} column(s) need semantic_meaning; processing first {cap} \"\n          f\"(cap via SEED_AI_ENRICH_LIMIT, model='{_m}').\")\n\n    updated = 0\n    for tid, col in missing[:cap]:\n        tag = _enrich_one(_c, _m, tid, col)\n        if tag:\n            col[\"semantic_meaning\"] = tag\n            updated += 1\n            print(f\n```\n\n", "url": "https://wpnews.pro/news/schemaflow-agentic-database-change-impact-analysis-sql-gen-and-eval-guardrails", "canonical_source": "https://developers.openai.com/cookbook/examples/partners/schemaflow_design_guide/schemaflow_cookbook", "published_at": "2026-06-14 05:35:43+00:00", "updated_at": "2026-06-14 05:59:52.569061+00:00", "lang": "en", "topics": ["ai-agents", "developer-tools", "large-language-models", "ai-tools"], "entities": ["OpenAI", "OpenAI Agents SDK", "SchemaFlow", "Promptfoo"], "alternates": {"html": "https://wpnews.pro/news/schemaflow-agentic-database-change-impact-analysis-sql-gen-and-eval-guardrails", "markdown": "https://wpnews.pro/news/schemaflow-agentic-database-change-impact-analysis-sql-gen-and-eval-guardrails.md", "text": "https://wpnews.pro/news/schemaflow-agentic-database-change-impact-analysis-sql-gen-and-eval-guardrails.txt", "jsonld": "https://wpnews.pro/news/schemaflow-agentic-database-change-impact-analysis-sql-gen-and-eval-guardrails.jsonld"}}