{"slug": "executable-schema-contracts-from-automatic-ingestion-to-multi-source-retrieval", "title": "Executable Schema Contracts: From Automatic Ingestion to Multi-Source Retrieval", "summary": "Researchers have developed a system that automatically discovers an executable schema from raw multi-source data, using it as a shared contract for knowledge graph construction and query-time retrieval. The system improves over retrieval-only and decomposition-based baselines across four QA benchmarks, with schema-conditioned routing, structural intelligence, and schema-guided construction each contributing to the gains.", "body_md": "arXiv:2606.05415v1 Announce Type: new\nAbstract: Real-world data spans tables, documents, and semi-structured files with implicit semantics. Querying this data requires integrating evidence across inconsistent schemas and formats, yet existing approaches either demand costly manual engineering or bypass structure entirely. We present a system that automatically discovers an executable schema from raw multi-source data and uses it as a shared contract for knowledge graph construction and query-time retrieval. A closed-world field catalog constrains LLM-based schema discovery to attested fields; deterministic structural analysis infers identity keys, foreign keys, and source hierarchy; and the resulting schema drives extraction, deduplication, and cross-source linking into a provenance-aware knowledge graph. At query time the schema -- optionally extended via a monotonic protocol -- conditions a multi-tool agent routing retrieval across structured lookup, graph traversal, and vector search, returning grounded answers with traceable citations. In controlled zero-shot comparisons using the same LLM, data, and evaluation harness, the system improves over retrieval-only and decomposition-based baselines across four QA benchmarks, with ablations showing that schema-conditioned routing, structural intelligence, and schema-guided construction each contribute to the gains.", "url": "https://wpnews.pro/news/executable-schema-contracts-from-automatic-ingestion-to-multi-source-retrieval", "canonical_source": "https://arxiv.org/abs/2606.05415", "published_at": "2026-06-05 04:00:00+00:00", "updated_at": "2026-06-05 04:21:30.542998+00:00", "lang": "en", "topics": ["artificial-intelligence", "large-language-models", "ai-agents", "natural-language-processing", "ai-research"], "entities": [], "alternates": {"html": "https://wpnews.pro/news/executable-schema-contracts-from-automatic-ingestion-to-multi-source-retrieval", "markdown": "https://wpnews.pro/news/executable-schema-contracts-from-automatic-ingestion-to-multi-source-retrieval.md", "text": "https://wpnews.pro/news/executable-schema-contracts-from-automatic-ingestion-to-multi-source-retrieval.txt", "jsonld": "https://wpnews.pro/news/executable-schema-contracts-from-automatic-ingestion-to-multi-source-retrieval.jsonld"}}