SANA: What Matters for QA Agents over Massive Data Lakes?

Researchers introduced SANA, a diagnostic framework for evaluating LLM agents performing exploratory question answering over data lakes. Testing on two benchmarks revealed data analysis as a consistent bottleneck, with search limitations significant in large data lakes but less so in smaller ones. SANA enables systematic identification of failures in search, planning, data analysis, and agent policy.

arXiv:2606.13904v1 Announce Type: new Abstract: Exploratory question answering EQA over data lakes requires an LLM agent to discover relevant sources, analyze retrieved data, and adapt its actions based on intermediate results. End-to-end accuracy alone cannot distinguish failures in search, planning, data analysis, or the agent's Action Policy: its decisions about what to do next and when to submit an answer. We present SANA Search Agent Navigation Ablation framework , a diagnostic ablation framework that transforms EQA tasks into runtime profiles containing gold source sequence, sanitized subquestions, and execution records. SANA uses these profiles to construct idealized search, planning, and data-analysis tools, allowing each component to be ablated; the residual gap is diagnostic evidence for policy failures. To illustrate SANA as a reusable evaluation framework, we adapted two recent EQA benchmarks, LakeQA and KramaBench, and evaluated lightweight and mid-sized agents under fixed prompts, budgets, data lakes, and runtimes. Across both benchmarks, data analysis is a consistent bottleneck while planning is less so. Search is a major limitation in LakeQA's large data-lake setting, but less so for the smaller-scale KramaBench. SANA thus deconstructs end-to-end task accuracies into a diagnosis of where data-lake agents fail, and allows for systematic comparisons of progress in search, planning, data analysis, and agent design.