# DoorDash Builds Open Data Architecture for Agentic AI

> Source: <https://letsdatascience.com/news/doordash-builds-open-data-architecture-for-agentic-ai-072f8fe4>
> Published: 2026-06-03 21:52:21.156352+00:00

# DoorDash Builds Open Data Architecture for Agentic AI

SiliconANGLE reports that **DoorDash** has spent roughly **10 years** building an open data architecture based on open storage, open compute and compute-agnostic design to support real-time logistics and emerging machine-led workflows. Jajoo, head of data engineering, data platform and business intelligence at DoorDash, said "the machine user is outpacing the human user in consumption of analytics data." SiliconANGLE and Snowflake executives credited **Apache Iceberg** with reducing data movement, lowering latency and cutting infrastructure cost, with Snowflake's Child saying, "It's cheaper. It's faster." Separately, ByteByteGo reports DoorDash built a LLM testing system to evaluate chatbots after its customer support assistant showed subtle hallucinations; ByteByteGo notes DoorDash handles "hundreds of thousands of support contacts every day." Editorial analysis: these two threads, an open, Iceberg-backed data estate and purpose-built LLM testing, illustrate how large logistics platforms combine data plumbing and evaluation tooling to scale agentic AI safely and cost-effectively.

### What happened

SiliconANGLE reports that **DoorDash** has spent about **10 years** constructing an open data architecture founded on open storage, open compute and compute-agnostic design to serve consumers, merchants and delivery workers across real-time logistics. Jajoo, head of data engineering, data platform and business intelligence at DoorDash, said "The ML features, the feedback loops to production services or the AI agent workflows are outpacing the analytics user." SiliconANGLE and Snowflake executives say DoorDash adopted **Apache Iceberg** across its estate to reduce data movement costs and latency; Snowflake's Child said, "It's cheaper. It's faster. They have lower latency on data because there's not these movement steps in between."

### Technical details

ByteByteGo reports that DoorDash built a LLM testing system after observing subtle hallucinations in its customer support chatbot, showing examples where the model misread order-status fields and recommended incorrect refund policies. ByteByteGo notes DoorDash handles "hundreds of thousands of support contacts every day," which motivated investment in automated testing for model deployments.

### Editorial analysis

Industry-pattern observations: at petabyte scale, organizations increasingly prefer open-data formats and compute-agnostic stacks to avoid repeatedly copying data between systems. Reporting on DoorDash aligns with broader enterprise moves to use table formats like **Apache Iceberg** to reduce egress, lower latency and let engineers focus on application logic rather than data plumbing.

### Context and significance

Editorial analysis: for practitioners the two stories together highlight two operational pillars required to scale agentic AI in production: (1) an open, queryable data substrate that minimizes expensive hops and keeps fresh state available to models, and (2) robust, scenario-driven testing and cost telemetry for LLM behavior in production. Both pillars address operational costs and model reliability but in different parts of the stack.

### What to watch

Industry context: observers should track (a) adoption of open table formats and compute-agnostic tooling across other logistics and marketplace platforms, and (b) whether more teams pair data-platform investments with systematic LLM testing and spend monitoring to catch subtle hallucinations and cost regressions early. Reporting does not include DoorDash public statements on future roadmap or hiring related to these efforts.

## Scoring Rationale

The story shows a notable infrastructure pattern for scaling agentic AI at enterprise scale: open data formats plus model testing. It is relevant to practitioners managing petabyte-scale data and production LLMs, but it is not a frontier-model or industry-changing release.

Practice interview problems based on real data

1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.

[Try 250 free problems](/problems)
