arXiv:2605.27571v1 Announce Type: new Abstract: Modern analytics systems are fundamentally reactive, requiring users to define queries over increasingly complex and continuously evolving data. In real-time streaming environments, this paradigm breaks down, as the space of potential insights becomes too large to enumerate manually. We present a multi-agent architecture for autonomous insight discovery over real-time data streams. The system implements a continuous discovery loop in which agents generate hypotheses, compile them into executable analytics, validate generated artifacts, and produce visualizations and deployable applications. The architecture leverages Apache Kafka for event-driven coordination, Apache Flink for stream processing, and large language models to implement specialized agents. A key contribution is a contract-driven design based on typed intermediate artifacts, enabling modularity, observability, lineage, and safer execution of dynamically generated analytics. Through use cases in retail, finance, and public data, we show how this architecture supports a shift from query-driven analytics to proactive, discovery-driven systems.
Discovery Agents for Real-Time Analytics: Toward Proactive Insight Systems
Researchers have developed a multi-agent architecture for autonomous insight discovery over real-time data streams, enabling proactive analytics rather than requiring users to manually define queries. The system uses Apache Kafka, Apache Flink, and large language models to generate hypotheses, compile analytics, and produce visualizations in a continuous discovery loop. This shift from query-driven to discovery-driven analytics addresses the challenge of identifying insights in complex, continuously evolving streaming environments across retail, finance, and public data applications.
Run your AI side-project on zahid.host
EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.