CONCORD: Asynchronous Sparse Aggregation for Device-Cloud RAG under Document Isolation

wpnews.pro

cd /news/large-language-models/concord-asynchronous-sparse-aggregat… · home › topics › large-language-models › article

[ARTICLE · art-28940] src=arxiv.org ↗ pub=2026-06-16T04:00Z topic=large-language-models verified=true sentiment=↑ positive

CONCORD: Asynchronous Sparse Aggregation for Device-Cloud RAG under Document Isolation

Researchers propose CONCORD, an asynchronous sparse aggregation framework for device-cloud retrieval-augmented generation under document isolation, which improves throughput by up to 2.15x and reduces communication by over two orders of magnitude while maintaining answer quality.

read1 min views25 publishedJun 16, 2026

arXiv:2606.15179v1 Announce Type: new Abstract: Retrieval-augmented generation (RAG) has emerged as a pivotal technique for improving language models by incorporating external knowledge at inference time. As device-cloud collaborative inference makes it feasible to deploy small language models on edge devices, a new setting arises in which private documents remain on the device and public knowledge resides in the cloud. Privacy and policy constraints often forbid raw document exchange, creating a document-isolated dual-end RAG setting. However, existing methods rely on frequent remote synchronization and dense evidence transfer, limiting throughput under realistic latency and bandwidth conditions. To address this issue, we propose CONCORD, an asynchronous sparse aggregation framework for dual-end RAG under document isolation. CONCORD treats the cloud as an asynchronously arriving evidence source rather than a continuously synchronized co-generator. Specifically, we introduce waiting debt control to decide whether each decoding step should continue waiting for remote participation based on the observed return of waiting. We also design a certificate-guided minimal supplementation mechanism that requests only the remote evidence needed to determine the current greedy decision. Steps that consult the cloud preserve the same greedy token as dense dual-end aggregation, while the remaining steps commit locally without remote evidence. Experiments on Natural Questions and WikiText-2 show that CONCORD improves end-to-end throughput over baselines by $1.66\times$ and $2.15\times$, respectively, while reducing per-token communication by over two orders of magnitude and maintaining comparable answer quality and perplexity.

source & further reading

arxiv.org — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/concord-asynchronous-spa…

Read original on arxiv.org → arxiv.org/abs/2606.15179

mentioned entities

CONCORD

Natural Questions

WikiText-2

metadata

slugconcord-asynchronous-sparse-aggregation-for-device-cloud-rag-under-document

topic#large-language-models

secondary2 topics

sentimentpositive

canonicalarxiv.org

navigation

← prevShould you buy a Mac mini now or…

next →Could a diamond wafer as wide as…

── more in #large-language-models 4 stories · sorted by recency

runtimewire.com · 3 Aug · #large-language-models

Oumi, Larridin and Runware build tools to cut enterprise AI bills

promptcube3.com · 3 Aug · #large-language-models

Amazon's $50B OpenAI Investment: The Real Power Play

insideai.news · 3 Aug · #large-language-models

Alibaba Launches Qwen3.8-Max, a 2.4 Trillion Parameter AI Model

dev.to · 3 Aug · #large-language-models

AI Pricing This Week: DeepSeek Gets Cheaper, Claude Sonnet 5 Gets Pricier

── more on @concord 3 stories trending now

wpnews · 2 Aug · #artificial-intelligence

I Ran 8 AI APIs Through the Same 50 Prompts — Here's the Real Cost Breakdown

wpnews · 2 Aug · #developer-tools

Agent-Browser – Browser Automation for AI

wpnews · 2 Aug · #artificial-intelligence

Payment Rail vs. Settlement Layer: What AEON's Coinbase x402 Partnership Actually Validates

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required