Stanford introduces decentralized language model framework DeLM

Stanford researchers introduced Decentralized Language Models (DeLM), a multi-agent framework that enables AI agents to coordinate without a central controller, achieving a 10.5 percentage point improvement on SWE-bench Verified while cutting per-task costs by roughly 50% to $0.12.

Stanford introduces decentralized language model framework DeLM New multi-agent system lets AI coordinate without a central controller, cutting costs by half while boosting performance on key benchmarks Stanford researchers have built a framework that lets multiple AI agents work together without anyone calling the shots. The framework, called Decentralized Language Models DeLM , was detailed in an arXiv paper published June 9 by researchers Yuzhen Mao and Azalia Mirhoseini. It achieved a 10.5 percentage point improvement on a widely used coding benchmark while cutting per-task costs by roughly 50%, to about $0.12 per task. How DeLM actually works Most multi-agent AI systems today rely on a central orchestrator. One main model receives a task, breaks it into pieces, assigns those pieces to worker agents, collects results, and synthesizes a final answer. DeLM flips this architecture. Instead of a boss handing out assignments, agents asynchronously claim subtasks from a shared task queue. They also have access to a shared verified context, essentially a communal knowledge base that any agent can read from and contribute to. The system introduces several technical components to make this work reliably. Compression mechanisms keep the shared context from ballooning out of control. Verification gates ensure that information added to the shared pool meets quality thresholds before other agents can use it. And the asynchronous task queues mean agents don’t sit idle waiting for a central controller to tell them what to do next. The benchmark results On SWE-bench Verified, a benchmark that tests AI systems on real-world software engineering tasks pulled from GitHub repositories, DeLM posted a 10.5 percentage point increase in Pass@4 scores over leading baselines. Pass@4 measures whether at least one of four generated solutions correctly solves the problem. On LongBench-v2, which evaluates performance on tasks requiring extended context processing like multi-document question answering, DeLM delivered the highest average accuracy with gains of up to 5.7 percentage points over existing approaches. At roughly $0.12 per task, DeLM’s inference costs run about half of what comparable centralized systems charge. The project is open-source, with code and documentation available through a public GitHub repository and project page at https://yuzhenmao.github.io/DeLM/. Disclosure: This article was edited by Editorial Team. For more information on how we create and review content, see our Editorial Policy https://cryptobriefing.com/editorial-policy/ .