{"slug": "autotts-reduces-token-usage-by-69-5-in-llm-reasoning-strategies", "title": "AutoTTS reduces token usage by 69.5% in LLM reasoning strategies", "summary": "Researchers from Meta, Google, the University of Maryland, University of Virginia, Washington University in St. Louis, and the University of North Carolina developed AutoTTS, a framework that automatically discovers optimal reasoning strategies for large language models. The framework reduced token consumption by 69.5% during inference compared to a strong baseline method while maintaining accuracy, and the entire discovery process cost $39.90 and took 160 minutes.", "body_md": "# AutoTTS reduces token usage by 69.5% in LLM reasoning strategies\n\nResearchers from Meta, Google, and top universities built a framework that automatically discovers cheaper ways for AI models to think, and it cost less than $40 to run.\n\nA coalition of researchers from Meta, Google, and four major universities just figured out how to make large language models nearly 70% cheaper to run during their most compute-intensive phase. The framework, called AutoTTS, automates the discovery of optimal reasoning strategies instead of relying on humans to hand-tune them.\n\nIn practical terms, that means AI models can maintain their accuracy while burning through far fewer tokens during inference, the stage where models actually generate answers.\n\n## How AutoTTS works, and why it matters\n\nTest-time scaling, or TTS, is a technique that gives language models extra compute cycles when they’re generating responses. More thinking time generally means better answers, but it also means higher costs.\n\nThe problem has always been that TTS strategies were handcrafted. Human researchers would manually dictate how a model should allocate its reasoning budget, relying on intuition and trial-and-error. AutoTTS flips that process on its head.\n\nThe framework, detailed in a May 2026 arXiv paper titled “LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling,” reframes TTS design as controller synthesis over pre-collected reasoning trajectories. Instead of humans writing the rules, a coding agent explores a pre-built offline environment to find the most efficient reasoning strategies automatically.\n\nThat coding agent is Claude Code, which navigates what the researchers call an “offline replay environment” using reasoning trajectories and probe signals to identify effective controllers. The framework uses beta parameterization and fine-grained execution trace feedback to make the search process efficient.\n\nThe results are striking. Compared to SC@64, a strong baseline method, AutoTTS achieved an approximate 69.5% reduction in aggregate token consumption at a beta value of roughly 0.5. Despite slashing token usage by more than two-thirds, it maintained an average held-out accuracy of 45.3, compared to 45.2 for the baseline.\n\nThe entire discovery process cost $39.9 and took about 160 minutes.\n\n## Who built it and how well it generalizes\n\nThe research is a collaborative effort between the University of Maryland, University of Virginia, Washington University in St. Louis, University of North Carolina, Google, and Meta.\n\nThe strategies AutoTTS discovers transfer across different model scales and to held-out benchmarks including AIME24/25 and HMMT25, which are mathematical reasoning tests commonly used to evaluate advanced LLM capabilities.\n\nThe code and data have been made publicly available on GitHub under the repository zhengkid/AutoTTS.\n\n**Disclosure:** This article was edited by Editorial Team. For more information on how we create and review content, see our\n\n[Editorial Policy](https://cryptobriefing.com/editorial-policy/).", "url": "https://wpnews.pro/news/autotts-reduces-token-usage-by-69-5-in-llm-reasoning-strategies", "canonical_source": "https://cryptobriefing.com/autotts-reduces-llm-token-usage/", "published_at": "2026-05-28 21:46:00+00:00", "updated_at": "2026-05-28 22:07:42.249615+00:00", "lang": "en", "topics": ["large-language-models", "artificial-intelligence", "machine-learning", "ai-research", "ai-infrastructure"], "entities": ["Meta", "Google", "AutoTTS"], "alternates": {"html": "https://wpnews.pro/news/autotts-reduces-token-usage-by-69-5-in-llm-reasoning-strategies", "markdown": "https://wpnews.pro/news/autotts-reduces-token-usage-by-69-5-in-llm-reasoning-strategies.md", "text": "https://wpnews.pro/news/autotts-reduces-token-usage-by-69-5-in-llm-reasoning-strategies.txt", "jsonld": "https://wpnews.pro/news/autotts-reduces-token-usage-by-69-5-in-llm-reasoning-strategies.jsonld"}}