{"slug": "tokenmaxxing-is-so-over-it-s-all-about-modelmaxxing-now", "title": "Tokenmaxxing is so over. It's all about modelmaxxing now.", "summary": "AI startups and Big Tech companies are shifting from 'tokenmaxxing'—encouraging maximum AI usage—to 'modelmaxxing,' a cost-saving strategy of routing simple tasks to cheaper models and complex ones to premium models. Bold Metrics CTO Morgan Linton and Coinbase CEO Brian Armstrong are among those advocating this approach to cut AI spending as companies impose usage caps.", "body_md": "Twice a week, Morgan Linton tells his 16 engineers which AI models to use and when.\n\nBusiness Insider spoke to Linton, the Lake Tahoe-based chief technology officer of AI startup Bold Metrics, 50 minutes before his engineering team's standup. He planned to tell one team to use [Claude Fable](https://www.businessinsider.com/anthropic-disable-mythos-fable-us-export-control-national-security-2026-6) on low, and another to use GPT-5.5 on high. A third is using Cursor with Composer 2.5 and getting \"totally perfect results,\" he said.\n\nBeing specific about model use means Linton doesn't have to set hard token caps.\n\n\"My team is getting to use the best stuff, but they're using it a lot more efficiently,\" he said.\n\nThe first half of 2026 was characterized by one word in the AI community: tokenmaxxing — referring to companies urging their employees to use** **AI as much as possible. But after [reviewing the AI bills](https://www.businessinsider.com/cfo-power-brokers-ai-era-2026-6) their employees were racking up, companies from [Uber](https://www.businessinsider.com/uber-coo-andrew-macdonald-ai-token-spending-harder-justify-2026-5) to Microsoft are taking a more considered approach.\n\nFounders, software engineers, UX designers, and even non-technical [vibe-coding enthusiasts](https://www.businessinsider.com/build-app-no-coding-skills-ai-lovable-replit-claude) are catching on to one cost-saving hack: model switching. They route their most difficult, intellectually challenging tasks to pricier frontier models and offload easier, repetitive tasks to older and cheaper ones.\n\nAnd as companies cut back on AI budgets and impose usage caps, this token hygiene tactic could help you get more bang for your buck.\n\n**Goodbye, tokenmaxxing**\n\nThere are, of course, good reasons to use the most recent model. OpenAI's Kaylin Voss wrote on LinkedIn that better models \"reduce retries, supervision, and wasted effort.\"\n\nBut some tasks simply don't merit the costs. [Coinbase CEO Brian Armstrong](https://www.businessinsider.com/coinbase-ceo-ai-cost-savings-strategy-token-costs-2026-6) was one of the first to put it into words in an X post on June 7.\n\n\"80% of workloads will be running on 99% cheaper models within 12-18 months,\" he wrote, adding that the other 20% will continue to run on the latest models where \"IQ maxxing is important.\"\n\nChris Maconi was never a fan of tokenmaxxing. The Huntsville-based cofounder of the AI startup Hechura said he runs his company with a \"human-in-the-loop\" attitude, and isn't setting up overnight bots to keep on coding. Model choice is part of this [anti-tokenmaxxing outlook](https://www.businessinsider.com/pylon-ceo-tokenmaxxing-era-coming-to-end-ai-spend-limits-2026-6).\n\nMaconi remembers the OpenClaw hype cycle — a [Mac Mini-encapsulated AI agent](https://www.businessinsider.com/apple-mac-mini-having-a-moment-openclaw-craze-2026-2) that was especially token-burning, given its 24/7 use and broad autonomy. When he set up his OpenClaw, Maconi started with cheap Gemini models before switching to Anthropic's Haiku.\n\n\"I'm not afraid to go and try some of these lower-end models to see if they can provide the intelligence that we need,\" Maconi said.\n\n**Stretching their tokens in creative ways**\n\nTanvi Pisal, a 29-year-old Big Tech user-experience designer, said she learned the hard way to use models more efficiently.\n\nPisal uses tools such as [Figma](https://www.businessinsider.com/figma-stock-sinks-google-vibe-design-stitch-ai-tool-2026-3), ChatGPT, and Claude to brainstorm and formulate product requirements documents. She has a company subscription to ChatGPT and pays for the basic $20/month Claude Pro package. At the start, she said she would use Claude to brainstorm the UX from scratch, a process in which she \"wasted months of tokens\" and still didn't finish the task.\n\n\"So now what I do is I design everything in Figma first, then I put those screenshots into Claude. I tell Claude to keep the UI as is and build the entire functionality and flow,\" Pisal added. \"Doing this design-first process really helps me save tokens.\"\n\nShe also chooses to brainstorm ideas with ChatGPT — which is free for her thanks to her enterprise plan — then takes the refined ideas to Claude to create more polished documents.\n\nAlejandra Thomas, a software engineer and tech content creator based in New York City, said she runs tests on every new model released to see what each is good at.\n\n\"I try not to use the most expensive or advanced model just because it's available. For simple tasks, I always use lighter models or none at all,\" Thomas said.\n\nEd Stevens, the CEO of AI sales company Scoot, said that he likes to \"pick a horse and ride it.\" His engineers will land on a model, try it for a few months, and then determine if it's up to snuff. If there's a shiny new model — or if they think they can achieve the same for cheaper — they change horses, Stevens said.\n\nThe idea of squeezing the juice out of each token exemplifies the scarcity mindset, according to Dan Ariely, a behavioral economics researcher and professor at Duke University.\n\nAriely said token budgets remind him of cellphones back in the day, when they came with a limited number of minutes of talk time. He said people would try to max out their minutes at the end of the month, even if that meant calling people they didn't really want to.\n\n\"Tokens create a model of scarcity where people can't use as much as they want. It creates a target for use, and it creates a psychology of waste if people don't reach their target,\" he said. He added that because they don't want to go over the limit and pay extra per use, users switch to models from other companies to save on cash once they've hit the token ceiling.\n\n**There's a tool for that**\n\nIf AI modelmaxxing sounds exhausting, the good news is you don't have to make these switching decisions on your own.\n\nModel routing startups are all the rage. These companies provide software that designates tasks to specific models — sometimes including open-source — based on complexity. They're a venture hit, with startups like OpenRouter being [showered with cash](https://www.businessinsider.com/ai-routing-startups-openrouter-concentrate-funding-boom-2026-6).\n\nDavid Gilmore runs one of these companies, Rayline. His tool intercepts requests and determines whether they could go to cheaper, often open-source models. Many of his firm's clients fall prey to the \"FOMO moment,\" he said. Then, they get their API bill and realize they need to scale back.\n\nThe number of firms using a routing platform is inching up, Ramp's lead economist, Ara Kharazian, [told Business Insider](https://www.businessinsider.com/ai-token-economy-spending-workplace-budgets-usage-caps-software-engineer-2026-6). Last year, Kharazian found that around 1% of firms used a model router; this year, it's 5%.\n\nThe San Francisco-based investment firm BlockSpaceForce uses OpenRouter, Fireworks, and Together AI. Spencer Yang, its managing partner, also advocated asking a cheaper model first whether a more expensive one would be needed for your task.\n\n\"The models themselves are actually getting really good at assessing their own complexity,\" Yang said.\n\nSome companies continue to default to using the most recent, highest-costing models. Hecura cofounder Maconi pegged it to laziness.\n\n\"People don't want to do the hard work of understanding which models are good at which things,\" he said. \"They just want to ride the hype train.\"", "url": "https://wpnews.pro/news/tokenmaxxing-is-so-over-it-s-all-about-modelmaxxing-now", "canonical_source": "https://www.businessinsider.com/ai-model-routing-modelmaxxing-efficient-token-use-2026-7", "published_at": "2026-07-04 08:30:01+00:00", "updated_at": "2026-07-04 08:58:33.375610+00:00", "lang": "en", "topics": ["artificial-intelligence", "ai-products", "ai-infrastructure", "ai-startups", "ai-tools"], "entities": ["Morgan Linton", "Bold Metrics", "Claude", "GPT-5.5", "Cursor", "Coinbase", "Brian Armstrong", "Hechura"], "alternates": {"html": "https://wpnews.pro/news/tokenmaxxing-is-so-over-it-s-all-about-modelmaxxing-now", "markdown": "https://wpnews.pro/news/tokenmaxxing-is-so-over-it-s-all-about-modelmaxxing-now.md", "text": "https://wpnews.pro/news/tokenmaxxing-is-so-over-it-s-all-about-modelmaxxing-now.txt", "jsonld": "https://wpnews.pro/news/tokenmaxxing-is-so-over-it-s-all-about-modelmaxxing-now.jsonld"}}