{"slug": "xai-retired-8-grok-models-on-may-15-the-slugs-still-resolve-so-your-bill-and", "title": "xAI retired 8 Grok models on May 15 — the slugs still resolve, so your bill and output quality changed silently", "summary": "On May 15, 2026, xAI retired eight Grok API model slugs, including popular ones like `grok-code-fast-1` and `grok-4-fast-reasoning`, but the slugs continue to resolve without breaking code, meaning requests still return a 200 status with no error or warning. However, the retired models silently redirect to `grok-4.3` at higher pricing and with reduced reasoning effort, causing unexpected cost increases and degraded output quality that are not immediately detectable through standard monitoring or dashboards.", "body_md": "On **May 15, 2026 at 12:00 PM PT**, xAI retired eight model slugs from the Grok API:\n\n`grok-4-1-fast-reasoning`\n\n`grok-4-1-fast-non-reasoning`\n\n`grok-4-fast-reasoning`\n\n`grok-4-fast-non-reasoning`\n\n`grok-4-0709`\n\n`grok-code-fast-1`\n\n`grok-3`\n\n`grok-imagine-image-pro`\n\nHere is the line from xAI's migration notice that makes this dangerous:\n\nThe slugs themselves continue to resolve, so you do not need to change your code to avoid breakage.\n\nThat sounds reassuring. It is the opposite of reassuring. \"You do not need to change your code\" is exactly why most teams *didn't* — and a retirement that requires no code change is a retirement that ships no signal. Nothing 404s. No SDK exception. No deploy. The same request you sent on May 14 still returns `200`\n\non May 16. What changed is underneath the slug, and none of the usual alarms are wired to it.\n\nHere is the silent-fail surface we keep seeing on review.\n\n## 1. `grok-code-fast-1`\n\nnow bills at grok-4.3 rates — and that's your highest-volume slug\n\n`grok-code-fast-1`\n\nwas xAI's cheap, fast, coding-optimized model. Its entire reason to exist was running a lot of tokens for a little money — agentic coding loops, refactor passes, repo-wide edits, autocomplete backends. High call volume, low unit price. That's the slug people deliberately picked *because* it was cheap.\n\nAfter May 15, requests to `grok-code-fast-1`\n\nredirect to `grok-4.3`\n\n, billed at grok-4.3's rate of **$1.25 per 1M input tokens and $2.50 per 1M output tokens** — flagship pricing, not the fast-tier pricing you chose. The redirect is the worst possible combination: it lands hardest on the slug with the highest token throughput, and it produces no error, no warning, no changed status code. The first signal is the invoice, and the invoice arrives weeks late.\n\nIf you run agentic coding on Grok, this is not a \"review next sprint\" item. Your cost per run changed on May 15 and your monitoring almost certainly didn't notice, because cost-per-token isn't something most teams alert on until finance asks a question.\n\n## 2. The reasoning slugs are now answering at `low`\n\neffort\n\nThe redirect is not a clean one-to-one swap. xAI maps the retired slugs onto grok-4.3 with a *reduced* reasoning setting:\n\n- Every retired\n**reasoning** slug (`grok-4-fast-reasoning`\n\n,`grok-4-1-fast-reasoning`\n\n) →`grok-4.3`\n\nwithreasoning effort.`low`\n\n- Every retired\n**non-reasoning** slug →`grok-4.3`\n\nwithreasoning effort.`none`\n\nIf you picked `grok-4-fast-reasoning`\n\nspecifically because a task needed the model to think — structured extraction, multi-step tool planning, anything where you traded latency for correctness — you are now getting `low`\n\neffort by default. The model still answers. The answer is still well-formed JSON, still parses, still passes your schema validation. It's just measurably worse on the hard cases, and there is no field in the response that says \"I thought less about this than I used to.\" Your eval suite is the only thing that would catch it, and only if you re-ran it after May 15 — which nobody schedules, because nothing told them to.\n\nThis is the textbook drift shape: a valid-looking response that is a correct answer to a *different question* than the one your code thinks it asked.\n\n## 3. Cost-attribution dashboards now lie\n\nA lot of teams tag spend by the model slug they send: a `model`\n\ndimension on a metrics counter, a column in a usage table, a group-by in the monthly cost rollup. Those dashboards key off *the string you sent*, not the model that actually ran.\n\nPost-May-15, your dashboard still shows a tidy line item for `grok-code-fast-1`\n\nat the old unit price in your own math — while xAI bills the account at grok-4.3 rates. Internal cost attribution and the actual bill have silently diverged. Every \"cost per feature\" or \"margin per customer\" number that flows from that slug is now wrong, and it will stay wrong until someone reconciles the xAI invoice against the dashboard by hand and notices the totals don't match.\n\n## 4. `grok-imagine-image-pro`\n\nis a different image model now\n\n`grok-imagine-image-pro`\n\nredirects to `grok-imagine-image-quality`\n\n. That is a different image model, not a renamed one. Anything downstream that made assumptions about the old model's output — dimensions, style, latency budget, cost per image, safety-filter behavior — is now feeding a different generator into the same pipeline with no version bump. Image pipelines are especially exposed here because the output \"looks fine\" to code; only a human comparing before/after notices the model changed.\n\n## 5. Fallback chains lost their cheap degraded mode\n\nRouters built during past provider incidents tend to look like this:\n\n```\nprimary: grok-4.3\nfallback:\n  - grok-4-fast-non-reasoning   # cheap degraded mode\n  - grok-3\n```\n\nThe intent was: if the primary is rate-limited or down, drop to a cheaper model and keep serving. After May 15 both fallback entries resolve to `grok-4.3`\n\n. The \"cheap degraded mode\" is now full-price grok-4.3 — so the exact moment you fail over under load is the exact moment your per-request cost jumps to flagship rates, with no error and no log line saying the cheap path is gone. Incident plus silent cost blowout, stacked.\n\n## 6. Pinned eval baselines now track a moving target\n\nIf you run regression evals against a fixed model slug — standard practice for catching prompt regressions — you have `grok-4-fast-reasoning`\n\nor similar hardcoded in the harness. That pin was the whole point: a stable baseline to diff prompt changes against.\n\nAfter May 15 the pin resolves to `grok-4.3`\n\nat `low`\n\neffort. Your \"stable baseline\" moved. Every prompt-change diff you run against it from now on is measuring two variables at once — your prompt edit *and* a model swap you didn't make — and the harness has no idea, because the slug string in the config is unchanged.\n\n## What to actually do\n\nThe migration itself is small. The detection is the hard part, because there is no schema diff to catch at review time and no error to alert on.\n\n-\n**Grep every repo, IaC file, notebook, and prompt config** for the retired slugs:\n\n```\n   git grep -nE \"grok-(4-1-fast-(reasoning|non-reasoning)|4-fast-(reasoning|non-reasoning)|4-0709|code-fast-1|3|imagine-image-pro)\"\n```\n\nInclude eval harnesses, fallback/router configs, and cost-attribution code — not just your main call sites. Those three are where this hides.\n\n**Pin** Don't keep riding the redirect. The redirect picks`grok-4.3`\n\nexplicitly and choose your reasoning effort.`low`\n\n/`none`\n\nfor you; only an explicit`grok-4.3`\n\ncall with an explicit effort level (`none`\n\n/`low`\n\n/`medium`\n\n/`high`\n\n) puts the quality/cost tradeoff back in your hands.**Re-run your evals after switching**, and treat any pinned-baseline eval as invalidated as of May 15. Capture a fresh baseline against an explicit model+effort you control.**Reconcile one xAI invoice line by line** against your internal cost dashboard. If they don't match, your attribution is keying off the sent slug and needs to key off actual billed usage.**Add a cost-per-token alert**, not just a request-count alert. This entire class of failure is invisible to availability monitoring and visible only to spend monitoring.\n\nThe reason this one is worth a sprint and not a backlog ticket: every other model retirement this year threw an error eventually. This one is engineered specifically *not* to. \"Your code keeps working\" is the failure mode, not the mitigation.\n\n[FlareCanary](https://flarecanary.com) watches your third-party APIs and SDKs for breaking changes like this one — including model retirements, silent slug redirects, and pricing-tier remaps — and surfaces them before the invoice does. Free tier monitors 5 endpoints.", "url": "https://wpnews.pro/news/xai-retired-8-grok-models-on-may-15-the-slugs-still-resolve-so-your-bill-and", "canonical_source": "https://dev.to/flarecanary/xai-retired-8-grok-models-on-may-15-the-slugs-still-resolve-so-your-bill-and-output-quality-26jd", "published_at": "2026-05-20 05:00:38+00:00", "updated_at": "2026-05-20 05:38:48.110301+00:00", "lang": "en", "topics": ["artificial-intelligence", "large-language-models", "developer-tools", "products", "enterprise-software"], "entities": ["xAI", "Grok", "Grok API"], "alternates": {"html": "https://wpnews.pro/news/xai-retired-8-grok-models-on-may-15-the-slugs-still-resolve-so-your-bill-and", "markdown": "https://wpnews.pro/news/xai-retired-8-grok-models-on-may-15-the-slugs-still-resolve-so-your-bill-and.md", "text": "https://wpnews.pro/news/xai-retired-8-grok-models-on-may-15-the-slugs-still-resolve-so-your-bill-and.txt", "jsonld": "https://wpnews.pro/news/xai-retired-8-grok-models-on-may-15-the-slugs-still-resolve-so-your-bill-and.jsonld"}}