# xAI retired 8 Grok models on May 15 — the slugs still resolve, so your bill and output quality changed silently

> Source: <https://dev.to/flarecanary/xai-retired-8-grok-models-on-may-15-the-slugs-still-resolve-so-your-bill-and-output-quality-26jd>
> Published: 2026-05-20 05:00:38+00:00

On **May 15, 2026 at 12:00 PM PT**, xAI retired eight model slugs from the Grok API:

`grok-4-1-fast-reasoning`

`grok-4-1-fast-non-reasoning`

`grok-4-fast-reasoning`

`grok-4-fast-non-reasoning`

`grok-4-0709`

`grok-code-fast-1`

`grok-3`

`grok-imagine-image-pro`

Here is the line from xAI's migration notice that makes this dangerous:

The slugs themselves continue to resolve, so you do not need to change your code to avoid breakage.

That sounds reassuring. It is the opposite of reassuring. "You do not need to change your code" is exactly why most teams *didn't* — and a retirement that requires no code change is a retirement that ships no signal. Nothing 404s. No SDK exception. No deploy. The same request you sent on May 14 still returns `200`

on May 16. What changed is underneath the slug, and none of the usual alarms are wired to it.

Here is the silent-fail surface we keep seeing on review.

## 1. `grok-code-fast-1`

now bills at grok-4.3 rates — and that's your highest-volume slug

`grok-code-fast-1`

was xAI's cheap, fast, coding-optimized model. Its entire reason to exist was running a lot of tokens for a little money — agentic coding loops, refactor passes, repo-wide edits, autocomplete backends. High call volume, low unit price. That's the slug people deliberately picked *because* it was cheap.

After May 15, requests to `grok-code-fast-1`

redirect to `grok-4.3`

, billed at grok-4.3's rate of **$1.25 per 1M input tokens and $2.50 per 1M output tokens** — flagship pricing, not the fast-tier pricing you chose. The redirect is the worst possible combination: it lands hardest on the slug with the highest token throughput, and it produces no error, no warning, no changed status code. The first signal is the invoice, and the invoice arrives weeks late.

If you run agentic coding on Grok, this is not a "review next sprint" item. Your cost per run changed on May 15 and your monitoring almost certainly didn't notice, because cost-per-token isn't something most teams alert on until finance asks a question.

## 2. The reasoning slugs are now answering at `low`

effort

The redirect is not a clean one-to-one swap. xAI maps the retired slugs onto grok-4.3 with a *reduced* reasoning setting:

- Every retired
**reasoning** slug (`grok-4-fast-reasoning`

,`grok-4-1-fast-reasoning`

) →`grok-4.3`

withreasoning effort.`low`

- Every retired
**non-reasoning** slug →`grok-4.3`

withreasoning effort.`none`

If you picked `grok-4-fast-reasoning`

specifically because a task needed the model to think — structured extraction, multi-step tool planning, anything where you traded latency for correctness — you are now getting `low`

effort by default. The model still answers. The answer is still well-formed JSON, still parses, still passes your schema validation. It's just measurably worse on the hard cases, and there is no field in the response that says "I thought less about this than I used to." Your eval suite is the only thing that would catch it, and only if you re-ran it after May 15 — which nobody schedules, because nothing told them to.

This is the textbook drift shape: a valid-looking response that is a correct answer to a *different question* than the one your code thinks it asked.

## 3. Cost-attribution dashboards now lie

A lot of teams tag spend by the model slug they send: a `model`

dimension on a metrics counter, a column in a usage table, a group-by in the monthly cost rollup. Those dashboards key off *the string you sent*, not the model that actually ran.

Post-May-15, your dashboard still shows a tidy line item for `grok-code-fast-1`

at the old unit price in your own math — while xAI bills the account at grok-4.3 rates. Internal cost attribution and the actual bill have silently diverged. Every "cost per feature" or "margin per customer" number that flows from that slug is now wrong, and it will stay wrong until someone reconciles the xAI invoice against the dashboard by hand and notices the totals don't match.

## 4. `grok-imagine-image-pro`

is a different image model now

`grok-imagine-image-pro`

redirects to `grok-imagine-image-quality`

. That is a different image model, not a renamed one. Anything downstream that made assumptions about the old model's output — dimensions, style, latency budget, cost per image, safety-filter behavior — is now feeding a different generator into the same pipeline with no version bump. Image pipelines are especially exposed here because the output "looks fine" to code; only a human comparing before/after notices the model changed.

## 5. Fallback chains lost their cheap degraded mode

Routers built during past provider incidents tend to look like this:

```
primary: grok-4.3
fallback:
  - grok-4-fast-non-reasoning   # cheap degraded mode
  - grok-3
```

The intent was: if the primary is rate-limited or down, drop to a cheaper model and keep serving. After May 15 both fallback entries resolve to `grok-4.3`

. The "cheap degraded mode" is now full-price grok-4.3 — so the exact moment you fail over under load is the exact moment your per-request cost jumps to flagship rates, with no error and no log line saying the cheap path is gone. Incident plus silent cost blowout, stacked.

## 6. Pinned eval baselines now track a moving target

If you run regression evals against a fixed model slug — standard practice for catching prompt regressions — you have `grok-4-fast-reasoning`

or similar hardcoded in the harness. That pin was the whole point: a stable baseline to diff prompt changes against.

After May 15 the pin resolves to `grok-4.3`

at `low`

effort. Your "stable baseline" moved. Every prompt-change diff you run against it from now on is measuring two variables at once — your prompt edit *and* a model swap you didn't make — and the harness has no idea, because the slug string in the config is unchanged.

## What to actually do

The migration itself is small. The detection is the hard part, because there is no schema diff to catch at review time and no error to alert on.

-
**Grep every repo, IaC file, notebook, and prompt config** for the retired slugs:

```
   git grep -nE "grok-(4-1-fast-(reasoning|non-reasoning)|4-fast-(reasoning|non-reasoning)|4-0709|code-fast-1|3|imagine-image-pro)"
```

Include eval harnesses, fallback/router configs, and cost-attribution code — not just your main call sites. Those three are where this hides.

**Pin** Don't keep riding the redirect. The redirect picks`grok-4.3`

explicitly and choose your reasoning effort.`low`

/`none`

for you; only an explicit`grok-4.3`

call with an explicit effort level (`none`

/`low`

/`medium`

/`high`

) puts the quality/cost tradeoff back in your hands.**Re-run your evals after switching**, and treat any pinned-baseline eval as invalidated as of May 15. Capture a fresh baseline against an explicit model+effort you control.**Reconcile one xAI invoice line by line** against your internal cost dashboard. If they don't match, your attribution is keying off the sent slug and needs to key off actual billed usage.**Add a cost-per-token alert**, not just a request-count alert. This entire class of failure is invisible to availability monitoring and visible only to spend monitoring.

The reason this one is worth a sprint and not a backlog ticket: every other model retirement this year threw an error eventually. This one is engineered specifically *not* to. "Your code keeps working" is the failure mode, not the mitigation.

[FlareCanary](https://flarecanary.com) watches your third-party APIs and SDKs for breaking changes like this one — including model retirements, silent slug redirects, and pricing-tier remaps — and surfaces them before the invoice does. Free tier monitors 5 endpoints.
