cd /news/artificial-intelligence/amazon-removes-ai-leaderboard-after-… · home topics artificial-intelligence article
[ARTICLE · art-17232] src=letsdatascience.com pub= topic=artificial-intelligence verified=true sentiment=↓ negative

Amazon removes AI leaderboard after tokenmaxxing

Amazon removed an internal AI usage leaderboard called "KiroRank" after employees allegedly engaged in "tokenmaxxing," running unnecessary AI calls to inflate their usage scores. Senior vice-president Dave Treadwell told staff not to use AI for its own sake, according to people familiar with the matter. Amazon has since replaced raw token counts with a "normalised deployments" metric to measure meaningful AI-driven work, a cost-control and adoption-quality adjustment.

read3 min publishedMay 29, 2026

Financial Times reports that Amazon has taken down an internal AI usage leaderboard after employees allegedly began "tokenmaxxing", running low-value AI calls to inflate usage scores. India Today reports the leaderboard was called "KiroRank" on Amazon's internal Kiro developer platform and that senior vice-president Dave Treadwell told staff, "Please don't use AI just for the sake of using AI," according to people familiar with the matter. India Today reports Amazon has moved from raw token counts to a metric it calls "normalised deployments" to measure meaningful AI-driven work, a change framed as a cost-control and adoption-quality adjustment. Financial Times originally reported the episode and its internal consequences.

What happened

Financial Times reports Amazon removed an internal leaderboard that ranked employees by AI usage after staff allegedly began "tokenmaxxing," the practice of running unnecessary AI calls to inflate usage statistics. India Today reports the leaderboard was referred to internally as "KiroRank" and that the ranking measured token consumption on the company's Kiro developer platform. India Today attributes a direct quote to senior vice-president Dave Treadwell: "Please don't use AI just for the sake of using AI." India Today reports that Amazon has started using a different metric called "normalised deployments" that focuses on whether AI is being used to produce useful work rather than raw token volume.

Editorial analysis - technical context

The reported shift away from tracking raw token consumption reflects a common measurement problem in AI adoption programs. Companies that measure usage volume rather than output quality often create incentives to maximize the metric itself, a phenomenon described in organisational literature as Goodhart's law. Industry coverage of this episode notes that internal agentic tools like Amazon's MeshClaw, reported by Human Resources Director to automate tasks across systems, increase both the surface area for automation and the potential for low-value calls that inflate token counts.

Context and significance

Industry context: The episode arrives as commercial AI pricing models increasingly expose organisations to variable, consumption-linked costs. Reporting links wider market trends, including shifts by providers to consumption-based pricing, to growing concern about operational AI spend. For practitioners, the case is a concrete example of how metric design intersects with cost control, developer incentives, and governance when teams deploy agentic automation at scale.

What to watch

Industry context: Observers should track how companies instrument AI adoption metrics after this episode. Metrics to watch include: the emergence of normalized, output-focused adoption signals (for example, task completion rates per deployment), rollout of rate-limiting or quota controls at platform level, and changes in internal visibility (team-wide leaderboards versus private dashboards). Also watch vendor billing models and whether organisations increasingly use hybrid pricing or on-premise inference to cap marginal costs.

For practitioners

Editorial analysis: Engineering and product teams designing internal AI platforms should expect to face trade-offs between visibility, incentives, and cost. Experience reported in coverage suggests that making usage visible without coupling it to clear value definitions can encourage gaming. Instrumentation that ties AI calls to measurable business outcomes, sampling and auditing of agent actions, and conservative default quotas are common patterns companies employ to reduce low-value consumption.

Limitations of reporting

Financial Times provided the initial reporting; India Today and Human Resources Director cite the FT and internal sources for additional details. None of the cited coverage includes a public, on-the-record statement from Amazon explaining the rationale beyond the internal quote attributed to Dave Treadwell, and the FT articles are behind a paywall for fuller verification.

Scoring Rationale #

This story is a notable practitioner-level lesson about AI adoption metrics and cost control at scale. It matters for teams building internal AI platforms and governance, but it is not a frontier-model or industry-shaking regulatory event.

Practice with real Ad Tech data

90 SQL & Python problems · 15 industry datasets

[Active Search Campaigns by BudgetEasy](/problems/sql/active-search-campaigns-by-budget)

[High CPC Clicks & Poor Landing PagesMedium](/problems/sql/high-cpc-clicks-poor-landing-page)

[Campaign ROAS by Attribution ModelHard](/problems/sql/campaign-roas-by-attribution-model)

250 free problems · No credit card

See all Ad Tech problems

── more in #artificial-intelligence 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/amazon-removes-ai-le…] indexed:0 read:3min 2026-05-29 ·