Financial Times reports that Amazon has taken down an internal AI usage leaderboard after employees allegedly began "tokenmaxxing", running low-value AI calls to inflate usage scores. India Today reports the leaderboard was called "KiroRank" on Amazon's internal Kiro developer platform and that senior vice-president Dave Treadwell told staff, "Please don't use AI just for the sake of using AI," according to people familiar with the matter. India Today reports Amazon has moved from raw token counts to a metric it calls "normalised deployments" to measure meaningful AI-driven work, a change framed as a cost-control and adoption-quality adjustment. Financial Times originally reported the episode and its internal consequences.
What happened
Financial Times reports Amazon removed an internal leaderboard that ranked employees by AI usage after staff allegedly began "tokenmaxxing," the practice of running unnecessary AI calls to inflate usage statistics. India Today reports the leaderboard was referred to internally as "KiroRank" and that the ranking measured token consumption on the company's Kiro developer platform. India Today attributes a direct quote to senior vice-president Dave Treadwell: "Please don't use AI just for the sake of using AI." India Today reports that Amazon has started using a different metric called "normalised deployments" that focuses on whether AI is being used to produce useful work rather than raw token volume.
Editorial analysis - technical context
The reported shift away from tracking raw token consumption reflects a common measurement problem in AI adoption programs. Companies that measure usage volume rather than output quality often create incentives to maximize the metric itself, a phenomenon described in organisational literature as Goodhart's law. Industry coverage of this episode notes that internal agentic tools like Amazon's MeshClaw, reported by Human Resources Director to automate tasks across systems, increase both the surface area for automation and the potential for low-value calls that inflate token counts.
Context and significance
Industry context: The episode arrives as commercial AI pricing models increasingly expose organisations to variable, consumption-linked costs. Reporting links wider market trends, including shifts by providers to consumption-based pricing, to growing concern about operational AI spend. For practitioners, the case is a concrete example of how metric design intersects with cost control, developer incentives, and governance when teams deploy agentic automation at scale.
What to watch
Industry context: Observers should track how companies instrument AI adoption metrics after this episode. Metrics to watch include: the emergence of normalized, output-focused adoption signals (for example, task completion rates per deployment), rollout of rate-limiting or quota controls at platform level, and changes in internal visibility (team-wide leaderboards versus private dashboards). Also watch vendor billing models and whether organisations increasingly use hybrid pricing or on-premise inference to cap marginal costs.
For practitioners
Editorial analysis: Engineering and product teams designing internal AI platforms should expect to face trade-offs between visibility, incentives, and cost. Experience reported in coverage suggests that making usage visible without coupling it to clear value definitions can encourage gaming. Instrumentation that ties AI calls to measurable business outcomes, sampling and auditing of agent actions, and conservative default quotas are common patterns companies employ to reduce low-value consumption.
Limitations of reporting
Financial Times provided the initial reporting; India Today and Human Resources Director cite the FT and internal sources for additional details. None of the cited coverage includes a public, on-the-record statement from Amazon explaining the rationale beyond the internal quote attributed to Dave Treadwell, and the FT articles are behind a paywall for fuller verification.
Scoring Rationale #
This story is a notable practitioner-level lesson about AI adoption metrics and cost control at scale. It matters for teams building internal AI platforms and governance, but it is not a frontier-model or industry-shaking regulatory event.
Practice with real Ad Tech data
90 SQL & Python problems · 15 industry datasets
[Active Search Campaigns by BudgetEasy](/problems/sql/active-search-campaigns-by-budget)
[High CPC Clicks & Poor Landing PagesMedium](/problems/sql/high-cpc-clicks-poor-landing-page)
[Campaign ROAS by Attribution ModelHard](/problems/sql/campaign-roas-by-attribution-model)
250 free problems · No credit card