# AI credits are the new lines of code metric

> Source: <https://dev.to/pvgomes/ai-credits-are-the-new-lines-of-code-metric-4pgb>
> Published: 2026-06-21 00:01:45+00:00

GitHub added a tiny field to the Copilot usage metrics API this week that is going to create a lot of very confident spreadsheets.

Enterprise and organization admins can now see `ai_credits_used`

in the user-level Copilot usage reports. One field. Per user. Available for single-day and 28-day reports. It is not the invoice, and GitHub is careful to say it is a consumption signal rather than a billed total.

Still, the shape is obvious.

Now AI usage can sit next to adoption, activity, team, department, cost center, and whatever else the company already exports into a dashboard.

That is useful.

It is also exactly how a tool metric becomes a management metric.

And once that happens, the question is no longer "can we measure AI usage?"

The question is "what weird behavior will this metric create?"

I understand why this field exists.

If a company is paying for Copilot, especially with usage-based pieces attached to more expensive models and premium features, it needs some way to understand consumption. Platform teams need budget signals. Engineering leaders need adoption signals. Procurement needs something more concrete than "people seem to like it." Finance will eventually ask why one org burns through credits much faster than another.

That is normal.

The problem starts when a consumption signal is treated as a productivity signal.

High AI credit usage might mean a developer is doing valuable work with agent mode, code review, test generation, refactoring, or research. It might also mean the developer is stuck, repeatedly asking the model to solve the wrong problem, generating code that gets deleted, or using a heavyweight model where a small one would have been fine.

Low AI credit usage might mean a developer does not need much help. It might mean the work is mostly design, review, debugging, incident response, mentoring, or architecture. It might mean the codebase is small and well understood. It might mean the developer is skeptical. It might also mean the developer has not learned the tool yet.

The number alone does not know.

That is the first trap.

AI credits are not output.

They are input.

Software has a long history of measuring the thing that is easiest to count and then pretending it represents the thing we actually care about.

Lines of code. Commits. Pull requests. Story points. Tickets closed. Test coverage percentage. Build count. Deploy count. Review comments. Meeting hours. Slack messages. Keyboard activity, if you work somewhere especially cursed.

Some of those metrics are useful in context. None of them are engineering quality.

Lines of code are the classic example because everyone knows they are silly and people still accidentally reinvent them. A developer who deletes 3,000 lines of unnecessary code may have done the most valuable work of the quarter. A developer who adds 3,000 lines may have created six months of maintenance work.

The metric is not evil. The interpretation is.

AI credits have the same smell.

If a team uses them to understand budget, adoption, and tool behavior, good. If a team uses them to ask why a workflow is expensive, also good. If a team uses them to decide whether a department needs training, maybe good.

If a manager starts asking why Alice used 10x more credits than Bob, or why Carol used almost none, without looking at the work, the code, the reviews, and the outcomes, we are back in lines-of-code land with better branding.

The most interesting AI work is not always the most visible AI work.

A senior engineer might use Copilot heavily for one hour to explore three possible designs, then write the final change mostly by hand. Another engineer might spend an afternoon in agent mode producing a large pull request that reviewers reject because it missed a domain constraint. A third might use chat as a rubber duck during a tricky production incident and ship no code at all.

Which one was productive?

The credit number cannot answer that.

The credit number can tell you something was consumed.

It cannot tell you whether the work got better.

This distinction matters because AI tools make activity look very busy. Agents run commands. They edit files. They summarize. They retry. They generate tests. They open diffs. They can burn tokens while looking like they are making progress.

Sometimes they are.

Sometimes they are pacing around the same mistake with a nicer transcript.

If managers only see consumption, they will mistake motion for leverage.

The better question is not "who used the most AI?"

The better question is "where did AI usage change the work in a way we can defend?"

Did review time go down without defects going up? Did boring migrations become cheaper? Did flaky dependency upgrades get less painful? Did junior engineers get better feedback earlier? Did senior engineers spend less time on boilerplate and more time on design? Did incidents resolve faster? Did the team ship maintainable changes with fewer abandoned branches?

Those are harder questions.

That is why they are better.

I do not want to sound like the answer is "never measure this."

Please measure it.

AI cost has to become visible. Otherwise teams will discover the bill after habits have already formed.

If a new coding-agent workflow costs $4 per successful dependency upgrade, that might be wonderful. If it costs $180 because the agent keeps running the full integration suite, calling the largest model, and regenerating the same patch, someone should notice. If one repository burns credits because its build is slow, its tests are noisy, or its instructions are bad, that is useful platform feedback.

Per-user and per-team metrics can also reveal adoption gaps. Maybe one team is getting real value because it built good repository instructions and narrow workflows. Maybe another team is paying for seats nobody uses. Maybe a third team is using AI constantly but still rejecting most generated work.

All of that is worth knowing.

But the metric needs to stay attached to a workflow, not a moral judgment about the person.

The useful unit is often not "Paulo used 1,200 credits."

It is "the weekly dependency update workflow for service X used 1,200 credits, produced three pull requests, passed tests twice, needed one human rewrite, and saved roughly half a day of maintenance work."

That is an engineering conversation.

"Why did Paulo use 1,200 credits?" is a trap unless you already know what he was doing.

For agentic coding, I would like credit usage to show up next to the rest of the evidence.

Not as a leaderboard.

As a cost line in the work record.

An agent session should have an ID. It should link to the issue, branch, pull request, logs, tool calls, model choices, retries, test runs, and human approvals. Credit usage belongs there. It helps the team understand the actual cost of a workflow and compare it with the outcome.

For example:

That kind of measurement changes behavior in a good way. It pushes teams to design better workflows.

The bad version pushes teams to rank developers by how much AI they consumed.

One is platform engineering.

The other is cargo-cult management with an API.

The dangerous thing about metrics is that nobody has to announce the bad incentive.

At first, the dashboard is just informational. Then a leader asks why one team uses less Copilot than another. Then someone adds a target. Then managers start nudging people to "adopt AI more." Then a developer leaves the model running more often because the organization has made usage feel like modernity.

Or the incentive goes the other way.

Finance notices high consumption. A manager starts asking people to justify AI use. Engineers stop using the tool for exploratory work because it looks expensive. The team saves credits and loses leverage.

Both failures come from the same mistake: treating usage as the goal.

Usage is not the goal.

Better software is the goal.

Cheaper maintenance is the goal.

Faster feedback is the goal.

Less boring toil is the goal.

More reliable systems are the goal.

If AI credits help you understand those things, great. If they replace those things, you have built a productivity theater with nicer telemetry.

If I were responsible for an engineering org using Copilot broadly, I would still collect AI credit usage. I would just refuse to let it stand alone.

I would join it with workflow outcomes:

I would also look for places where high AI usage is a symptom.

Maybe the documentation is bad. Maybe the test suite is too slow. Maybe the service boundaries are unclear. Maybe onboarding is painful. Maybe the agent keeps rereading the same files because the repo has no useful map. Maybe developers are using chat to compensate for architecture nobody understands.

That is the part I find interesting.

AI credit usage may become a weird new observability signal for the developer experience itself.

Not "who is productive?"

"Where is the work expensive to understand?"

That is a much better question.

GitHub exposing `ai_credits_used`

is a reasonable product feature. Enterprises need budget visibility. Platform teams need consumption data. AI-assisted development cannot stay a mysterious line item forever.

But we should be honest about what the metric means.

AI credits measure consumption. They do not measure judgment, maintainability, leverage, taste, review quality, incident response, mentoring, or whether the final system got simpler.

So use the number.

Just do not worship it.

The teams that handle this well will treat AI credits like cloud cost: useful when tied to services, workflows, outcomes, and ownership.

The teams that handle it badly will reinvent lines of code, except this time the line goes through a model bill.

To test my projects, I use [Railway](https://railway.com?referralCode=G_jRmP). If you want $20 USD to get started, [use this link](https://railway.com?referralCode=G_jRmP).