You built a skill for your technology. API references, authentication flows, SDK patterns, error handling, version info, all packed into one skill. The agent calls it, gets all that context, and generates code. The kicker? You’ve just wasted a lot of tokens.
It already knows #
Models have ingested your documentation, your Stack Overflow answers, your GitHub repos, your blog posts. The default imports, the standard auth flow, the common CRUD operations: the model already has all of that baked in.
When your skill repeats what the model already knows, you’re not helping, you’re adding weight. Every token your skill returns occupies space in a finite context window, and those tokens aren’t neutral. They push out the stuff the model doesn’t know: workspace files, conversation history, or output from other tools.
Most skills suffer from the same problem. Stuffed with thousands of tokens of documentation, called by the agent, payload returned, and outcomes don’t improve. Sometimes they get worse, because the skill is doing work the model didn’t need help with.
How do you know what the model knows? #
You don’t, unless you measure. And most folks skip this step entirely. They go straight from “we have a technology” to “we need a skill” without checking what the model does on its own.
What if the model already generates correct code for your API 90% of the time? Then you need a lightweight skill that covers the remaining 10%: the auth quirk that trips people up, the breaking change that’s too recent for training data, the configuration pattern that looks like nothing else on the web. You can’t know which 10% to target if you haven’t measured the baseline.
Measure first, build second #
Start by running your scenarios without the skill. Same model, same harness, same prompts. See what the agent gets right and what it gets wrong, because that’s your baseline.
If the model handles CRUD correctly, don’t put CRUD examples in your skill. If auth flows work out of the box, don’t include your auth guide. If it picks the right SDK version, don’t waste tokens telling it which version to use. What’s left after you subtract the baseline? The patterns the model gets wrong or doesn’t know about at all. That’s your skill’s scope. Nothing more.
Every unnecessary token is drag #
Context windows have a fixed budget. A skill that returns 3,000 tokens of documentation the model already knows is burning context that could hold the developer’s workspace files, conversation thread, or output from another tool the model needs.
It gets worse when skills compose. Your developers have other skills installed, and each one claims tokens just by being present. Your oversized skill isn’t just dragging its own scenarios, it’s eating into the budget other skills need. You’re not just hurting your outcomes, you’re contributing to everyone else’s drag.
The lean skill #
- Define scenarios: the tasks developers actually ask agents to do with your technology.
- Run them without your skill, and score the outcomes.
- Identify where the model fails: those failures are your scope.
- Build a skill that addresses onlythe gaps. - Measure again. Confirm you’re producing lift, not drag. Also pay attention to the token count: lift at 3x the tokens is a net loss.
Do this and you’ll end up with skills a fraction of their original size that produce measurably better results. In many cases, models don’t need a textbook. They needed a cheat sheet.