Skill Distillation

A developer has created a system called "skill distillation" that uses frontier AI models like Opus 4.7 and GPT-5.1 to write procedural skill files, which are then executed by smaller local models like Qwen 35B or Gemma 26B running on personal computers. The system, built around a personal agent called Pi, transfers procedural knowledge through markdown files rather than compressing model weights, allowing the smaller model to follow step-by-step instructions without needing to understand the underlying task. This approach creates inspectable, versionable, and hot-swappable skills that can be automatically generated, tested, and refined overnight based on historical logs.

I’ve been using state-of-the-art models to teach small models running on my computer how I work. My personal agent, based on Pi https://github.com/earendil-works/pi , runs my inbox, my deal pipeline, my blog publishing, my calendar, & my research. It looks less like a chatbot & more like a small operating system. The first layer is QMD , a local markdown knowledge base of about eighty workflow files in ~/memories . Before answering any procedural question, the agent searches QMD for the right playbook.The second layer is Skills , atomic SKILL.md files that describe one job each. The skills are written by a frontier model. So are the evaluations that grade them. The same system writes, tests, and rewrites each skill until accuracy converges. It also checks recall against QMD, so the right keywords always surface the right skill. The third layer is the Agent Loop , a model running Plan → Tool Call → Observe → Refine, calling out to seventeen Rust APIs & a handful of MCP integrations. One of the techniques I’ve started to use is skill distillation . A frontier model, Opus 4.7, GPT-5.1, Gemini 3 Pro, authors & refines the skill files. A smaller model, Qwen 35B or Gemma 26B running locally, executes them. The teacher transfers procedural knowledge to the student through markdown. The skill is inspectable, versionable, & hot-swappable. This is fundamentally different from classical knowledge distillation, which compresses a big model’s soft probability outputs into a smaller model’s weights. It’s different from instruction tuning, which bakes behavior into weights through prompt-response pairs. It’s different from RAG, which retrieves facts. Skill distillation retrieves procedures . The smaller model doesn’t have to know how to evaluate a company. It just has to know how to follow the steps. Every night a system runs through historical logs to understand what new skills should be generated, mirroring the loop that Pete Koomen described at Y Combinator https://www.youtube.com/watch?v=B246K G7mHU earlier this week. The frontier model becomes a teacher. The library becomes the company’s institutional knowledge. The student becomes whichever model happens to be cheapest this quarter.