{"slug": "skill-distillation", "title": "Skill Distillation", "summary": "A developer has created a system called \"skill distillation\" that uses frontier AI models like Opus 4.7 and GPT-5.1 to write procedural skill files, which are then executed by smaller local models like Qwen 35B or Gemma 26B running on personal computers. The system, built around a personal agent called Pi, transfers procedural knowledge through markdown files rather than compressing model weights, allowing the smaller model to follow step-by-step instructions without needing to understand the underlying task. This approach creates inspectable, versionable, and hot-swappable skills that can be automatically generated, tested, and refined overnight based on historical logs.", "body_md": "I’ve been using state-of-the-art models to teach small models running on my computer how I work.\n\nMy personal agent, based on [Pi](https://github.com/earendil-works/pi), runs my inbox, my deal pipeline, my blog publishing, my calendar, & my research. It looks less like a chatbot & more like a small operating system.\n\nThe first layer is ** QMD**, a local markdown knowledge base of about eighty workflow files in\n\n`~/memories`\n\n. Before answering any procedural question, the agent searches QMD for the right playbook.The second layer is **Skills**, atomic `SKILL.md`\n\nfiles that describe one job each. The skills are written by a frontier model. So are the evaluations that grade them. The same system writes, tests, and rewrites each skill until accuracy converges. It also checks recall against QMD, so the right keywords always surface the right skill.\n\nThe third layer is the **Agent Loop**, a model running Plan → Tool Call → Observe → Refine, calling out to seventeen Rust APIs & a handful of MCP integrations.\n\nOne of the techniques I’ve started to use is **skill distillation**. A frontier model, Opus 4.7, GPT-5.1, Gemini 3 Pro, authors & refines the skill files. A smaller model, Qwen 35B or Gemma 26B running locally, executes them. The teacher transfers procedural knowledge to the student through markdown. The skill is inspectable, versionable, & hot-swappable.\n\nThis is fundamentally different from classical knowledge distillation, which compresses a big model’s soft probability outputs into a smaller model’s weights. It’s different from instruction tuning, which bakes behavior into weights through prompt-response pairs. It’s different from RAG, which retrieves facts.\n\nSkill distillation retrieves *procedures*. The smaller model doesn’t have to know how to evaluate a company. It just has to know how to follow the steps.\n\nEvery night a system runs through historical logs to understand what new skills should be generated, mirroring the loop that [Pete Koomen described at Y Combinator](https://www.youtube.com/watch?v=B246K_G7mHU) earlier this week.\n\nThe frontier model becomes a teacher. The library becomes the company’s institutional knowledge. The student becomes whichever model happens to be cheapest this quarter.", "url": "https://wpnews.pro/news/skill-distillation", "canonical_source": "https://www.tomtunguz.com/the-pi-agent-skill-distillation/", "published_at": "2026-05-29 00:00:00+00:00", "updated_at": "2026-05-29 17:35:51.655974+00:00", "lang": "en", "topics": ["artificial-intelligence", "machine-learning", "large-language-models", "ai-agents", "ai-tools"], "entities": ["Pi", "Opus 4.7", "GPT-5.1", "Gemini 3 Pro", "Qwen 35B", "Gemma 26B", "QMD", "Rust"], "alternates": {"html": "https://wpnews.pro/news/skill-distillation", "markdown": "https://wpnews.pro/news/skill-distillation.md", "text": "https://wpnews.pro/news/skill-distillation.txt", "jsonld": "https://wpnews.pro/news/skill-distillation.jsonld"}}