{"slug": "how-to-write-a-flutter-agent-skill-that-actually-works-the-2026-recipe", "title": "How to Write a Flutter Agent Skill That Actually Works: The 2026 Recipe", "summary": "A developer has published a recipe for writing effective Flutter agent skills, synthesizing official guidance from Flutter, Anthropic, Google, and OpenAI into a single copy-pasteable format. The recipe emphasizes that a great skill is a tightly scoped `SKILL.md` file with a description engineered for discovery, ruthless conciseness, anti-patterns stated upfront, a checklist workflow, and a feedback loop. The format is an open standard that works across Claude Code, OpenAI Codex, Google Antigravity, Gemini CLI, and Cursor.", "body_md": "TL;DRA great agent skill is not a pile of documentation. It is a tightly scoped`SKILL.md`\n\nwith a description engineered for discovery, ruthless conciseness, anti-patterns stated up front, a checklist workflow, and a feedback loop. The format is an open standard that works across Claude Code, OpenAI Codex, Google Antigravity, Gemini CLI, and Cursor. This post synthesizes the official authoring guidance fromFlutter, Anthropic, Google, and OpenAIinto one recipe, hands you a complete copy-pasteable Flutter skill, and shows you how to actually evaluate it instead of guessing.\n\nIn my last article, I wrote about the official Dart and Flutter Agent Skills and why they stop your AI from writing 2022 Flutter. The most common reply I got was some version of the same question:\n\n*\"Cool. How do I write my own?\"*\n\nSo I went and read the actual playbooks. Not the hot takes, the primary sources: Flutter's skill docs and eval framework, Anthropic's skill authoring best practices, Google's Antigravity skill docs, and OpenAI's Codex skill guide. The good news is they agree on almost everything. The better news is that the gap between a skill that works and a skill that gets silently ignored comes down to a handful of decisions, and most people get them wrong.\n\nHere is the recipe, Flutter-flavored.\n\nAI agents are generalists. They average across years of Flutter code, much of it deprecated, and hand you the most statistically common answer instead of the currently correct one. The Flutter team named this the **knowledge gap**: the framework ships features faster than language models can update their training data. Skills exist to close that gap by handing the agent a task-specific, expert workflow.\n\nBut here is what nobody tells you. A poorly written skill does not just fail to help. It actively costs you. Every skill's metadata sits in the agent's context budget at all times. A vague skill that never triggers is dead weight. A skill with a fuzzy description that triggers on the *wrong* tasks is worse, because now your agent is following the wrong playbook with full confidence.\n\nThe bar is not \"wrote some Markdown.\" The bar is \"the agent reliably finds it, trusts it, and follows it.\" Everything below is in service of that bar.\n\nA skill is the simplest possible thing: a folder with one required file.\n\n```\nbuilding-riverpod-async-screens/\n├── SKILL.md          # Required: metadata + instructions\n├── references/       # Optional: deep-dive docs loaded on demand\n├── examples/         # Optional: reference implementations\n├── scripts/          # Optional: scripts the agent runs, not reads\n└── assets/           # Optional: templates, images\n```\n\nThe `SKILL.md`\n\nitself is YAML frontmatter plus a Markdown body:\n\n```\n---\nname: building-riverpod-async-screens\ndescription: \"Build a Flutter screen that loads async data with Riverpod...\"\n---\n\n# Building Riverpod Async Screens\n\n[instructions go here]\n```\n\nThe magic that makes this scale is **progressive disclosure**. At startup the agent loads only the lightweight metadata (name, description, path) of every skill. It reads the full `SKILL.md`\n\nonly when a task matches, and it reads anything in `references/`\n\nor `examples/`\n\nonly when the body points it there. If you write Flutter, you already know this pattern: it is deferred loading for the context window. OpenAI, Anthropic, and Google all describe the exact same mechanism.\n\nThis is the part that makes writing a skill worth your time. `SKILL.md`\n\nis an open standard (published at agentskills.io, originated at Anthropic, since adopted across the ecosystem). One skill works almost everywhere:\n\n| Tool | Vendor | Where skills live |\n|---|---|---|\n| Claude Code | Anthropic |\n`.claude/skills/` (project), `~/.claude/skills/` (personal) |\n| OpenAI Codex | OpenAI |\n`.codex/skills/` (project), `~/.codex/skills/` or `~/.agents/skills/`\n|\n| Antigravity |\n`.agents/skills/` (workspace), `~/.gemini/antigravity/skills/` (global) |\n|\n| Gemini CLI |\n`SKILL.md` standard locations |\n|\n| Cursor / Copilot | Various | supported with manual placement |\n\nThe Flutter team's installer targets the cross-tool location directly:\n\n```\nnpx skills add flutter/skills --skill '*' --agent universal\n```\n\nThe `--agent universal`\n\nflag drops everything into `.agents/skills`\n\n, the folder compatible agents auto-discover. Write a skill once, and your whole team gets the same expertise regardless of which agent they prefer. Codex adds a distribution layer on top (it calls the authoring format a \"skill\" and the installable package a \"plugin\"), but the core file is identical.\n\nEvery official source converges on these. I have ordered them by how much they matter in practice.\n\nIf your skill does not trigger, it is almost never the instructions. It is the description. This is the single most important line in the entire file, because it is the only part the agent reads when deciding *whether to load your skill at all*, often choosing from 100+ candidates.\n\nThree rules from the official guidance:\n\nCompare:\n\n```\n# Weak: vague, no triggers, will rarely fire correctly\ndescription: Helps with Flutter screens.\n\n# Strong: what + when + triggers + boundary\ndescription: Build a Flutter screen that loads async data with Riverpod,\n  handling loading, error, and data states with AsyncValue. Use when\n  fetching from a repository or API and rendering spinners, retry UI, and\n  lists. Do not use for purely static screens with no async data.\n```\n\nAnthropic puts it perfectly: the context window is a public good. Your skill shares it with the system prompt, the conversation, every other skill's metadata, and the user's actual request. The default assumption must be that **the agent is already very smart**.\n\nDo not explain what Flutter is. Do not explain what a widget is. Do not define JSON. Challenge every sentence: does the agent really not know this? Keep the `SKILL.md`\n\nbody under 500 lines. If it grows past that, split it into `references/`\n\nfiles.\n\n``` php\n<!-- Bad: wastes tokens on what the model already knows -->\nFlutter is Google's UI toolkit. A widget is a building block of the UI.\nTo make a network call, you first need an HTTP client, which is a piece\nof software that...\n\n<!-- Good: assumes competence, gets to the point -->\nUse the `http` package for REST calls. Wrap responses in a typed model.\n```\n\nThis framing from Anthropic is the one most people miss. Think of the agent as a robot walking a path:\n\n`dart run build_runner build --delete-conflicting-outputs`\n\n. Do not modify the flags.\"Fragile, deterministic Flutter operations (code generation, migrations, platform config) want low freedom. Architectural and design decisions want high freedom. Most skills need a mix.\n\nThis is what makes the official Flutter skills so effective, and it is the ingredient that separates a senior skill from a junior one. Do not only say what to do. Ban the wrong instinct explicitly.\n\nThe official `flutter-build-responsive-layout`\n\nskill does exactly this. It does not just say \"be responsive.\" It says: do NOT switch layouts on `MediaQuery.orientationOf`\n\n, do NOT check for \"phone\" vs \"tablet\", do NOT lock orientation. Those negative rules are what stop the model from reaching for the plausible-but-wrong pattern it learned from a thousand old tutorials.\n\n```\n## Rules\n\n- Use `AsyncValue.when` to render data/loading/error. Never assume data is present.\n- Do NOT use `FutureBuilder` for server state. It re-runs on every rebuild\n  and causes duplicate network calls.\n- Do NOT swallow exceptions or show an infinite spinner on failure.\n```\n\nFor any multi-step task, give the agent a checklist it can copy into its response and tick off. This prevents skipped steps, which is the most common failure mode on complex work. Both Anthropic and Flutter's own skills use this pattern.\n\n```\n## Workflow\n\nCopy this checklist and track progress:\n\n- [ ] Define the immutable data model.\n- [ ] Add the repository method returning `Future<Model>`.\n- [ ] Create the provider that calls the repository.\n- [ ] Build the screen with `ref.watch` + `AsyncValue.when`.\n- [ ] Implement the error branch with a retry action.\n- [ ] Run `dart analyze` and fix everything. Repeat until clean.\n```\n\nThe highest-leverage pattern in the entire playbook: **run validator, fix errors, repeat.** Give the agent an objective check it can run and a rule to keep going until it passes. In Flutter, you have world-class validators for free.\n\n```\nAfter generating code, run `dart analyze`. If it reports issues, fix them\nand run it again. Only present the result when analysis is clean and\n`flutter test` passes.\n```\n\nThis single habit improves output quality more than almost anything else, because it converts \"looks right\" into \"provably compiles and lints clean.\"\n\nKeep the main `SKILL.md`\n\nas a lean overview and push depth into linked files. Three patterns, named by the Antigravity docs:\n\n`SKILL.md`\n\nonly. For focused, single-purpose skills.`SKILL.md`\n\n+ `references/`\n\n. For skills with deep API detail.`SKILL.md`\n\n+ `examples/`\n\n. For skills where output quality depends on seeing worked examples.Two rules when you split: keep references **one level deep** from `SKILL.md`\n\n(the agent may only partially read nested files), and add a **table of contents** to any reference file longer than 100 lines so the agent can see the full scope even on a partial read.\n\nNever write \"before August 2025, use the old API.\" It rots. Instead, put deprecated guidance in a collapsed \"old patterns\" section so the current path stays clean while history stays available.\n\n```\n## Old patterns\n\n<details>\n<summary>Why not FutureBuilder? (legacy)</summary>\n\n`FutureBuilder` re-runs its future on every rebuild unless cached, causing\nduplicate calls. Providers cache and dedupe by default. Prefer providers.\n</details>\n```\n\nIf output quality depends on style, show a complete input/output example rather than describing it. The model matches patterns far better than it follows prose. One correct, runnable Dart snippet anchors the entire skill.\n\nHere is a full, working skill that bundles every ingredient above. It targets a spot where AI agents reliably write outdated Flutter: loading async data. Drop this into `.agents/skills/building-riverpod-async-screens/SKILL.md`\n\nand it works in Claude Code, Codex, and Antigravity.\n\n```\n---\nname: building-riverpod-async-screens\ndescription: Build a Flutter screen that loads async data with Riverpod,\n  handling loading, error, and data states with AsyncValue. Use when\n  fetching from a repository, API, or database and rendering spinners,\n  retry UI, and lists. Do not use for static screens with no async data.\n---\n\n# Building Riverpod Async Screens\n\nWire an async data screen the way a senior Flutter dev would: a typed\nprovider, `AsyncValue` state handling, and explicit loading/error/data\nbranches. No raw `FutureBuilder`, no manual `setState` for server state,\nno swallowed errors.\n\n## Rules\n\n- Use a `FutureProvider` for read-only data, or an `AsyncNotifier` when the\n  screen also mutates state. Do NOT use `StatefulWidget` + `setState` for\n  server state.\n- Watch with `ref.watch` inside `build`. Use `ref.read` only inside callbacks.\n- Render all three states with `AsyncValue.when`. Never assume data exists.\n- Always give the error branch a retry path. Do NOT swallow exceptions or\n  show an infinite spinner on failure.\n- Keep shared providers in their own file: one feature, one providers file.\n\n## Workflow\n\nCopy this checklist and track progress:\n\n- [ ] Define the immutable data model.\n- [ ] Add the repository method returning `Future<Model>`.\n- [ ] Create a `FutureProvider` (or `AsyncNotifier`) that calls the repository.\n- [ ] Build the screen: `ref.watch` the provider, render with `AsyncValue.when`.\n- [ ] Implement the error branch with a retry that invalidates the provider.\n- [ ] Run `dart analyze` and fix all issues. Repeat until clean.\n\n## Example\n// product_providers.dart\nfinal productProvider = FutureProvider.autoDispose<List<Product>>((ref) async {\n  final repo = ref.watch(productRepositoryProvider);\n  return repo.fetchProducts();\n});\n\n// product_screen.dart\nclass ProductScreen extends ConsumerWidget {\n  const ProductScreen({super.key});\n\n  @override\n  Widget build(BuildContext context, WidgetRef ref) {\n    final products = ref.watch(productProvider);\n    return Scaffold(\n      appBar: AppBar(title: const Text('Products')),\n      body: products.when(\n        data: (items) => ListView.builder(\n          itemCount: items.length,\n          itemBuilder: (_, i) => ListTile(title: Text(items[i].name)),\n        ),\n        loading: () => const Center(child: CircularProgressIndicator()),\n        error: (err, _) => ErrorRetry(\n          message: 'Could not load products',\n          onRetry: () => ref.invalidate(productProvider),\n        ),\n      ),\n    );\n  }\n}\n```\n\nWhy not FutureBuilder? (legacy)\n\n`FutureBuilder`\n\nre-runs its future on every rebuild unless cached, causing\n\nduplicate network calls. Providers cache and dedupe by default. Prefer\n\nproviders for any server state.\n\n```\nNotice how much work the description does, how the rules ban wrong instincts before listing right ones, how the workflow ends in a validator loop, and how the example is complete enough to copy. That is the whole recipe in one file.\n\n## How to actually evaluate your skill\n\nThis is the step almost everyone skips, and it is the one that separates a skill that feels good from one that is good. Both Anthropic and the Flutter team are emphatic: do not trust vibes. Measure.\n\n### Build the evaluation first\n\nAnthropic calls this evaluation-driven development, and the order matters:\n\n1. **Find the gap.** Run the agent on a real task with no skill. Document exactly where it fails or writes outdated code.\n2. **Write three eval scenarios** that target those failures.\n3. **Establish a baseline.** Measure performance without the skill.\n4. **Write the minimum instructions** needed to pass.\n5. **Iterate.** Re-run, compare to baseline, refine.\n\nThis guarantees you are solving a real problem instead of documenting an imaginary one. A simple eval is just structured expectations:\n{\n  \"skills\": [\"building-riverpod-async-screens\"],\n  \"query\": \"Build a screen that loads the user's order history from OrderRepository and shows it in a list\",\n  \"expected_behavior\": [\n    \"Creates a FutureProvider or AsyncNotifier that calls OrderRepository, not a StatefulWidget with setState\",\n    \"Renders loading, error, and data states using AsyncValue.when\",\n    \"Includes a retry action in the error branch that invalidates the provider\",\n    \"Generated code passes `dart analyze` with no errors\"\n  ]\n}\n```\n\nThe Dart and Flutter teams run an experimental evals framework (open-sourced at the flutter/evals repository) built around **critical user journeys**: realistic developer tasks rather than toy prompts. They score on two axes, which is a great rubric to copy for your own skills:\n\n`dart analyze`\n\n, and pass the tests? Objective, machine-checkable.For your own skill, that translates to a dead-simple loop: run the task with and without the skill, then ask \"did the deterministic checks pass, and is the code meaningfully better?\" If the skill does not move either axis, it is not earning its context budget.\n\nAnthropic's most practical tip: develop the skill with one instance (call it the author) and test it with a fresh instance that has no memory of the conversation (the tester). The author helps you write and tighten the `SKILL.md`\n\n. The tester reveals what the instructions actually communicate to a cold agent. When the tester stumbles, bring the specific failure back to the author and refine. Repeat. This observe-refine-test loop is how the official skills were hardened, and it works because the model understands both how to write agent instructions and what an agent needs to receive.\n\nSkills can include scripts and reference external resources. That means an untrusted skill can introduce vulnerabilities or quietly exfiltrate data. Before you install a community skill, read it, the same way you would read a dependency before adding it to `pubspec.yaml`\n\n. For any skill that runs terminal commands or touches infrastructure, add an explicit \"Safety\" section documenting exactly what it does. Treat skills as code, because they are.\n\nWhen the official skills dropped, the Flutter corner of X and Reddit reacted the way it always does: screenshots, threads, and declarations that AI coding just changed again. I want to be straight, because the skeptics have a point worth hearing.\n\nMore than one experienced Flutter dev read the actual skill files and came away underwhelmed, noting the initial set is fairly thin and covers ground a competent dev already knows. That is fair. And it is also the wrong frame.\n\nA skill is not a magic file that makes your agent brilliant. It is a discipline. The value is not in any single skill the Flutter team shipped. It is in the workflow the format unlocks: codify a pattern once, evaluate it, refine it on a loop, and every future session inherits it. The teams that win with AI in 2026 are not the ones with the best model. They are the ones who got good at writing down what they already know, then testing that the agent actually follows it.\n\nThat is the real reason to learn this recipe. Not to consume the official skills, but to write the ones your team actually needs.\n\nBefore you commit a Flutter skill, verify:\n\n`SKILL.md`\n\nbody is under 500 lines; depth is in `references/`\n\nor `examples/`\n\n.`dart analyze`\n\n/ `flutter test`\n\n).**What is a Flutter agent skill?**\n\nA folder containing a `SKILL.md`\n\nfile that gives an AI coding agent task-specific, expert instructions for a Flutter or Dart workflow. It loads on demand via progressive disclosure, so it adds expertise without permanently bloating the context window.\n\n**What makes an agent skill good?**\n\nA precise, trigger-rich description (the single biggest factor), ruthless conciseness, explicitly stated anti-patterns, a checklist workflow, a validator feedback loop, and at least one complete example, all verified against evaluations rather than vibes.\n\n**How do I write the description so the skill actually triggers?**\n\nThird person, state both what the skill does and when to use it, front-load the trigger words a developer would type, and add a \"Do not use\" clause to prevent it firing on the wrong tasks.\n\n**How do I evaluate a skill?**\n\nBuild evals before writing docs. Run the task without the skill to establish a baseline, write three scenarios with expected behaviors, then measure deterministic correctness (compiles, passes `dart analyze`\n\nand tests) and qualitative quality against that baseline.\n\n**Does a skill I write for Claude Code work in Codex and Antigravity?**\n\nYes. `SKILL.md`\n\nis an open standard. Skills that stick to the core format (frontmatter plus Markdown instructions) work across Claude Code, Codex, Antigravity, Gemini CLI, and Cursor. Only advanced, tool-specific features need adjustment.\n\n**How is a skill different from a rules file or AGENTS.md?**\n\nRules and `AGENTS.md`\n\nare always-on, repository-wide instructions (setup commands, standards). A skill is loaded only when its description matches the current task. Use always-on files for global rules and short if/then triggers, and skills for specific, repeatable workflows.\n\n**How long should a SKILL.md be?**\n\nKeep the body under 500 lines. If it grows past that, move depth into one-level-deep `references/`\n\nfiles and keep the main file as a lean overview.\n\nThe official Dart and Flutter skills are a starting point, not the destination. The real unlock is the recipe behind them: a discovery-optimized description, concise expert instructions, anti-patterns stated out loud, a checklist, a validator loop, and an evaluation that proves it works. Get those right and you can encode your team's hardest-won Flutter patterns into something every agent on the team follows automatically.\n\nWrite one skill this week. Pick the task where your AI agent annoys you most, encode the correct pattern, and evaluate it against the no-skill baseline. Then tell me what you built. I read every comment, and I want to know which Flutter pattern you taught your agent first. 🥊\n\n*Sources: Flutter docs: Agent skills and AI Evaluations, flutter/skills, Anthropic: Skill authoring best practices, Google Antigravity skills docs, and OpenAI Codex: Agent skills.*", "url": "https://wpnews.pro/news/how-to-write-a-flutter-agent-skill-that-actually-works-the-2026-recipe", "canonical_source": "https://dev.to/sayed_ali_alkamel/how-to-write-a-flutter-agent-skill-that-actually-works-the-2026-recipe-2joi", "published_at": "2026-06-12 18:13:04+00:00", "updated_at": "2026-06-12 18:44:28.455197+00:00", "lang": "en", "topics": ["ai-agents", "ai-tools", "large-language-models", "artificial-intelligence"], "entities": ["Flutter", "Anthropic", "Google", "OpenAI", "Claude Code", "Codex", "Antigravity", "Gemini CLI"], "alternates": {"html": "https://wpnews.pro/news/how-to-write-a-flutter-agent-skill-that-actually-works-the-2026-recipe", "markdown": "https://wpnews.pro/news/how-to-write-a-flutter-agent-skill-that-actually-works-the-2026-recipe.md", "text": "https://wpnews.pro/news/how-to-write-a-flutter-agent-skill-that-actually-works-the-2026-recipe.txt", "jsonld": "https://wpnews.pro/news/how-to-write-a-flutter-agent-skill-that-actually-works-the-2026-recipe.jsonld"}}