How to Write a Flutter Agent Skill That Actually Works: The 2026 Recipe

wpnews.pro

TL;DRA great agent skill is not a pile of documentation. It is a tightly scopedSKILL.md

with a description engineered for discovery, ruthless conciseness, anti-patterns stated up front, a checklist workflow, and a feedback loop. The format is an open standard that works across Claude Code, OpenAI Codex, Google Antigravity, Gemini CLI, and Cursor. This post synthesizes the official authoring guidance fromFlutter, Anthropic, Google, and OpenAIinto one recipe, hands you a complete copy-pasteable Flutter skill, and shows you how to actually evaluate it instead of guessing.

In my last article, I wrote about the official Dart and Flutter Agent Skills and why they stop your AI from writing 2022 Flutter. The most common reply I got was some version of the same question:

"Cool. How do I write my own?"

So I went and read the actual playbooks. Not the hot takes, the primary sources: Flutter's skill docs and eval framework, Anthropic's skill authoring best practices, Google's Antigravity skill docs, and OpenAI's Codex skill guide. The good news is they agree on almost everything. The better news is that the gap between a skill that works and a skill that gets silently ignored comes down to a handful of decisions, and most people get them wrong.

Here is the recipe, Flutter-flavored.

AI agents are generalists. They average across years of Flutter code, much of it deprecated, and hand you the most statistically common answer instead of the currently correct one. The Flutter team named this the knowledge gap: the framework ships features faster than language models can update their training data. Skills exist to close that gap by handing the agent a task-specific, expert workflow.

But here is what nobody tells you. A poorly written skill does not just fail to help. It actively costs you. Every skill's metadata sits in the agent's context budget at all times. A vague skill that never triggers is dead weight. A skill with a fuzzy description that triggers on the wrong tasks is worse, because now your agent is following the wrong playbook with full confidence.

The bar is not "wrote some Markdown." The bar is "the agent reliably finds it, trusts it, and follows it." Everything below is in service of that bar.

A skill is the simplest possible thing: a folder with one required file.

building-riverpod-async-screens/
├── SKILL.md          # Required: metadata + instructions
├── references/       # Optional: deep-dive docs loaded on demand
├── examples/         # Optional: reference implementations
├── scripts/          # Optional: scripts the agent runs, not reads
└── assets/           # Optional: templates, images

The SKILL.md

itself is YAML frontmatter plus a Markdown body:

---
name: building-riverpod-async-screens
description: "Build a Flutter screen that loads async data with Riverpod..."
---


[instructions go here]

The magic that makes this scale is progressive disclosure. At startup the agent loads only the lightweight metadata (name, description, path) of every skill. It reads the full SKILL.md

only when a task matches, and it reads anything in references/

or examples/

only when the body points it there. If you write Flutter, you already know this pattern: it is deferred for the context window. OpenAI, Anthropic, and Google all describe the exact same mechanism.

This is the part that makes writing a skill worth your time. SKILL.md

is an open standard (published at agentskills.io, originated at Anthropic, since adopted across the ecosystem). One skill works almost everywhere:

Tool	Vendor	Where skills live
Claude Code	Anthropic
`.claude/skills/` (project), `~/.claude/skills/` (personal)
OpenAI Codex	OpenAI
`.codex/skills/` (project), `~/.codex/skills/` or `~/.agents/skills/`

Antigravity
`.agents/skills/` (workspace), `~/.gemini/antigravity/skills/` (global)

Gemini CLI
`SKILL.md` standard locations

Cursor / Copilot	Various	supported with manual placement

The Flutter team's installer targets the cross-tool location directly:

npx skills add flutter/skills --skill '*' --agent universal

The --agent universal

flag drops everything into .agents/skills

, the folder compatible agents auto-discover. Write a skill once, and your whole team gets the same expertise regardless of which agent they prefer. Codex adds a distribution layer on top (it calls the authoring format a "skill" and the installable package a "plugin"), but the core file is identical.

Every official source converges on these. I have ordered them by how much they matter in practice.

If your skill does not trigger, it is almost never the instructions. It is the description. This is the single most important line in the entire file, because it is the only part the agent reads when deciding whether to load your skill at all, often choosing from 100+ candidates.

Three rules from the official guidance:

Compare:

description: Helps with Flutter screens.

description: Build a Flutter screen that loads async data with Riverpod,
  handling , error, and data states with AsyncValue. Use when
  fetching from a repository or API and rendering spinners, retry UI, and
  lists. Do not use for purely static screens with no async data.

Anthropic puts it perfectly: the context window is a public good. Your skill shares it with the system prompt, the conversation, every other skill's metadata, and the user's actual request. The default assumption must be that the agent is already very smart.

Do not explain what Flutter is. Do not explain what a widget is. Do not define JSON. Challenge every sentence: does the agent really not know this? Keep the SKILL.md

body under 500 lines. If it grows past that, split it into references/

files.

<!-- Bad: wastes tokens on what the model already knows -->
Flutter is Google's UI toolkit. A widget is a building block of the UI.
To make a network call, you first need an HTTP client, which is a piece
of software that...

<!-- Good: assumes competence, gets to the point -->
Use the `http` package for REST calls. Wrap responses in a typed model.

This framing from Anthropic is the one most people miss. Think of the agent as a robot walking a path:

dart run build_runner build --delete-conflicting-outputs

. Do not modify the flags."Fragile, deterministic Flutter operations (code generation, migrations, platform config) want low freedom. Architectural and design decisions want high freedom. Most skills need a mix.

This is what makes the official Flutter skills so effective, and it is the ingredient that separates a senior skill from a junior one. Do not only say what to do. Ban the wrong instinct explicitly.

The official flutter-build-responsive-layout

skill does exactly this. It does not just say "be responsive." It says: do NOT switch layouts on MediaQuery.orientationOf

, do NOT check for "phone" vs "tablet", do NOT lock orientation. Those negative rules are what stop the model from reaching for the plausible-but-wrong pattern it learned from a thousand old tutorials.

## Rules

- Use `AsyncValue.when` to render data//error. Never assume data is present.
- Do NOT use `FutureBuilder` for server state. It re-runs on every rebuild
  and causes duplicate network calls.
- Do NOT swallow exceptions or show an infinite spinner on failure.

For any multi-step task, give the agent a checklist it can copy into its response and tick off. This prevents skipped steps, which is the most common failure mode on complex work. Both Anthropic and Flutter's own skills use this pattern.

## Workflow

Copy this checklist and track progress:

- [ ] Define the immutable data model.
- [ ] Add the repository method returning `Future<Model>`.
- [ ] Create the provider that calls the repository.
- [ ] Build the screen with `ref.watch` + `AsyncValue.when`.
- [ ] Implement the error branch with a retry action.
- [ ] Run `dart analyze` and fix everything. Repeat until clean.

The highest-leverage pattern in the entire playbook: run validator, fix errors, repeat. Give the agent an objective check it can run and a rule to keep going until it passes. In Flutter, you have world-class validators for free.

After generating code, run `dart analyze`. If it reports issues, fix them
and run it again. Only present the result when analysis is clean and
`flutter test` passes.

This single habit improves output quality more than almost anything else, because it converts "looks right" into "provably compiles and lints clean."

Keep the main SKILL.md

as a lean overview and push depth into linked files. Three patterns, named by the Antigravity docs:

SKILL.md

only. For focused, single-purpose skills.SKILL.md

references/

. For skills with deep API detail.SKILL.md

examples/

. For skills where output quality depends on seeing worked examples.Two rules when you split: keep references one level deep from SKILL.md

(the agent may only partially read nested files), and add a table of contents to any reference file longer than 100 lines so the agent can see the full scope even on a partial read.

Never write "before August 2025, use the old API." It rots. Instead, put deprecated guidance in a collapsed "old patterns" section so the current path stays clean while history stays available.

## Old patterns

<details>
<summary>Why not FutureBuilder? (legacy)</summary>

`FutureBuilder` re-runs its future on every rebuild unless cached, causing
duplicate calls. Providers cache and dedupe by default. Prefer providers.
</details>

If output quality depends on style, show a complete input/output example rather than describing it. The model matches patterns far better than it follows prose. One correct, runnable Dart snippet anchors the entire skill.

Here is a full, working skill that bundles every ingredient above. It targets a spot where AI agents reliably write outdated Flutter: async data. Drop this into .agents/skills/building-riverpod-async-screens/SKILL.md

and it works in Claude Code, Codex, and Antigravity.

---
name: building-riverpod-async-screens
description: Build a Flutter screen that loads async data with Riverpod,
  handling , error, and data states with AsyncValue. Use when
  fetching from a repository, API, or database and rendering spinners,
  retry UI, and lists. Do not use for static screens with no async data.
---


Wire an async data screen the way a senior Flutter dev would: a typed
provider, `AsyncValue` state handling, and explicit /error/data
branches. No raw `FutureBuilder`, no manual `setState` for server state,
no swallowed errors.

## Rules

- Use a `FutureProvider` for read-only data, or an `AsyncNotifier` when the
  screen also mutates state. Do NOT use `StatefulWidget` + `setState` for
  server state.
- Watch with `ref.watch` inside `build`. Use `ref.read` only inside callbacks.
- Render all three states with `AsyncValue.when`. Never assume data exists.
- Always give the error branch a retry path. Do NOT swallow exceptions or
  show an infinite spinner on failure.
- Keep shared providers in their own file: one feature, one providers file.

## Workflow

Copy this checklist and track progress:

- [ ] Define the immutable data model.
- [ ] Add the repository method returning `Future<Model>`.
- [ ] Create a `FutureProvider` (or `AsyncNotifier`) that calls the repository.
- [ ] Build the screen: `ref.watch` the provider, render with `AsyncValue.when`.
- [ ] Implement the error branch with a retry that invalidates the provider.
- [ ] Run `dart analyze` and fix all issues. Repeat until clean.

## Example
// product_providers.dart
final productProvider = FutureProvider.autoDispose<List<Product>>((ref) async {
  final repo = ref.watch(productRepositoryProvider);
  return repo.fetchProducts();
});

// product_screen.dart
class ProductScreen extends ConsumerWidget {
  const ProductScreen({super.key});

  @override
  Widget build(BuildContext context, WidgetRef ref) {
    final products = ref.watch(productProvider);
    return Scaffold(
      appBar: AppBar(title: const Text('Products')),
      body: products.when(
        data: (items) => ListView.builder(
          itemCount: items.length,
          itemBuilder: (_, i) => ListTile(title: Text(items[i].name)),
        ),
        : () => const Center(child: CircularProgressIndicator()),
        error: (err, _) => ErrorRetry(
          message: 'Could not load products',
          onRetry: () => ref.invalidate(productProvider),
        ),
      ),
    );
  }
}

Why not FutureBuilder? (legacy)

FutureBuilder

re-runs its future on every rebuild unless cached, causing

duplicate network calls. Providers cache and dedupe by default. Prefer

providers for any server state.

Notice how much work the description does, how the rules ban wrong instincts before listing right ones, how the workflow ends in a validator loop, and how the example is complete enough to copy. That is the whole recipe in one file.

## How to actually evaluate your skill

This is the step almost everyone skips, and it is the one that separates a skill that feels good from one that is good. Both Anthropic and the Flutter team are emphatic: do not trust vibes. Measure.

### Build the evaluation first

Anthropic calls this evaluation-driven development, and the order matters:

1. **Find the gap.** Run the agent on a real task with no skill. Document exactly where it fails or writes outdated code.
2. **Write three eval scenarios** that target those failures.
3. **Establish a baseline.** Measure performance without the skill.
4. **Write the minimum instructions** needed to pass.
5. **Iterate.** Re-run, compare to baseline, refine.

This guarantees you are solving a real problem instead of documenting an imaginary one. A simple eval is just structured expectations:
{
  "skills": ["building-riverpod-async-screens"],
  "query": "Build a screen that loads the user's order history from OrderRepository and shows it in a list",
  "expected_behavior": [
    "Creates a FutureProvider or AsyncNotifier that calls OrderRepository, not a StatefulWidget with setState",
    "Renders , error, and data states using AsyncValue.when",
    "Includes a retry action in the error branch that invalidates the provider",
    "Generated code passes `dart analyze` with no errors"
  ]
}

The Dart and Flutter teams run an experimental evals framework (open-sourced at the flutter/evals repository) built around critical user journeys: realistic developer tasks rather than toy prompts. They score on two axes, which is a great rubric to copy for your own skills:

dart analyze

, and pass the tests? Objective, machine-checkable.For your own skill, that translates to a dead-simple loop: run the task with and without the skill, then ask "did the deterministic checks pass, and is the code meaningfully better?" If the skill does not move either axis, it is not earning its context budget.

Anthropic's most practical tip: develop the skill with one instance (call it the author) and test it with a fresh instance that has no memory of the conversation (the tester). The author helps you write and tighten the SKILL.md

. The tester reveals what the instructions actually communicate to a cold agent. When the tester stumbles, bring the specific failure back to the author and refine. Repeat. This observe-refine-test loop is how the official skills were hardened, and it works because the model understands both how to write agent instructions and what an agent needs to receive.

Skills can include scripts and reference external resources. That means an untrusted skill can introduce vulnerabilities or quietly exfiltrate data. Before you install a community skill, read it, the same way you would read a dependency before adding it to pubspec.yaml

. For any skill that runs terminal commands or touches infrastructure, add an explicit "Safety" section documenting exactly what it does. Treat skills as code, because they are.

When the official skills dropped, the Flutter corner of X and Reddit reacted the way it always does: screenshots, threads, and declarations that AI coding just changed again. I want to be straight, because the skeptics have a point worth hearing.

More than one experienced Flutter dev read the actual skill files and came away underwhelmed, noting the initial set is fairly thin and covers ground a competent dev already knows. That is fair. And it is also the wrong frame.

A skill is not a magic file that makes your agent brilliant. It is a discipline. The value is not in any single skill the Flutter team shipped. It is in the workflow the format unlocks: codify a pattern once, evaluate it, refine it on a loop, and every future session inherits it. The teams that win with AI in 2026 are not the ones with the best model. They are the ones who got good at writing down what they already know, then testing that the agent actually follows it.

That is the real reason to learn this recipe. Not to consume the official skills, but to write the ones your team actually needs.

Before you commit a Flutter skill, verify:

SKILL.md

body is under 500 lines; depth is in references/

or examples/

.dart analyze

/ flutter test

).What is a Flutter agent skill?

A folder containing a SKILL.md

file that gives an AI coding agent task-specific, expert instructions for a Flutter or Dart workflow. It loads on demand via progressive disclosure, so it adds expertise without permanently bloating the context window.

What makes an agent skill good?

A precise, trigger-rich description (the single biggest factor), ruthless conciseness, explicitly stated anti-patterns, a checklist workflow, a validator feedback loop, and at least one complete example, all verified against evaluations rather than vibes.

How do I write the description so the skill actually triggers?

Third person, state both what the skill does and when to use it, front-load the trigger words a developer would type, and add a "Do not use" clause to prevent it firing on the wrong tasks.

How do I evaluate a skill?

Build evals before writing docs. Run the task without the skill to establish a baseline, write three scenarios with expected behaviors, then measure deterministic correctness (compiles, passes dart analyze

and tests) and qualitative quality against that baseline.

Does a skill I write for Claude Code work in Codex and Antigravity?

Yes. SKILL.md

is an open standard. Skills that stick to the core format (frontmatter plus Markdown instructions) work across Claude Code, Codex, Antigravity, Gemini CLI, and Cursor. Only advanced, tool-specific features need adjustment.

How is a skill different from a rules file or AGENTS.md?

Rules and AGENTS.md

are always-on, repository-wide instructions (setup commands, standards). A skill is loaded only when its description matches the current task. Use always-on files for global rules and short if/then triggers, and skills for specific, repeatable workflows.

How long should a SKILL.md be?

Keep the body under 500 lines. If it grows past that, move depth into one-level-deep references/

files and keep the main file as a lean overview.

The official Dart and Flutter skills are a starting point, not the destination. The real unlock is the recipe behind them: a discovery-optimized description, concise expert instructions, anti-patterns stated out loud, a checklist, a validator loop, and an evaluation that proves it works. Get those right and you can encode your team's hardest-won Flutter patterns into something every agent on the team follows automatically.

Write one skill this week. Pick the task where your AI agent annoys you most, encode the correct pattern, and evaluate it against the no-skill baseline. Then tell me what you built. I read every comment, and I want to know which Flutter pattern you taught your agent first. 🥊

Sources: Flutter docs: Agent skills and AI Evaluations, flutter/skills, Anthropic: Skill authoring best practices, Google Antigravity skills docs, and OpenAI Codex: Agent skills.

source & further reading

dev.to — original article AgentENV: Distributed Runtime for AI Agents at Scale (Open Source, Rust) I Made REGENT: An MCP Server for Configuring OpenWrt Routers Through an AI Physics-Augmented Diffusion Modeling for satellite anomaly response operations with embodied agent feedback loops

How to Write a Flutter Agent Skill That Actually Works: The 2026 Recipe

Run your AI side-project on zahid.host