# AI Research Engineer Open-Sources His Entire Workflow and Prompts

> Source: <https://dev.to/mixture-of-experts/ai-research-engineer-open-sources-his-entire-workflow-and-prompts-20jm>
> Published: 2026-06-17 00:18:59+00:00

Fable 5 came and went. And because it was taken away so quickly, developers wanted it back even more. Scarcity has a way of making things feel more valuable.

Reviews during its short tenure described a model that was very capable and great at churning on long-running, ambiguous tasks. But it was too expensive. The model was also intelligent enough that, on large work and overhauls, it tended to overthink. Most likely because of its size. For iterative work like implementing a feature or change, Fable 5 was comparable head-to-head with GPT 5.5, except Fable 5 would run for 10x as long: a larger model, more overthinking, and more time. The other issue was fallback behavior. If you hit a case where the model needed to call the fallback Opus model, you would not necessarily know it happened, and you would be billed at the higher charge.

Nonetheless, it was a noticeable change compared to existing models. It was good at churning on a specific, goal-oriented problem. For example, optimizing a slow path by repeatedly profiling, tracing call sites, tightening hot loops, and validating the regression budget. For architecture design, it was still not remarkable. So it was good at that goal-oriented push, but even within that you needed to run it in sessions, review its code, and steer or compact to get the results you wanted.

It is a good model to use for planning, research, and review, which is where I had adopted it. I saw real benefits. However, when it came to orchestration or running workflows, I still believe GPT 5.5 is better and more cost-effective on both tokens and time. Personally, I care about token spend, but I care immensely more about my time.

Model capability aside, I still think we are missing a bigger problem, and Fable 5 put a magnifying lens on it because of the nature of its capabilities. AI adoption in organizations is still a challenge for many developers because there are not enough good examples of how power users of coding agents are prompting, running workflows, reviewing outputs, and taking action.

So I am turning my process into a public [workflow playbook](https://github.com/bastani-inc/atomic/blob/main/docs/workflow-playbook.md): how I prompt, how I run workflows, how I steer them, and how I handle the edge cases that show up when agents are doing real work.

Here is the prompt I asked my coding agent to run:

**Workflow usage guide generator**

A privacy-preserving prompt for turning private workflow usage into a public developer guide.

```
<prompt>
You are helping me turn my private Atomic workflow usage into a public, developer-facing guide.

Your job is to analyze my workflow behavior, steering patterns, prompts, and decision-making style without exposing any private information.

<privacy_rules>
- Do NOT quote private session text verbatim unless it is completely generic.
- Do NOT include names, company details, repository names, customer data, file paths, secrets, strategy, internal roadmap details, or private implementation specifics.
- Replace concrete/private details with neutral placeholders like [project], [bug], [workflow], [internal tool], [customer], or [repo].
- Prefer synthesized examples over copied examples.
- If a useful example depends on private context, rewrite it as a safe fictionalized version.
- Flag anything that may be unsafe to publish instead of including it.
</privacy_rules>

<task>
Analyze my workflow usage and produce a practical guide for other developers showing how I use workflows effectively.
Focus on concrete behaviors, reusable prompts, steering moves, and examples developers can copy.
</task>

Look for:
1. The types of workflows I run most often.
2. How I define objectives and done criteria.
3. How I break down complex work into stages.
4. How I steer workflows when they go off track.
5. How I respond to workflow prompts or blocked stages.
6. How I use verification, tests, reviews, or acceptance criteria.
7. How I decide when to interrupt, resume, pause, or rerun.
8. Prompt patterns I reuse.
9. Mistakes or anti-patterns I avoid.
10. Lessons that would help another developer get better results.

<output_format>
Produce the following:

# Workflow Usage Guide

## 1. Executive Summary
A short overview of my workflow style.

## 2. Core Principles
List 5-10 principles I seem to follow when running workflows.

## 3. Common Workflow Patterns
For each pattern:
- Pattern name
- When I use it
- What the workflow usually does
- Why it works
- Safe public example

## 4. Steering Patterns
For each steering behavior:
- Situation
- What I usually say or do
- Why it helps
- Reusable public prompt

## 5. Prompt Templates
Create reusable prompt templates based on my behavior.
Do not copy private prompts directly. Generalize them.

Include templates for:
- Starting a workflow
- Tightening scope
- Adding acceptance criteria
- Redirecting a stage
- Handling a failed validation
- Asking for synthesis
- Turning results into implementation steps

## 6. Concrete Public Examples
Create 3-5 fictionalized but realistic examples showing how a developer could use these patterns.

Each example should include:
- Scenario
- Initial workflow objective
- Steering message
- Validation step
- Final outcome

## 7. Anti-Patterns
List behaviors I avoid or correct, such as vague objectives, missing validation, overbroad prompts, or accepting unverified output.

## 8. Publishability Review
Create a table with:
- Section
- Safe to publish? yes/no
- Risk
- Suggested redaction or rewrite

Important: prioritize usefulness for developers while preserving privacy.
</prompt>
```

The final asset is a workflow playbook you can hand to your own coding agent. It open-sources how I run workflows and prompt effectively, including how I define scope, set done criteria, steer blocked stages, verify results, and recover when a workflow goes off track.

The workflows I run are not the dynamic workflows or loops you see in Claude Code, Codex `/goal`

, or Hermes Agent. They are literally programmatic automations of the work I already do, with human-in-the-loop checkpoints, review gates, and the ability to steer agents mid-run.

I do not manually prompt much anymore.

A good example: say you are doing a refactor. You probably find yourself running a prompt, then `/compact`

, then running the same prompt again. Repeat that three times, compact again, and keep going. You probably do this very frequently.

It turns out that can just become a workflow. You repeat and micromanage less without giving up human autonomy. You also reduce slop because the workflow design handles the piping: what gets passed forward, what gets reviewed, what gets rejected, and where the human needs to make a decision.

In terms of cost, I spend more than regular Codex but significantly less than using Claude Code. In terms of timed runs, it is about the same as Codex at first glance, and much less again than Claude Code.

The quality of the result is where it shines. The workflow approach has a win rate of 75% against both Codex and Claude Code on the exact same issues, which means I actually spend way less time than I would be using Codex alone.

I tried solving real problems, not oversaturated benchmarks. I asked it to work through different kinds of tasks in a real-world codebase: a migration, a new feature, and a bug fix. The point was not to find one cherry-picked issue where a coding agent looked good. The point was to see whether a workflow-first approach stayed useful across different shapes of software work.

The migration required moving embedded PNG metadata from an older latin1-oriented chunk format to a UTF-8-compatible format while preserving legacy fallback behavior. The new feature required surfacing collaboration connection failures in the UI, which meant tracking transient connection state, wiring lifecycle events, cleaning up listeners, preserving accessibility, and adding tests. The bug fix required correcting arrow-curve behavior inside closed shapes without changing the expected behavior outside those shapes.

Across the migration and new-feature issues, the workflow-generated PRs consistently landed the safest technically correct change compared with the Claude Code and Codex PRs. For the PNG metadata migration, the Workflows PR wrote spec-correct UTF-8 iTXt, selected Excalidraw-keyed metadata, preserved legacy tEXt fallback, and validated emoji and on-disk chunk behavior; the other PRs had subtle compatibility bugs where unrelated or malformed iTXt chunks could shadow valid legacy metadata. For the collaboration-status feature, the Workflows PR had the best transient non-persistent state model, Socket.IO lifecycle handling, listener cleanup, accessibility, and targeted tests, while the alternatives had shared error-indicator state bugs or narrower lifecycle and UI coverage.

The bug-fix case showed the same pattern: the Workflows PR solved the actual arrow-curving bug with the narrowest safe behavioral change. It prevented premature auto-finalization while drawing inside the same start-bound shape, preserved normal binding and finalization for other targets, and added meaningful regression coverage. The rejected Claude Code and Codex alternatives either introduced a high-severity regression where simple click-created arrows no longer auto-finalized on bindable targets, or had weaker coverage around binding gaps, different target shapes, and finalization edge cases. Overall, workflows reduced AI slop by producing changes that were tighter in scope, safer for compatibility, better tested, and more careful about edge-case behavior than the competing agent-generated PRs.

That is why I am sharing the workflow playbook instead of only writing about the idea. The goal is for another developer to copy the patterns, adapt the prompts, and run a similar workflow-first process on their own codebase without needing my private context.

Personally, I see reliability and improved model capability exceed expectations when we keep the developer in the loop, not cut them out. I live this thesis daily.

I think we need good examples of how to work with coding agents: what each person's workflow looks like, how they prompt, where they intervene, where they trust automation, and where they refuse to give up control.

The playbook is meant to make that concrete. It covers the workflow moves that matter in practice: starting with a tight objective, adding acceptance criteria, redirecting a stage, responding to blocked agents, handling failed validation, deciding when to pause or rerun, and turning the final synthesis into implementation steps.

The point is to demystify the work and make it easier for all developers to build. Let's make the bar as low as possible to get good results.
