cd /news/ai-infrastructure/durable-execution-the-hard-way · home topics ai-infrastructure article
[ARTICLE · art-17894] src=github.com pub= topic=ai-infrastructure verified=true sentiment=· neutral

Durable execution, the hard way

A developer has published a step-by-step guide to building a durable execution engine from scratch using Go and Postgres, inspired by Kelsey Hightower's "Kubernetes the hard way." The guide, which implements a minimal but fully-working workflow engine across seven lessons, is designed to help engineers understand how systems like Hatchet and Temporal operate at a deeper level.

read3 min publishedMay 28, 2026

Inspired by Kelsey Hightower's Kubernetes the hard way, we're going to build a durable execution engine from scratch using Go and Postgres.

Durable execution is a mechanism to incrementally checkpoint the state of a function as it makes progress, so that in the case of unexpected failure, the function can recover from where it left off. It's particularly relevant in newer stacks and projects implementing AI agents, which are long-running and stateful. A system which implements durable execution is often called a "workflow engine."

This guide uses Go and templated SQL using sqlc. The only dependencies are:

  • Go 1.25+
  • Postgres (by default, created via Docker)
  • pgx

If you are interested in contributing support for other languages, please create a Github issue. I'll be sharing updates (new lessons, other languages) for this guide on Twitter if you'd like to follow along.

You will benefit from this guide if you:

  • Want to understand how durable execution engines like Hatchet and Temporal work at a deeper level
  • Are implementing your own workflow engine and would like a simple starting point for your architecture

This guide expects that you understand the foundations of SQL databases, can read code, and are familiar with some minimal backend engineering concepts, such as queues. More advanced terminology will be introduced in each lesson.

For a motivating guide on durable execution, see the blog post How to think about durable execution.

Each directory in /lessons is set up with an identical structure:

  • A README.md

file for navigating the lesson - A main.go

file for running the example code produced by the lesson, which can be run viago run .

  • A sql

directory which contains aschema.sql

file, aqueries.sql

file, and some files for generating templated queries viasqlc

By the final lesson, we'll have a minimal but fully-working workflow engine. Note that these lessons are not focused on developer ergonomics: we'll be building the bare minimum to understand the fundamentals, but won't implement the typical niceties you'd see in a client SDK.

PrerequisitesSimple task queueLimiting concurrent tasksTask queue improvementsDurable event logTracking non-determinismDurable tasks

This guide is a somewhat opinionated view on durable execution. Specifically, it implements:

  • Durable execution entirely in Postgres.
  • Two types of functions: durable tasks and regular tasks. These map directly to durable tasks and tasks in Hatchet, and are akin to Temporal workflows and activities.
  • Regular tasks invokable as standalone tasks, meaning this guide implements a simple Postgres-backed task queue as well in the first few lessons.
  • Multiple types of retries and replays, which are treated as distinct:
  • Retries will retry a durable task without resetting the event history (preserving the execution state of the function)
  • Replays will reset a durable task's execution history to start from scratch
  • Forking will reset a durable task's execution history at a given point in the execution history, effectively creating a "fork" of that task. This will be the subject of a future lesson

You can modify the schema, queries, and code in each lesson to experiment. To regenerate the SQL files in each directory, run the following:

go run github.com/sqlc-dev/sqlc/cmd/sqlc generate --file sql/sqlc.yaml

If you discovered an error in the core logic of any lesson, please file a Github issue. We'd be happy to reward you with a baked good from a bakery near you (yes, we're serious). If a bakery isn't available, we'd be happy to send you a Hatchet tee or hat. If you understandably don't want more vendor swag, you'll have my eternal gratitude.

AI has not been used to write any prose in this guide. All mistakes and turns of phrase are my own. AI has been used to:

  • Verify that each lesson of this guide is independently runnable and instructions are easy to follow
  • Generate mermaid diagrams

If there's sufficient interest, I'd be happy to put together additional lessons, such as:

  • Using Postgres LISTEN

/NOTIFY

to speed up processing significantly - Durable sleep

  • Branching and forking the durable event log
── more in #ai-infrastructure 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/durable-execution-th…] indexed:0 read:3min 2026-05-28 ·