Inspired by Kelsey Hightower's Kubernetes the hard way, we're going to build a durable execution engine from scratch using Go and Postgres.
Durable execution is a mechanism to incrementally checkpoint the state of a function as it makes progress, so that in the case of unexpected failure, the function can recover from where it left off. It's particularly relevant in newer stacks and projects implementing AI agents, which are long-running and stateful. A system which implements durable execution is often called a "workflow engine."
This guide uses Go and templated SQL using sqlc. The only dependencies are:
- Go 1.25+
- Postgres (by default, created via Docker)
- pgx
If you are interested in contributing support for other languages, please create a Github issue. I'll be sharing updates (new lessons, other languages) for this guide on Twitter if you'd like to follow along.
You will benefit from this guide if you:
- Want to understand how durable execution engines like Hatchet and Temporal work at a deeper level
- Are implementing your own workflow engine and would like a simple starting point for your architecture
This guide expects that you understand the foundations of SQL databases, can read code, and are familiar with some minimal backend engineering concepts, such as queues. More advanced terminology will be introduced in each lesson.
For a motivating guide on durable execution, see the blog post How to think about durable execution.
Each directory in /lessons is set up with an identical structure:
- A
README.md
file for navigating the lesson - A
main.go
file for running the example code produced by the lesson, which can be run viago run .
- A
sql
directory which contains aschema.sql
file, aqueries.sql
file, and some files for generating templated queries viasqlc
By the final lesson, we'll have a minimal but fully-working workflow engine. Note that these lessons are not focused on developer ergonomics: we'll be building the bare minimum to understand the fundamentals, but won't implement the typical niceties you'd see in a client SDK.
PrerequisitesSimple task queueLimiting concurrent tasksTask queue improvementsDurable event logTracking non-determinismDurable tasks
This guide is a somewhat opinionated view on durable execution. Specifically, it implements:
- Durable execution entirely in Postgres.
- Two types of functions: durable tasks and regular tasks. These map directly to durable tasks and tasks in Hatchet, and are akin to Temporal workflows and activities.
- Regular tasks invokable as standalone tasks, meaning this guide implements a simple Postgres-backed task queue as well in the first few lessons.
- Multiple types of retries and replays, which are treated as distinct:
- Retries will retry a durable task without resetting the event history (preserving the execution state of the function)
- Replays will reset a durable task's execution history to start from scratch
- Forking will reset a durable task's execution history at a given point in the execution history, effectively creating a "fork" of that task. This will be the subject of a future lesson
You can modify the schema, queries, and code in each lesson to experiment. To regenerate the SQL files in each directory, run the following:
go run github.com/sqlc-dev/sqlc/cmd/sqlc generate --file sql/sqlc.yaml
If you discovered an error in the core logic of any lesson, please file a Github issue. We'd be happy to reward you with a baked good from a bakery near you (yes, we're serious). If a bakery isn't available, we'd be happy to send you a Hatchet tee or hat. If you understandably don't want more vendor swag, you'll have my eternal gratitude.
AI has not been used to write any prose in this guide. All mistakes and turns of phrase are my own. AI has been used to:
- Verify that each lesson of this guide is independently runnable and instructions are easy to follow
- Generate mermaid diagrams
If there's sufficient interest, I'd be happy to put together additional lessons, such as:
- Using Postgres
LISTEN
/NOTIFY
to speed up processing significantly - Durable sleep
- Branching and forking the durable event log