{"slug": "passing-dbs-through-continuations", "title": "Passing DBs Through Continuations", "summary": "A database developer implemented relational algebra operators using continuation-passing style (CPS) to fuse operations and eliminate intermediate materialization overhead. The approach, inspired by functional compiler techniques, allows operators like increment and double to compose directly without producing temporary lists or requiring the complex iterator models used by systems like DuckDB and Umbra. This CPS-based method offers a simpler path to achieving compiled-database-level performance without the engineering effort typically required.", "body_md": "*Dedicated to the Minnowbrook\nAnalytic Reasoning Seminar with special thanks to Kris Micinski and Michael Ballantyne*\n\nSuppose you want to write a database. You'd probably start by\nimplementing relational algebra operators — projection, filter, join,\netc. The easy way is to implement them as functions that take in tables\nand return tables, and assemble them into a larger expression. That was\nhow [Prela](https://prela-lang.org) worked in its first\nincarnation. The code was clean, but it was hella slow! Which was not\nsurprising, because every operator materialized every intermediate\nresult. The standard solution to this is the [iterator\nmodel](https://cs-people.bu.edu/mathan/reading-groups/papers-classics/volcano.pdf), where each operator implements an *Iterator* interface\nthat streams intermediate tables row by row instead of materializing\nthem. But implementing the iterator model naively still incurs overhead:\nevery call to `Iterator.next()`\n\ntriggers a dynamic dispatch,\nwhich costs vtable lookups and destroys cache locality. There are two\nstandard remedies: [vectorization\nand compilation](https://www.vldb.org/pvldb/vol11/p2209-kersten.pdf). A vectorized database amortizes the overhead by\nimplementing `Iterator.next_batch()`\n\nwhich returns a whole\nbatch of data that can be processed together; a compiled database, well,\ncompiles the incoming query directly to fast machine code that runs\nwithout any dynamic dispatch. Either approach takes a lot of very smart\npeople spending their entire working life to build, and it's why systems\nlike DuckDB and Umbra exist. I'm moderately smart but don't have a lot\nof time, so I was looking for a shortcut. The [shortcut](https://dl.acm.org/doi/10.1145/165180.165214) I\nstumbled upon was so beautiful that I literally cried[ 1](#fn1)\nwhen I finally understood it, and I hope my explanation below will make\nyou cry too :' )\n\nTo keep things simple, let's suppose we're just dealing with lists of\nnumbers, and we want to do two very simple things to them:\n`inc`\n\nadds 1 to every number, and `dbl`\n\ndoubles\nthem. That's pretty easy to write:2\n\n```\ninc(xs) = [x + 1 for x in xs]\n\ndbl(xs) = [2 * x for x in xs]\n```\n\nNow, we can chain them together with `dbl(inc(xs))`\n\nwhich\nwill do two steps in sequence. Problem is, because each function takes\nin a list and returns a list, our program produces an\n*intermediate*, namely `inc(xs)`\n\n. This allocates a new\nlist only to be thrown away by the call to `dbl`\n\n. Things only\ngets worse when we chain together multiple calls to `inc`\n\nand\n`dbl`\n\n. A more efficient implementation would *fuse*\ntogether the operations:\n\n```\ninc_n_dbl(xs) = [2 * (x + 1) for x in xs]\n```\n\nOf course, we can't write down every possible combination of operators like this. Is there a way to define each operator modularly, yet still have them compose into tightly fused operations automatically? Yes, if we use a bit of magic from functional compilers — continuation-passing style (CPS).\n\nThe key idea of CPS is to define operators that *do* things\ninstead of *making* things. `inc`\n\nand `dbl`\n\nas defined above each takes in a list and *makes* a list.\nInstead, the CPS version of each operator takes in a list and an\nadditional input `k`\n\n: this `k`\n\nis a function that\nthe caller passes in, specifying what it wants to do with each element\nafter the operator's work is done. `k`\n\nis called the\n*continuation*. Let's look at some code:\n\n```\nfunction inc(xs, k)\n  for x in xs\n    k(x + 1)\n  end\nend\n```\n\nNow suppose `k`\n\nis the `print`\n\nfunction, then\n`inc`\n\nas defined above will add 1 to each number, then print\nthe result. Note that nothing is returned, and `inc`\n\nonly\ndoes its job (adding 1) then performs what it's told to (apply\n`k`\n\n). As an exercise, you can try and write down\n`dbl`\n\nin CPS style.\n\nBut currently each of `inc`\n\nand `dbl`\n\nstill\ntakes in a list, and there's no obvious way to compose multiple\noperators. To do that, we replace `xs`\n\nwith a \"child\"\noperator `op`\n\n:\n\n``` php\ninc(op, k) = op(x -> k(x + 1))\n\ndbl(op, k) = op(x -> k(x * 2))\n\nfunction scan(xs, k)\n  for x in xs\n    k(x)\n  end\nend\n```\n\nIntuitively, `inc`\n\nnow trusts its child `op`\n\nto\ndo its job, namely, that `op`\n\nwill apply the continuation it\nreceives to each item. So instead of iterating over `xs`\n\n,\n`inc`\n\nsimply tags the `+ 1`\n\nstep onto the\ncontinuation and passes it to `op`\n\n. I've also defined a\n\"source\" operator `scan`\n\nthat connects the input list to the\noperators. Let's see the code in action.\n\n`inc(scan(xs), print)`\n\n.`inc`\n\n, this will call\n`scan(xs, x -> print(x + 1))`\n\n`scan`\n\n, this gets us\n`for x in xs; print(x + 1); end`\n\nSo chaining together `inc`\n\nand `scan`\n\nindeed\ndoes what we want! Now let's try a longer chain\n`dbl(inc(scan(xs)), print)`\n\n:\n\n`dbl`\n\ngets us\n`inc(scan(xs), x -> print(x * 2))`\n\n`inc`\n\ngets us\n`scan(xs, x -> print((x + 1) * 2))`\n\n`scan`\n\ngets us\n`for x in xs; print((x + 1) * 2); end`\n\nNotice how I used the word `expand`\n\n— if we annotate every\noperator definition with `@inline`\n\n, the compiler will\nactually unfold the code as we did above, and an operator chain gets\ncompiled down to a fused loop in the end! You can try expanding longer\nchains like `dbl(inc(dbl(inc(scan(xs)))), print)`\n\nto get some\npractice thinking about CPS. Julia also has handy tools like\n`@code_typed`\n\nthat lets you inspect the compiled code, or the\naptly named [Cthulhu.jl](https://github.com/JuliaDebug/Cthulhu.jl) that does\nthat interactively. In summary, the example shows that if we define\noperators modularly with CPS, inlining the definitions will\nautomatically produce tightly fused compiled code.\n\nNone of these is really new, and have been known to functional\nprogrammers for decades by the name of [deforestation](https://en.wikipedia.org/wiki/Deforestation_(computer_science)).\nBut when implemented in Prela, something incredible happens: *a clean\nCPS-style interpreter for Prela automagically recovers fast columnar\nexecution when compiled!*\n\nThe central design principle behind Prela is \"everything is a\n*binary* relation\". This means Prela maps cleanly to both a\nlogical Entity/Relationship data model, as well as to a columnar\nphysical storage. I won't go into details here, but encourage you to\nplay with the language to get a feel for that. Practically, this means\nwe fully normalize every wide table with m attributes to m binary\nrelations. For example, a table `movie`\n\nwith columns\n`ID, year, title`\n\nbecomes:\n\n`movie`\n\n) over\n`ID`\n\n(you can think of this as a unary table over\n`ID`\n\n)`ID`\n\nto `year`\n\n`ID`\n\nto `title`\n\nBut now the issue is, even for a simple\n`SELECT * FROM movie`\n\nwe need to join together 3 different\ntables! Whereas a column store would simply run:\n\n```\nfor i in 0:n\n  print(id_col[i], year_col[i], title_col[i])\nend\n```\n\nIn other words, the column store *co-iterates* the columns in\none pass to compute the query.\n\nLet's first look at how Prela used to run this query. The most\nimportant operator in Prela is the *relation composition* \\rightarrow, which generalizes function\ncomposition the same way relations generalize functions. In standard\nrelational algebra: R \\rightarrow S = \\pi_{x,\nz}(R \\Join_{R.y = S.y} S) where R's schema is over x and y, and\nS's schema is over y and z. The second most important Prela operator is\nthe *product* \\times which takes\ntwo binary relations and joins them: R \\times\nS = R \\Join_{R.x = S.x} S where R's schema is over x, y, and S's\nschema is over x and z.\n\nSo `SELECT * FROM movie`\n\nis spelled \\text{movie} \\rightarrow \\text{year} \\times\n\\text{title} in Prela. Now, this requires first joining\n`year`\n\nwith `title`\n\n, whose result is joined with\n`movie`\n\n. We can make this a bit cheaper if we can assume the\nprimary key `ID`\n\ns are dense and continuous, in which case we\ncan just store `year`\n\nas an array of integers and\n`title`\n\nas an array of strings; and for `ID`\n\ns, we\nonly need to store one single number `n`\n\nwhich says how many\nIDs there are. But even with this, we still need to do the work to join\nthe tables.\n\nInstead, let's define *compose* and *product* in\nCPS:\n\n``` php\ncompose(lhs, rhs, k) = lhs((x, y) -> rhs(y, (z -> k(x, z))))\n\nproduct(lhs, rhs, x, k) = lhs(x, (y -> rhs(x, (z -> k((y, z))))))\n\nscan_id(n, k) = for i in 0:n; k(i, i); end\n\nprobe(col, i, k) = k(col[i])\n```\n\nLet's go over each line carefully. Semantically `compose`\n\nis supposed to return a (binary) relation by composing the\n`lhs`\n\nand `rhs`\n\nrelations. In CPS, its job is to\napply `k`\n\nto every pair in this composition. Its first\nargument, `lhs`\n\n, represents a binary relation and applies the\ngiven continuation to every pair. The `rhs`\n\nis a little\ndifferent: it represents a relation that supports *lookup*, i.e.,\n`rhs(key, k)`\n\nwill look up the values associated with\n`key`\n\n, then apply `k`\n\nto each such value. Now\ngoing back to `compose`\n\n— we're saying that, for each\n`(x, y)`\n\ntuple in the LHS, we will look up `y`\n\nfrom the RHS, then for each matching `z`\n\n, we apply\n`k(x, z)`\n\n. In loops this will be:\n\n```\nfor (x, y) in lhs\n  for z in rhs[y]\n    k(x, z)\n  end\nend\n```\n\nWhich is exactly a hash join and a projection that throws away\n`y`\n\n.\n\nNext, `product`\n\nitself is a relation supporting lookup,\nand so are its arguments. To lookup `x`\n\nin a product, we\nfirst look it up from the `lhs`\n\nwhich gets us a bunch of\n`y`\n\ns. Then for each `y`\n\n, we now look up\n`x`\n\nfrom the `rhs`\n\n, getting a bunch of\n`z`\n\ns. Finally, for each `(y, z)`\n\npair, we apply\nthe continuation `k((y, z))`\n\n. In loops:\n\n```\nfor y in lhs[x]\n  for z in rhs[x]\n    k((y, z))\n  end\nend\n```\n\nThis is what will happen if you look up `x`\n\nin R \\Join_{R.x = S.x} S.\n\nFinally, we have the \"source\" operators `scan_id`\n\nand\n`probe`\n\n. `scan_id`\n\nis `scan`\n\nbut\nspecialized for a dense ID relation where we only store `n`\n\n:\nit simply increments `i`\n\nfrom 0 to `n`\n\nand applies\n`k`\n\nto `(i, i)`\n\n. `probe`\n\nrepresents an\ninput relation that supports looking up a primary key `i`\n\nand\nwhich is backed by a dense vector, so looking up an ID `i`\n\nsimply indexes `col[i]`\n\nand applies `k`\n\nto the\nvalue.\n\nWe're now ready to put everything together and pull the trigger: the\nPrela query \\text{movie} \\rightarrow\n\\text{year} \\times \\text{title} desugars to\n`compose(scan_id(n), product(probe(year), probe(title)))`\n\nwhere `n`\n\nis the number of movies. Here are the definitions\nagain for reference:\n\n``` php\ncompose(lhs, rhs, k) = lhs((x, y) -> rhs(y, (z -> k(x, z))))\nproduct(lhs, rhs, x, k) = lhs(x, (y -> rhs(x, (z -> k((y, z))))))\nscan_id(n, k) = for i in 0:n; k(i, i); end\nprobe(col, i, k) = k(col[i])\n```\n\nExpand `compose`\n\n:\n\n``` php\nscan_id(n, (x, y) -> product(probe(year), probe(title), y, (z -> k(x, z))))\n```\n\nExpand `scan_id`\n\n:\n\n``` php\nfor i in 0:n\n  product(probe(year), probe(title), i, (z -> k(i, z)))\nend\n```\n\nExpand `product`\n\n:\n\n``` php\nfor i in 0:n\n  probe(year, i, (y -> probe(title, i, (z -> k(i, (y, z))))))\nend\n```\n\nExpand `probe`\n\n:\n\n```\nfor i in 0:n\n  k(i, (year[i], title[i]))\nend\n```\n\nAnd taking `k = print`\n\n, we finally have:\n\n```\nfor i in 0:n\n  print(i, (year[i], title[i]))\nend\n```\n\nSpectacular!!\n\nTo keep the examples small, I've made several simplifications. The\nactual Prela implementation defines two methods `drive`\n\nand\n`probe`\n\nfor each operator which fire depending on how the\noperator is accessed — scanned or probed. But that's pretty much it, and\nthe complete source is around 1000 lines of Julia code in a single file,\nsupporting select, project, join, groupby, aggregation, CTEs, UDFs, and\nwith performance matching DuckDB on TPCH and Join Order Benchmark. I\nshould note that the performance numbers are riding on lots of\nassumptions though, the strongest ones being:\n\nNevertheless, the CPS approach cleanly separates the responsibility of the data engine — which is to map queries to efficient code in some target language — and the responsibility of the compiler which is to produce fast code quickly, and we see there's a lot of room for the compiler to improve. The assumption that PKs are dense can also be relaxed with the help of B-trees, bitmap filters, and other data structures.\n\nBut perhaps the biggest strength of the CPS style is that it makes\nPrela *extensible*, as users can write their own operator in a\nnatural way with a few lines of code, with the assurance that it will be\ncompiled and fused with the rest of the query for efficient\nexecution.\n\nLong before AI psychosis, there was FP psychosis,\nclinically defined as the intense psychological response to\nunderstanding functional programming concepts like recursion, higher\norder functions, monads, or in this case, continuation passing style.[↩︎](#fnref1)\n\nAll code in this post is in Julia.[↩︎](#fnref2)\n\n`scan(xs)`\n\nstands for\n`k -> scan(xs, k)`\n\n, i.e., it is the [curried](https://en.wikipedia.org/wiki/Currying) application of\n`scan`\n\nto `xs`\n\n. Similarly for\n`inc(scan(xs))`\n\nbelow which curries `inc`\n\nwith\n`scan(xs)`\n\n.[↩︎](#fnref3)", "url": "https://wpnews.pro/news/passing-dbs-through-continuations", "canonical_source": "https://remy.wang/blog/cps.html", "published_at": "2026-06-07 01:17:40+00:00", "updated_at": "2026-06-07 01:46:49.631028+00:00", "lang": "en", "topics": ["ai-research"], "entities": ["Kris Micinski", "Michael Ballantyne", "Prela"], "alternates": {"html": "https://wpnews.pro/news/passing-dbs-through-continuations", "markdown": "https://wpnews.pro/news/passing-dbs-through-continuations.md", "text": "https://wpnews.pro/news/passing-dbs-through-continuations.txt", "jsonld": "https://wpnews.pro/news/passing-dbs-through-continuations.jsonld"}}