Interesting links - May 2026

Apache Kafka 4.3.0 has been released with new features detailed in the release announcement and a video from Sandon Jacobs. Jack Vanlightly introduced Dimster, a new performance benchmarking tool for Apache Kafka, and published several related blog posts. The Confluent Parallel Consumer library was marked as no longer maintained, prompting a LinkedIn discussion and a fork from original author Tony Stubbs.

Welcome to May’s Interesting Links This month saw the Current conference in London with the usual 5k run https://rmoff.net/categories/5k-run/walk/ , lots of familiar faces and friendly conversations—and plenty of excellent breakout sessions too. It seems live-tweeting conferences isn’t a thing any more, with only myself and Thomas Cooper seeming to post anything, but if you want you can go review the hashtag feed on BlueSky https://bsky.app/search?q=%23current26 for some highlights of the conference. I got my first Hacker News front page hit with AI Slop is Killing Online Communities https://rmoff.net/2026/05/06/ai-slop-is-killing-online-communities/ 51k views and climbing , and a nice little halo boost for another rant from earlier this year, AI will fsck you up if you’re not on board https://rmoff.net/2026/03/06/ai-will-fuck-you-up-if-youre-not-on-board/ . Oh, and I got involved in some thought leadering over on LinkedIn which a non-zero number of people thought was serious with my shitposting about fried breakfasts https://www.linkedin.com/posts/robinmoffatt current26-current26-leadership-activity-7463189069041180672-CFR4?rcm=ACoAAAC2ckIBstmoM1I4uBi9Djg8B7e0JaBvqzQ . {{< il-header }} 🔥 Apache Kafka 4.3.0 has been released. Check out the release announcement https://kafka.apache.org/blog/2026/05/22/apache-kafka-4.3.0-release-announcement/ , as well as a video from Sandon Jacobs https://www.youtube.com/watch?v=lePgrOiX11U covering the new features. 🔥 After a few quiet months on his blog, Jack Vanlightly is back with a bang He’s written a new tool, Dimster, a performance benchmarking tool for Apache Kafka https://jack-vanlightly.com/blog/2026/5/20/introducing-dimster-a-performance-benchmarking-tool-for-apache-kafka , and has written several more blog posts off the back of it: 🔥 I had the absolute pleasure to watch Victor Rentea present at Devoxx UK earlier this month. This guy redefines what it means to be an entertaining, energetic, enthusiastic—and educational presenter. Whilst his specific talk, "Event-Driven Architecture Pitfalls" isn’t online yet, you can find the slides here https://victorrentea.ro/eda-pitfalls , and a recording from Devoxx last year https://www.youtube.com/watch?v=0SnuppAHOlQ of a similar talk. The Parallel Consumer library from Confluent has been marked as no longer maintained, prompting a discussion https://www.linkedin.com/posts/charles-larrieu-casias nooooo-the-confluent-parallel-consumer-library-share-7465111475133685760-oBvc/ of alternatives and the concept itself on LinkedIn, as well as a fork https://github.com/astubbs/parallel-consumer from one of the original authors, Tony Stubbs. Mariano Gonzalez - Benchmarking KPipe against the parallel-Kafka libraries you would actually pick https://mariano-gonzalez.com/posts/post-5/ . Michel Tricot - Event-Driven vs. Polling Architectures for Agent Triggers https://agentblueprint.substack.com/p/event-driven-vs-polling-architectures . An interesting idea from Florent Ramiere and colleagues: what if you specify a set of interesting additions to Kafka’s functionality, with strict rules around the implementation, and then have LLMs take their best shot at it? You can see the ideas and results in the branches of this repository https://github.com/conduktor/current-london-2026 . Viquar Khan - Architecting Cloud-Native Kafka: From Tiered Storage Towards a Diskless Future https://www.infoq.com/articles/architecting-cloud-native-kafka/ . Elad Eldor - Kafka’s Real Compression Problem Is Batch Depth https://dev.to/eeldor/kafkas-real-compression-problem-is-batch-depth-515k , and Kafka Compute Is Cheap. Network Is Not https://dev.to/eeldor/kafka-compute-is-cheap-network-is-not-2bdh . Kroxylicious version 0.21.0 has been released https://kroxylicious.io/blog/kroxylicious-proxy/releases/2026/05/15/release-0 21 0.html , and Sam Barker from the Kroxylicious project has been running some benchmarks to look at the impact that the proxy has https://kroxylicious.io/benchmarking/performance/2026/05/28/benchmarking-the-proxy.html , both pass-through and when encrypting records. Aiven’s Juha Mynttinen explores why they think Apache Kafka Deserves Topic Types https://aiven.io/blog/kafka-deserves-topic-types . Details of a Coinbase outage https://x.com/rwitoff/status/2052863502424133949 involving their Kafka provider, which based on blogs from 2022 https://www.coinbase.com/en-gb/blog/kafka-infrastructure-renovation and 2023 https://aws.amazon.com/blogs/aws-cloud-financial-management/how-coinbase-built-a-cloud-center-of-excellence-to-optimize-their-cloud-costs-on-aws/ is MSK. Andy Muir - Kafka Schema Registry doesn’t guarantee compatibility and what actually does https://muirandy.wordpress.com/2026/04/30/kafka-schema-registry-doesnt-guarantee-compatibility-and-what-actually-does/ . Bruno Cadonna - OpenData Buffer: HA pipelines without Kafka https://www.opendata.dev/blog/buffer-ha-pipelines-without-kafka . Jeffrey J. Jennings - Kafka’s quiet observability superpower - Kafka Interceptors https://medium.com/@jeffrey.j.jennings/kafkas-quiet-observability-superpower-kafka-interceptors-aca88c33867e . Grzegorz Kocur - Do Kafka metrics have to be so difficult? https://monedula.dev/blog/kafka-metrics-opentelemetry-otlp-monedula-metrics-reporter/ Flink’s Stateful Functions StateFun is not maintained by the project any more, so kzmlabs' Oleksandr Kazimirov forked it https://kzmlabs.github.io/flink-statefun/articles/forking-statefun/ to continue developing it. Olena Vodzianova - How Chandy-Lamport Inspired Apache Flink Checkpointing https://medium.com/@wizzywooz/how-chandy-lamport-inspired-apache-flink-checkpointing-256db84084ce . 🔥 Two good posts from the team at Grab: Details of how Smartsheet use Flink https://aws.amazon.com/blogs/big-data/how-smartsheet-built-real-time-dynamic-filtering-on-apache-flink-reducing-40k-month-in-messaging-costs/ for optimising both costs and performance by filtering messages. flink-state-explorer https://github.com/Eric-D/flink-state-explorer is, as the name suggests, a tool for exploring Apache Flink 1.20 canonical savepoints interactively. A hands-on github repo from Patrick Neff showing off Stream processing pipeline using dbt and Flink on Confluent Cloud https://github.com/pneff93/dbt-cc-stream-processing . Shuva Jyoti Kar - Designing stateful serverless Agentic Loop https://medium.com/google-cloud/designing-stateful-serverless-agentic-loop-bb73a63562b4 with Kafka and Flink. A couple of security issues for Flink to be aware of if you’re running it: 🔥 Tristan Handy - BI’s Second Unbundling https://roundup.getdbt.com/p/bis-second-unbundling . A good writeup from Cloudflare’s James Morrison and Christian Endres about tracing performance issues in ClickHouse. Several posts from StarRocks covering new features in 4.1: Two BigQuery optimisation/cost saving articles, from Christophe Oudar https://medium.com/teads-engineering/how-we-cut-bigquery-slot-usage-by-90-on-one-of-our-most-resource-hungry-service-after-an-outage-c491af09e77e and Azeem Jalageri https://medium.com/@azeemjalageri/23fc5efc91a5?sk=2d8855c53c8d878b6afa7a839b30ef09 . Daniel Beach - Spark is Dead. Long Live DuckDB https://www.confessionsofadataguy.com/spark-is-dead-long-live-duckdb/ . Alibaba added DuckDB into their fork of MySQL, AliSQL, providing storage and query for OLAP workloads https://www.alibabacloud.com/blog/when-mysql-meets-the-columnar-storage-engine-duckdb-in-the-ai-era 603117 . Simon Aubury - I don’t need an untrusted LLM to tell me I’m spending too much on coffee https://simonaubury.substack.com/p/i-dont-need-an-untrusted-llm-to-tell . The DuckDB team announced Quack: The DuckDB Client-Server Protocol https://duckdb.org/2026/05/12/quack-remote-protocol.html . Ben Fleis explores DuckDB’s support for Delta and Unity Catalog https://duckdb.org/2026/05/07/delta-uc-updates . 🔥 I’ve been a fan of Mark Litwintschik’s https://tech.marksblogg.com/ no-nonsense blog posts showing current technologies and exploring interesting data sets for many years. In this one he uses DuckDB to analyse details of 10K+ Satellites in Space https://tech.marksblogg.com/gcat-satellite-database.html . 🔥 Nikola Ilic - Data Modeling for Analytics Engineers: The Complete Primer https://towardsdatascience.com/data-modeling-for-analytics-engineers-the-complete-primer/ . AirTable’s Matthew Jin details how they optimised their costs https://medium.com/airtable-eng/how-we-reduced-archive-storage-costs-by-100x-and-saved-millions-21754b5a6c8e by moving PBs of cold data from MySQL to S3, and wrote a query engine using Data Fusion to serve it. Brian Brunner and his colleagues at Cloudflare published details of how they built Cloudflare’s data platform and an AI agent on top of it https://blog.cloudflare.com/our-unified-data-platform/ . Caesario Kisty - A Practical Implementation of Medallion Architecture Using ClickHouse https://blog.dataengineerthings.org/a-practical-implementation-of-medallion-architecture-using-clickhouse-484ec6dd960c . Xinran Waibel - Data Engineering Open Forum 2026 Recap https://blog.dataengineerthings.org/data-engineering-open-forum-2026-recap-b0154b770315 . 🔥 After doing a bit of fairly naïve experimentation with Claude and dbt earlier this year https://rmoff.net/2026/03/11/claude-code-isnt-going-to-replace-data-engineers-yet/ , I was very interested to read Jason Ganz’s article What data agent benchmarks do and don’t tell us https://roundup.getdbt.com/p/what-data-agent-benchmarks-do-and , and hope to try out the referenced ADE-bench https://github.com/dbt-labs/ade-bench user-content-fn-1-43049741a33bb2b20904cc0f5298be23 "a framework for evaluating AI agents on data analyst tasks" soon. Whilst Thijs Nieuwdorp’s article about Handling Schema Issues in Polars https://pola.rs/posts/schema-evolution/ is specific to Polars, it’s a useful reference for the kind of schema changes one will want to make in a data pipeline, and the challenges it can cause depending on how or if your implementation technology of choice supports it. 🔥 Pedram Navid - We need to talk about dbt https://databased.pedramnavid.com/p/we-need-to-talk-about-dbt . A summary/re-write by Alex Yu a.k.a. ByteByteGo looking at How Figma Upgraded Data Pipeline from Multi-Day Latency to Real-Time https://blog.bytebytego.com/p/how-figma-upgraded-data-pipeline based on the original blog post by Yichao Zhao https://www.figma.com/blog/figmas-data-pipeline-upgrade/ from last year . Netflix - The Evolution of Cassandra Data Movement at Netflix https://netflixtechblog.medium.com/the-evolution-of-cassandra-data-movement-at-netflix-6e13329c80a1 . Alexey Makhotkin has a two parter on Data Quality part 1 https://minimalmodeling.substack.com/p/my-take-on-data-quality / part 2 https://minimalmodeling.substack.com/p/my-take-on-data-quality-tier-2 . Chris Hillman - Don’t Go Dark: Visibility Is a Data Engineering Skill https://ghostinthedata.info/posts/2026/2026-05-23-dont-go-dark/ . After the excellent survey and results that Joe Reis published about data engineering earlier this year, he’s now following up with a survey on The Organizational State of Data Engineering https://joereis.substack.com/p/the-organizational-state-of-data open for submissions until Sunday, June 21 . Mahendran Vasagam - From SSH to REST: A Security-Driven Modernization of Slack’s EMR Data Pipelines https://slack.engineering/from-ssh-to-rest-a-security-driven-modernization-of-slacks-emr-data-pipelines/ . Dana Rabba - Building Self-Healing Data Pipelines at Halodoc https://blogs.halodoc.io/building-self-healing-data-pipelines-at-halodoc/ . rocky https://github.com/rocky-data/rocky is a dbt alternative that looks quite interesting. It describes itself as "the trust plane for your warehouse", and targeting Databricks users primarily, with Snowflake and BigQuery to follow. There’s a built-in playground feature that’s worth poking around to get a feel for it. Marc Bowes describes how Aurora DSQL’s CDC feature works https://marc-bowes.com/dsql-coupler.html . If you want more, there’s further details of its use from Vijay Karumajji https://aws.amazon.com/blogs/database/getting-started-with-change-data-capture-in-amazon-aurora-dsql/ . 🔥 George Zefko - Building a CDC pipeline, part 2: Debezium Internals https://georgioszefkilis.substack.com/p/building-a-cdc-pipeline-part-2-debezium I featured part 1 last month, if you missed it it’s here https://georgioszefkilis.substack.com/p/building-a-cdc-pipeline-part-1-postgresql . A couple of good posts from the Debezium team: Apache Iceberg 1.11 https://iceberg.apache.org/releases/ 1110-release has been released I even got some small contributions https://github.com/apache/iceberg/releases/tag/apache-iceberg-1.11.0 merged 🎉 . There are more details of the release in these blog posts: 🔥 The talks from Iceberg Summit 2026 are now online https://www.youtube.com/watch?v=4Bg64WnkfgE&list=PLkifVhhWtccxSA6VskdKdLnIwCJevOqFL . Alex Merced begins an epic 15-part series about Apache Iceberg by looking at What Are Table Formats and Why Were They Needed? https://medium.com/data-engineering-with-dremio/what-are-table-formats-and-why-were-they-needed-7d5ca69546a1 Yelp’s Nick Del Nano looks at How Partition Access Visualizations Reduced Data Lake S3 Cost by 33% https://engineeringblog.yelp.com/2026/05/partition-access-visualizations.html . Honest words from Fresha’s Samuel Valente as he looks at the use of Iceberg with Snowflake in practice: Snowflake with Iceberg: Lakekeeper, dbt, and some Sparks Flying https://medium.com/fresha-data-engineering/snowflake-with-iceberg-lakekeeper-dbt-and-some-sparks-flying-a6231fcb35a7 . Daniel Guzman-Burgos describes bintrail which provides time-travel SQL for MySQL https://blog.dbtrail.com/time-travel-sql-for-mysql-finally/ . Renato Losio has a summary on InfoQ https://www.infoq.com/news/2026/05/bintrail-mysql-timetravel/ . Teiva Harsanyi - How Linux 7.0 Broke PostgreSQL: The Preemption Regression Explained https://read.thecoder.cafe/p/linux-broke-postgresql . Radim Marek covers the ORDER BY jungle https://boringsql.com/posts/order-by-jungle/ , as well as PostgreSQL’s TOAST https://boringsql.com/posts/postgresql-toast/ . Markus Winand also looks at ORDER BY and the evolution of support in different RDBMS https://modern-sql.com/blog/2026-05/order-by-history . 🔥 James Blackwood-Sewell writes up details of the benchmarking platform they built https://www.paradedb.com/blog/what-we-think-about-when-we-think-about-benchmarking , whilst Ben Dicken muses on benchmarking at PlanetScale https://planetscale.com/blog/on-benchmarking too. An opinionated, and fairly concise, set of recommendations for the use of different Open Standards for Modern Data Architecture https://www.data-landscape.com/ . LinkedIn’s Pratikmohan Srivastav writes about a performance troubleshooting experience - The 58-Million-Key Freeze: What a HashMap Resize Taught Us About Memory Allocation at Scale https://www.linkedin.com/blog/engineering/feed/the-58-million-key-freeze-what-a-hashmap-resize-taught-us-about-memory-allocation-at-scale . Sem Sinchenko - Same buffers, same instructions, same hardware. Where Is the JVM Tax? https://semyonsinchenko.github.io/ssinchenko/post/jvm-tax/ Gergely Orosz shares some excerpts https://newsletter.pragmaticengineer.com/p/designing-data-intensive-applications-book-excerpt from Martin Kleppmann’s second edition of Designing Data Intensive Applications . I warned you previously…this AI stuff is here to stay, and it’d be short-sighted to think otherwise. 🔥 Ben Evans - AI Eats the World https://www.ben-evans.com/presentations . 🔥 TikTok is my guilty pleasure, but instead of dogs misbehaving in comical ways, here’s an excellent piece to camera from Scott Hanselman reflecting on the impact of AI in our lives as software developers https://vm.tiktok.com/ZNRW27cR2/ . Pro-tip: works great with TikTok, so you don’t have to actually open the page if you still wanna view the video. yt-dlp Nate Berkopec - Thoughts on LLMs in 2026 https://www.nateberkopec.com/blog/thoughts-on-llms-in-2026/ . Julien Hurault - Time for AI Coding to Turn Boring? https://juhache.substack.com/p/time-for-ai-coding-to-turn-boring Kate Holterhoff - AI Slop & the Vulnerability Treadmill https://redmonk.com/kholterhoff/2026/05/05/ai-slop-vulnerability-treadmill/ . Paulo Arruda - What I Learned Building Multi-Agent Systems from Scratch at Shopify https://www.infoq.com/presentations/multi-agent-system-lessons/ . 🔥 Loris Cro - Contributor Poker and Zig’s AI Ban https://kristoff.it/blog/contributor-poker-and-ai/ . Lucia Cerchie - Why You Need More Than a SKILL.md https://luciacerchie.dev/articles/why-you-need-more-than-a-skill-md/ . Nothing to do with data, but stuff that I’ve found interesting or has made me smile. 🔥 An oldie 2008 but a goodie: Jeff Atwood - Don’t Go Dark https://blog.codinghorror.com/dont-go-dark/ . 🔥 Lara Hogan - Be a thermostat, not a thermometer https://larahogan.me/blog/be-a-thermostat-not-a-thermometer/ . As an IC, I endorse this pitch from Elena Verna ;- IC work is the new career flex https://www.elenaverna.com/p/ic-work-is-the-new-career-flex . 🔥 Ana Rodrigues - It’s 2026 and women are still asked to teach others to think a little bit and not be a prick https://ohhelloana.blog/woman-in-tech/ . Leyla Kazim - I did no work for a year and no one noticed https://leylakazim.substack.com/p/i-did-no-work-for-a-year . 🔥 Kevin Powell wrote this article https://www.kevinpowell.co/article/tell-someone-you-appreciate-them/ which resonated hard for me. I think it’s a boiling-frog situation; if I think about my motivation to write today, vs a year ago, vs 5, it’s definitely very different. AI noise drowns things out, kinda like SEO marketing 'content factories' did but on a bigger and more destructive scale, so as an author is it even worth writing original material? Is anyone even gonna read it? An excellent writeup from Vicki Boykis about Tagging my blog posts with BERTopic and LLMs https://vickiboykis.com/2026/05/18/tagging-my-blog-posts-with-bertopic-and-llms/ - definitely need to try this. Mike McQuaid - Open Source Resistance: Keep OSS alive on company time https://ossresistance.com/ . Very cool idea for conference badges from Shy Ruparel, with an excellent writeup https://temporal.io/blog/badges-for-replay-and-i-havent-slept-since-december to boot. I couldn’t think of a good subheading for these : Some fun nostalgia, with screenshots of various old OSes at typewritten.org http://www.typewritten.org/Media/ and The Virtual OS Museum https://virtualosmuseum.org/more-screenshots/ the latter even has, IIUC, runnable VMs for download https://virtualosmuseum.org/downloads/ Dan Carlin is probably my favourite podcaster, and as well as his well-known Hardcore Histories https://www.dancarlin.com/hardcore-history-series/ he has occasional thoughts on more current affairs, including this one: The Water in Which We Swim https://pca.st/episode/5df3b4a2-666d-4a53-9741-6d46fc85d188 . The Middle Class Museum https://www.ideagames.fun/middle-class-museum A memorial to affordable living . Fast16: The Cyberweapon That Predates Stuxnet by Five Years https://hackingpassion.com/fast16-pre-stuxnet-cyber-sabotage/ . Unresolved directive in <stdin - include::../../asciidoc-includes/il-footer.adoc