Meeting Data (and Analytics) Engineers Where They Are: Introducing the dbt Adapter for Confluent Cloud

wpnews.pro

New in Confluent Cloud: Making Data & Pipelines Accessible for AI-Ready Streaming | Learn More dbt is the most commonly used tool by data engineers to define SQL transformations (as models), write tests, generate documentation, and deploy through CI/CD and now it’s available with Confluent Cloud too! The magic of dbt is that it brings the engineering rigor to modern data work and data engineering, regardless of the underlying compute source - Snowflake, BigQuery, Databricks, Redshift or Confluent. You can find out more about the launch in our

The trend today, largely driven by AI, as well as customer demand for fresh data is driving processing closer to the source. The source, for many organizations, is a stream, so as a result, batch transformations need to get turned into streaming jobs that deliver fresher data and reduce pipeline latency. As a knock-on effect, this shift tends to reduce architectural complexity. This pattern, that we call " shifting left," positions stream processing as the primary transformation layer rather than purely a and ingestion mechanism.

Confluent Cloud for Apache Flink provides the streaming engine with a declarative SQL API and Table API. The engine and API’s alone aren't enough. Data engineers also need the developer experience around it: a way to manage SQL as code, test it before it reaches production, and deploy it through the same review process they use for everything else.

When we set out to build the dbt-confluent adapter, we focused on three things data engineers told us matter most: a familiar interface for managing transformations, reliable and deterministic testing, and seamless CI/CD integration.

We built the adapter from the ground up to suit production workloads on Confluent Cloud. The dbt-confluent adapter is backed by confluent-sql, an open source purpose-built DB-API v2 compliant Python driver that communicates directly with the Confluent Cloud REST API. This eliminated the need for a proxy or middleware layer. Statement lifecycle management, result pagination, type conversion, and automatic retry on transient infrastructure issues are built into the driver. The result is an adapter that behaves the way data engineers expect a dbt adapter to behave.

Most dbt adapters are built for batch: you run a query, it produces a table, and you're done. Streaming is fundamentally different in that queries run continuously and results keep updating. The adapter needed new materializations to account for that:

view — Standard Flink SQL views (default).

streaming_table — Creates a table with a continuous INSERT INTO...SELECT statement, providing explicit control over changelog semantics.

streaming_source — Creates a Kafka topic-backed source table with schema defined in the model SQL. If you are using connectors, this would be the materialization you use to define the schema of the model for that topic.

Data engineers configure these materializations through dbt's standard config() block. Under the hood, the adapter manages the distinct execution modes. The complexity of the materialization stays behind the interface. It’s only a config option. This also separates the SQL logic from the table creation which is what allows dbt to have unit test capabilities.

Testing has historically been one of the biggest challenges with streaming SQL. When a query runs continuously and results are unbounded, how do you write a test with a deterministic pass/fail?

The adapter solves this by automatically switching to bounded execution mode ( snapshot queries) for tests. Rather than setting a timeout and hoping enough data arrives before it expires, which can produce false positives when no data means a silent pass, the adapter runs tests against finite, deterministic result sets. The outcome is reliable and repeatable, and your CI/CD pipeline won't hang waiting on an unbounded stream. This functionality is available via the native dbt unit tests. You can provide mock input data, run your model logic, and verify the output matches expectations. For example, in the configuration below, we are testing the stg_orders model. We feed the test a single mock record (an order for $29.99), and then instruct dbt to assert that the exact same row is successfully processed and emitted without data loss or unwanted mutation:

Combined with dbt's standard data quality tests (not_null, unique, custom assertions), teams get a robust testing framework for streaming pipelines.

Here's what a dbt project on Confluent Cloud looks like in practice. Start by configuring your connection to Confluent Cloud in profiles.yml:

A staging model references a Kafka topic, using a watermark to define event-time semantics for downstream windowed operations:

Next, we build a downstream revenue model to process this stream. This model (fct_revenue.sql) uses a Flink tumbling window to group the incoming orders into one-minute buckets, continuously calculating the total revenue and order volume:

Run dbt run

and both statements deploy to a Flink compute pool on Confluent Cloud.

The adapter submits each statement, confirms it's running on the compute pool, and returns control to the CLI; your terminal doesn't block waiting on an infinite stream. The revenue model then runs continuously on Flink: as new orders arrive on the Kafka topic, the tumbling window aggregates them into one-minute revenue buckets. This is a live, always-updating pipeline, deployed with the same command a data engineer would use to materialize a batch table on any data warehouse.

The integration doesn't stop at deployment. Because you are using dbt, the rest of your standard workflow naturally extends to your streaming data:

Validation: Running dbt test validates data quality against the live, running pipeline.

Documentation: Running dbt docs generate produces data lineage and documentation across all your models.

Automation: The entire lifecycle of model changes, test runs, and deployments integrates seamlessly with GitHub Actions or any other CI/CD system.

Ultimately, your streaming pipelines move through the exact same pull request, review, and deploy process as any standard software code.

The adapter is available today for teams already on Confluent Cloud:

Install: pip install dbt-confluent

Initialize: dbt init my_project

and select confluent

Configure: Set your Confluent Cloud credentials in profiles.yml

Deploy: dbt run

Full documentation is in the Confluent Cloud docs. Get in touch to let us know how you get on and discuss any features you’d like to see!

Confluent Cloud’s Q2 ‘26 launch makes AI-ready streaming accessible by delivering SQL-based workflows via the dbt adapter for Flink, enhanced developer tooling with managed MCP server and Agent Skills, production-grade AI solutions through Streaming Agents and Real-Time Context Engine, and more.

Explore new Confluent Intelligence features: enhanced querying with Real-Time Context Engine, PII detection, sentiment analysis, and support for TimesFM, Anthropic, and Fireworks AI models.

source & further reading

confluent.io — original article Announcing Confluent Platform 8.3: Powerful Apache Flink® SQL operations, Easier KRaft Migrations, Expanded Monitoring and more. We cut Flink OOMKills by 91.2%: Zombie block cache and phantom CPUs Announcing Upvest as Confluent’s 2026 EMEA Data Streaming Startup of the Year

Meeting Data (and Analytics) Engineers Where They Are: Introducing the dbt Adapter for Confluent Cloud

Run your AI side-project on zahid.host