Hands-On with Apache Iceberg Using Dremio Cloud

This article, part 14 of a 15-part Apache Iceberg Masterclass, provides a practical walkthrough of using Iceberg with Dremio Cloud, covering table creation, data ingestion, and optimization. It highlights Dremio’s features like automatic metadata management via a Polaris-based catalog, transparent caching with Columnar Cloud Cache (C3), and a semantic layer for governance and AI-powered analytics. The article also details performance acceleration through precomputed "Reflections" and column-level access control for secure data sharing.

This is Part 14 of a 15-part Apache Iceberg Masterclass https://iceberglakehouse.com/posts/ . Part 13 https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-13/ covered streaming approaches. This article is a practical walkthrough of working with Iceberg on Dremio Cloud https://www.dremio.com/get-started/ , covering table creation, data ingestion, optimization, semantic layer construction, and AI-powered analytics. Table of Contents What Are Table Formats and Why Were They Needed? https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-01/ The Metadata Structure of Current Table Formats https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-02/ Performance and Apache Iceberg's Metadata https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-03/ Technical Deep Dive on Partition Evolution https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-04/ Technical Deep Dive on Hidden Partitioning https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-05/ Writing to an Apache Iceberg Table https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-06/ What Are Lakehouse Catalogs? https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-07/ Embedded Catalogs: S3 Tables and MinIO AI Stor https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-08/ How Iceberg Table Storage Degrades Over Time https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-09/ Maintaining Apache Iceberg Tables https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-10/ Apache Iceberg Metadata Tables https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-11/ Using Iceberg with Python and MPP Engines https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-12/ Streaming Data into Apache Iceberg Tables https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-13/ Hands-On with Iceberg Using Dremio Cloud https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-14/ Migrating to Apache Iceberg https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-15/ Getting Started Step 1: Sign Up and Connect Storage - Create a Dremio Cloud account https://www.dremio.com/get-started/ free trial available - Add a cloud storage source S3, ADLS, or GCS through the Sources panel - Configure credentials and target bucket Dremio creates an Open Catalog https://www.dremio.com/platform/open-catalog/ for your Iceberg tables automatically. This Polaris-based catalog handles metadata management, access control, and automatic optimization. Step 2: Create Iceberg Tables CREATE TABLE analytics.orders order id BIGINT, customer id BIGINT, order date DATE, amount DECIMAL 10,2 , status VARCHAR, region VARCHAR PARTITION BY day order date This creates a table with hidden partitioning https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-05/ by day. Users query on order date naturally; the engine handles partition pruning automatically. Step 3: Ingest Data From files in object storage: COPY INTO analytics.orders FROM '@my s3 source/raw/orders/' FILE FORMAT 'parquet' From another table or source: INSERT INTO analytics.orders SELECT FROM postgres source.public.orders WHERE order date = '2024-01-01' Dremio's federation https://www.dremio.com/platform/federation/ can query data in PostgreSQL, MySQL, Oracle, MongoDB, S3 files, and other sources directly. You can migrate data into Iceberg tables with a single INSERT...SELECT statement. The Dremio Platform Columnar Cloud Cache Dremio's Columnar Cloud Cache C3 https://www.dremio.com/blog/dremios-columnar-cloud-cache-c3/ stores frequently accessed Iceberg data on local NVMe SSDs attached to the query engine nodes. When a query accesses data for the first time, Dremio caches the relevant columns locally. Subsequent queries against the same data read from local SSD instead of remote object storage, reducing latency from hundreds of milliseconds to single-digit milliseconds. C3 operates transparently. You do not need to configure which data to cache. Dremio tracks access patterns and caches the most-queried data automatically. Connecting BI Tools Dremio exposes Iceberg data through ODBC, JDBC, and Arrow Flight endpoints. Any BI tool Tableau, Power BI, Looker, Superset can connect to Dremio and query Iceberg tables as if they were a traditional database. The semantic layer ensures consistent governance and naming across all connected tools. Semantic Layer Dremio's semantic layer https://www.dremio.com/platform/semantic-layer/ lets you create governed SQL views that serve as the interface between raw data and consumers: CREATE VIEW analytics.customer orders AS SELECT o.customer id, c.customer name, c.region, SUM o.amount AS total spend, COUNT AS order count FROM analytics.orders o JOIN analytics.customers c ON o.customer id = c.customer id GROUP BY o.customer id, c.customer name, c.region Add wikis and tags to views and tables through the Dremio UI. These descriptions help other users find and understand data, and they power the AI agent's https://www.dremio.com/platform/ai/ ability to generate accurate SQL from natural language. Reflections Query Acceleration Dremio Reflections are precomputed materializations that automatically accelerate queries without requiring changes to your SQL. When you create a reflection on a view or table, Dremio precomputes the results and stores them as optimized Iceberg tables on fast storage: -- Create an aggregation reflection for fast dashboard queries ALTER TABLE analytics.customer orders CREATE AGGREGATE REFLECTION customer orders agg USING DIMENSIONS region, order date MEASURES total spend SUM, order count SUM When a query matches the reflection's definition, Dremio serves it from the precomputed data instead of scanning the full table. Queries that take 30 seconds against raw data can complete in under 1 second with reflections. The query optimizer chooses the reflection transparently, so users and applications do not need to know reflections exist. Data Governance Dremio provides column-level access control and row-level filtering directly in the semantic layer https://www.dremio.com/platform/semantic-layer/ : -- Create a view that masks PII for non-privileged users CREATE VIEW analytics.orders masked AS SELECT order id, CASE WHEN is member 'finance team' THEN customer name ELSE ' MASKED ' END AS customer name, order date, amount FROM analytics.orders Governance policies defined in the semantic layer apply consistently regardless of which tool BI dashboard, Python notebook, AI agent queries the data. This approach is more maintainable than duplicating access policies in every consuming application. Query Federation One of Dremio's unique capabilities is querying Iceberg tables alongside data in other systems: -- Join Iceberg table with a PostgreSQL table SELECT i.order id, i.amount, p.payment status FROM analytics.orders i JOIN postgres source.public.payments p ON i.order id = p.order id This eliminates the need to move all data into Iceberg before you can query it. You can start with federation and migrate incrementally https://www.dremio.com/blog/the-journey-from-scattered-data-to-an-apache-iceberg-lakehouse-with-governed-agentic-analytics/ . Federation is especially useful during migration https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-15/ : query legacy systems and Iceberg tables side by side, then swap the underlying source when you are ready. Essential SQL Operations Table Optimization -- Compact small files OPTIMIZE TABLE analytics.orders REWRITE DATA USING BIN PACK -- Compact with sorting for better file skipping OPTIMIZE TABLE analytics.orders REWRITE DATA USING SORT order date, customer id -- Expire old snapshots ALTER TABLE analytics.orders EXPIRE SNAPSHOTS OLDER THAN = '2024-04-01 00:00:00' For tables managed by Open Catalog https://www.dremio.com/platform/open-catalog/ , Dremio runs automatic table optimization https://www.dremio.com/blog/table-optimization-in-dremio/ in the background, handling compaction, expiry, and orphan cleanup without user intervention. Time Travel -- Query the table as of a specific timestamp SELECT FROM analytics.orders AT TIMESTAMP '2024-03-01 00:00:00' -- Compare current data to a previous snapshot SELECT current data.region, current data.total - old data.total AS growth FROM SELECT region, SUM amount AS total FROM analytics.orders GROUP BY region current data JOIN SELECT region, SUM amount AS total FROM analytics.orders AT TIMESTAMP '2024-01-01' GROUP BY region old data ON current data.region = old data.region Metadata Inspection -- Check table health SELECT AVG file size in bytes /1048576 AS avg mb, COUNT AS files FROM TABLE table files 'analytics.orders' -- Review recent snapshots SELECT committed at, operation, summary FROM TABLE table snapshot 'analytics.orders' ORDER BY committed at DESC LIMIT 5 AI-Powered Analytics Dremio's built-in AI agent https://www.dremio.com/platform/ai/ converts natural language questions into SQL queries using the semantic layer's wikis and tags as context: - "Show me the top 10 customers by total spend this quarter" - "What was the month-over-month revenue growth by region?" - "Which products had the highest return rate last month?" The AI agent generates standard SQL, meaning the results are transparent and auditable. Users can see exactly what SQL was generated, verify it, and refine it. This is different from black-box AI analytics tools that hide the underlying logic. MCP Server for External AI Agents The MCP Server https://www.dremio.com/blog/getting-started-with-the-dremio-mcp-server/ extends Dremio's data access to external AI agents and tools through the Model Context Protocol. LLMs running in Claude, ChatGPT, or custom agent frameworks can query your Iceberg lakehouse through MCP, inheriting all the governance, semantic context, and optimization that Dremio provides. This positions Dremio as the data layer for agentic AI https://www.dremio.com/platform/ai/ workflows: the AI agent asks questions in natural language, MCP translates them into governed SQL, and Dremio returns the results from optimized Iceberg tables. Part 15 https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-15/ covers strategies for migrating existing data into Iceberg. Books to Go Deeper - Architecting the Apache Iceberg Lakehouse https://www.amazon.com/Architecting-Apache-Iceberg-Lakehouse-open-source/dp/1633435105/ by Alex Merced Manning - Lakehouses with Apache Iceberg: Agentic Hands-on https://www.amazon.com/Lakehouses-Apache-Iceberg-Agentic-Hands-ebook/dp/B0GQL4QNRT/ by Alex Merced - Constructing Context: Semantics, Agents, and Embeddings https://www.amazon.com/Constructing-Context-Semantics-Agents-Embeddings/dp/B0GSHRZNZ5/ by Alex Merced - Apache Iceberg & Agentic AI: Connecting Structured Data https://www.amazon.com/Apache-Iceberg-Agentic-Connecting-Structured/dp/B0GW2WF4PX/ by Alex Merced - Open Source Lakehouse: Architecting Analytical Systems https://www.amazon.com/Open-Source-Lakehouse-Architecting-Analytical/dp/B0GW595MVL/ by Alex Merced