{"slug": "hands-on-with-apache-iceberg-using-dremio-cloud", "title": "Hands-On with Apache Iceberg Using Dremio Cloud", "summary": "This article, part 14 of a 15-part Apache Iceberg Masterclass, provides a practical walkthrough of using Iceberg with Dremio Cloud, covering table creation, data ingestion, and optimization. It highlights Dremio’s features like automatic metadata management via a Polaris-based catalog, transparent caching with Columnar Cloud Cache (C3), and a semantic layer for governance and AI-powered analytics. The article also details performance acceleration through precomputed \"Reflections\" and column-level access control for secure data sharing.", "body_md": "This is Part 14 of a 15-part [Apache Iceberg Masterclass](https://iceberglakehouse.com/posts/). [Part 13](https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-13/) covered streaming approaches. This article is a practical walkthrough of working with Iceberg on [Dremio Cloud](https://www.dremio.com/get-started/), covering table creation, data ingestion, optimization, semantic layer construction, and AI-powered analytics.\n\n## Table of Contents\n\n[What Are Table Formats and Why Were They Needed?](https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-01/)[The Metadata Structure of Current Table Formats](https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-02/)[Performance and Apache Iceberg's Metadata](https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-03/)[Technical Deep Dive on Partition Evolution](https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-04/)[Technical Deep Dive on Hidden Partitioning](https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-05/)[Writing to an Apache Iceberg Table](https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-06/)[What Are Lakehouse Catalogs?](https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-07/)[Embedded Catalogs: S3 Tables and MinIO AI Stor](https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-08/)[How Iceberg Table Storage Degrades Over Time](https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-09/)[Maintaining Apache Iceberg Tables](https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-10/)[Apache Iceberg Metadata Tables](https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-11/)[Using Iceberg with Python and MPP Engines](https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-12/)[Streaming Data into Apache Iceberg Tables](https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-13/)[Hands-On with Iceberg Using Dremio Cloud](https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-14/)[Migrating to Apache Iceberg](https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-15/)\n\n## Getting Started\n\n### Step 1: Sign Up and Connect Storage\n\n-\n[Create a Dremio Cloud account](https://www.dremio.com/get-started/)(free trial available) - Add a cloud storage source (S3, ADLS, or GCS) through the Sources panel\n- Configure credentials and target bucket\n\nDremio creates an [Open Catalog](https://www.dremio.com/platform/open-catalog/) for your Iceberg tables automatically. This Polaris-based catalog handles metadata management, access control, and automatic optimization.\n\n### Step 2: Create Iceberg Tables\n\n```\nCREATE TABLE analytics.orders (\n    order_id BIGINT,\n    customer_id BIGINT,\n    order_date DATE,\n    amount DECIMAL(10,2),\n    status VARCHAR,\n    region VARCHAR\n)\nPARTITION BY (day(order_date))\n```\n\nThis creates a table with [hidden partitioning](https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-05/) by day. Users query on `order_date`\n\nnaturally; the engine handles partition pruning automatically.\n\n### Step 3: Ingest Data\n\n**From files in object storage:**\n\n```\nCOPY INTO analytics.orders\nFROM '@my_s3_source/raw/orders/'\nFILE_FORMAT 'parquet'\n```\n\n**From another table or source:**\n\n```\nINSERT INTO analytics.orders\nSELECT * FROM postgres_source.public.orders\nWHERE order_date >= '2024-01-01'\n```\n\n[Dremio's federation](https://www.dremio.com/platform/federation/) can query data in PostgreSQL, MySQL, Oracle, MongoDB, S3 files, and other sources directly. You can migrate data into Iceberg tables with a single INSERT...SELECT statement.\n\n## The Dremio Platform\n\n### Columnar Cloud Cache\n\nDremio's [Columnar Cloud Cache (C3)](https://www.dremio.com/blog/dremios-columnar-cloud-cache-c3/) stores frequently accessed Iceberg data on local NVMe SSDs attached to the query engine nodes. When a query accesses data for the first time, Dremio caches the relevant columns locally. Subsequent queries against the same data read from local SSD instead of remote object storage, reducing latency from hundreds of milliseconds to single-digit milliseconds.\n\nC3 operates transparently. You do not need to configure which data to cache. Dremio tracks access patterns and caches the most-queried data automatically.\n\n### Connecting BI Tools\n\nDremio exposes Iceberg data through ODBC, JDBC, and Arrow Flight endpoints. Any BI tool (Tableau, Power BI, Looker, Superset) can connect to Dremio and query Iceberg tables as if they were a traditional database. The semantic layer ensures consistent governance and naming across all connected tools.\n\n### Semantic Layer\n\nDremio's [semantic layer](https://www.dremio.com/platform/semantic-layer/) lets you create governed SQL views that serve as the interface between raw data and consumers:\n\n```\nCREATE VIEW analytics.customer_orders AS\nSELECT\n    o.customer_id,\n    c.customer_name,\n    c.region,\n    SUM(o.amount) AS total_spend,\n    COUNT(*) AS order_count\nFROM analytics.orders o\nJOIN analytics.customers c ON o.customer_id = c.customer_id\nGROUP BY o.customer_id, c.customer_name, c.region\n```\n\nAdd wikis and tags to views and tables through the Dremio UI. These descriptions help other users find and understand data, and they power the [AI agent's](https://www.dremio.com/platform/ai/) ability to generate accurate SQL from natural language.\n\n### Reflections (Query Acceleration)\n\nDremio Reflections are precomputed materializations that automatically accelerate queries without requiring changes to your SQL. When you create a reflection on a view or table, Dremio precomputes the results and stores them as optimized Iceberg tables on fast storage:\n\n```\n-- Create an aggregation reflection for fast dashboard queries\nALTER TABLE analytics.customer_orders\n  CREATE AGGREGATE REFLECTION customer_orders_agg\n  USING DIMENSIONS (region, order_date)\n  MEASURES (total_spend SUM, order_count SUM)\n```\n\nWhen a query matches the reflection's definition, Dremio serves it from the precomputed data instead of scanning the full table. Queries that take 30 seconds against raw data can complete in under 1 second with reflections. The query optimizer chooses the reflection transparently, so users and applications do not need to know reflections exist.\n\n### Data Governance\n\nDremio provides column-level access control and row-level filtering directly in the [semantic layer](https://www.dremio.com/platform/semantic-layer/):\n\n```\n-- Create a view that masks PII for non-privileged users\nCREATE VIEW analytics.orders_masked AS\nSELECT\n    order_id,\n    CASE WHEN is_member('finance_team') THEN customer_name\n         ELSE '***MASKED***' END AS customer_name,\n    order_date,\n    amount\nFROM analytics.orders\n```\n\nGovernance policies defined in the semantic layer apply consistently regardless of which tool (BI dashboard, Python notebook, AI agent) queries the data. This approach is more maintainable than duplicating access policies in every consuming application.\n\n### Query Federation\n\nOne of Dremio's unique capabilities is querying Iceberg tables alongside data in other systems:\n\n```\n-- Join Iceberg table with a PostgreSQL table\nSELECT i.order_id, i.amount, p.payment_status\nFROM analytics.orders i\nJOIN postgres_source.public.payments p\nON i.order_id = p.order_id\n```\n\nThis eliminates the need to move all data into Iceberg before you can query it. You can [start with federation and migrate incrementally](https://www.dremio.com/blog/the-journey-from-scattered-data-to-an-apache-iceberg-lakehouse-with-governed-agentic-analytics/). Federation is especially useful during [migration](https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-15/): query legacy systems and Iceberg tables side by side, then swap the underlying source when you are ready.\n\n## Essential SQL Operations\n\n### Table Optimization\n\n```\n-- Compact small files\nOPTIMIZE TABLE analytics.orders REWRITE DATA USING BIN_PACK\n\n-- Compact with sorting for better file skipping\nOPTIMIZE TABLE analytics.orders REWRITE DATA USING SORT (order_date, customer_id)\n\n-- Expire old snapshots\nALTER TABLE analytics.orders EXPIRE SNAPSHOTS OLDER_THAN = '2024-04-01 00:00:00'\n```\n\nFor tables managed by [Open Catalog](https://www.dremio.com/platform/open-catalog/), Dremio runs [automatic table optimization](https://www.dremio.com/blog/table-optimization-in-dremio/) in the background, handling compaction, expiry, and orphan cleanup without user intervention.\n\n### Time Travel\n\n```\n-- Query the table as of a specific timestamp\nSELECT * FROM analytics.orders\nAT TIMESTAMP '2024-03-01 00:00:00'\n\n-- Compare current data to a previous snapshot\nSELECT\n    current_data.region,\n    current_data.total - old_data.total AS growth\nFROM (SELECT region, SUM(amount) AS total FROM analytics.orders GROUP BY region) current_data\nJOIN (\n    SELECT region, SUM(amount) AS total\n    FROM analytics.orders AT TIMESTAMP '2024-01-01'\n    GROUP BY region\n) old_data ON current_data.region = old_data.region\n```\n\n### Metadata Inspection\n\n```\n-- Check table health\nSELECT AVG(file_size_in_bytes)/1048576 AS avg_mb, COUNT(*) AS files\nFROM TABLE(table_files('analytics.orders'))\n\n-- Review recent snapshots\nSELECT committed_at, operation, summary\nFROM TABLE(table_snapshot('analytics.orders'))\nORDER BY committed_at DESC LIMIT 5\n```\n\n## AI-Powered Analytics\n\nDremio's built-in [AI agent](https://www.dremio.com/platform/ai/) converts natural language questions into SQL queries using the semantic layer's wikis and tags as context:\n\n- \"Show me the top 10 customers by total spend this quarter\"\n- \"What was the month-over-month revenue growth by region?\"\n- \"Which products had the highest return rate last month?\"\n\nThe AI agent generates standard SQL, meaning the results are transparent and auditable. Users can see exactly what SQL was generated, verify it, and refine it. This is different from black-box AI analytics tools that hide the underlying logic.\n\n### MCP Server for External AI Agents\n\nThe [MCP Server](https://www.dremio.com/blog/getting-started-with-the-dremio-mcp-server/) extends Dremio's data access to external AI agents and tools through the Model Context Protocol. LLMs running in Claude, ChatGPT, or custom agent frameworks can query your Iceberg lakehouse through MCP, inheriting all the governance, semantic context, and optimization that Dremio provides.\n\nThis positions Dremio as the data layer for [agentic AI](https://www.dremio.com/platform/ai/) workflows: the AI agent asks questions in natural language, MCP translates them into governed SQL, and Dremio returns the results from optimized Iceberg tables.\n\n[Part 15](https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-15/) covers strategies for migrating existing data into Iceberg.\n\n### Books to Go Deeper\n\n-\n[Architecting the Apache Iceberg Lakehouse](https://www.amazon.com/Architecting-Apache-Iceberg-Lakehouse-open-source/dp/1633435105/)by Alex Merced (Manning) -\n[Lakehouses with Apache Iceberg: Agentic Hands-on](https://www.amazon.com/Lakehouses-Apache-Iceberg-Agentic-Hands-ebook/dp/B0GQL4QNRT/)by Alex Merced -\n[Constructing Context: Semantics, Agents, and Embeddings](https://www.amazon.com/Constructing-Context-Semantics-Agents-Embeddings/dp/B0GSHRZNZ5/)by Alex Merced -\n[Apache Iceberg & Agentic AI: Connecting Structured Data](https://www.amazon.com/Apache-Iceberg-Agentic-Connecting-Structured/dp/B0GW2WF4PX/)by Alex Merced -\n[Open Source Lakehouse: Architecting Analytical Systems](https://www.amazon.com/Open-Source-Lakehouse-Architecting-Analytical/dp/B0GW595MVL/)by Alex Merced", "url": "https://wpnews.pro/news/hands-on-with-apache-iceberg-using-dremio-cloud", "canonical_source": "https://dev.to/alexmercedcoder/hands-on-with-apache-iceberg-using-dremio-cloud-fa4", "published_at": "2026-05-22 17:19:31+00:00", "updated_at": "2026-05-22 17:34:31.142242+00:00", "lang": "en", "topics": ["data", "cloud-computing", "open-source", "enterprise-software", "developer-tools"], "entities": ["Apache Iceberg", "Dremio Cloud", "Polaris", "PostgreSQL", "MySQL", "Oracle", "MongoDB", "S3"], "alternates": {"html": "https://wpnews.pro/news/hands-on-with-apache-iceberg-using-dremio-cloud", "markdown": "https://wpnews.pro/news/hands-on-with-apache-iceberg-using-dremio-cloud.md", "text": "https://wpnews.pro/news/hands-on-with-apache-iceberg-using-dremio-cloud.txt", "jsonld": "https://wpnews.pro/news/hands-on-with-apache-iceberg-using-dremio-cloud.jsonld"}}