Apache Iceberg Metadata Tables: Querying the Internals

Apache Iceberg provides queryable metadata tables that allow users to inspect table internals using standard SQL, enabling tasks such as health checks, performance debugging, and change auditing. Key metadata tables include `$snapshots`, `$files`, `$manifests`, and `$partitions`, which expose details like snapshot history, file statistics, and partition-level metrics. These tables also support time travel queries and incremental processing by comparing snapshots to identify newly added data files.

This is Part 11 of a 15-part Apache Iceberg Masterclass https://iceberglakehouse.com/posts/ . Part 10 https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-10/ covered maintenance operations. This article covers the metadata tables that let you inspect Iceberg table internals using standard SQL. Iceberg exposes its internal metadata as queryable virtual tables. You can use them to check table health, debug performance issues, audit changes, and build monitoring dashboards. No special tools required, just SQL. Table of Contents What Are Table Formats and Why Were They Needed? https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-01/ The Metadata Structure of Current Table Formats https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-02/ Performance and Apache Iceberg's Metadata https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-03/ Technical Deep Dive on Partition Evolution https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-04/ Technical Deep Dive on Hidden Partitioning https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-05/ Writing to an Apache Iceberg Table https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-06/ What Are Lakehouse Catalogs? https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-07/ Embedded Catalogs: S3 Tables and MinIO AI Stor https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-08/ How Iceberg Table Storage Degrades Over Time https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-09/ Maintaining Apache Iceberg Tables https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-10/ Apache Iceberg Metadata Tables https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-11/ Using Iceberg with Python and MPP Engines https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-12/ Streaming Data into Apache Iceberg Tables https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-13/ Hands-On with Iceberg Using Dremio Cloud https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-14/ Migrating to Apache Iceberg https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-15/ The Seven Metadata Tables Snapshots The $snapshots table lists every snapshot in the table's history. Each row represents a committed transaction. -- Dremio syntax SELECT FROM TABLE table snapshot 'analytics.orders' -- Spark syntax SELECT FROM analytics.orders.snapshots Key columns: snapshot id , committed at , operation append, overwrite, delete , summary files added/removed counts . History The $history table shows the timeline of which snapshot was current at each point in time. SELECT FROM TABLE table history 'analytics.orders' Files The $files table lists every data file in the current snapshot with detailed statistics. SELECT file path, file size in bytes, record count, partition FROM TABLE table files 'analytics.orders' This is the primary diagnostic table for checking file sizes https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-09/ and identifying the small file problem. Manifests The $manifests table lists the manifest files for the current snapshot. SELECT path, length, added data files count, existing data files count FROM TABLE table manifests 'analytics.orders' Partitions The $partitions table provides statistics per partition: row counts, file counts, and size. SELECT partition, record count, file count FROM TABLE table partitions 'analytics.orders' Practical Use Cases Monitoring: Average File Size SELECT AVG file size in bytes / 1048576 AS avg file mb, MIN file size in bytes / 1048576 AS min file mb, COUNT AS total files FROM TABLE table files 'analytics.orders' If avg file mb drops below 64, schedule compaction. Debugging: Files Per Partition SELECT partition, COUNT AS files, SUM record count AS rows FROM TABLE table files 'analytics.orders' GROUP BY partition ORDER BY files DESC LIMIT 20 Partitions with hundreds of files are compaction candidates. Use this query as a daily health check and pipe the results into your monitoring system. Debugging: Sort Order Effectiveness Column statistics in the files table reveal whether your sort order is effective: SELECT file path, lower bounds 'customer id' AS min customer id, upper bounds 'customer id' AS max customer id FROM TABLE table files 'analytics.orders' If the min/max ranges overlap heavily across files, the sort order has decayed and compaction with sorting Part 10 https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-10/ will restore effectiveness. Monitoring: Commit Velocity Track how frequently the table is being written to: SELECT DATE TRUNC 'hour', committed at AS hour, COUNT AS commits, SUM CAST summary 'added-data-files' AS INT AS files added FROM TABLE table snapshot 'analytics.orders' WHERE committed at CURRENT TIMESTAMP - INTERVAL '24' HOUR GROUP BY DATE TRUNC 'hour', committed at ORDER BY hour High commit velocity hundreds of commits per hour indicates a streaming workload https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-13/ that needs aggressive compaction. Auditing: Recent Changes SELECT committed at, operation, summary FROM TABLE table snapshot 'analytics.orders' ORDER BY committed at DESC LIMIT 10 This shows the last 10 operations: how many files were added or removed per commit. Time Travel Metadata tables enable time travel queries. Use the snapshot list to find the snapshot ID for a specific point in time, then query the table at that snapshot: -- Query the table as it existed on February 15 SELECT FROM analytics.orders AT SNAPSHOT '1234567890123456789' -- Or by timestamp SELECT FROM analytics.orders AT TIMESTAMP '2024-02-15 00:00:00' Time travel is useful for debugging data issues "what did this table look like before yesterday's pipeline ran?" , auditing "what was the account balance at end-of-quarter?" , and reproducible analysis "run this report against last month's data" . Incremental Reads Metadata tables also enable incremental processing. By comparing two snapshots, you can identify which files were added between them and process only the new data: -- Find files added in the last snapshot SELECT file path, record count FROM TABLE table files 'analytics.orders' WHERE file path NOT IN SELECT file path FROM TABLE table files 'analytics.orders' AT SNAPSHOT '1234567890' This pattern is the foundation for CDC Change Data Capture on Iceberg tables: read only what changed since the last processing run, rather than re-scanning the entire table. Rollback If a bad write corrupts your table, use the snapshot list to rollback: -- Find the last good snapshot SELECT snapshot id, committed at, operation FROM TABLE table snapshot 'analytics.orders' ORDER BY committed at DESC -- Rollback to it Spark CALL system.rollback to snapshot 'analytics.orders', 1234567890 Rollback does not delete data. It simply changes the current snapshot pointer to an earlier snapshot, making the table appear as it was at that point. The rolled-back data files remain in storage for potential recovery. Dremio https://docs.dremio.com/cloud/sonar/query-manage/querying-metadata/ supports all Iceberg metadata table queries through its TABLE function syntax and provides time travel in both SQL and its semantic layer. Building a Health Dashboard Combine metadata table queries into a scheduled monitoring job: -- Table health summary SELECT SELECT COUNT FROM TABLE table snapshot 'analytics.orders' AS snapshots, SELECT COUNT FROM TABLE table files 'analytics.orders' AS files, SELECT AVG file size in bytes /1048576 FROM TABLE table files 'analytics.orders' AS avg mb, SELECT COUNT FROM TABLE table manifests 'analytics.orders' AS manifests Set alerts when snapshots exceed 1,000, average file size drops below 64 MB, or manifest count exceeds 500. Engine Syntax Variations Different engines use different syntax for metadata tables: The underlying data is identical; only the SQL syntax differs. Regardless of which engine you use, these metadata tables are the key diagnostic tool for understanding and maintaining Iceberg table health. Automating Decisions with Metadata You can use metadata table queries to drive automated maintenance decisions. For example, a scheduler can check whether compaction is needed before running it: -- Only compact if average file size is below threshold SELECT CASE WHEN AVG file size in bytes / 1048576 < 64 THEN 'COMPACT NEEDED' ELSE 'HEALTHY' END AS table status FROM TABLE table files 'analytics.orders' This avoids running compaction on tables that are already well-organized, saving compute costs and preventing unnecessary data rewrites. For production environments, integrate these checks into your orchestration tool Airflow, Dagster, Prefect . Schedule a daily metadata scan across all tables, collect the health metrics, and trigger maintenance jobs only for tables that need them. This approach scales to hundreds of tables without manual oversight. Dremio's autonomous optimization https://www.dremio.com/blog/table-optimization-in-dremio/ automates this entire workflow for tables managed by Open Catalog. Part 12 https://iceberglakehouse.com/posts/2026-04-29-iceberg-masterclass-12/ covers using Iceberg from Python and MPP query engines. Books to Go Deeper - Architecting the Apache Iceberg Lakehouse https://www.amazon.com/Architecting-Apache-Iceberg-Lakehouse-open-source/dp/1633435105/ by Alex Merced Manning - Lakehouses with Apache Iceberg: Agentic Hands-on https://www.amazon.com/Lakehouses-Apache-Iceberg-Agentic-Hands-ebook/dp/B0GQL4QNRT/ by Alex Merced - Constructing Context: Semantics, Agents, and Embeddings https://www.amazon.com/Constructing-Context-Semantics-Agents-Embeddings/dp/B0GSHRZNZ5/ by Alex Merced - Apache Iceberg & Agentic AI: Connecting Structured Data https://www.amazon.com/Apache-Iceberg-Agentic-Connecting-Structured/dp/B0GW2WF4PX/ by Alex Merced - Open Source Lakehouse: Architecting Analytical Systems https://www.amazon.com/Open-Source-Lakehouse-Architecting-Analytical/dp/B0GW595MVL/ by Alex Merced