# Komprise provides lakehouse access to petabytes of unstructured data

> Source: <https://www.blocksandfiles.com/data-management/2026/06/23/komprise-provides-lakehouse-access-to-petabytes-of-unstructured-data/5260267>
> Published: 2026-06-23 13:59:43+00:00

# Komprise provides lakehouse access to petabytes of unstructured data

[Komprise ](https://www.blocksandfiles.com/data-management/2026/05/05/komprise-patents-dynamic-load-balancing-tech/5219172)has found a way to index and make the entirety of an organization’s distributed file estate visible to lakehouse users and then move selected files needed for AI or analytics processing dynamically to the lakehouse.

Komprise provides a data management software layer above distributed unstructured data sources, both on-premises and in the cloud. The data can then be accessed through this layer, tiered or migrated, without disturbing access to the source files. Its Transparent File Tables (TFT) software now makes underlying data accessible to lakehouses via Apache Iceberg to, for example, Databricks and Snowflake users. There is no need to make bulk data copies into the lakehouse or migrate raw data there. If the actual file or object data is needed for actual AI or analytics processing, it’s moved across using Komprise’s existing AI ingest and Transparent Move Technology (TMT) software.

Komprise Co-founder and CEO Kumar Goswami said: “The reason 99 percent of enterprise unstructured data has been dark to AI and analytics is because discovering and generating its schema and moving it is inherently complex and costly. Komprise brings to light the huge petabytes of enterprise unstructured data in a form that data teams can access easily and transparently for analytics. Komprise Transparent File Tables opens a whole new world to AI.”

TFT indexes enterprise unstructured file and object data across datacenters and hybrid cloud storage, with the index entries forming a [Global Metadatabase](https://www.komprise.com/product/global-metadatabase/). This Global Metadatabase can be made available as an [Apache Iceberg](https://iceberg.apache.org/) table or as specific subsets of one.

IT users can add rich context to files with content, header and sensitive data scanning and metadata tagging using [Komprise AI Preparation and Process Automation](https://www.blocksandfiles.com/data-management/2026/02/20/komprise-launches-kappa-to-hunt-metadata-across-enterprise-file-silos/4091645 https://www.blocksandfiles.com/data-management/2026/02/20/komprise-launches-kappa-to-hunt-metadata-across-enterprise-file-silos/4091645 ) (KAPPA) data services and [Komprise Smart Data Workflows](https://www.komprise.com/product/smart-data-workflows/). The KAPPA service automates the collection of metadata tags by AI agents from unstructured data distributed across many silos; multi-vendor NAS and public cloud filestores.

IT admin people load the Global Metadatabase or subsets into data lakehouses, such as Snowflake and Databricks, by exporting Komprise Transparent File Tables. Such tables use the Iceberg format to display Komprise-enriched metadata and a pointer to the data, enabling access to remote data without moving the source files.

Lakehouse enterprise data experts can then create queries in Apache Iceberg using their preferred BI and analytics tools. They do not need to access Komprise services or know about them, and Komprise provides data governance based on user access permissions.

If the full files are required for AI or analytics, Komprise Transparent Move Technology (TMT) uses Komprise Intelligent AT Ingest and moves the needed files, Komprise says, at 2X the speed of standard data transfer tools.

[Intelligent AI Ingest](https://www.blocksandfiles.com/ai-ml/2025/09/23/komprise-launches-ai-focused-ingest-tool-to-clean-up-unstructured-data/1612772) uses filters to eliminate low-quality and sensitive data flowing from data sources via connectors during ingest. Komprise claims it doubles ingest performance compared to the AWS DataSync data transfer tool in benchmark tests because it has a massively parallel architecture and minimizes file overhead.

Komprise supplied an example of how TFT could be used; a pharmaceutical company data analyst can create dashboards in Snowflake or Databricks for their drug research projects by querying a Komprise Transparent File Table for project files generated by each instrument and lab. The analyst can then join the data with financial tables from their ERP systems and instrument information from Benchling, thus combining structured and unstructured data from different sources in a single interface.

A second example features an AI agent in media and entertainment helping with narrative alignment that can use structured project data to identify relevant media archives and join this with Komprise Transparent File Tables to narrow down which scripts to ingest for summarization.

The Komprise Transparent File Tables software is now available for early access. Read a Goswami interview [blog](https://www.komprise.com/blog/interview-komprise-transparent-file-tables/) for more information.

**Bootnote**

Komprise TMT moves files by policy to a secondary storage of your choice such as a lakehouse in this instance. It leaves behind industry-standard symbolic links that are dynamic and resilient, called Komprise Dynamic Links. These links look like the original file and preserve the original file permissions and attributes. Users and applications can open and access the moved files from their original location exactly as before, without any changes.

Download a Transparent Move Technology white paper [here](https://www.komprise.com/wp-content/uploads/Komprise-Transparent-Move-Technology-White-Paper.pdf). It compares TFT to alternatives such as ETL pipelines and built-in lakehouse features.
