pgEdge joins rush to merge OLTP and OLAP storage to support AI

wpnews.pro

For years, enterprises have maintained separate systems for processing transactional (OLTP) and analytical (OLAP) data, even if that meant moving data between them. However, the rise of autonomous agents and AI applications needing immediate access to data while generating volumes of operational data themselves, has exposed the cost and complexity of maintaining those separate systems. The industry’s response has been quick, with data warehouse and database vendors proposing a wave of competing approaches to collapsing those data silos. In the past few weeks Databricks unveiled LTAP and EDB introduced converged analytics, while late last year Snowflake launched pg_lake, all of which offer different blueprints for bringing transactional, analytical and AI workloads closer together.

Now it’s the turn of distributed PostgreSQL provider pgEdge, which has introduced a beta version of ColdFront, a PostgreSQL-native hot-and-cold data tiering architecture that automatically moves older data into Apache Iceberg object storage while keeping PostgreSQL as the only database that applications need to interact with.

In ColdFront’s architecture, hot and cold refer to newer and older data, respectively.

The approach of keeping PostgreSQL as the primary interface is what sets ColdFront apart from the other architectures emerging in this space, differing in where the center of gravity for data lies, according to analysts.

Databricks’ LTAP keeps operational applications connected to a lakehouse where analytics and AI are performed, EDB keeps PostgreSQL as the operational source of truth while exposing data through Iceberg for analytical engines, and Snowflake’s pg_lake writes PostgreSQL data directly into Iceberg so both PostgreSQL and Snowflake can query the same data, said Ashish Chaturvedi, leader of executive research at HFS Research.

ColdFront, by contrast, treats Iceberg only as a transparent storage tier behind PostgreSQL, automatically moving older data out of the database while keeping applications on the same tables and SQL, Chaturvedi said.

The result, according to pgEdge cofounder Phillip Merrick, is that queries against recent data continue to run on PostgreSQL, while requests for older records are transparently executed using DuckDB’s embedded analytical engine, allowing applications to use the same SQL without introducing ETL pipelines, separate query paths, or application changes.

That also means older records stored in Iceberg can be updated through PostgreSQL without requiring application changes, enabling what Merrick described as a “cold writable tier.”

That cold writable tier could resonate with enterprises seeking to balance data residency, sovereignty, regulatory compliance and the growing operational demands of the agentic era, particularly because competing approaches generally require sacrificing at least one of those objectives.

As enterprises retain growing volumes of historical operational data generated by AI applications for audit and regulatory purposes, they increasingly need the ability to correct, delete or modify records, for example to comply with data protection and privacy laws, even after they have been moved into lower-cost storage, which other rival approaches complicate, said Amit Chandak, chief analytics officer at IT consulting firm Kanerika.

ColdFront can simplify those processes, said Chaturvedi: “In most tiering systems, cold (older) data is read-only, so a GDPR deletion request on archived data means restore-delete-rearchive, which is a half day job. ColdFront’s architecture would allow you to UPDATE and DELETE archived rows through one SQL statement.”

The rival architectures make different tradeoffs, with Databricks asking enterprises to adopt a proprietary lakehouse as the operational center of gravity, Snowflake requiring applications to distinguish between PostgreSQL and analytical tables, and EDB still requiring archived data to be brought back into active PostgreSQL before it can be modified, he said.

Those tradeoffs are particularly significant for regulated industries, according to Igor Ikonnikov, advisory fellow at Info-Tech Research Group, who said enterprises in financial services, healthcare and government increasingly want to keep sensitive operational data on customer-controlled infrastructure while preserving the ability to modify historical records to meet evolving regulatory obligations.

Despite their architectural differences, all the vendors are masking an emerging convergence at another layer of the stack that CIOs should take note of: an increasing dependence on DuckDB.

“ColdFront uses DuckDB to execute queries against data stored in Iceberg. Snowflake’s pg_lake routes Iceberg queries through pgduck_server, and Databricks’ Lakebase also relies on DuckDB internally for parts of its analytical processing. As a result, DuckDB is rapidly becoming the de facto embedded analytics engine for this new generation of PostgreSQL-Iceberg architectures,” Ikonnikov said.

That growing dependence creates what the analyst described as a concentration risk: “If DuckDB faces licensing changes, security vulnerabilities, performance bottlenecks or governance issues, the impact would ripple across multiple products simultaneously.”

As a result, CIOs should understand the maturity and roadmap of the shared components these architectures increasingly depend on.

However, that similarity in shared components will not make evaluation of these competing architectures easier for CIOs.

Most enterprises already have established data architectures, said Michael Leone, principal analyst at Moor Insights & Strategy, arguing that CIOs should evaluate these platforms based on where their data, developers and operational workflows already reside rather than assuming one architecture fits every environment.

For enteprises still defining their long-term data strategy, Leone recommended standardizing on Iceberg first since all four architectures support the open table format and enterprises will retain the flexibility to replace the front-end database or analytical platform later without migrating the underlying data. Even that portability, however, has limits, Ikonnikov cautioned.

“The issue is Iceberg catalog governance. All four approaches write to Iceberg, but they use different catalogs and their interoperability across vendors remains an open problem. When agents from different systems need to query the same Iceberg tables, catalog federation becomes a real operational challenge.”

source & further reading

infoworld.com — original article Why private AI is the smarter bet Agentic AI security steals the spotlight at Confidential Computing Summit New Linux Foundation project aims to bring DNS-style trust to AI agents

pgEdge joins rush to merge OLTP and OLAP storage to support AI

Run your AI side-project on zahid.host