Data and artificial intelligence (AI) company Databricks has agreed to acquire data management company Tabular, which was founded by Ryan Blue, Daniel Weeks and Jason Reid.
This move aims to bring together the original creators of the open-source lakehouse formats Apache Iceberg and Linux Foundation Delta Lake, Databricks said in a Tuesday (June 4) press release.
By doing so, Databricks aims to lead the way in data compatibility, ensuring that organizations are no longer limited by the format their data is in, according to the release.
Subject to customary closing conditions, the proposed acquisition is expected to close in Databricks’ second fiscal quarter, the release said.
Lakehouse architecture enables the integration of traditional data warehousing workloads with AI workloads on a single, governed copy of data, per the release. To achieve this, data must be stored in an open format, allowing different workloads, applications and engines to access the same data. The lakehouse architecture maximizes enterprise productivity by democratizing data access, in contrast to proprietary data warehouses that often create vendor lock-in.
The foundation of the lakehouse architecture lies in open-source data formats that enable ACID transactions on data stored in object storage, according to the release. These formats, specifically designed for open-source engines such as Apache Spark, Trino and Presto, enhance the reliability and performance of data operations.
To address the challenges posed by format incompatibility, Databricks collaborated with the Linux Foundation to create the Delta Lake project, the release said. Delta Lake has gained significant popularity, with over 500 code contributors from various organizations and over 10,000 companies globally utilizing it to process data daily.
Around the same time, the Iceberg project was developed by Blue and Weeks at Netflix and subsequently donated to the Apache Software Foundation, per the release.
Delta Lake and Iceberg have emerged as leading open-source standards for lakehouse formats, according to the release. However, due to independent development, they became incompatible over time. This led to fragmented and siloed enterprise data as different engines and tools adopted only one of the standards, or only parts of them.
To realize the full benefits of the lakehouse architecture, companies require data interoperability. Databricks recognizes this need and plans to collaborate closely with the Delta Lake and Iceberg communities to bring interoperability to the formats over time, the release said.
Last year, Databricks introduced Delta Lake UniForm, which provides interoperability across Delta Lake, Iceberg and Hudi, per the release. UniForm tables support the Iceberg restful catalog interface, allowing companies to leverage their preferred analytics engines and tools across all their data.
With the addition of the original Iceberg team, Databricks aims to further enhance Delta Lake UniForm and make it the best solution for unifying data for every workload, the release said.
This announcement comes about eight months after Databricks announced it was acquiring enterprise data company Arcion for $100 million.
It also comes about a year after Databricks bought generative AI startup MosaicML in a $1.3 billion deal.