← All projects

Apache Hudi

An open source data lake platform with database functionality

Data & Analyticsdata-lakehouseopen-sourcedata-lakeincremental-processingapachestreamingetl
Apache Hudi screenshot

About

Apache Hudi is an open-source data lakehouse platform built on a high-performance open table format that brings database functionality to data lakes. It replaces slow batch data processing with an incremental processing framework designed for low-latency, minute-level analytics. Hudi integrates with a wide ecosystem including Kafka, Spark, Flink, S3, BigQuery, and many other data tools.

Problem

Traditional batch data processing in data lakes is slow and lacks database-like functionality for real-time analytics.

For

data engineers and organizations managing large-scale data lakes

How it works

Hudi provides an open table format and incremental processing framework that layers database capabilities over cloud and on-prem storage, enabling CDC ingestion, streaming, and interactive queries.

Business model

open-source

Status

launched

Company

Apache Software Foundation

Similar projects