Lakekeeper Catalog for Apache Iceberg

Please visit https://docs.lakekeeper.io for Documentation!

This is Lakekeeper: A secure, fast and easy to use implementation of the Apache Iceberg REST Catalog specification based on apache/iceberg-rust. If you have questions, feature requests or just want a chat, we are hanging around in Discord!

Quickstart

A Docker Container is available on quay.io. We have prepared a minimal docker-compose file to demonstrate how to use the Lakekeeper catalog with common query engines.

git clone https://github.com/lakekeeper/lakekeeper.git
cd lakekeeper/examples/minimal
docker compose up

Then open your browser and head to localhost:8888 to load the example Jupyter notebooks or head to localhost:8181 for the Lakekeeper UI.

For more information on deployment, please check the Getting Started Guide.

Scope and Features

The Iceberg Catalog REST interface has become the standard for catalogs in open Lakehouses. It natively enables multi-table commits, server-side deconflicting and much more. It is figuratively the (TIP) of the Iceberg.

Written in Rust: Single all-in-one binary - no JVM or Python env required.
Storage Access Management: Lakekeeper secures access to your data using Vended-Credentials and remote signing for S3. All major Hyperscalers (AWS, Azure, GCP) as well as on-premise deployments with S3 are supported.
Openid Provider Integration: Use your own identity provider for authentication, just set LAKEKEEPER__OPENID_PROVIDER_URI and you are good to go.
Native Kubernetes Integration: Use our helm chart to easily deploy high available setups and natively authenticate kubernetes service accounts with Lakekeeper. Kubernetes and OpenID authentication can be used simultaneously. A Kubernetes Operator is currently in development.
Change Events: Built-in support to emit change events (CloudEvents), which enables you to react to any change that happen to your tables.
Change Approval: Changes can also be prohibited by external systems. This can be used to prohibit changes to tables that would invalidate Data Contracts, Quality SLOs etc. Simply integrate with your own change approval via our ContractVerification trait.
Multi-Tenant capable: A single deployment of Lakekeeper can serve multiple projects - all with a single entrypoint. Each project itself supports multiple Warehouses to which compute engines can connect.
Customizable: Lakekeeper is meant to be extended. We expose the Database implementation (Catalog), SecretsStore, Authorizer, Events (CloudEventBackend) and ContractVerification as interfaces (Traits). This allows you to tap into any access management system of your company or stream change events to any system you like - simply by implementing a handful methods.
Well-Tested: Integration-tested with spark, pyiceberg, trino and starrocks.
High Available & Horizontally Scalable: There is no local state - the catalog can be scaled horizontally easily.
Fine Grained Access (FGA): Lakekeeper's default Authorization system leverages OpenFGA. If your company already has a different system in place, you can integrate with it by implementing a handful of methods in the Authorizer trait.

If you are missing something, we would love to hear about it in a Github Issue.

Status

Supported Operations - Iceberg-Rest

Operation	Status	Description
Namespace		All operations implemented
Table		All operations implemented - additional integration tests in development
Views		Remove unused files and log entries
Metrics		Endpoint is available but doesn't store the metrics

Storage Profile Support

Storage	Status	Comment
S3 - AWS		vended-credentials & remote-signing, assume role missing
S3 - Custom		vended-credentials & remote-signing, tested against Minio
Azure ADLS Gen2
Azure Blob
Microsoft OneLake
Google Cloud Storage

Details on how to configure the storage profiles can be found in the Docs.

Supported Catalog Backends

Backend	Status	Comment
Postgres
MongoDB

Supported Secret Stores

Backend	Status	Comment
Postgres
kv2 (hcp-vault)		userpass auth

Supported Event Stores

Backend	Status	Comment
Nats
Kafka		Available in branch already, we are currently struggling with cross-compilation.

Supported Operations - Management API

Operation	Status	Description
Warehouse Management		Create / Update / Delete a Warehouse
AuthZ		Manage access to warehouses, namespaces and tables
More to come!

Auth(N/Z) Handlers

Operation	Status	Description
OIDC (AuthN)		Secure access to the catalog via OIDC
Custom (AuthZ)		If you are willing to implement a single rust Trait, the `AuthZHandler` can be implement to connect to your system
OpenFGA (AuthZ)		Internal Authorization management

License

Licensed under the Apache License, Version 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 476 Commits
.cargo		.cargo
.github		.github
.sqlx		.sqlx
assets		assets
authz/openfga		authz/openfga
crates		crates
docker-compose		docker-compose
docker		docker
docs		docs
examples		examples
openapi		openapi
release-please		release-please
site		site
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Cross.toml		Cross.toml
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
justfile		justfile
renovate.json		renovate.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lakekeeper Catalog for Apache Iceberg

Quickstart

Scope and Features