Skip to content

kurangdoa/lakehouse_iceberg

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Iceberg Lakehouse

In pursue of building local modern data lakehouse. Kubernetes is used to deploy every component of data lakehouse.

Pre-requisites

Environment

  • Apple M1 Pro
  • Sonoma 14.5
  • Python 3.10.9

Other Tools

Create Minikube Cluster

create cluster for the whole project

minikube delete -p datasaku-cluster 
minikube start -p datasaku-cluster --disk-size 60000mb --driver docker --memory=max --cpus=max

Install add-ons

minikube addons -p datasaku-cluster enable ingress 
minikube addons -p datasaku-cluster enable ingress-dns
minikube addons -p datasaku-cluster enable storage-provisioner

Create Tunnel

minikube tunnel -p datasaku-cluster

Lakehouse Components

To build the lakehouse, there will be several component needed.

  • minio
  • spark
  • nessie + psql
  • trino
  • jupyterhub + datasaku

Optional

Create NFS Share

echo "$(realpath .)/_data -alldirs -mapall="$(id -u)":"$(id -g)" $(minikube ip -p datasaku-cluster)"  | sudo tee -a /etc/exports && sudo nfsd update

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published