In pursue of building local modern data lakehouse. Kubernetes is used to deploy every component of data lakehouse.
- Apple M1 Pro
- Sonoma 14.5
- Python 3.10.9
create cluster for the whole project
minikube delete -p datasaku-cluster
minikube start -p datasaku-cluster --disk-size 60000mb --driver docker --memory=max --cpus=max
minikube addons -p datasaku-cluster enable ingress
minikube addons -p datasaku-cluster enable ingress-dns
minikube addons -p datasaku-cluster enable storage-provisioner
minikube tunnel -p datasaku-cluster
To build the lakehouse, there will be several component needed.
- minio
- spark
- nessie + psql
- trino
- jupyterhub + datasaku
echo "$(realpath .)/_data -alldirs -mapall="$(id -u)":"$(id -g)" $(minikube ip -p datasaku-cluster)" | sudo tee -a /etc/exports && sudo nfsd update