Minimalistic and free modern data stack, hence for all.
-
- Warehousing: Google BigQuery
- Orchestrating: Github Actions
- Ingesting: Meltano
- Transforming: dbt
- Cataloging: dbt docs on GitHub Pages
-
other
:- Code-editing: VS Code with:
ELT layer of mds4all
.
### BigQuery
# Log in to project
# install gcloud from https://cloud.google.com/sdk/docs/install
$ gcloud auth login
$ PROJECT_ID=$(gcloud config get-value project)
# Create user and add permissions
$ gcloud iam service-accounts create bigquery-sa --display-name="BigQuery SA"
$ gcloud projects add-iam-policy-binding $PROJECT_ID --member="serviceAccount:bigquery-sa@${PROJECT_ID}.iam.gserviceaccount.com" --role="roles/bigquery.user"
$ gcloud projects add-iam-policy-binding $PROJECT_ID --member="serviceAccount:bigquery-sa@${PROJECT_ID}.iam.gserviceaccount.com" --role="roles/bigquery.dataEditor"
# Download creds to local
$ gcloud iam service-accounts keys create bigquery-sa.json --iam-account=bigquery-sa@${PROJECT_ID}.iam.gserviceaccount.com
### Meltano
# Test Meltano locally
$ pip install meltano
$ meltano init meltano
$ cd meltano
$ meltano add extractor tap-spreadsheets-anywhere
$ meltano add loader target-bigquery
# add config info about tap and target into meltano.yml
$ meltano elt tap-spreadsheets-anywhere target-postgres
### dbt
# Test dbt locally
$ pip install dbt-bigquery
$ dbt init
# set up files
$ dbt debug
$ dbt run
$ dbt test
$ dbt docs generate
$ dbt docs serve
- Public repository
- Public documentation
- Warehousing with 100 GB of storage and 1 TB of queries