Bare bones DBT project
Structure your data analysis projects DBT style. Useful when dbt
is not available in your organisation but you prefer structure in your projects. Nothing fancy here. Just a fallible bash
script that works if used correctly 😎.
Unfortunately this only works with sqlcmd
(SQLSERVER) but can easily be re-written for other RDBMSs (You'll have to change one line of code 🤠).
If you're not familiar with the data-buid-tool then check it out here. If your organisation allows the use of dbt
then don't waste your time reading further ✌️.
Create a .bashrc
file in your home directory
vi ~/.bashrc
and paste the snippet below.
export PATH=/this/directory/contains/mini-dbt:$PATH
alias mini-dbt=mini-dbt.sh
You can also just use the shell script directly. No need to add to $PATH
.
# chmod if you need to
./mini-dbt.sh --help
mkdir my-awesome-mini-dbt-project && cd my-awesome-mini-dbt-project
mini-dbt init
Start your Analyses 🚀
- Help
mini-dbt --help
- Delete all objects
mini-dbt --cleanup
- Start a
mini-dbt
project
mini-dbt init
- Build the entire project (Run all models
.sql
files). This excludes models in thetests
andcleanup
directories.
mini-dbt --all
- Build models in the staging folder
mini-dbt --build-models-in-folder staging
- Build entire project excluding cleanup and analyses folders
mini-dbt --exclude-folder cleanup/ analyses/
- Build a single model
mini-dbt --build-model stg_res_mod__rabbits.sql
- Build models in a directory excluding selected models
mini-dbt --exclude-models-in-folder staging stg_res_mod__rabbits.sql
This is where you create sql
scripts that generate the base tables (views) of your project.
Filename: stg_schema__rabbits.sql
-- Idempotency is a rabbit's best friend. Be like a rabbit
drop view if exists schema.stg_schema__rabbits;
create view schema.stg_schema__rabbits
as
select *
from schema.rabbits
This allows you to analyse the base tables before you start joining them onto others. Might be useful to add an Id
column to track duplicates after joins, etc. Normally, there'll be no joins here but the choice is yours.
Want to do some pivoting, unpivoting, grouping, etc? Then this is a great folder to place those sql
scripts in.
Filename: pivot_rabbits_by_age_range.sql
drop view if exists pivot_rabbits_by_age_range;
create view pivot_rabbits_by_age_range
as
select
rabbit_name,
sum(case when age between 0 and 10 then 1 end) as YoungRabbits,
sum(case when age between 11 and 20 then 1 end) as OldRabbits
from schema.stg_schema__rabbits
group by
rabbit_name
You might want to join a base view with the results of this view in the serving
folder.
Typically, you'll join all base views to create one big view and export the view to your semantic layer of choice in this folder. If your semantic layer supports normalized relations, then you can export the views generated from the staging/intermediate folders.
Filename: serving_rabbits.sql
-- Yes these rabbits have babies and we're categorising them by age!
with rabbits as (
select *
from schema.stg_schema__rabbits
),
number_of_rabbits_in_diff_age_ranges as (
select
rabbit_name,
YoungRabbits,
OldRabbits
from pivot_rabbits_by_age_range
),
merged as (
select *
from rabbits
inner join number_of_rabbits_in_diff_age_ranges
on rabbits.rabbit_name = number_of_rabbits_in_diff_age_ranges.rabbit_name
)
select * from merged;
The analyses folder should contain .sql
scripts that generate datasets you wish to analyse. For example, say you want to generate a dataset containing rabbits that live in the same shelter, this is where you place such a query. This is useful because you might want to show the results to your manager, teammate, etc. It can be a great extension of the serving
folder.
Filename: rabbits_with_same_shelters.sql
with rabbits as (
select *
from serving_rabbits
),
rabbits_sharing_shelters as (
select
rabbit_name
from rabbits
group by shelters
having count(*) > 1
),
merged as (
select *
from rabbits
where rabbit_name in (select rabbit_name from rabbits_sharing_shelters)
)
select * from merged;
Have photos/any static files? Place them here please.
With DBT the test folder contains scripts used to create singular tests however for mini-dbt
, this is where you run some select
statements on the outputs generated from the analyses
folder. For example, find all rabbits that are white, 6ft tall and called Harry
. Think of it as an extension of the analyses
folder. No relations are created here.
Show the results to your teammates
select *
from rabbits_with_same_shelters
where rabbit_name = 'Harry';
go
select *
from rabbits_with_same_shelters
where rabbit_height = 6ft
order by rabbit_name
offset 0 rows fetch next 10 rows only;
Remember you generated this project because of an adhoc request? Well, now that the manager is satisfied, there's no need to leave all the objects you generated in the database. Create a cleanup.sql
script that contains drop
statements to remove all objects generated from this project.
Filename: cleanup.sql
drop view schema.stg_schema__rabbits;
drop view schema.rabbits_with_same_shelters;
and run
mini-dbt -build-model cleanup.sql
to run the script. Models are supposed to contain DML statements but this is the only exception.