Thank you for your interest in contributing to this project!
We created the package so that we can easily utilize dbt artifacts in python. By leveraging this package, we can implement useful tools for dbt users such as dbterd. I hope this package enhances DataOps practices on dbt. And your contribution will be definitely a part of that.
This package is designed to work with dbt artifacts in python. dbt artifacts are JSON files generated by dbt (data build tool) that contain metadata about your dbt runs, including information about models, tests, and snapshots. These artifacts are essential for understanding the state of your data transformations and can be used for debugging, monitoring, and reporting purposes.
- https://docs.getdbt.com/reference/artifacts/dbt-artifacts
- https://github.com/dbt-labs/dbt-core/tree/main/schemas/dbt
We can generate pydantic models from the JSON schema of dbt artifacts using datamodel-code-generator.
- https://docs.pydantic.dev/latest/integrations/datamodel_code_generator/
- https://github.com/koxudaxi/datamodel-code-generator/
Since the Pydantic models in this package are generated from dbt artifacts, we encounter certain technical challenges. To mitigate these challenges, we adhere to the following policies:
- We do not manually modify the generated Pydantic models.
- We utilize dbt artifacts from stable versions of dbt.
- We support only those Pydantic models that can be generated from publicly available JSON schemas of dbt artifacts.
First, we don't manually modify the generated Pydantic models, because it is quite hard to maintain changes on the generated Pydantic models. For instance, we have to re-generate all pydantic models, if we upgrade pydantic major version. It would be hard to apply the same manual changes to the new generated Pydantic models. If we need to change something in the pydantic models, it would be great to communicate with the dbt community in order to change the JSON schema of dbt artifacts.
Second, we use dbt artifacts from stable versions of dbt. dbt Core takes an alpha and beta version in the middle of the dbt release. As there is no guarantee that the JSON schema of dbt artifacts is backward compatible, we should use dbt artifacts from stable versions of dbt.
Third, we support only those Pydantic models that can be generated from publicly available JSON schemas of dbt artifacts.
At the time of writing this document, semantic_manifest.json
isn't publicly available, as the type of artifact is generated only in dbt Cloud.
Since we can't get the JSON schema of semantic_manifest.json
, we can't generate the Pydantic models.
We utilize Makefile to set up the development environment. The subsequent command is to set up the development environment. It installs the dependencies and set up the pre-commit hooks.
make setup
These are the steps to generate the Pydantic models from dbt artifacts in this package.
- Add or update the JSON schemas of dbt artifacts in the repository
- Generate Pydantic models from the JSON schemas
We get JSON schemas of dbt artifacts which we want to add or update from the repository of dbt-core. We manage the downloaded JSON schemas in the directory of dbt_artifacts_parser/resources/.
dev/generate_parser_classes.sh is a script to generate Pydantic models from the JSON schemas of dbt artifacts. If we want to add new dbt artifact(s), we need to modify the script to generate the new pydantic models.