-
-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get RxNorm data models up to speed with dbt best practices #280
base: main
Are you sure you want to change the base?
Conversation
dbt/sagerx/dbt_project.yml
Outdated
@@ -35,7 +35,7 @@ models: | |||
+materialized: view | |||
intermediate: | |||
+schema: sagerx | |||
+materialized: view | |||
+materialized: table |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
most impactful change here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the reason we maybe wouldn't want to make this change by itself would be that the materialized table has no schedule for being refreshed, other than when we run the DAGs that rebuild everything. I'm leaving a comment on the original issue discussion as to another possible solution
Also, this particular change would materialize ALL the models in this layer as tables which might become slightly taxing on the DAGs that build this layer (definitely Rxnorm, potentially others?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry im not following your concerns.
- The intermediate tables are updated by the rxnorm DAG with the final "transform" task. So the tables should be updated after running the DAG which I believe is the correct workflow. Happy to discuss.
- I dont think we should see it as being taxing on the DAG to run this, yes the DAG would be doing more work, but we are moving the computation upstream so that the queries run faster/cheaper.
- The only issue I have with this is that all of the intermediate tables will be cluttering the database, but again that might be warranted as we work through the data and business logic.
Man you did tons of work here. Great job. Fundamental question themes:
Another major thing I will look at is taking a 50,000 foot view of the entire RxNorm data model - it's hard to see when zoomed in on a given intermediate model, but I think I re-wrote a lot of the code in other intermediate models inside of intermediate models. So I think there's an opportunity to ref intermediate models instead of re-writing that SQL. |
ca79a75
to
06065c5
Compare
…ient_component" This reverts commit ea04f48.
I'd like to start moving toward styling our SQL like the dbt styleguide. https://docs.getdbt.com/best-practices/how-we-style/2-how-we-style-our-sql |
Resolves #270
Explanation
Changes rxnorm staging and intermediate queries to using dbt's jinja table references.
Most important is that it sets intermediate models to materialize as tables (vs views).
Rationale
dbt mart models are made up of complex logic that should be captured in the intermediate models. The issues already that we ran into (issue #270 ) was that performing these aggregations for each query took 10+ minutes. Therefore, bringing this aggregation into a separated intermediate model to be materialized as a table is a good solution. The materialization of these intermediate tables can take awhile but will significantly speed up queries.
Additional work can be done to optimize the queries that build out intermediate models in the future.
Tests
testing logs