Get RxNorm data models up to speed with dbt best practices #280

lprzychodzien · 2024-04-28T20:39:06Z

Resolves #270

Explanation

Changes rxnorm staging and intermediate queries to using dbt's jinja table references.

Most important is that it sets intermediate models to materialize as tables (vs views).

Rationale

dbt mart models are made up of complex logic that should be captured in the intermediate models. The issues already that we ran into (issue #270 ) was that performing these aggregations for each query took 10+ minutes. Therefore, bringing this aggregation into a separated intermediate model to be materialized as a table is a good solution. The materialization of these intermediate tables can take awhile but will significantly speed up queries.

Additional work can be done to optimize the queries that build out intermediate models in the future.

Tests

What testing did you do? dbt run --full-refresh
Attach testing logs inside a summary block:

testing logs

lprzychodzien · 2024-04-28T20:40:18Z

dbt/sagerx/dbt_project.yml

@@ -35,7 +35,7 @@ models:
      +materialized: view
    intermediate:
      +schema: sagerx
-      +materialized: view
+      +materialized: table


most impactful change here

I think the reason we maybe wouldn't want to make this change by itself would be that the materialized table has no schedule for being refreshed, other than when we run the DAGs that rebuild everything. I'm leaving a comment on the original issue discussion as to another possible solution

Also, this particular change would materialize ALL the models in this layer as tables which might become slightly taxing on the DAGs that build this layer (definitely Rxnorm, potentially others?)

Sorry im not following your concerns.

The intermediate tables are updated by the rxnorm DAG with the final "transform" task. So the tables should be updated after running the DAG which I believe is the correct workflow. Happy to discuss.

I dont think we should see it as being taxing on the DAG to run this, yes the DAG would be doing more work, but we are moving the computation upstream so that the queries run faster/cheaper.

The only issue I have with this is that all of the intermediate tables will be cluttering the database, but again that might be warranted as we work through the data and business logic.

jrlegrand · 2024-05-01T16:08:19Z

Man you did tons of work here. Great job. Fundamental question themes:

Should we title ctes specifically to what they represent even if we are just pulling in rxnconso multiple times? Or should we have one rxnconso cte per model and use aliases and leave everything else the same? Or should we name them what they represent and pull the where clause into the cte? Or should we make like other "base" staging tables with them? Or should we just look for opportunities to reference these staging models from other int/stg models instead of going back to source all the time?
I'm def referencing source tables from a lot of intermediate models - shame on me. I need to fix this.
Minor formatting changes still needed to satisfy my OCD (lowercase SQL commands and tab indent in some places).
Some ref's still in select from instead of within a cte first.
Maybe we should build a dbt macro for the active / prescribable columns - prob in a separate issue.

Another major thing I will look at is taking a 50,000 foot view of the entire RxNorm data model - it's hard to see when zoomed in on a given intermediate model, but I think I re-wrote a lot of the code in other intermediate models inside of intermediate models. So I think there's an opportunity to ref intermediate models instead of re-writing that SQL.

…ponent

…ient_component" This reverts commit ea04f48.

jrlegrand · 2024-07-19T02:28:23Z

I'd like to start moving toward styling our SQL like the dbt styleguide.

https://docs.getdbt.com/best-practices/how-we-style/2-how-we-style-our-sql

lprzychodzien commented Apr 28, 2024

View reviewed changes

lprzychodzien added 2 commits May 17, 2024 12:00

rxnorm update and intermediate tables

8b90cd6

query fixes

06065c5

lprzychodzien force-pushed the rxnorm branch 2 times, most recently from ca79a75 to 06065c5 Compare May 17, 2024 16:12

lprzychodzien and others added 6 commits May 17, 2024 12:18

only rxnorm

f38a4c3

format updates and added common cte stg_rxnorm__common_ingredient_com…

ea04f48

…ponent

Revert "format updates and added common cte stg_rxnorm__common_ingred…

dc50716

…ient_component" This reverts commit ea04f48.

added common model to ref ingredient_components models

e89f447

formatting updates

ebf52de

formatted staging rxnorm

58a5ccf

jrlegrand self-requested a review July 16, 2024 19:28

jrlegrand self-assigned this Jul 16, 2024

jrlegrand added 3 commits July 18, 2024 21:32

Back to lowercase

e4c390a

Macro and two models

c606083

Start working on cpc

2cf80ca

jrlegrand changed the title ~~Rxnorm dbt update and intermediate tables~~ Get RxNorm data models up to speed with dbt Jul 20, 2024

jrlegrand changed the title ~~Get RxNorm data models up to speed with dbt~~ Get RxNorm data models up to speed with dbt best practices Jul 20, 2024

Rxnrel work

b9dc191

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get RxNorm data models up to speed with dbt best practices #280

Get RxNorm data models up to speed with dbt best practices #280

lprzychodzien commented Apr 28, 2024

lprzychodzien Apr 28, 2024

leemlb06pmi Apr 28, 2024 •

edited

Loading

lprzychodzien Apr 29, 2024

jrlegrand commented May 1, 2024

jrlegrand commented Jul 19, 2024

Get RxNorm data models up to speed with dbt best practices #280

Are you sure you want to change the base?

Get RxNorm data models up to speed with dbt best practices #280

Conversation

lprzychodzien commented Apr 28, 2024

Explanation

Rationale

Tests

lprzychodzien Apr 28, 2024

Choose a reason for hiding this comment

leemlb06pmi Apr 28, 2024 • edited Loading

Choose a reason for hiding this comment

lprzychodzien Apr 29, 2024

Choose a reason for hiding this comment

jrlegrand commented May 1, 2024

jrlegrand commented Jul 19, 2024

leemlb06pmi Apr 28, 2024 •

edited

Loading