Dev/incremental media model #2

agnessnowplow · 2022-03-29T08:44:18Z

This is the Redshift only incremental model currently working together with the web package.

The _interactions_this_run table follows the web model`s incremental logic and takes snowplow_web_base_events_this_run as an input to then add the various contexts to enrich the base table with the additional media related fields.

The _plays_by_pageview_this_run table aggregates the _interactions_this_run'table and serves as a basis for the incrementalised derived table _plays_by_pageview.

The main _media_stats derived table will also be updated incrementally based on the _plays_by_pageview derived table, on a play by pageview basis after a set time window. This is to prevent complex and expensive queries due to metrics which need to take the whole page_view events into calculation. This way the metrics will only be calculated once per pageview/media, after no new events are expected.

The additional _pivot_base table is there to calculate the percent_progress boundaries and weights that are used to calculate the total play time and other related media fields.

Once we are happy about this way of incrementalisation, I will update the schema names (only the media_stats and plays_by_pageview should be in the derived schema, the rest could go to scratch, to follow the conventions)

For testing, I ran dbt test which passed and I also compared the results with the drop and recompute model by taking 18-19-20 Jan data and running the incremental model with 1 day batches. I have adjusted some of the metrics calculations during this exercise so the two models diverged in parts at the moment.

emielver

Just a few suggestions around simplifying and keeping things more consistent but it looks good! I also have a lingering question about the incremental functionality of the media_stats table but for the rest it's looking great!

packages.yml

models/web/snowplow_media_plays_by_pageview_this_run.sql

models/web/snowplow_media_pivot_base.sql

models/web/snowplow_media_media_stats.sql

models/web/snowplow_media_interactions_this_run.sql

models/custom/snowplow_media_user_stats.sql

dbt_project.yml

models/web/snowplow_media_interactions_this_run.sql

models/web/snowplow_media_media_stats.sql

models/web/snowplow_media_plays_by_pageview_this_run.sql

agnessnowplow · 2022-04-22T07:45:59Z

Hi Emiel! I have left comments above for all the recommended changes and questions you or Will had and made modifications accordingly. I also added documentation within snowplow_media_player_overview.md and README.md. As I was going through the guide I also realised I need some slight changes, for instance I added a selectors.yml file. I always use the prod db for testing purposes because that has correct data with enough for sampling. I checked 18-20 Jan period when I have data in consecutive days for testing incremental runs with daily limits but once we are happy with the calculations I will do a full testing with all the data just in case. When testing I use it as intended, in a new dbt project importing it as a package and running it together with the web model. Just noticed I left scratch_agnes as schema name, will make another commit shortly to correct this.

emielver · 2022-05-03T09:38:51Z

Hey Agnes, thanks for this and great job on everything so far!! I have added some small QOL suggestions with regards to the documentation, as well as directory conventions that we seem to be following in the other packages. I also pointed out what I think is an issue with an average calculation in one of the custom models, and I'm still a bit confused by the snowplow_media_player_media_stats model's construction. Maybe we could have a chat about that before I finish the review but everything else should be actionable! And wrt schema stuff, you can also change the value in your dbt profiles.yml of the schema variable which will prefix scratch and derived with whatever you put, e.g. dbt_agnes so then you won't need to change the dbt_project.yml when testing locally!

docs/markdown/snowplow_media_player_overview.md

models/web/scratch/snowplow_media_player_base_this_run.sql

models/custom/snowplow_media_player_session_stats.sql

models/custom/snowplow_media_player_user_stats.sql

models/web/snowplow_media_player_media_stats.sql

emielver · 2022-05-04T10:54:48Z

@agnessnowplow
There's also some issues with building the pages. Can you add a empty .nojekyll file to the docs folder, and can you do a docs generate and copy over the ['catalog.json', 'index.html', 'manifest.json', 'run_results.json'] files from your target folder to your docs folder after the docs have been generated, in the same vein as https://github.com/snowplow/dbt-snowplow-mobile/tree/main/docs
Also you should be admin now so you can check/build the docs yourself as well!

models/web/sources.yml

models/web/schema.yml

agnessnowplow · 2022-05-10T10:22:53Z

I have run the model within dev1 so that I can replace the docs for the site (previously it was all within one schema). I have also added more fields for interactions_this_run that are not used in the model but users might want to take advantage of them in their custom models. I also made a correction within base_this_run because the retention_rate was not taking the correct value in case the percent_progress was calculated from the ended event.

models/web/scratch/snowplow_media_player_base_this_run.sql

emielver

On two dbt run commands and a dbt test I get two failed unique tests for event_id in snowplow_media_player_interactions_this_run and snowplow_web_base_events_this_run. Is this something you encountered? If this is resolved I'm happy to give the go ahead!

emielver

Errors in previous comment were due to duplicate source data, model seems to work well!! Great job on this!

agnessnowplow · 2022-05-12T14:55:20Z

Thanks for your review! I fixed the bugs I came across: as discussed separately, I changed the ephemeral models to tables, including the docs so that the manifest table gets updated and the incremental runs don't get stuck. I also changed the full outer join back to account for the scenario when there is no data for the incremental run, which should preserve any pre-existing percent progress values from previous runs.

emielver

Just add the sort and dist keys and then it should be okay!

models/custom/snowplow_media_player_session_stats.sql

models/custom/snowplow_media_player_user_stats.sql

emielver · 2022-05-12T20:58:07Z

dbt_project.yml

+    custom:
+      +schema: "scratch"
+      +tags: "snowplow_web_incremental"
+      +enabled: false


This enabled should be triggered by a variable, which should be defined in your vars and in the documentation so that users can easily enable/disable these custom tables, right? Or how else would we expect users to build off of them?

This is what I always do for testing, you just overwrite it in your own dbt project's project.yml file and explained in the docs so they should be ok I think (?):

By default these are disabled, but you can enable them in the project's profiles.yml, if needed. yml # dbt_project.yml ... models: snowplow_media_player: custom: enabled: true

Fair enough, although I do think we should change it to being behind a variable once this dbt bug is fixed dbt-labs/dbt-core#3698 to align with the web and mobile packages and their optional modules, but it's good for now!

emielver

LGTM

emielver

LGTM!

agnessnowplow requested review from bill-warner and emielver March 29, 2022 08:44

emielver reviewed Apr 1, 2022

View reviewed changes

bill-warner reviewed Apr 7, 2022

View reviewed changes

models/web/snowplow_media_interactions_this_run.sql Outdated Show resolved Hide resolved

bill-warner reviewed Apr 7, 2022

View reviewed changes

models/web/snowplow_media_media_stats.sql Outdated Show resolved Hide resolved

bill-warner reviewed Apr 7, 2022

View reviewed changes

models/web/snowplow_media_plays_by_pageview_this_run.sql Outdated Show resolved Hide resolved

emielver reviewed May 4, 2022

View reviewed changes

models/web/sources.yml Outdated Show resolved Hide resolved

emielver reviewed May 4, 2022

View reviewed changes

models/web/schema.yml Outdated Show resolved Hide resolved

github-pages bot temporarily deployed to github-pages May 9, 2022 18:40 Inactive

github-pages bot temporarily deployed to github-pages May 10, 2022 10:17 Inactive

emielver reviewed May 12, 2022

View reviewed changes

models/web/scratch/snowplow_media_player_base_this_run.sql Show resolved Hide resolved

emielver reviewed May 12, 2022

View reviewed changes

emielver approved these changes May 12, 2022

View reviewed changes

github-pages bot temporarily deployed to github-pages May 12, 2022 14:49 Inactive

github-pages bot temporarily deployed to github-pages May 12, 2022 15:52 Inactive

emielver suggested changes May 12, 2022

View reviewed changes

models/custom/snowplow_media_player_session_stats.sql Outdated Show resolved Hide resolved

models/custom/snowplow_media_player_user_stats.sql Outdated Show resolved Hide resolved

github-pages bot temporarily deployed to github-pages May 12, 2022 17:04 Inactive

emielver suggested changes May 12, 2022

View reviewed changes

emielver approved these changes May 13, 2022

View reviewed changes

github-pages bot temporarily deployed to github-pages May 13, 2022 09:21 Inactive

github-pages bot temporarily deployed to github-pages May 13, 2022 09:28 Inactive

emielver approved these changes May 13, 2022

View reviewed changes

Add Media Player model with Redshift and Postgres support (Close #3)

21d785b

agnessnowplow force-pushed the dev/incremental_media_model branch from 78c0622 to 21d785b Compare May 13, 2022 10:09

github-pages bot temporarily deployed to github-pages May 13, 2022 10:09 Inactive

agnessnowplow merged commit 2c21ebf into feature/incremental_media_model May 13, 2022

emielver mentioned this pull request May 13, 2022

add drop and recompute media model #1

Closed

emielver deleted the dev/incremental_media_model branch August 26, 2022 14:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dev/incremental media model #2

Dev/incremental media model #2

agnessnowplow commented Mar 29, 2022

emielver left a comment

agnessnowplow commented Apr 22, 2022

emielver commented May 3, 2022

emielver commented May 4, 2022

agnessnowplow commented May 10, 2022

emielver left a comment

emielver left a comment •

edited

Loading

agnessnowplow commented May 12, 2022

emielver left a comment

emielver May 12, 2022

agnessnowplow May 13, 2022

emielver May 13, 2022

emielver left a comment

emielver left a comment

Dev/incremental media model #2

Dev/incremental media model #2

Conversation

agnessnowplow commented Mar 29, 2022

emielver left a comment

Choose a reason for hiding this comment

agnessnowplow commented Apr 22, 2022

emielver commented May 3, 2022

emielver commented May 4, 2022

agnessnowplow commented May 10, 2022

emielver left a comment

Choose a reason for hiding this comment

emielver left a comment • edited Loading

Choose a reason for hiding this comment

agnessnowplow commented May 12, 2022

emielver left a comment

Choose a reason for hiding this comment

emielver May 12, 2022

Choose a reason for hiding this comment

agnessnowplow May 13, 2022

Choose a reason for hiding this comment

emielver May 13, 2022

Choose a reason for hiding this comment

emielver left a comment

Choose a reason for hiding this comment

emielver left a comment

Choose a reason for hiding this comment

emielver left a comment •

edited

Loading