You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After merging this PR, we will have CEMS files in the format of one file per quarter, rather than one file per state/year. Year and state should both still function as filters for data extraction from the parquet files, however. To process this new data format in PUDL and integrate more recently downloaded data, we will need to do the following:
Run CEMs production archive
Add the new Zenodo archive DOI values to pudl/workspace/datastore.py.
Run the datastore script to download the new year of data.
Decide how etl_full and etl_fast should call year and quarter - should we only specify years and by default include all avail quarters?
Add the new year/quarters to etl_full.yml and etl_fast.yml.
Add the new year/quarters to the working_partitions in pudl/metadata/sources.py
Update the extractor to ingest year and quarter partitions rather than year and state
Update pudl.transform.epacems
Update CEMS DOI that is written into unit tests
Launch dagit and refresh the code location.
Tone down dagster concurrency on epacems yearly partitions to prevent memory issues
Materialize epacems asset with new data and remap any plants missing from data
Remove state partitions from dagster launchpad (either by updating pudl.output.epacems or by removing it from the launchpad)
Update the validation tests if needed
The text was updated successfully, but these errors were encountered:
After merging this PR, we will have CEMS files in the format of one file per quarter, rather than one file per state/year. Year and state should both still function as filters for data extraction from the parquet files, however. To process this new data format in PUDL and integrate more recently downloaded data, we will need to do the following:
year
andquarter
partitions rather thanyear
andstate
pudl.transform.epacems
epacems
yearly partitions to prevent memory issuesepacems
asset with new data and remap any plants missing from datastate
partitions from dagster launchpad (either by updatingpudl.output.epacems
or by removing it from the launchpad)The text was updated successfully, but these errors were encountered: