Make quicker quarterly updates to PUDL EPA CEMS, EIA 923 and EIA 860 data #2902
Closed
18 tasks done
Labels
eia860
Anything having to do with EIA Form 860
eia923
Anything having to do with EIA Form 923
enhancement
Improvements in existing functionality.
epacems
Integration and analysis of the EPA CEMS dataset.
epic
Any issue whose primary purpose is to organize other issues into a group.
github-actions
Pull requests that update GitHub Actions code
new-data
Requests for integration of new data.
rmi
Description
With the support of RMI, the goal of the project is to make it possible to integrate quarterly updates for CEMS, EIA 923M and EIA 860M data within 1-2 weeks of new data release. To do this, we are going to 1) automate archiving of new data, with the support of additional data validation checks on our archiver, 2) redesign PUDL to handle quarterly and monthly data formats and 3) integrate YTD data to test it all.
Archiver infrastructure updates
Goal: Generate robust report of archiver results to enable quick (<.25 hour per dataset) manual review and approval of draft production archives run by Github action.
Current state: Currently, archive runs check for 1) missing files, 2) valid file types, 3) emptiness of zips (in progress) and produce a summary of all changed files. The new default behavior of the ‘auto-publish’ flag allows for the production of a draft production archive in Zenodo for manual approval, removing the need for sandbox runs of an archiver that is known to work. @zschira has seriously improved the archiver’s robustness to large file uploads but this is always a trouble spot and we should anticipate some time required to handle things that come up.
Tasks
Non-coding tasks required (likely RMI tasks):
Handle monthly data in PUDL
Goal: Design a mechanism to handle monthly data in a system that is designed for annual data. Make structural changes required for each dataset to make this possible, designating new data as YTD data and excluding it from annually aggregated tables.
Current state:
EIA 860M data is ‘annual’ in nature and already appended on to EIA 860 data. No changes required here.
EIA 923M data has the same format as EIA 923 data, with slightly fewer ‘pages’ (Schedules 2-5 only, which means no emission control table). The column names and layout are the same, with YTD data and blank rows for the months not yet covered.
CEMS is currently downloaded by year-state and will need to be downloaded by quarter instead (one file per quarter, ~2-3Gb per file).
Tasks
Doc updates
Goal: make it easier for external contributors to make progress on the annual updates
Current status: Annual doc updates are relatively up to date but require additional elaboration for a few steps.
Tasks
Integrate YTD data
Goal: test new infrastructure on YTD data (anticipated Q3 2023).
Tasks
If time, but not essential to project success
Future projects that could complement this work
The text was updated successfully, but these errors were encountered: