-
-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve and Automate raw data archiving/access #1418
Comments
We had mentioned maybe "Try adding a new dataset and see if our automation picks it up and archives it" as the final definition of done - what do you think @zschira ? Or is that just part of catalyst-cooperative/pudl-archiver#2? |
I feel like there are 2 ways we could approach this.
|
Can this be closed? |
We should probably carve out the unfinished work in another issue or issues.
|
I've carved those out, minus the datastore thing, which is a persistent large thing we've been thinking about. catalyst-cooperative/pudl-archiver#346 Closing! |
Description
This Epic tracks updates to the data archiving and access processes. The previous process for creating new archives involved first running the scraper to download new data locally. Next, the archiver could be used to upload new data to zenodo and create a new archive version. This manual process makes updating archives somewhat difficult, and requires someone being aware of upstream updates, which often leads to stale data. Combining the archiver and scrapers will not only simplify this process, but also make automation much easier.
Once new data archives are created, there is still no easy way to access these raw archives outside of PUDL. This is because the
Datastore
that PUDL uses for accessing these data archives is embedded within PUDL. Making theDatastore
a standalone software package would allow accessing these archives from client projects, and by users.Scope
- How do we know when we are done? This epic is done when dataset archives are updated automatically.
- What is out of scope? Integrating specific datasets.
Tasks
Archiver
PUDL Integration
Create standalone Datastore
The text was updated successfully, but these errors were encountered: