-
-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add parquet download link to data dictionary #3984
Conversation
docs/templates/resource.rst.jinja
Outdated
{%- endif %} | ||
* `Download this table as a Parquet file. <https://s3.us-west-2.amazonaws.com/pudl.catalyst.coop/stable/{{ resource.name }}.parquet>`__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't figure out how to format the URL so that users can download the version of the file associated with the version of the docs the user is viewing. Also, any idea what ref latest
is pointing to? stable
or nightly
?
I wrote up an issue about this readthedocs build failure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
THis will be super nice to have! I'm not sure about the refs, or how to point to the corresponding version. I wonder if, for the time being, it would be best to point to nightly builds and just make it clear in the comment that it's pointing to the latest version?
@bendnorman Just reran after #3989 merged to see if that's fixed the readthedocs problem. Did you not want to add documentation on Parquet files to the |
@bendnorman @zschira For what it's worth, my instinct is to link to the nightly Parquet file because Datasette shows the nightly data (unless I'm totally misreading our ETL script), and it'd be confusing to have links going to two different underlying datasets right next to one another. |
docs/data_access.rst
Outdated
|
||
Fully Processed SQLite Databases | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
* `Main PUDL Database <https://s3.us-west-2.amazonaws.com/pudl.catalyst.coop/nightly/pudl.sqlite.zip>`__ | ||
* `US Census DP1 Database (2010) <https://s3.us-west-2.amazonaws.com/pudl.catalyst.coop/nightly/censusdp1tract.sqlite.zip>`__ | ||
|
||
Hourly Tables as Parquet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this section because now all tables are available as Parquet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is true, but these tables aren't in SQLite and so I think shouting them out here is still helpful - if people are looking for them they won't be able to find them in the full DB.
Thanks for the input y'all! I changed the template to point at the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small request, but otherwise looks great thank you!
docs/data_access.rst
Outdated
|
||
Fully Processed SQLite Databases | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
* `Main PUDL Database <https://s3.us-west-2.amazonaws.com/pudl.catalyst.coop/nightly/pudl.sqlite.zip>`__ | ||
* `US Census DP1 Database (2010) <https://s3.us-west-2.amazonaws.com/pudl.catalyst.coop/nightly/censusdp1tract.sqlite.zip>`__ | ||
|
||
Hourly Tables as Parquet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is true, but these tables aren't in SQLite and so I think shouting them out here is still helpful - if people are looking for them they won't be able to find them in the full DB.
docs/data_access.rst
Outdated
@@ -106,32 +108,19 @@ resulting outputs pass all of the data validation tests we've defined, the outpu | |||
automatically uploaded to the `AWS Open Data Registry | |||
<https://registry.opendata.aws/catalyst-cooperative-pudl/>`__, and used to deploy a new | |||
version of Datasette (see above). These nightly build outputs can be accessed using the | |||
AWS CLI, or programmatically via the S3 API. They can also be downloaded directly over | |||
HTTPS using the following links: | |||
AWS CLI, or programmatically via the S3 API. If you don't want to mess with the API |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AWS CLI, or programmatically via the S3 API. If you don't want to mess with the API | |
AWS CLI, or programmatically via the S3 API. | |
If you don't want to mess with the API |
…ons back to data access page
Made the changes! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this, all looks good to me!
I added a parquet file download link to the data dictionary so it's easier for people to access the s3 files.
Tasks
Testing
I built the docs locally and was able to download a parquet file.
To-do list