Merge pull request #3156 from catalyst-cooperative/no-docker-readme

Remove obsolete Docker data access instructions.
catalyst-cooperative · Dec 15, 2023 · 0e36ef9 · 0e36ef9
2 parents 71968a9 + 26de704
commit 0e36ef9
Show file tree

Hide file tree

Showing 2 changed files with 69 additions and 110 deletions.
diff --git a/README.rst b/README.rst
@@ -47,6 +47,19 @@ it's often difficult to work with. PUDL takes the original spreadsheets, CSV fil
 and databases and turns them into a unified resource. This allows users to spend more
 time on novel analysis and less time on data preparation.
 
+Who is PUDL for?
+----------------
+
+The project is focused on serving researchers, activists, journalists, policy makers,
+and small businesses that might not otherwise be able to afford access to this data from
+commercial sources and who may not have the time or expertise to do all the data
+processing themselves from scratch.
+
+We want to make this data accessible and easy to work with for as wide an audience as
+possible: anyone from a grassroots youth climate organizers working with Google sheets
+to university researchers with access to scalable cloud computing resources and everyone
+in between!
+
 What data is available?
 -----------------------
 
@@ -73,90 +86,37 @@ Program <https://sloan.org/programs/research/energy-and-environment>`__, from
 * `PHMSA Natural Gas Annual Report <https://www.phmsa.dot.gov/data-and-statistics/pipeline/gas-distribution-gas-gathering-gas-transmission-hazardous-liquids>`__
 * Machine Readable Specifications of State Clean Energy Standards
 
-Who is PUDL for?
-----------------
-
-The project is focused on serving researchers, activists, journalists, policy makers,
-and small businesses that might not otherwise be able to afford access to this data
-from commercial sources and who may not have the time or expertise to do all the
-data processing themselves from scratch.
-
-We want to make this data accessible and easy to work with for as wide an audience as
-possible: anyone from a grassroots youth climate organizers working with Google
-sheets to university researchers with access to scalable cloud computing
-resources and everyone in between!
-
 How do I access the data?
 -------------------------
 
-There are several ways to access PUDL outputs. For more details you'll want
-to check out `the complete documentation
-<https://catalystcoop-pudl.readthedocs.io>`__, but here's a quick overview:
-
-Datasette
-^^^^^^^^^
-We publish a lot of the data on https://data.catalyst.coop using a tool called
-`Datasette <https://datasette.io>`__ that lets us wrap our databases in a relatively
-friendly web interface. You can browse and query the data, make simple charts and
-maps, and download portions of the data as CSV files or JSON so you can work with it
-locally. For a quick introduction to what you can do with the Datasette interface,
-check out `this 17 minute video <https://simonwillison.net/2021/Feb/7/video/>`__.
-
-This access mode is good for casual data explorers or anyone who just wants to grab a
-small subset of the data. It also lets you share links to a particular subset of the
-data and provides a REST API for querying the data from other applications.
-
-Docker + Jupyter
-^^^^^^^^^^^^^^^^
-Want access to all the published data in bulk? If you're familiar with Python
-and `Jupyter Notebooks <https://jupyter.org/>`__ and are willing to install Docker you
-can:
-
-* `Download a PUDL data release <https://zenodo.org/record/3653158>`__ from
-  CERN's `Zenodo <https://zenodo.org>`__ archiving service.
-* `Install Docker <https://docs.docker.com/get-docker/>`__
-* Run the archived image using ``docker-compose up``
-* Access the data via the resulting Jupyter Notebook server running on your machine.
-
-If you'd rather work with the PUDL `SQLite <https://sqlite.org>`__ Databases and
-`Apache Parquet <https://parquet.apache.org>`__ files directly, they are accessible
-within the same Zenodo archive.
-
-The `PUDL Examples repository <https://github.com/catalyst-cooperative/pudl-examples>`__
-has more detailed instructions on how to work with the Zenodo data archive and Docker
-image.
-
-The PUDL Development Environment
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-If you're more familiar with the Python data science stack and are comfortable working
-with git, ``conda`` environments, and the Unix command line, then you can set up the
-whole PUDL Development Environment on your own computer. This will allow you to run the
-full data processing pipeline yourself, tweak the underlying source code, and (we hope!)
-make contributions back to the project.
-
-This is by far the most involved way to access the data and isn't recommended for
-most users. You should check out the `Development section <https://catalystcoop-pudl.readthedocs.io/en/latest/dev/dev_setup.html>`__
-of the main `PUDL documentation <https://catalystcoop-pudl.readthedocs.io>`__ for more
-details.
-
-Nightly Data Builds
-^^^^^^^^^^^^^^^^^^^
-If you are less concerned with reproducibility and want the freshest possible data
-we automatically upload the outputs of our nightly builds to public S3 storage buckets
-as part of the `AWS Open Data Registry
-<https://registry.opendata.aws/catalyst-cooperative-pudl/>`__.  This data is based on
-the `dev branch <https://github.com/catalyst-cooperative/pudl/tree/dev>`__, of PUDL, and
-is updated most weekday mornings. It is also the data used to populate Datasette.
-
-The nightly build outputs can be accessed using the AWS CLI, the S3 API, or downloaded
-directly via the web. See `Accessing Nightly Builds <https://catalystcoop-pudl.readthedocs.io/en/latest/data_access.html#access-nightly-builds>`__
-for links to the individual SQLite, JSON, and Apache Parquet outputs.
+For details on how to access PUDL data, see the `data access documentation
+<https://catalystcoop-pudl.readthedocs.io/en/latest/data_access.html>`__. A quick
+summary:
+
+* `Datasette <https://catalystcoop-pudl.readthedocs.io/en/latest/data_access.html#-access-datasette>`__
+  provides browsable and queryable data from our nightly builds on the web:
+  https://data.catalyst.coop
+* `Kaggle <https://catalystcoop-pudl.readthedocs.io/en/latest/data_access.html#access-kaggle>`__
+  provides easy Jupyter notebook access to the PUDL data, updated weekly:
+  https://www.kaggle.com/datasets/catalystcooperative/pudl-project
+* `Zenodo <https://catalystcoop-pudl.readthedocs.io/en/latest/data_access.html#access-zenodo>`__
+  provides stable long-term access to our versioned data releases with a citeable DOI:
+  https://doi.org/10.5281/zenodo.3653158
+* `Nightly Data Builds <https://catalystcoop-pudl.readthedocs.io/en/latest/data_access.html#access-nightly-builds>`__
+  push their outputs to the AWS Open Data Registry:
+  https://registry.opendata.aws/catalyst-cooperative-pudl/
+  See `the nightly build docs <https://catalystcoop-pudl.readthedocs.io/en/latest/data_access.html#access-nightly-builds>`__
+  for direct download links.
+* `The PUDL Development Environment <https://catalystcoop-pudl.readthedocs.io/en/latest/dev/dev_setup.html>`__
+  lets you run the PUDL data processing pipeline locally.
 
 Contributing to PUDL
 --------------------
+
 Find PUDL useful? Want to help make it better? There are lots of ways to help!
 
-* First, be sure to read our `Code of Conduct <https://catalystcoop-pudl.readthedocs.io/en/latest/code_of_conduct.html>`__.
+* Check out our `contribution guide <https://catalystcoop-pudl.readthedocs.io/en/latest/CONTRIBUTING.html>`__
+  including our `Code of Conduct <https://catalystcoop-pudl.readthedocs.io/en/latest/code_of_conduct.html>`__.
 * You can file a bug report, make a feature request, or ask questions in the
   `Github issue tracker <https://github.com/catalyst-cooperative/pudl/issues>`__.
 * Feel free to fork the project and make a pull request with new code, better
@@ -165,8 +125,6 @@ Find PUDL useful? Want to help make it better? There are lots of ways to help!
   to support our work liberating public energy data.
 * `Hire us to do some custom analysis <https://catalyst.coop/hire-catalyst/>`__ and
   allow us to integrate the resulting code into PUDL.
-* For more information check out the Contributing section of the
-  `PUDL Documentation <https://catalystcoop-pudl.readthedocs.io>`__
 
 Licensing
 ---------
@@ -193,10 +151,15 @@ Contact Us
 * Want to schedule a time to chat with us one-on-one about your PUDL use case, ideas
   for improvement, or get some personalized support? Join us for
   `Office Hours <https://calend.ly/catalyst-cooperative/pudl-office-hours>`__
+* `Follow us here on GitHub <https://github.com/catalyst-cooperative/>`__
+* Follow us on Mastodon: `@CatalystCoop@mastodon.energy <https://mastodon.energy/@CatalystCoop>`__
+* Follow us on BlueSky:  `@catalyst.coop <https://bsky.app/profile/catalyst.coop>`__
+* `Follow us on LinkedIn <https://www.linkedin.com/company/catalyst-cooperative/>`__
+* `Follow us on HuggingFace <https://huggingface.co/catalystcooperative>`__
 * Follow us on Twitter: `@CatalystCoop <https://twitter.com/CatalystCoop>`__
+* `Follow us on Kaggle <https://www.kaggle.com/catalystcooperative/>`__
 * More info on our website: https://catalyst.coop
-* To hire us to provide customized data
-  extraction and analysis, you can email the maintainers:
+* Email us if you'd like to hire us to provide customized data extraction and analysis:
   `hello@catalyst.coop <mailto:hello@catalyst.coop>`__
 
 About Catalyst Cooperative

diff --git a/docs/data_access.rst b/docs/data_access.rst
@@ -30,14 +30,17 @@ which one is right for you and your use case.
        Select data to download as CSVs for local analysis in spreadsheets.
        Create sharable links to a particular selection of data.
        Access PUDL data via a REST API.
+   * - :ref:`access-kaggle`
+     - Data scientist, data analyst, Jupyter notebook user
+     - Easy Jupyter notebook access to all PUDL data products, including example
+       notebooks. Updated weekly based on the nightly builds.
    * - :ref:`access-nightly-builds`
      - Cloud Developer, Database User, Beta Tester
-     - Get the freshest data that has passed all data validations, updated most weekday
-       mornings. Fast downloads from AWS S3 storage buckets.
+     - Get the freshest data that has passed all of our data validations, updated most
+       weekday mornings. Fast, free downloads from AWS S3 storage buckets.
    * - :ref:`access-zenodo`
      - Researcher, Database User, Notebook Analyst
      - Use a stable, citable, fully processed version of the PUDL on your own computer.
-       Use PUDL in Jupyer Notebooks running in a stable, archived Docker container.
        Access the SQLite DB and Parquet files directly using any toolset.
    * - :ref:`access-development`
      - Python Developer, Data Wrangler
@@ -69,6 +72,19 @@ data you've selected.
    SQLite to improve accessibility of the raw inputs, but they should generally not be
    used directly if the data you need has integrated into the PUDL database.
 
+.. _access-kaggle:
+
+---------------------------------------------------------------------------------------
+Kaggle
+---------------------------------------------------------------------------------------
+
+Want to explore the PUDL data interactively in a Jupyter Notebook without needing to do
+any setup? Our nightly build outputs (see below) automatically update `the PUDL Project
+Dataset on Kaggle <https://www.kaggle.com/datasets/catalystcooperative/pudl-project>`__
+once a week. There are `several notebooks <https://www.kaggle.com/datasets/catalystcooperative/pudl-project/code>`__
+associated with the dataset, both curated by Catalyst and contributed by other Kaggle
+users which you can use to get oriented to the PUDL database.
+
 .. _access-nightly-builds:
 
 ---------------------------------------------------------------------------------------
@@ -129,42 +145,22 @@ HTTPS using the following links:
    be quite large when uncompressed. To decompress them locally, you can use the
    ``gunzip`` command.
 
-
    .. code-block:: console
 
       $ gunzip *.sqlite.gz
 
-
 .. _access-zenodo:
 
 ---------------------------------------------------------------------------------------
-Zenodo Archives
+Zenodo
 ---------------------------------------------------------------------------------------
 
-We use Zenodo to archive our fully processed data as SQLite databases and
-Parquet files. We also archive a Docker image that contains the software environment
-required to use PUDL within Jupyter Notebooks. You can find all our archived data
-products in `the Catalyst Cooperative Community on Zenodo
-<https://zenodo.org/communities/catalyst-cooperative/>`__.
-
-* The current version of the archived data and Docker container can be
-  downloaded from `This Zenodo archive <https://doi.org/10.5281/zenodo.3653158>`__
-* Detailed instructions on how to access the archived PUDL data using a Docker
-  container can be found in our `PUDL Examples repository
-  <https://github.com/catalyst-cooperative/pudl-examples/>`__.
-* The SQLite databases and Parquet files containing the PUDL data, the complete FERC 1
-  database, and EPA CEMS hourly data are contained in that same archive, if you want
-  to access them directly without using PUDL.
-
-.. note::
-
-   If you're already familiar with Docker, you can also pull
-   `the image we use <https://hub.docker.com/r/catalystcoop/pudl-jupyter>`__ to run
-   Jupyter directly:
-
-   .. code-block:: console
-
-      $ docker pull catalystcoop/pudl-jupyter:latest
+We use Zenodo to archive and version our raw data inputs, the fully processed outputs,
+and the PUDL software repositories. You can find all of our archives in
+`the Catalyst Cooperative Community <https://zenodo.org/communities/catalyst-cooperative/>`__.
+Zenodo assigns long-lived DOIs to each archive, suitable for citation in academic
+journals and other publications. The most recent versioned PUDL data release can be
+found using this Concept DOI: https://doi.org/10.5281/zenodo.3653158
 
 .. _access-development: