Refactor the documentation into separate pages #202

leouieda · 2020-07-31T14:27:05Z

Break up the usage documentation into separate pages: basic usage
(Training your Pooch with some elements removed), single file downloads,
downloaders, processors, and advanced tricks. Moved the processor and
downloader specifications from the docstring of Pooch.fetch to the
respective pages and link to them from the docstrings. Also did general
updates to the docstrings (for example, to include the new retrieve
function) and tutorials. Separated the side menu into Getting Started,
User Guide, and Reference Documentation like in all other projects.

Fixes #188

See a rendered version of the docs here: https://www.leouieda.com/pooch-docs-refactor/

Reminders:

Run make format and make check to make sure the code follows the style guide.
Add tests for new features or tests that would have caught the bug that you're fixing.
Add new public functions/methods/classes to doc/api/index.rst and the base __init__.py file for the package.
Write detailed docstrings for all functions/classes/methods. It often helps to design better code if you write the docstrings first.
If adding new functionality, add an example to the docstring, gallery, and/or tutorials.
Add your full name, affiliation, and ORCID (optional) to the AUTHORS.md file (if you haven't already) in case you'd like to be listed as an author on the Zenodo archive of the next release.

Break up the usage documentation into separate pages: basic usage (Training your Pooch with some elements removed), single file downloads, downloaders, processors, and advanced tricks. Moved the processor and downloader specifications from the docstring of Pooch.fetch to the respective pages and link to them from the docstrings. Also did general updates to the docstrings (for example, to include the new retrieve function) and tutorials. Separated the side menu into Getting Started, User Guide, and Reference Documentation like in all other projects.

leouieda · 2020-07-31T14:29:22Z

It would be great to get a couple of eyes on this before proceeding. What do you think of the new layout?

danshapero · 2020-07-31T15:09:06Z

This looks great, thanks Leo!

Add an intermediate level tutorial

hugovk · 2020-08-02T09:16:27Z

Just had a cursory check, but looks good!

By the way, it's now possible to enable Read the Docs to autobuild for PRs. See:

Move things between them so that the beginner tutorial is really minimal

Start with retrieve and include links to setting up Pooch through the beginner tutorial

leouieda · 2020-08-03T11:03:15Z

Thanks for the comments @danshapero and @hugovk. I didn't RTD was doing that now but it looks pretty cool! Might give it a shot.

I made a few more tweaks and divided the Training your Pooch tutorial into 3 sections: beginner, intermediate, and advanced. I made the pooch.retrieve tutorial as a first-contact with the package since it's the simplest and doesn't require any setup. It then points people to the beginner level tutorial to set up Pooch properly.

I'll give this some time for other to see and comment. I'm pretty happy with the layout and will stop messing around with it now (sorry).

MarkWieczorek · 2020-08-03T11:57:14Z

I don't think that the beginner documentation shows what a beginner (i.e., me) would want to know. I find the example of using pooch.create() to be a little unclear. You return GOODBOY but it is not clear to me what a GOODBOY is or does. I would probably rename GOODBOY to something more descriptive.

The most basic feature of pooch (to me) is to download a single file using retrieve. In my datasets module, 99% of my use of pooch looks like this:

from pooch import os_cache
from pooch import retrieve
from pooch import HTTPDownloader

def GRGM1200B():
    '''
    GRGM1200B is a dataset of spherical harmonic coefficients. This function downloads the file from NASA's PDS and
    then returns a pyshtools.SHCoeffs class instance that contains the coefficients.
    '''
    fname = retrieve(
        url="https://core2.gsfc.nasa.gov/PGDA/data/MoonRM1/sha.grgm1200b_sigma",  # noqa: E501
        known_hash="sha256:f08a988b43f3eaa5a2089045a9b7e41e02f16542c7912b87ea34366fafa39bc5",  # noqa: E501
        downloader=HTTPDownloader(progressbar=True),
        path=os_cache('pyshtools'),
    )
    return SHGravCoeffs.from_file(fname, header_units='m', r0_index=1, gm_index=0, errors=True)

That is: I use retrieve to download the file and return the path of the file, and then I read the file and return a custom data structure (for the docs, I would replace my custom data structure with just a call to np.load() or something simple).

Here is another simple example showing how to use ftp and then unzip the file:

    fname = retrieve(
        url="ftp://swarm-diss.eo.esa.int/Level2longterm/MLI/SW_OPER_MLI_SHA_2D_00000000T000000_99999999T999999_0501.ZIP",  # noqa: E501
        known_hash="sha256:53b92d229ff9416c4cd5663975bdcb23f193f41e7212f2956685dae34dbc6f7f",  # noqa: E501
        downloader=FTPDownloader(progressbar=True),
        processor=Unzip(),
        path=os_cache('pyshtools'),
    )

And here is an example where you decompress a gzip file before saving it.

    fname = retrieve(
        url="https://zenodo.org/record/3876495/files/Morschhauser2014.txt.gz?download=1",  # noqa: E501
        known_hash="sha256:a86200b3147a24447ff8bba88ec6047329823275813a9f5e9505bb611e3e86e0",  # noqa: E501
        downloader=HTTPDownloader(progressbar=True),
        path=os_cache('pyshtools'),
        processor=Decompress(),
    )

So, my point is: These three examples show everything that I (a beginner) needed to know to create the pyshtools datasets module. If these examples were on page 1 ("start here"), I wouldn't need to read any further.

In my opinion "beginner" means download a single file, and "intermediate" means deal with more than 1 file and registries.

leouieda · 2020-08-03T13:19:45Z

@MarkWieczorek thanks for the inputs!

I agree that the first example should be downloading a single file with retrieve, which is why I moved that particular example to the "Getting Started" section right after "Installing" and included a note at the top of the beginner tutorial pointing there as well. The 3 levels under "Training your Pooch" are geared towards creating something that will manage a registry of files (which is the original use case for Pooch). Maybe this could be clearer, though. I wouldn't want to call this the "beginner tutorial" since retrieve is more of a utility than the main way to use Pooch for package developers (see below).

These three examples show everything that I (a beginner) needed to know to create the pyshtools datasets module.

Those examples for retrieve are actually not the recommended way to use Pooch for package developers since there is no way for a user to control the cache location or support for sandboxing package versions. Both are very common, particularly for projects that have already been storing their sample data on GitHub.

I would probably rename GOODBOY to something more descriptive.

Fair point. Got lost in the puns on that one 🙂

You return GOODBOY but it is not clear to me what a GOODBOY is or does.

Right after the code sample there is:

The GOODBOY returned by pooch.create is an instance of the Pooch class, which handles downloading files from the registry using the fetch method. See the documentation for pooch.create and pooch.Pooch for more options.

But I agree that this could be a lot clearer.

I'll make a note about expanding the current "Retrieving a data file" page to include more advanced things like FTP downloads and unzipping. But those could be added later since this PR is already quite big.

Added some docs explaining what the return of create is.

andersy005

👍 This looks good to me!

santisoler

Look great to me! I like how the tutorials are split by levels.

santisoler · 2020-08-06T17:23:09Z

I think we can safely merge this. If anyone finds any typo or something that should be fixed or improved, we can do that on a future PR.

leouieda added 4 commits July 31, 2020 14:11

Slight tweak to the citation file

3b8b809

Avoid bottom margins in quotes

4ab842d

Merge branch 'master' into docs-tweaks

441d8e7

leouieda requested review from danshapero and removed request for danshapero July 31, 2020 14:28

leouieda requested review from andersy005, danshapero, hugovk and santisoler July 31, 2020 14:29

danshapero approved these changes Jul 31, 2020

View reviewed changes

Separate tutorial into 3 levels

ebb105e

Add an intermediate level tutorial

leouieda added 2 commits August 3, 2020 11:07

Reorganize the tutorial levels

9179cec

Move things between them so that the beginner tutorial is really minimal

Make the retrieve tutorial the first contact

6bafcf7

Start with retrieve and include links to setting up Pooch through the beginner tutorial

leouieda added 2 commits August 3, 2020 14:43

Rename GOODBOY to POOCH and clarify what it is

60af2fa

Added some docs explaining what the return of create is.

Make it clear that retrieve is not meant for packages

61025f4

andersy005 approved these changes Aug 3, 2020

View reviewed changes

santisoler approved these changes Aug 4, 2020

View reviewed changes

santisoler merged commit b12c14a into master Aug 6, 2020

santisoler deleted the docs-tweaks branch August 6, 2020 17:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor the documentation into separate pages #202

Refactor the documentation into separate pages #202

leouieda commented Jul 31, 2020

leouieda commented Jul 31, 2020

danshapero commented Jul 31, 2020

hugovk commented Aug 2, 2020

leouieda commented Aug 3, 2020

MarkWieczorek commented Aug 3, 2020

leouieda commented Aug 3, 2020

andersy005 left a comment

santisoler left a comment

santisoler commented Aug 6, 2020

Refactor the documentation into separate pages #202

Refactor the documentation into separate pages #202

Conversation

leouieda commented Jul 31, 2020

leouieda commented Jul 31, 2020

danshapero commented Jul 31, 2020

hugovk commented Aug 2, 2020

leouieda commented Aug 3, 2020

MarkWieczorek commented Aug 3, 2020

leouieda commented Aug 3, 2020

andersy005 left a comment

Choose a reason for hiding this comment

santisoler left a comment

Choose a reason for hiding this comment

santisoler commented Aug 6, 2020