Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor the documentation into separate pages #202

Merged
merged 9 commits into from
Aug 6, 2020
Merged

Conversation

leouieda
Copy link
Member

Break up the usage documentation into separate pages: basic usage
(Training your Pooch with some elements removed), single file downloads,
downloaders, processors, and advanced tricks. Moved the processor and
downloader specifications from the docstring of Pooch.fetch to the
respective pages and link to them from the docstrings. Also did general
updates to the docstrings (for example, to include the new retrieve
function) and tutorials. Separated the side menu into Getting Started,
User Guide, and Reference Documentation like in all other projects.

Fixes #188

See a rendered version of the docs here: https://www.leouieda.com/pooch-docs-refactor/

Reminders:

  • Run make format and make check to make sure the code follows the style guide.
  • Add tests for new features or tests that would have caught the bug that you're fixing.
  • Add new public functions/methods/classes to doc/api/index.rst and the base __init__.py file for the package.
  • Write detailed docstrings for all functions/classes/methods. It often helps to design better code if you write the docstrings first.
  • If adding new functionality, add an example to the docstring, gallery, and/or tutorials.
  • Add your full name, affiliation, and ORCID (optional) to the AUTHORS.md file (if you haven't already) in case you'd like to be listed as an author on the Zenodo archive of the next release.

Break up the usage documentation into separate pages: basic usage
(Training your Pooch with some elements removed), single file downloads,
downloaders, processors, and advanced tricks. Moved the processor and
downloader specifications from the docstring of Pooch.fetch to the
respective pages and link to them from the docstrings. Also did general
updates to the docstrings (for example, to include the new retrieve
function) and tutorials. Separated the side menu into Getting Started,
User Guide, and Reference Documentation like in all other projects.
@leouieda leouieda requested review from danshapero and removed request for danshapero July 31, 2020 14:28
@leouieda
Copy link
Member Author

It would be great to get a couple of eyes on this before proceeding. What do you think of the new layout?

@danshapero
Copy link
Contributor

This looks great, thanks Leo!

Add an intermediate level tutorial
@hugovk
Copy link
Member

hugovk commented Aug 2, 2020

Just had a cursory check, but looks good!

By the way, it's now possible to enable Read the Docs to autobuild for PRs. See:

Move things between them so that the beginner tutorial is really minimal
Start with retrieve and include links to setting up Pooch through the
beginner tutorial
@leouieda
Copy link
Member Author

leouieda commented Aug 3, 2020

Thanks for the comments @danshapero and @hugovk. I didn't RTD was doing that now but it looks pretty cool! Might give it a shot.

I made a few more tweaks and divided the Training your Pooch tutorial into 3 sections: beginner, intermediate, and advanced. I made the pooch.retrieve tutorial as a first-contact with the package since it's the simplest and doesn't require any setup. It then points people to the beginner level tutorial to set up Pooch properly.

I'll give this some time for other to see and comment. I'm pretty happy with the layout and will stop messing around with it now (sorry).

@MarkWieczorek
Copy link

I don't think that the beginner documentation shows what a beginner (i.e., me) would want to know. I find the example of using pooch.create() to be a little unclear. You return GOODBOY but it is not clear to me what a GOODBOY is or does. I would probably rename GOODBOY to something more descriptive.

The most basic feature of pooch (to me) is to download a single file using retrieve. In my datasets module, 99% of my use of pooch looks like this:

from pooch import os_cache
from pooch import retrieve
from pooch import HTTPDownloader

def GRGM1200B():
    '''
    GRGM1200B is a dataset of spherical harmonic coefficients. This function downloads the file from NASA's PDS and
    then returns a pyshtools.SHCoeffs class instance that contains the coefficients.
    '''
    fname = retrieve(
        url="https://core2.gsfc.nasa.gov/PGDA/data/MoonRM1/sha.grgm1200b_sigma",  # noqa: E501
        known_hash="sha256:f08a988b43f3eaa5a2089045a9b7e41e02f16542c7912b87ea34366fafa39bc5",  # noqa: E501
        downloader=HTTPDownloader(progressbar=True),
        path=os_cache('pyshtools'),
    )
    return SHGravCoeffs.from_file(fname, header_units='m', r0_index=1, gm_index=0, errors=True)

That is: I use retrieve to download the file and return the path of the file, and then I read the file and return a custom data structure (for the docs, I would replace my custom data structure with just a call to np.load() or something simple).

Here is another simple example showing how to use ftp and then unzip the file:

    fname = retrieve(
        url="ftp://swarm-diss.eo.esa.int/Level2longterm/MLI/SW_OPER_MLI_SHA_2D_00000000T000000_99999999T999999_0501.ZIP",  # noqa: E501
        known_hash="sha256:53b92d229ff9416c4cd5663975bdcb23f193f41e7212f2956685dae34dbc6f7f",  # noqa: E501
        downloader=FTPDownloader(progressbar=True),
        processor=Unzip(),
        path=os_cache('pyshtools'),
    )

And here is an example where you decompress a gzip file before saving it.

    fname = retrieve(
        url="https://zenodo.org/record/3876495/files/Morschhauser2014.txt.gz?download=1",  # noqa: E501
        known_hash="sha256:a86200b3147a24447ff8bba88ec6047329823275813a9f5e9505bb611e3e86e0",  # noqa: E501
        downloader=HTTPDownloader(progressbar=True),
        path=os_cache('pyshtools'),
        processor=Decompress(),
    )

So, my point is: These three examples show everything that I (a beginner) needed to know to create the pyshtools datasets module. If these examples were on page 1 ("start here"), I wouldn't need to read any further.

In my opinion "beginner" means download a single file, and "intermediate" means deal with more than 1 file and registries.

@leouieda
Copy link
Member Author

leouieda commented Aug 3, 2020

@MarkWieczorek thanks for the inputs!

I agree that the first example should be downloading a single file with retrieve, which is why I moved that particular example to the "Getting Started" section right after "Installing" and included a note at the top of the beginner tutorial pointing there as well. The 3 levels under "Training your Pooch" are geared towards creating something that will manage a registry of files (which is the original use case for Pooch). Maybe this could be clearer, though. I wouldn't want to call this the "beginner tutorial" since retrieve is more of a utility than the main way to use Pooch for package developers (see below).

These three examples show everything that I (a beginner) needed to know to create the pyshtools datasets module.

Those examples for retrieve are actually not the recommended way to use Pooch for package developers since there is no way for a user to control the cache location or support for sandboxing package versions. Both are very common, particularly for projects that have already been storing their sample data on GitHub.

I would probably rename GOODBOY to something more descriptive.

Fair point. Got lost in the puns on that one 🙂

You return GOODBOY but it is not clear to me what a GOODBOY is or does.

Right after the code sample there is:

The GOODBOY returned by pooch.create is an instance of the Pooch class, which handles downloading files from the registry using the fetch method. See the documentation for pooch.create and pooch.Pooch for more options.

But I agree that this could be a lot clearer.

I'll make a note about expanding the current "Retrieving a data file" page to include more advanced things like FTP downloads and unzipping. But those could be added later since this PR is already quite big.

Copy link
Member

@andersy005 andersy005 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 This looks good to me!

Copy link
Member

@santisoler santisoler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look great to me! I like how the tutorials are split by levels.

@santisoler
Copy link
Member

I think we can safely merge this. If anyone finds any typo or something that should be fixed or improved, we can do that on a future PR.

@santisoler santisoler merged commit b12c14a into master Aug 6, 2020
@santisoler santisoler deleted the docs-tweaks branch August 6, 2020 17:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Documentation section with tips & tricks / FAQ
6 participants