Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add example for fetching mutating datasets #144

Merged
merged 4 commits into from
Mar 1, 2020
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 46 additions & 0 deletions doc/usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,52 @@ Alternative hashing algorithms supported by :mod:`hashlib` can be used if necess
print(pooch.file_hash("data/c137.csv", alg="sha512"))


Bypassing the hash check
------------------------

Sometimes we might not know the hash of the file or it could change on the server
periodically. In these cases, we need a way of bypassing the hash check.
One way of doing that is with Python's ``unittest.mock`` module. It defines the object
``unittest.mock.ANY`` which passes all equality tests made against it. To bypass the
check, we can set the hash value to ``unittest.mock.ANY`` when specifying the
``registry`` argument for :func:`pooch.create`.

In this example, we want to use Pooch to download a list of weather stations around
Australia. The file with the stations is in an FTP server and we want to store it locally
in separate folders for each day that the code is run. The problem is that the
``stations.zip`` file is updated on the server instead of creating a new one, so the
hash check would fail. This is how you can solve this problem:


.. code:: python

import datetime
import unittest.mock
import pooch

# Get the current data to store the files in separate folders
CURRENT_DATE = datetime.datetime.now().date()
lmartinking marked this conversation as resolved.
Show resolved Hide resolved

GOODBOY = pooch.create(
path=pooch.os_cache("bom_daily_stations") / CURRENT_DATE,
base_url="ftp://ftp.bom.gov.au/anon2/home/ncc/metadata/sitelists/",
# Use ANY for the hash value to ignore the checks
registry={
"stations.zip": unittest.mock.ANY,
},
)

Because hash check is always ``True``, Pooch will only download the file once. When running
again at a different date, the file will be downloaded again because the local cache folder
changed and the file is no longer present in it. If you omit ``CURRENT_DATE`` from the cache
path, then Pooch will only fetch the files once, unless they are deleted from the cache.

.. note::

If run over a period of time, your cache directory will increase in size, as the
files are stored in daily subdirectories.


Versioning
----------

Expand Down