-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support DOIs with slashes in the filename #337
Conversation
Allow DOI downloaders to work with files uploaded to services like Zenodo, Figshare or Dataverse that have a forward slash in their name.
With the patch in this PR you should be able to download your zip file that have a slash in its name through: import pooch
raw_data_path = pooch.retrieve(
url=r"doi:10.5281/zenodo.7347607/Wild-Minds\/GreatApeDictionary-v1.1.zip",
known_hash="md5:8c40d1bce25619548f4daa16d63f36a3",
) I decided to use a leading backslash before the offending slash in the filename to separate it from the other slashes that are part of the url. In order to use a backslash in a Python string without meaning to use it as a escape character we need to use r-strings with It works locally for me, does it for you? |
Yes, that works for me now - thanks! |
It worth noting that I haven't changed the filename of the cached file. Currently it will contain only the portion of the original filename that follows the last slash (e.g. if the filename is |
Use our own Zenodo repository (10.5281/zenodo.7632643) to test if parse_url works as expected with the file name has a slash.
Add a new test that checks if DOIDownloaders works as expected when the filename contains a slash. Uses our own Zenodo repository.
Well, this is trickier to solve than I expected. This patch works well with For example, if we create a
And we want to load it after creating a import pooch
pup = pooch.create(
path=pooch.os_cache("testing-zenodo"),
base_url="doi:10.5281/zenodo.7632643/",
registry=None,
)
registry_file = "registry.txt"
pup.load_registry(registry_file) Then the following pup.fetch(r"santisoler\/pooch-test-data-v1.zip") And this other call will also fail because the url will not be parse correctly: pup.fetch(r"santisoler/pooch-test-data-v1.zip") Moreover, populating the registry through the repository API (see #319) will also fail at some point. Is there any other way we can solve this considering that these filenames are only present in Zenodo? |
I think this patch has a flawed design: having to introduce the backslash makes it more difficult to use by the end user, but also creates a lot of corner cases that are prone to future errors. I came out with a better solution in #340, where the Therefore, I'm closing this PR. |
Allow DOI downloaders to work with files uploaded to Zenodo that have a forward slash "/" in their name. These special cases are generated by the Zenodo-GitHub integration service, where the first portion of the filename matches the GitHub handle of the repository owned and the last portion matches the repository name. When creating the url to these repositories, the forward slashes that are part of the filename should be led by a backslash. For example:
url = "doi:10.123/zenodo.123123/my-file\/has-a-slash.zip"
.Relevant issues/PRs:
Fixes #336
Todo
In case this patch works, we need to:
DOIDownloader
retrieve