-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem verifying the hash of a downloaded zip file from ICGEM #185
Comments
Hi @MarkWieczorek I suspect that the URL you're given actually redirects to the file URL. Pooch has no way of guessing that it does that so it is downloading the redirect page instead of the file. In To get around that, you need to first make a request to get the actual file URL and then pass that to the downloader. There is an example here https://www.fatiando.org/pooch/latest/usage.html#custom-downloaders (you can skip the authentication part) |
If I unzip the file
I get the correct file What is odd, though, is that the hash of the zipped file pooch downloads is
But if I download the zip file from a browser, the hash is different:
I don't know how the hash could be different. I don't think that the filename affects the hash. Lastly: Do you know how my browser knows how to save the file with the name |
Ok. I just made some progress: After downloading the "same" file from the ICGEM website, I realized that the hash of this file was different each time. I suspect that what is happening is that there is some kind of time stamp in the file. Perhaps they are zipping these files whenever they are requested. I'm going to contact them about this, as the hash problem of the zip archive is not something that we can solve. Nevertheless, I have two suggestions:
|
Ah that would definitely break things! For now, one way around that would be to make a custom downloader that unzips the file the in memory before saving (which might not be ideal). Or saves it to a temp file and then moves the unzipped version. Then you can store the hash of the unzipped file.
My guess is that the file name is located somewhere in the HTTP GET request. Pooch takes the file name from the URL because I'm ignorant and didn't consider this use case 🙂 If using
In Getting the file name from the request would be difficult because right now Pooch functions are transparent to the download method. Doing this would require knowing that this is HTTP. I'll keep this in mind but it might require a lot of refactoring of the code. I'm open to any suggestions, though.
This is also tricky because we don't want to touch the file if the hash doesn't match (meaning that it could be corrupted). But it can be done as stated above. I would be hesitant to make this easy, though. |
Here is the response I got from ICGEM.
So, the only thing that pooch could do at this point is
I understand why this is non ideal. Given that I am now just going to download the unzipped files, and wait until they implement gz, if you want to close this issue, I would be ok with that. |
@MarkWieczorek yep, these little tricks should be documented somewhere. We don't have a good place for them on the docs at this point. See #188 SHTools + ICGEM is going to be awesome! |
I am having a problem verifying the hash of zip file downloaded from the ICGEM website. I have downloaded zip files with pooch from other repositories, so in principle, I should be doing everything ok.
First, on the ICGEM website, you can download a file in gfc format (which works fine for me with pooch) or a zipped version. If I copy the link from the website for the zipped version of EGM2008, I get
http://icgem.gfz-potsdam.de/getmodel/zip/c50128797a9cb62e936337c890e4425f03f0461d7329b09a8cc8561504465340
Using this link in a browser downloads and saves the file: EGM2008.zip (from which I computed the sha256 hash).
Using pooch the file is download to the filename
d99404d2e294332575026111bd03dbf3-c50128797a9cb62e936337c890e4425f03f0461d7329b09a8cc8561504465340
Pooch however complains that the hash of the file doesn't match
ValueError: SHA256 hash of downloaded file (d99404d2e294332575026111bd03dbf3-c50128797a9cb62e936337c890e4425f03f0461d7329b09a8cc8561504465340) does not match the known hash: expected sha256:9393a9100a61bab4353d8f8d429cbc3b344153690adfbf5ac678eec92ab9fdef but got 92d03699ad51510b4faf815a9c3c59db8211c9a8d18c576717a90a4ece493153. Deleted download for safety. The downloaded file may have been corrupted or the known hash may be outdated.
I do not want to unzip the file (it will be unzipped on the fly when needed).
Any ideas?
Here is the code
The text was updated successfully, but these errors were encountered: