-
Notifications
You must be signed in to change notification settings - Fork 366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check file after downloading and redownload if corrupt? #19
Comments
I tried this out of curiosity and that happened to me as well, most of the files were corrupted. I have created my own script anyway, but it's worth mentioning as a lot of people are using it. Thanks for this though @alexgand, much appreciated! |
@emmaKts How does your script differ from this one as such to prevent the corruption issue? |
Here all downloads were ok, perhaps the issue is related with the quality of the connection. I'll leave the issue open, if anyone know how to do this (check if the file is corrupted and restart the download), feel free to do a pull request! |
@jaintj95 it might be that a lot of those were ePub files which were actually PDF files (as no ePub existed). There is a fix since PR #25. Maybe that already helps. (I'd check whether it is part of your local clone; see also PR #26). A check for PDF files could be done by using e.g. PyPDF2 and ePub files could get checked using zipfile (as it is a zip at the end). This would make it slower, of course. And there is the question on how often you want to try again and how to react if those tries are all used. |
- check whether PDFs are valid using PyPDF2 - check whether ePubs (=Zips) are valid using zipfile - try 3 times and then give up and continue - print error information - can be recovered/tried again by running the downloader again
- check whether PDFs are valid using PyPDF2 - check whether ePubs (=Zips) are valid using zipfile - try 3 times and then give up and continue - print error information - can be recovered/tried again by running the downloader again
I used the script to download 14GB worth of files and more than 60% of them turned to be corrupt files to due to incomplete downloads.
Would be great if somehow we could check that the downloaded file is not corrupt.
If corrupt: reinitiate download.
The text was updated successfully, but these errors were encountered: