Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check file after downloading and redownload if corrupt? #19

Open
jaintj95 opened this issue Apr 12, 2020 · 4 comments
Open

Check file after downloading and redownload if corrupt? #19

jaintj95 opened this issue Apr 12, 2020 · 4 comments

Comments

@jaintj95
Copy link

I used the script to download 14GB worth of files and more than 60% of them turned to be corrupt files to due to incomplete downloads.
Would be great if somehow we could check that the downloaded file is not corrupt.
If corrupt: reinitiate download.

@emmaKts
Copy link

emmaKts commented Apr 13, 2020

I tried this out of curiosity and that happened to me as well, most of the files were corrupted. I have created my own script anyway, but it's worth mentioning as a lot of people are using it.

Thanks for this though @alexgand, much appreciated!

@VikashKothary
Copy link
Contributor

@emmaKts How does your script differ from this one as such to prevent the corruption issue?

@alexgand
Copy link
Owner

Here all downloads were ok, perhaps the issue is related with the quality of the connection.

I'll leave the issue open, if anyone know how to do this (check if the file is corrupted and restart the download), feel free to do a pull request!

@pjungermann
Copy link
Contributor

pjungermann commented Apr 16, 2020

@jaintj95 it might be that a lot of those were ePub files which were actually PDF files (as no ePub existed). There is a fix since PR #25. Maybe that already helps. (I'd check whether it is part of your local clone; see also PR #26).

A check for PDF files could be done by using e.g. PyPDF2 and ePub files could get checked using zipfile (as it is a zip at the end). This would make it slower, of course. And there is the question on how often you want to try again and how to react if those tries are all used.

pjungermann added a commit to pjungermann/springer_free_books that referenced this issue Apr 16, 2020
- check whether PDFs are valid using PyPDF2
- check whether ePubs (=Zips) are valid using zipfile
- try 3 times and then give up and continue
    - print error information
    - can be recovered/tried again by running the downloader again
pjungermann added a commit to pjungermann/springer_free_books that referenced this issue Apr 16, 2020
- check whether PDFs are valid using PyPDF2
- check whether ePubs (=Zips) are valid using zipfile
- try 3 times and then give up and continue
    - print error information
    - can be recovered/tried again by running the downloader again
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants