Check file after downloading and redownload if corrupt? #19

jaintj95 · 2020-04-12T09:28:19Z

I used the script to download 14GB worth of files and more than 60% of them turned to be corrupt files to due to incomplete downloads.
Would be great if somehow we could check that the downloaded file is not corrupt.
If corrupt: reinitiate download.

emmaKts · 2020-04-13T16:50:26Z

I tried this out of curiosity and that happened to me as well, most of the files were corrupted. I have created my own script anyway, but it's worth mentioning as a lot of people are using it.

Thanks for this though @alexgand, much appreciated!

VikashKothary · 2020-04-13T18:26:06Z

@emmaKts How does your script differ from this one as such to prevent the corruption issue?

alexgand · 2020-04-15T21:40:57Z

Here all downloads were ok, perhaps the issue is related with the quality of the connection.

I'll leave the issue open, if anyone know how to do this (check if the file is corrupted and restart the download), feel free to do a pull request!

pjungermann · 2020-04-16T00:47:47Z

@jaintj95 it might be that a lot of those were ePub files which were actually PDF files (as no ePub existed). There is a fix since PR #25. Maybe that already helps. (I'd check whether it is part of your local clone; see also PR #26).

A check for PDF files could be done by using e.g. PyPDF2 and ePub files could get checked using zipfile (as it is a zip at the end). This would make it slower, of course. And there is the question on how often you want to try again and how to react if those tries are all used.

- check whether PDFs are valid using PyPDF2 - check whether ePubs (=Zips) are valid using zipfile - try 3 times and then give up and continue - print error information - can be recovered/tried again by running the downloader again

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check file after downloading and redownload if corrupt? #19

Check file after downloading and redownload if corrupt? #19

jaintj95 commented Apr 12, 2020

emmaKts commented Apr 13, 2020 •

edited

Loading

VikashKothary commented Apr 13, 2020

alexgand commented Apr 15, 2020

pjungermann commented Apr 16, 2020 •

edited

Loading

Check file after downloading and redownload if corrupt? #19

Check file after downloading and redownload if corrupt? #19

Comments

jaintj95 commented Apr 12, 2020

emmaKts commented Apr 13, 2020 • edited Loading

VikashKothary commented Apr 13, 2020

alexgand commented Apr 15, 2020

pjungermann commented Apr 16, 2020 • edited Loading

emmaKts commented Apr 13, 2020 •

edited

Loading

pjungermann commented Apr 16, 2020 •

edited

Loading