Warn about size of a downloaded page? #183

parkr · 2015-02-23T08:43:03Z

I have the feeling when checking links, HTML Proofer downloads the entire thing. If you link out to a big archive, therefore, then HTML Proofer has to download the whole archive. Are we checking the Content-Length header to ensure it's not too large? Are we limiting downloading to HTML pages only?

Thanks :)

The text was updated successfully, but these errors were encountered:

parkr · 2015-02-23T09:32:32Z

Typhoeus timeout and connecttimeout options are crucial to happiness. The default is 300 seconds. Why.

gjtorikian · 2015-02-23T16:07:56Z

I have the feeling when checking links, HTML Proofer downloads the entire thing.

FWIW it actually doesn't do this. First a HEAD request is made to see if a link is valid. If that fails, it moves on to making a GET. Some servers are not properly configured to HEAD, so it might fail as a false negative:

html-proofer/lib/html/proofer/url_validator.rb

Lines 32 to 35 in 019a410

    
           # Finally, we'll first make a HEAD request, rather than GETing all the contents. 
        
           # If the HEAD fails, we'll fall back to GET, as some servers are not configured 
        
           # for HEAD. If we've decided to check for hashes, we must do a GET--HEAD is 
        
           # not an option.

parkr closed this as completed Feb 23, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Warn about size of a downloaded page? #183

Warn about size of a downloaded page? #183

parkr commented Feb 23, 2015

parkr commented Feb 23, 2015

gjtorikian commented Feb 23, 2015

Warn about size of a downloaded page? #183

Warn about size of a downloaded page? #183

Comments

parkr commented Feb 23, 2015

parkr commented Feb 23, 2015

gjtorikian commented Feb 23, 2015