Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proofer falsely reports failure for CGI-escaped URLs #385

Closed
lucperkins opened this issue Feb 18, 2017 · 3 comments
Closed

Proofer falsely reports failure for CGI-escaped URLs #385

lucperkins opened this issue Feb 18, 2017 · 3 comments
Assignees

Comments

@lucperkins
Copy link

The following URL returns a 200 but is currently flagged as a failure by html-proofer:

https://reflect-io-packages.s3.amazonaws.com/reflect-agent/reflect-agent_0.4.6-1~bpo8%2B1.290_amd64.deb

I suspect that the issue may have something to do with CGI escaping in the URLs (note the %2B). Here's the error message from html-proofer:

External link https://reflect-io-packages.s3.amazonaws.com/reflect-agent/reflect-agent_0.4.6-1~bpo7+1.290_amd64.deb failed: 403 No error

In the error output, the %2B seems to have been converted back to a +.

Does anyone know of a workaround for this? Is there a way to prevent html-proofer from de-escaping URLs? I'm happy to dig in and potentially submit a PR for this, just wanted to check first if this is intended behavior or has been encountered by others.

@gjtorikian
Copy link
Owner

I believe this might actually be the result of the script not having access to the S3 package. The 403 status is a clue for that.

Even if I try to access this link in the browser, I get Amazon's default blocked message:

<Error>
<Code>AccessDenied</Code>
<Message>Access Denied</Message>
<RequestId>C40F353A3530B218</RequestId>
<HostId>
WjuU7TDeSh2LDC9rFF0WQgkKz+pMdlVzYscIRBsloctMTwwp+VTE2B9FhPq6xHgrXB7qoxg+iJc=
</HostId>
</Error>

Is this file configured to be readable?

@lucperkins
Copy link
Author

lucperkins commented Feb 21, 2017

My apologies for a poor explanation! The issue here is that the URL given in the error message returns a 403. Here's the URL that's in my actual HTML (which returns a 200):

https://reflect-io-packages.s3.amazonaws.com/reflect-agent/reflect-agent_0.4.6-1~bpo7%2B1.290_amd64.deb

So it looks like it's maybe un-CGI-escaping the URLs before it checks them. I created a workaround by ignoring 403 errors and just using HTTParty to check those URLs in my rake tasks. I'll see if I can work things out and submit a PR if I find a solution.

Also, let me use this opportunity to say that this is just an absolutely fantastic tool, 💯 out of 💯. Mad respect from a fellow tech writer and member of the Write the Docs community.

@gjtorikian
Copy link
Owner

Also, let me use this opportunity to say that this is just an absolutely fantastic tool, 💯 out of 💯. Mad respect from a fellow tech writer and member of the Write the Docs community.

👍 Thanks! I'm glad you find it useful. 😁

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants