Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix outdated GitHub Docs links #1796

Merged
merged 6 commits into from
Apr 14, 2022
Merged

Fix outdated GitHub Docs links #1796

merged 6 commits into from
Apr 14, 2022

Conversation

mortenpi
Copy link
Member

@mortenpi mortenpi commented Apr 11, 2022

They have been throwing 301s. Should hopefully fix linkcheck.

The period at the end usually gets interpreted as part of the URL in
browsers, making it impossible to click on the links or copy the link
address.
@mortenpi
Copy link
Member Author

I believe the 403s that linkcheck complains about are an issue on GitHub's end, as I am also seeing them locally (intermittently) along with some 503s. I would not merge this PR until linkcheck actually passes.

@mortenpi
Copy link
Member Author

It actually starts looking like the 403s on docs.github.com are not an intermittent problem, but instead it looks like GitHub has decided to block non-browser requests based on the User-Agent. For example, simply running curl -I (which should be passing a curl/7.68 UA):

$ curl -I https://docs.github.com/en
HTTP/2 403 
x-azure-ref: 0sadXYgAAAADZSTHMRxIAT5rY1kqwPFNHQUtMMzBFREdFMDMxOQA1OTZkNzhhMi1jYTVmLTQ3OWQtYmNkYy0wODM1ODMzMTc0YjI=
accept-ranges: bytes
date: Thu, 14 Apr 2022 04:48:49 GMT
via: 1.1 varnish
x-served-by: cache-akl10327-AKL
x-cache: MISS
x-cache-hits: 0
x-timer: S1649911729.432441,VS0,VE6
strict-transport-security: max-age=31557600

vs. when you spoof a browser UA:

$ curl -I --user-agent "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36" https://docs.github.com/en
HTTP/2 200 
cache-control: private, no-store
content-type: text/html; charset=utf-8
etag: "205a6-GRu78m11eWLYGT3vvB/H5ULI/5A"
set-cookie: _csrf=Yc_DxYKyMzpLByv2ZPRwPhMW; Path=/; HttpOnly; Secure; SameSite=Lax
access-control-allow-origin: *
content-security-policy: default-src 'none';prefetch-src 'self';connect-src 'self';font-src 'self' data: githubdocs.azureedge.net;img-src 'self' data: github.githubassets.com githubdocs.azureedge.net placehold.it *.githubusercontent.com github.com;object-src 'self';script-src 'self';frame-src https://graphql.github.com/ https://www.youtube-nocookie.com;style-src 'self' 'unsafe-inline';child-src 'self'
x-dns-prefetch-control: off
expect-ct: max-age=0
x-frame-options: SAMEORIGIN
x-download-options: noopen
x-content-type-options: nosniff
x-permitted-cross-domain-policies: none
referrer-policy: strict-origin-when-cross-origin
x-xss-protection: 0
x-powered-by: Next.js
x-azure-ref: 0qadXYgAAAACteYID9J/NRqwA1Kgj8QVXQUtMMzBFREdFMDMxMgA1OTZkNzhhMi1jYTVmLTQ3OWQtYmNkYy0wODM1ODMzMTc0YjI=
accept-ranges: bytes
date: Thu, 14 Apr 2022 04:48:42 GMT
via: 1.1 varnish
x-served-by: cache-akl10328-AKL
x-cache: CONFIG_NOCACHE, MISS
x-cache-hits: 0
x-timer: S1649911722.673083,VS0,VE339
vary: Accept-Encoding
strict-transport-security: max-age=31557600
content-length: 132518

To work around this, I changed linkcheck to spoof a realistic browser UA (somewhat arbitrarily picking the example Chrome UA from Mozilla docs), which makes linkcheck pass again with the docs.github.com links.

I considered having the spoofed UA be a fallback, if the initial linkcheck fails, but the complexity is probably not worth it. This does mean that theoretically it's possible that some linkchecks will now fail because the servers will consider this UA to be suspicious or something along those lines. But I think it should mostly have the opposite effect --- any server that cares about the UA on this level will hopefully consider this UA to be more trustworthy.

@mortenpi mortenpi merged commit 78b6f67 into master Apr 14, 2022
@mortenpi mortenpi deleted the mp/fix-links branch April 14, 2022 23:28
@mortenpi mortenpi mentioned this pull request Apr 26, 2022
@mortenpi mortenpi mentioned this pull request Feb 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant