Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sphinx linkcheck and broken/redirect occurrences in Python Docs #103484

Open
rffontenelle opened this issue Apr 12, 2023 · 29 comments
Open

Sphinx linkcheck and broken/redirect occurrences in Python Docs #103484

rffontenelle opened this issue Apr 12, 2023 · 29 comments
Labels
3.11 only security fixes 3.12 bugs and security fixes docs Documentation in the Doc dir

Comments

@rffontenelle
Copy link
Contributor

rffontenelle commented Apr 12, 2023

Running make linkcheck in Doc folder outputs thousands of broken and redirected status. It would be nice to clean this up so we can link-check Python in CI/CD for each commit, but right now it is too polluted.

Some of these occurrences could/should be fixed in the docs itself, others can benefit from Sphinx's linkcheck configs e.g. linkcheck_ignore and linkcheck_allow_redirect.

See linkcheck-log.txt for the full log as of commit f2b7ecb.

Fun stats obtained from the above file:

  • Of its 8327 lines, where 7824 are related to BPO -> GH issues. Of 7824, 5744 lines are from whatsnew/changelog and only 20 are not from whatsnew/
  • 241 lines are redirection of CPython CVS URL, fixing /tree/ to /blob/ in GitHub URL
  • 28 lines are broken links, where one broken link is 'https://' from an example in whatsnet/changelog

The way I see, this steps divide in:

  • Clean BPO to GH redirection messages
  • Fix broken links
  • Clean CPython CVS URL redirections
  • Clean GH issues to GH PR redirections
  • Fix the rest of the occurrences.

linkcheck_allow_redirect and linkcheck_ignore can be very handy in this case. linkcheck_allow_redirect makes 'ok' status a redirect that is being spotted by linkcheck, and we have linkcheck_ignore as the last resource.

Questions I have before implementing the solution:

  • Documentation hosted by Read The Docs may have language enabled so example.com is redirected to example.com/en/latest. To handle occurrences, I could add them to linkcheck_allow_redirect or we can use sphinx-ext-intersphinx to map a keyword to the documentation URL (e.g 'rtd' for read-the-docs docs). The last option allows to map to proper language of the target URL linked, similar on how Weblate did)
  • Is there any restrictions to fix broken/redirect links in old whatsnew/.rst?
  • Is there any restrictions to fix broken/redirect links in old whatsnew/changelog.rst (i.e. Misc/NEWS)?
  • Should I create a single Pull Request for all the fixes?

Linked PRs

@rffontenelle rffontenelle added the docs Documentation in the Doc dir label Apr 12, 2023
@merwok
Copy link
Member

merwok commented Apr 12, 2023

What about ignoring the 7824 bpo redirects? They work fine, we control bpo and can keep the redirector working, we don’t need to change 8000 lines in historical whatsnew docs.

The CVS redirections can be fixed, as well as the broken links.

@hugovk
Copy link
Member

hugovk commented Apr 12, 2023

(For reference, some earlier discussion at https://discuss.python.org/t/sphinx-linkcheck-and-broken-redirect-occurrences-in-python-docs/25687)

  • Of its 8327 lines, where 7824 are related to BPO -> GH issues. Of 7824, 5744 lines are from whatsnew/changelog and only 20 are not from whatsnew/

Yes, let's ignore the BPO redirects. There's so many, I think either linkcheck_allow_redirect or linkcheck_ignore, whichever is quicker for linkcheck to process.

We actually have a linkcheck_ignore for BPO, but the regex doesn't match, I guess it's gone out of sync:

linkcheck_ignore = [r'https://bugs.python.org/(issue)?\d+']

  • 241 lines are redirection of CPython CVS URL, fixing /tree/ to /blob/ in GitHub URL

Please could you give some examples of these?

  • Documentation hosted by Read The Docs may have language enabled so example.com is redirected to example.com/en/latest. To handle occurrences, I could add them to linkcheck_allow_redirect or we can use sphinx-ext-intersphinx to map a keyword to the documentation URL (e.g 'rtd' for read-the-docs docs). The last option allows to map to proper language of the target URL linked, similar on how Weblate did)

If there's not too many, we could add to linkcheck_allow_redirect, similar to the devguide. There may be some other useful rules to copy over:

https://github.com/python/devguide/blob/79e1a1f2961ba0882d5f0274f7c8876e36ca5dff/conf.py#L60-L107

  • Is there any restrictions to fix broken/redirect links in old whatsnew/.rst?
  • Is there any restrictions to fix broken/redirect links in old whatsnew/changelog.rst (i.e. Misc/NEWS)?

Not as such, although effort is usually prioritised for the newer/more active pages. Cleaning them up would mean linkcheck output is more useful in the future.

The way I see, this steps divide in:

  • Clean BPO to GH redirection messages
  • Fix broken links
  • Clean CPython CVS URL redirections
  • Clean GH issues to GH PR redirections
  • Fix the rest of the occurrences.

...

  • Should I create a single Pull Request for all the fixes?

Maybe a PR for each of these steps? And split smaller if you feel they're too big. Smaller PRs are easier to review and get merged.

@rffontenelle
Copy link
Contributor Author

  • 241 lines are redirection of CPython CVS URL, fixing /tree/ to /blob/ in GitHub URL

Please could you give some examples of these?

Sure. The file Doc/about.rst has :source:`Misc/ACKS` , which links to that file in CPython's source code repository. The custom role "source" is defined in Doc/tools/extensions/pyspecific.py to https://github.com/python/cpython/tree/main/ + the suffix.

GitHub URL for browsing files is /blob/ and directories is /tree/. When browsing Misc/ACKS, the /tree/ redirects to /blob/.

From linkcheck stdout:

about: line   33) redirect  https://github.com/python/cpython/tree/main/Misc/ACKS - permanently to https://github.com/python/cpython/blob/main/Misc/ACKS

See in the above log how the redirect was from /tree/ to /blob/.

  • Is there any restrictions to fix broken/redirect links in old whatsnew/.rst?
  • Is there any restrictions to fix broken/redirect links in old whatsnew/changelog.rst (i.e. Misc/NEWS)?

Not as such, although effort is usually prioritised for the newer/more active pages. Cleaning them up would mean linkcheck output is more useful in the future.

Great, I'll focus on the cleanup then.

  • Should I create a single Pull Request for all the fixes?

Maybe a PR for each of these steps? And split smaller if you feel they're too big. Smaller PRs are easier to review and get merged.

Will do, thanks for the suggestion.

@hugovk
Copy link
Member

hugovk commented Apr 13, 2023

Thanks, we can add the tree->blob redirects to linkcheck_allow_redirect like: https://github.com/python/devguide/blob/main/conf.py#L73-L74

@merwok
Copy link
Member

merwok commented Apr 13, 2023

Why not do the one-line fix to pyspecific rather?

@rffontenelle
Copy link
Contributor Author

rffontenelle commented Apr 13, 2023

Because in :source:`something` the URL has /tree/ if something is a directory and /blob/ if it is a file. GitHub redirects both ways. If we put /blob/ in pyspecific.py, then a :source:`Misc/` would result in /blob/ being redirected to /tree/.

EDIT: See some examples:

$ curl -sI https://github.com/python/cpython/blob/main/Misc | grep -E '(^HTTP|^location:)'
HTTP/2 301 
location: https://github.com/python/cpython/tree/main/Misc
$ curl -sI https://github.com/python/cpython/tree/main/Misc/ACKS | grep -E '(^HTTP|^location:)'
HTTP/2 301 
location: https://github.com/python/cpython/blob/main/Misc/ACKS

@rffontenelle
Copy link
Contributor Author

Suggestion please?

bpo-31453: Add TLSVersion constants and SSLContext.maximum_version / minimum_version attributes. The new API wraps OpenSSL 1.1 https://www.openssl.org/docs/man1.1.0/ssl/SSL_CTX_set_min_proto_version.html feature.

The text talks about OpenSSL 1.1.0 and the URL for version 1.1.0 is broken (HTTP 404), but there are valid URLs for the same resource pointing to OpenSSL version 1.1.1 and to 3.1.0. Should I update the URL to either one, or use wayback machine to make sure to match the OpenSSL version?

@merwok
Copy link
Member

merwok commented Apr 13, 2023

If openssl 1.1.1 is a bugfix release of the 1.1 branch, then it would be fine to point there.

@rffontenelle
Copy link
Contributor Author

Created the first PR. I'm planning to create other 2 PRs, with more file changes than this one.

@rffontenelle
Copy link
Contributor Author

I noticed the docs.yml workflow file runs for all branches from main to 3.7. Should I back-port the fixes to all these versions (cases were the bot won't be able to back-port automatically) or is there some code freeze that prevents it?

@merwok merwok added 3.11 only security fixes 3.12 bugs and security fixes labels Apr 16, 2023
@merwok
Copy link
Member

merwok commented Apr 16, 2023

Backporting doc fixes to 3.11 is fine, but older branches only accept security fixes.

@rffontenelle
Copy link
Contributor Author

Got it. It means a linkcheck job in docs.yml will require a condition check to exclude versions < 3.11. Thanks.

@merwok
Copy link
Member

merwok commented Apr 16, 2023

A workflow in the main branch applies to other branches?! What a strange system.

@rffontenelle
Copy link
Contributor Author

rffontenelle commented Apr 16, 2023

Yep. It is a centralized way to control what routines will run in which events (push, pull requests, etc.) and in which branches.

@hugovk
Copy link
Member

hugovk commented Apr 17, 2023

@rffontenelle
Copy link
Contributor Author

I was sure until now. I mean, what's the point of having branch filters with they don't actually filter on which branch a workflow can or cannot be run?

on:
  push:
    branches:
      - main
      - '3.11'
      - '3.10'
    ...

@hugovk
Copy link
Member

hugovk commented Apr 17, 2023

I expect it's to have as few changes as possible between branches to make it easier for the bot to backport (and for humans when the bot can't do it).

@rffontenelle
Copy link
Contributor Author

Well, I stand corrected. That means a linkcheck job will need to be added to both main and 3.11. No problem.

hugovk added a commit that referenced this issue Apr 18, 2023
…es (#103569)

Co-authored-by: Hugo van Kemenade <hugovk@users.noreply.github.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Apr 18, 2023
…st cases (pythonGH-103569)

(cherry picked from commit f39e00f)

Co-authored-by: Rafael Fontenelle <rffontenelle@users.noreply.github.com>
Co-authored-by: Hugo van Kemenade <hugovk@users.noreply.github.com>
@hugovk
Copy link
Member

hugovk commented Apr 18, 2023

We probably don't want to run linkcheck on the CI for a couple of reasons:

  • it takes about 50 minutes to run
  • we can get new failures when things change external to a PR, like a website going down or temporary network failures

miss-islington added a commit that referenced this issue Apr 18, 2023
…es (GH-103569)

(cherry picked from commit f39e00f)

Co-authored-by: Rafael Fontenelle <rffontenelle@users.noreply.github.com>
Co-authored-by: Hugo van Kemenade <hugovk@users.noreply.github.com>
@rffontenelle
Copy link
Contributor Author

Fixes to broken links now submitted.

carljm added a commit to carljm/cpython that referenced this issue Apr 20, 2023
* main: (24 commits)
  pythongh-98040: Move the Single-Phase Init Tests Out of test_imp (pythongh-102561)
  pythongh-83861: Fix datetime.astimezone() method (pythonGH-101545)
  pythongh-102856: Clean some of the PEP 701 tokenizer implementation (python#103634)
  pythongh-102856: Skip test_mismatched_parens in WASI builds (python#103633)
  pythongh-102856: Initial implementation of PEP 701 (python#102855)
  pythongh-103583: Add ref. dependency between multibytecodec modules (python#103589)
  pythongh-83004: Harden msvcrt further (python#103420)
  pythonGH-88342: clarify that `asyncio.as_completed` accepts generators yielding tasks (python#103626)
  pythongh-102778: IDLE - make sys.last_exc available in Shell after traceback (python#103314)
  pythongh-103582: Remove last references to `argparse.REMAINDER` from docs (python#103586)
  pythongh-103583: Always pass multibyte codec structs as const (python#103588)
  pythongh-103617: Fix compiler warning in _iomodule.c (python#103618)
  pythongh-103596: [Enum] do not shadow mixed-in methods/attributes (pythonGH-103600)
  pythonGH-100530: Change the error message for non-class class patterns (pythonGH-103576)
  pythongh-95299: Remove lingering setuptools reference in installer scripts (pythonGH-103613)
  [Doc] Fix a typo in optparse.rst (python#103504)
  pythongh-101100: Fix broken reference `__format__` in `string.rst` (python#103531)
  pythongh-95299: Stop installing setuptools as a part of ensurepip and venv (python#101039)
  pythonGH-103484: Docs: add linkcheck allowed redirects entries for most cases (python#103569)
  pythongh-67230: update whatsnew note for csv changes (python#103598)
  ...
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Sep 16, 2024
…ythonGH-123019)

(cherry picked from commit 1054a75)

Co-authored-by: Rafael Fontenelle <rffontenelle@users.noreply.github.com>
AA-Turner pushed a commit that referenced this issue Sep 16, 2024
…H-123019) (#124136)

GH-103484: Tell linkcheck to ignore debian manpage redirects (GH-123019)
(cherry picked from commit 1054a75)

Co-authored-by: Rafael Fontenelle <rffontenelle@users.noreply.github.com>
gpshead pushed a commit that referenced this issue Sep 17, 2024
Fix redirects reported by linkcheck, update docs conf.py checks.
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Sep 17, 2024
…honGH-124144)

Fix redirects reported by linkcheck, update docs conf.py checks.
(cherry picked from commit 0a32c69)

Co-authored-by: Rafael Fontenelle <rffontenelle@users.noreply.github.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Sep 17, 2024
…honGH-124144)

Fix redirects reported by linkcheck, update docs conf.py checks.
(cherry picked from commit 0a32c69)

Co-authored-by: Rafael Fontenelle <rffontenelle@users.noreply.github.com>
gpshead pushed a commit that referenced this issue Sep 17, 2024
…-124144) (GH-124152)

Fix redirects reported by linkcheck, update docs conf.py checks.
(cherry picked from commit 0a32c69)

Co-authored-by: Rafael Fontenelle <rffontenelle@users.noreply.github.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Sep 17, 2024
)

(cherry picked from commit ab80c6b)

Co-authored-by: Rafael Fontenelle <rffontenelle@users.noreply.github.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Sep 17, 2024
)

(cherry picked from commit ab80c6b)

Co-authored-by: Rafael Fontenelle <rffontenelle@users.noreply.github.com>
@vstinner
Copy link
Member

Fixed by ab80c6b. Backports will follow.

vstinner pushed a commit that referenced this issue Sep 17, 2024
…124180)

GH-103484: Fix broken links reported by linkcheck (GH-124169)
(cherry picked from commit ab80c6b)

Co-authored-by: Rafael Fontenelle <rffontenelle@users.noreply.github.com>
@rffontenelle
Copy link
Contributor Author

@vstinner I was hoping to include at least another PR for "redirect with found" reported by linkcheck, (see above comment #103484 (comment)), and then we can close this. Can you please keep this open for now?

@AA-Turner AA-Turner reopened this Sep 17, 2024
savannahostrowski pushed a commit to savannahostrowski/cpython that referenced this issue Sep 22, 2024
…honGH-124144)

Fix redirects reported by linkcheck, update docs conf.py checks.
Yhg1s pushed a commit that referenced this issue Sep 23, 2024
…H-123019) (#124137)

GH-103484: Tell linkcheck to ignore debian manpage redirects (GH-123019)
(cherry picked from commit 1054a75)

Co-authored-by: Rafael Fontenelle <rffontenelle@users.noreply.github.com>
Yhg1s pushed a commit that referenced this issue Sep 23, 2024
…124179)

GH-103484: Fix broken links reported by linkcheck (GH-124169)
(cherry picked from commit ab80c6b)

Co-authored-by: Rafael Fontenelle <rffontenelle@users.noreply.github.com>
Yhg1s pushed a commit that referenced this issue Sep 24, 2024
…-124144) (#124151)

GH-103484: Fix permanently redirects reported by linkcheck (GH-124144)

Fix redirects reported by linkcheck, update docs conf.py checks.
(cherry picked from commit 0a32c69)

Co-authored-by: Rafael Fontenelle <rffontenelle@users.noreply.github.com>
willingc pushed a commit that referenced this issue Oct 30, 2024
….rst (GH-124183)

Relink _xxsubinterpretersmodule.c in whatsnew/3.12.rst
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.11 only security fixes 3.12 bugs and security fixes docs Documentation in the Doc dir
Projects
None yet
Development

No branches or pull requests

5 participants