Add configuration for running `codespell` locally #2151

DimitriPapadopoulos · 2021-11-15T14:04:03Z

Merely add codespell configuration files, following #2075 (comment):

.codespellrc is the root configuration file
.codespell/ignore-words.txt list of words that are false positives
.codespell/exclude-file.txt list of lines that contain false positives

Unlike the initial PR #2075, this PR focuses on configuration and does not attempt to add a spelling check to the CI.

A few notes:

Codespell checks the current directory for a file named setup.cfg or .codespellrc. I have opened Add tox.ini to the list of default config files codespell-project/codespell#2087, but the request hasn't been accepted yet. In the meantime, in the absence of a setup.cfg file, I have added a new hidden configuration file .codespellrc
One of the caveats is the necessity to maintain a list of false positives. There are not too many of them, but contributors might have to add a new false positive to the list from time to time.
The location of the list of words that are false positives is:
.codespell/ignore_words.txt
The location of the list of entire lines that are false positives is:
.codespell/exclude-file.txt
Some false positives that appear only in URIs are listed after option uri-ignore-words-list in the main configuration file:
.codespellrc
The default dictionaries used by codespell are clear and rare. While rare might catch a few additional typos, it raises a few additional false positives, such as complies, therefor and theses.

DimitriPapadopoulos · 2021-11-15T14:05:43Z

@hugovk This is still a draft, it adds configuration files but no Makefile target - yet.

CAM-Gerlach

Instead of adding it as a hard requirement which gets installed on all platforms, CIs and locally, (regardless if the user actually wants to perform spell checking), and in the same environment as the build (potential for dep conflicts), I strongly suggest instead just adding it as Pre-Commit check with hook-stage: manual, so it only runs on-demand, and run it via pre-commit run codespell (perhaps with an alias in the Makefile, for completeness)?

Pre-commit will then take care of installing it if/when it is needed in its own isolated environment with a consistent pinned version for all devs (with easy updates via pre-commit autoupdate), and it works cross-platform unlike relying solely on the Makefile, while still being entirely optional (and not installed for all users and CIs by default).

hugovk · 2021-11-15T19:44:19Z

hook-stage: manual is new to me, that sounds perfect!

Then the command wouldn't be pre-commit run codespell but pre-commit run --hook-stage manual codespell [docs], and it can go in the Makefile so you can run make spellcheck:

 lint:
        pre-commit --version > /dev/null || python3 -m pip install pre-commit
        pre-commit run --all-files

+spellcheck:
+       pre-commit --version > /dev/null || python3 -m pip install pre-commit
+       pre-commit run --hook-stage manual codespell

Add codespell configuration files: * `.codespellrc` is the root configuration file * `.codespell/ignore-words.txt` list of words that are false positives * `.codespell/exclude-file.txt` list of lines that contain false positives Add a pre-commit hook, confined to the manual stage. Also add a Makefile target that will call the pre-commit hook.

CAM-Gerlach · 2021-11-17T22:18:57Z

Then the command wouldn't be pre-commit run codespell but pre-commit run --hook-stage manual codespell [docs], and it can go in the Makefile so you can run make spellcheck

Oops, forgot that Pre-Commit will only check the default commit hook-stage for a given hook even if is explicitly named, unless the appropriate hook-stage is passed. Thanks for the catch. Other than that, that's exactly what I was thinking of. I just have a few comments on the PR, which I'll leave as a review.

CAM-Gerlach

Just a few minor comments; otherwise LGTM if the maintainers don't mind adding this.

.codespell/exclude-file.txt

.codespellrc

requirements.txt

Co-authored-by: CAM Gerlach <CAM.Gerlach@Gerlach.CAM>

.codespellrc

CAM-Gerlach

LGTM now, for what its worth. Thanks @DimitriPapadopoulos !

CAM-Gerlach · 2022-01-23T02:09:47Z

Seems reasonable to add this, to make it much easier and less false-positive-prone for PEP authors to check their PEPs locally if they choose to do so, using our existing infra (and allow potentially adding it to the CI in the future for new PEPs if we decide to do so, which I'm willing to take responsibility for maintaining if so). @python/pep-editors Any objections?

JelleZijlstra

I'm happy with adding this as a non-blocking, optional step.

AA-Turner · 2022-01-23T02:11:48Z

I see the value to an extent, my concern is having to additionally maintain a list of false positives -- and (at least to me) clarity of expression is much more important than correct spelling. ¹

Also there are a number of different valid dialects for English - American and British, but also Australian, Candadian, South African, various other creoles, etc. I suppose I'm -0.

A

I realised the second clause is basically repeating Guido from https://github.com/python/peps/pull/2211#issuecomment-1005221924, but much less elegantly ↩

AA-Turner · 2022-01-23T02:15:00Z

(footnote to previous comment)

An example: In 676 "focussed" was changed by someone doing a mass spellcheck PR to "focused" -- the double "s" variant is a valid spelling in British English, and was the author's choice of wording. I worry a little that having this in the repo "blesses" that type of mass spellcheck PR, which create quite a bit of noise but, in my opinion, don't add much.

I won't block anything, but just want to be the grumpy old man here, I guess!

A

hugovk · 2022-01-23T05:20:03Z

I'm okay with adding as an optional make command, but against adding to the CI: #2075 (comment)

CAM-Gerlach · 2022-01-23T07:38:55Z

(at least to me) clarity of expression is much more important than correct spelling. 1

Yes, though I don't see why we can't have both, at least if an author chooses to run it or a PEP editor uses it as a tool for editorial suggestions. I'm not condoning mass typo correction PRs; they've already caused enough trouble to my own PEPs.

Also there are a number of different valid dialects for English - American and British, but also Australian, Candadian, South African, various other creoles, etc. I suppose I'm -0.

To note, codespell takes an explicit deny-list (i.e. common typos) rather than allow-list (i.e. valid words) approach, which should avoid the great majority of instances of this; to my knowledge, they don't intentionally add "corrections" for words or spellings that are valid in major English dialects, though there may be isolated instances.

An example: In 676 "focussed" was changed by someone doing a mass spellcheck PR to "focused" -- the double "s" variant is a valid spelling in British English, and was the author's choice of wording.

To note, Oxford, the canonical BrE dictionary, lists the "focussed" spelling as "irregular" and as I gather from various sources, "focused" is still more common and seen as more "correct" in BrE usage than "focussed", which is flatly incorrect in AmE, so that change isn't necessarily wrong per say. I would certainly agree that it wasn't worth changing, though, and do not condone mass spelling correction PRs in general—though I hope use of this tool by writers and/or editors can help avoid them.

I worry a little that having this in the repo "blesses" that type of mass spellcheck PR, which create quite a bit of noise but, in my opinion, don't add much.

I don't think so; for better or for worse those PRs were pretty much already done and accepted, so aside from new PEPs (which we or the authors can manually check) there isn't going to be much to correct (thus less motivation to do so), and just having the infra and config file to make that easy and silence false positives, I don't think, encourages that more—and we can always simply accept or reject such PRs on their merits if and when they are submitted, as we always have. I don't think we should not offer authors and editors the opportunity to use a tool if they choose to, just because of the possibility that it may increase the chance that a few individuals may submit PRs making trivial changes that we can simply decline to accept.

I'm okay with adding as an optional make command, but against adding to the CI: #2075 (comment)

I agree; in addition to the things you mention (particularly usability for authors), there's also the fact that at least currently, some authors still commit directly to the repo without letting our builds and checks run first to ensure they don't break everything, so typos could easily slip through anyway without any review, human or machine, and cause later check runs to fail. I think it would be a good idea to change this, especially if and when we move to continuous deployment via the new build system, but that's a separate discussion. As much as the perfectionist in me would love to have it, it is incompatible with the practical reality of this repo.

DimitriPapadopoulos · 2022-01-24T08:40:36Z

Note that codespell does not (or is not supposed to) enforce American English over British English by default – unless you explicitly instruct it to use dictionary_en-GB_to_en-US.txt. If it does, it should be considered a bug.

Some codespell suggestions might be debatable, for example enforcing usable over useable, or focused over focussed. About the last suggestion, see:

I understand there is “usually” no doubling when the preceding vowel is unstressed, which means that focused is “highly preferred” and focussed is “irregular”, although not entirely wrong. The point is that documentation should be easy to read for anyone, including non-native speakers. I believe authors should use standard English as much as possible and avoid irregular alternatives when writing for an international audience. In this case, focussed, while not wrong, is irregular and disturbing, and should be avoided. Please have a thought for us non-native speakers.

That said, I fully understand that enforcing more standard and readable English from the CI might be a bad idea, at least in some contexts, or as a first step, or as long as spelling tools are not perfect.

AA-Turner

"focussed" was perhaps a poor example, and just something I noticed.

Bigger picture, I think I've ameliorated to Hugo & Jelle's view.

Could you add some text to https://github.com/python/peps/blob/main/CONTRIBUTING.rst, briefly explaining how to run codespell (with and without make), but more importantly saying something along the lines of Guido's post here that we generally don't want mass spellcheck PRs:

I am against fixing typos unless they reduce readability. Most typos don't hamper readability much less than many other "defects" in our documentation, like poor wording, unclear examples, and outright ambiguities. We receive way fewer fixes for those quality issues. To me this is proof that a lot of contributors prefer the quick win of a typo fix over something that requires thinking, and I don't want to encourage that sort of thing.

If you ping me when you've pushed that text I'll review & happy to merge after that.

A

DimitriPapadopoulos · 2022-01-24T10:01:38Z

Thank you, i'll look into CONTRIBUTING.rst.

Otherwise, yes, typos are only the tip of the iceberg. Yes, they are a quick win because fixing typos can be partially automated (current spell checking tools do require a human checks all suggestions), but they're still a win and a first step. Fixing typos is by no way a complete solution, but you have to start somewhere. That said, when fixing typos in code, I usually also fix missing words or poor wording that I notice when reviewing spellchecker suggestions, and I encourage anyone to do the same.

Also, the kind of “negative” feedback on readability issues I see here does not encourage to spend time on other qualities issues like poor wording, unclear examples, and outright ambiguities. Don't misunderstand me, I value and appreciate all feedback. By “negative” I just mean “opposed to fixing typos” and I understand and value all the arguments. However, spending time on improving documentation suddenly looks like a risky business, not worth investing time in it because the risk of rejected changes is too high.

Finally, typos do hamper readability as far as I am concerned – we are all different, including in the way we read.

CAM-Gerlach · 2022-01-24T22:01:45Z

I do think if the typo significantly impairs understanding of the word, including for a non-native speaker, then it motivates a change, but that's on a case by case basis. Typically, the types of typos detected by codespell are, by design, only those that are common misspellings that it can unambiguously detect and usually correct, which typically are therefore those that are usually the least likely to cause ambiguity or confusion on their own (at least, in my experience as a native speaker). On the other hand, I don't think bikeshedding over whether each and every typo fix to an old PEP is necessary is a great use of limited volunteer time either (from our side or yours), compared to more substantive fixes and improvements to draft, active or popular PEPs.

In any case, as I understand you've already corrected most of these typos already, so the work is more or less done there (and the cost paid), and for my part, with your improvements here, I can try to run codespell on newly submitted PRs to catch typos as part of my submission workflow and add suggestions to fix any valid ones, to avoid new ones creeping in; and authors and other PEP editors can be encouraged to do the same—and of course, you can help us with that too on any PRs you see. One potential problem is our current workflow that allows direct commits to main without the checks passing or any other oversight (which I think long-term should be moved away from, as mentioned above), but that would block any CI from being effective regardless.

I don't want to discourage you from contributing; we welcome the more substantive fixes that Guido mentioned, as well as your help reviewing newly submitted and updated PEPs for all manner of issues, large and small. There are other more mechanical things you could help with that would (at least in my view) have much more value—for instance, one of my biggest frustrations as a PEP reader is the PEP's header fields often missing important information, most especially a link to the current/latest mailing list or Discourse discussion of the PEP, which means that I need to manually go off searching for it, if I find the right discussion at all. We'd want to open an issue first and discuss it, but adding the right links to PEPs missing such would be (IMO) a valuable endeavor. Likewise, if you find examples that are unclear, missing, buggy, etc., open an issue, we can all discuss it and assuming it sounds good, you can go ahead with a PR—for such substantive changes, opening an issue helps ensure your time is not wasted if we do happen to take a different view of it.

In any case, I think we all agree that merging this PR with the basic infra for authors and editors to check specific PEPs themselves is a reasonable step. Right now we don't actually formally document any of the linting workflow at all, as far as I'm aware, which is something I'm responsible for. As such, I think it would be best if I write up a concise but sufficiently detailed section of the contributing guide mentioning how to do so (I already have a basic template for such in mind, based on the many related such sections I've added to other contributing guides in the past) as well as a short section clarifying when and what changes generally are and aren't appropriate to propose to Final PEPs along those lines. I can either do this as a commit on your PR here, or (my preference) a separate followup PR.

AA-Turner · 2022-01-24T23:39:44Z

OK, given documentation for this will be added in a followup PR (taking advantage of @CAM-Gerlach's good nature), I'll merge now.

A

AA-Turner · 2022-01-24T23:42:53Z

Bitten by only adding Sphinx runs to CI recently. I'll push a fix.

A

the-knights-who-say-ni added the CLA signed label Nov 15, 2021

DimitriPapadopoulos marked this pull request as draft November 15, 2021 14:04

CAM-Gerlach requested changes Nov 15, 2021

View reviewed changes

DimitriPapadopoulos mentioned this pull request Nov 15, 2021

Add codespell to CI #2075

Closed

DimitriPapadopoulos marked this pull request as ready for review November 16, 2021 08:46

CAM-Gerlach requested changes Nov 17, 2021

View reviewed changes

.codespell/exclude-file.txt Show resolved Hide resolved

.codespellrc Show resolved Hide resolved

requirements.txt Outdated Show resolved Hide resolved

Update requirements.txt

114e8ae

Co-authored-by: CAM Gerlach <CAM.Gerlach@Gerlach.CAM>

DimitriPapadopoulos commented Nov 18, 2021

View reviewed changes

.codespellrc Show resolved Hide resolved

CAM-Gerlach approved these changes Nov 18, 2021

View reviewed changes

This was referenced Jan 4, 2022

Various PEPs: fix typos #2211

Merged

PEP 676: Formatting and prose updates #2207

Merged

JelleZijlstra approved these changes Jan 23, 2022

View reviewed changes

AA-Turner requested changes Jan 24, 2022

View reviewed changes

AA-Turner changed the title ~~Add codespell infastructure~~ Add configuration for running codespell locally Jan 24, 2022

AA-Turner merged commit e5de01a into python:main Jan 24, 2022

AA-Turner mentioned this pull request Jan 24, 2022

Exclude .codespell from Sphinx #2264

Merged

CAM-Gerlach mentioned this pull request Jan 25, 2022

Add contributor documentation for how to run header/reST checks, codespell and other linting locally #2265

Closed

DimitriPapadopoulos deleted the codespell_infrastructure branch February 11, 2022 21:49

CAM-Gerlach mentioned this pull request Mar 4, 2022

Meta: Clearly document current guidance for changing existing PEPs #2378

Merged

5 tasks

erlend-aasland mentioned this pull request May 13, 2022

pep8/greppable exception messages erlend-aasland/peps#1

Closed

erlend-aasland mentioned this pull request Jun 27, 2022

pep 687/mark as accepted erlend-aasland/peps#2

Closed

hugovk mentioned this pull request Apr 30, 2023

Lint: Fix outstanding codespell spelling errors #3129

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add configuration for running `codespell` locally #2151

Add configuration for running `codespell` locally #2151

DimitriPapadopoulos commented Nov 15, 2021 •

edited

Loading

DimitriPapadopoulos commented Nov 15, 2021

CAM-Gerlach left a comment

hugovk commented Nov 15, 2021

CAM-Gerlach commented Nov 17, 2021

CAM-Gerlach left a comment

CAM-Gerlach left a comment

CAM-Gerlach commented Jan 23, 2022

JelleZijlstra left a comment

AA-Turner commented Jan 23, 2022 •

edited

Loading

AA-Turner commented Jan 23, 2022 •

edited

Loading

hugovk commented Jan 23, 2022

CAM-Gerlach commented Jan 23, 2022 •

edited

Loading

DimitriPapadopoulos commented Jan 24, 2022 •

edited

Loading

AA-Turner left a comment •

edited

Loading

DimitriPapadopoulos commented Jan 24, 2022 •

edited

Loading

CAM-Gerlach commented Jan 24, 2022 •

edited

Loading

AA-Turner commented Jan 24, 2022

AA-Turner commented Jan 24, 2022

Add configuration for running codespell locally #2151

Add configuration for running codespell locally #2151

Conversation

DimitriPapadopoulos commented Nov 15, 2021 • edited Loading

DimitriPapadopoulos commented Nov 15, 2021

CAM-Gerlach left a comment

Choose a reason for hiding this comment

hugovk commented Nov 15, 2021

CAM-Gerlach commented Nov 17, 2021

CAM-Gerlach left a comment

Choose a reason for hiding this comment

CAM-Gerlach left a comment

Choose a reason for hiding this comment

CAM-Gerlach commented Jan 23, 2022

JelleZijlstra left a comment

Choose a reason for hiding this comment

AA-Turner commented Jan 23, 2022 • edited Loading

Footnotes

AA-Turner commented Jan 23, 2022 • edited Loading

hugovk commented Jan 23, 2022

CAM-Gerlach commented Jan 23, 2022 • edited Loading

DimitriPapadopoulos commented Jan 24, 2022 • edited Loading

AA-Turner left a comment • edited Loading

Choose a reason for hiding this comment

DimitriPapadopoulos commented Jan 24, 2022 • edited Loading

CAM-Gerlach commented Jan 24, 2022 • edited Loading

AA-Turner commented Jan 24, 2022

AA-Turner commented Jan 24, 2022

Add configuration for running `codespell` locally #2151

Add configuration for running `codespell` locally #2151

DimitriPapadopoulos commented Nov 15, 2021 •

edited

Loading

AA-Turner commented Jan 23, 2022 •

edited

Loading

AA-Turner commented Jan 23, 2022 •

edited

Loading

CAM-Gerlach commented Jan 23, 2022 •

edited

Loading

DimitriPapadopoulos commented Jan 24, 2022 •

edited

Loading

AA-Turner left a comment •

edited

Loading

DimitriPapadopoulos commented Jan 24, 2022 •

edited

Loading

CAM-Gerlach commented Jan 24, 2022 •

edited

Loading