Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add configuration for running codespell locally #2151

Merged
merged 2 commits into from
Jan 24, 2022
Merged

Add configuration for running codespell locally #2151

merged 2 commits into from
Jan 24, 2022

Conversation

DimitriPapadopoulos
Copy link
Contributor

@DimitriPapadopoulos DimitriPapadopoulos commented Nov 15, 2021

Merely add codespell configuration files, following #2075 (comment):

  • .codespellrc is the root configuration file
  • .codespell/ignore-words.txt list of words that are false positives
  • .codespell/exclude-file.txt list of lines that contain false positives

Unlike the initial PR #2075, this PR focuses on configuration and does not attempt to add a spelling check to the CI.

A few notes:

  • Codespell checks the current directory for a file named setup.cfg or .codespellrc. I have opened Add tox.ini to the list of default config files codespell-project/codespell#2087, but the request hasn't been accepted yet. In the meantime, in the absence of a setup.cfg file, I have added a new hidden configuration file .codespellrc

  • One of the caveats is the necessity to maintain a list of false positives. There are not too many of them, but contributors might have to add a new false positive to the list from time to time.

  • The location of the list of words that are false positives is:
    .codespell/ignore_words.txt

  • The location of the list of entire lines that are false positives is:
    .codespell/exclude-file.txt

  • Some false positives that appear only in URIs are listed after option uri-ignore-words-list in the main configuration file:
    .codespellrc

  • The default dictionaries used by codespell are clear and rare. While rare might catch a few additional typos, it raises a few additional false positives, such as complies, therefor and theses.

@DimitriPapadopoulos
Copy link
Contributor Author

@hugovk This is still a draft, it adds configuration files but no Makefile target - yet.

Copy link
Member

@CAM-Gerlach CAM-Gerlach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of adding it as a hard requirement which gets installed on all platforms, CIs and locally, (regardless if the user actually wants to perform spell checking), and in the same environment as the build (potential for dep conflicts), I strongly suggest instead just adding it as Pre-Commit check with hook-stage: manual, so it only runs on-demand, and run it via pre-commit run codespell (perhaps with an alias in the Makefile, for completeness)?

Pre-commit will then take care of installing it if/when it is needed in its own isolated environment with a consistent pinned version for all devs (with easy updates via pre-commit autoupdate), and it works cross-platform unlike relying solely on the Makefile, while still being entirely optional (and not installed for all users and CIs by default).

@hugovk
Copy link
Member

hugovk commented Nov 15, 2021

hook-stage: manual is new to me, that sounds perfect!

Then the command wouldn't be pre-commit run codespell but pre-commit run --hook-stage manual codespell [docs], and it can go in the Makefile so you can run make spellcheck:

 lint:
        pre-commit --version > /dev/null || python3 -m pip install pre-commit
        pre-commit run --all-files

+spellcheck:
+       pre-commit --version > /dev/null || python3 -m pip install pre-commit
+       pre-commit run --hook-stage manual codespell

Add codespell configuration files:
* `.codespellrc` is the root configuration file
* `.codespell/ignore-words.txt` list of words that are false positives
* `.codespell/exclude-file.txt` list of lines that contain
  false positives

Add a pre-commit hook, confined to the manual stage.

Also add a Makefile target that will call the pre-commit hook.
@DimitriPapadopoulos DimitriPapadopoulos marked this pull request as ready for review November 16, 2021 08:46
@CAM-Gerlach
Copy link
Member

Then the command wouldn't be pre-commit run codespell but pre-commit run --hook-stage manual codespell [docs], and it can go in the Makefile so you can run make spellcheck

Oops, forgot that Pre-Commit will only check the default commit hook-stage for a given hook even if is explicitly named, unless the appropriate hook-stage is passed. Thanks for the catch. Other than that, that's exactly what I was thinking of. I just have a few comments on the PR, which I'll leave as a review.

Copy link
Member

@CAM-Gerlach CAM-Gerlach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few minor comments; otherwise LGTM if the maintainers don't mind adding this.

.codespell/exclude-file.txt Show resolved Hide resolved
.codespellrc Show resolved Hide resolved
requirements.txt Outdated Show resolved Hide resolved
Co-authored-by: CAM Gerlach <CAM.Gerlach@Gerlach.CAM>
Copy link
Member

@CAM-Gerlach CAM-Gerlach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now, for what its worth. Thanks @DimitriPapadopoulos !

@CAM-Gerlach
Copy link
Member

Seems reasonable to add this, to make it much easier and less false-positive-prone for PEP authors to check their PEPs locally if they choose to do so, using our existing infra (and allow potentially adding it to the CI in the future for new PEPs if we decide to do so, which I'm willing to take responsibility for maintaining if so). @python/pep-editors Any objections?

Copy link
Member

@JelleZijlstra JelleZijlstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with adding this as a non-blocking, optional step.

@AA-Turner
Copy link
Member

AA-Turner commented Jan 23, 2022

I see the value to an extent, my concern is having to additionally maintain a list of false positives -- and (at least to me) clarity of expression is much more important than correct spelling. 1

Also there are a number of different valid dialects for English - American and British, but also Australian, Candadian, South African, various other creoles, etc. I suppose I'm -0.

A

Footnotes

  1. I realised the second clause is basically repeating Guido from https://github.com/python/peps/pull/2211#issuecomment-1005221924, but much less elegantly

@AA-Turner
Copy link
Member

AA-Turner commented Jan 23, 2022

(footnote to previous comment)

An example: In 676 "focussed" was changed by someone doing a mass spellcheck PR to "focused" -- the double "s" variant is a valid spelling in British English, and was the author's choice of wording. I worry a little that having this in the repo "blesses" that type of mass spellcheck PR, which create quite a bit of noise but, in my opinion, don't add much.

I won't block anything, but just want to be the grumpy old man here, I guess!

A

@hugovk
Copy link
Member

hugovk commented Jan 23, 2022

I'm okay with adding as an optional make command, but against adding to the CI: #2075 (comment)

@CAM-Gerlach
Copy link
Member

CAM-Gerlach commented Jan 23, 2022

(at least to me) clarity of expression is much more important than correct spelling. 1

Yes, though I don't see why we can't have both, at least if an author chooses to run it or a PEP editor uses it as a tool for editorial suggestions. I'm not condoning mass typo correction PRs; they've already caused enough trouble to my own PEPs.

Also there are a number of different valid dialects for English - American and British, but also Australian, Candadian, South African, various other creoles, etc. I suppose I'm -0.

To note, codespell takes an explicit deny-list (i.e. common typos) rather than allow-list (i.e. valid words) approach, which should avoid the great majority of instances of this; to my knowledge, they don't intentionally add "corrections" for words or spellings that are valid in major English dialects, though there may be isolated instances.

An example: In 676 "focussed" was changed by someone doing a mass spellcheck PR to "focused" -- the double "s" variant is a valid spelling in British English, and was the author's choice of wording.

To note, Oxford, the canonical BrE dictionary, lists the "focussed" spelling as "irregular" and as I gather from various sources, "focused" is still more common and seen as more "correct" in BrE usage than "focussed", which is flatly incorrect in AmE, so that change isn't necessarily wrong per say. I would certainly agree that it wasn't worth changing, though, and do not condone mass spelling correction PRs in general—though I hope use of this tool by writers and/or editors can help avoid them.

I worry a little that having this in the repo "blesses" that type of mass spellcheck PR, which create quite a bit of noise but, in my opinion, don't add much.

I don't think so; for better or for worse those PRs were pretty much already done and accepted, so aside from new PEPs (which we or the authors can manually check) there isn't going to be much to correct (thus less motivation to do so), and just having the infra and config file to make that easy and silence false positives, I don't think, encourages that more—and we can always simply accept or reject such PRs on their merits if and when they are submitted, as we always have. I don't think we should not offer authors and editors the opportunity to use a tool if they choose to, just because of the possibility that it may increase the chance that a few individuals may submit PRs making trivial changes that we can simply decline to accept.

I'm okay with adding as an optional make command, but against adding to the CI: #2075 (comment)

I agree; in addition to the things you mention (particularly usability for authors), there's also the fact that at least currently, some authors still commit directly to the repo without letting our builds and checks run first to ensure they don't break everything, so typos could easily slip through anyway without any review, human or machine, and cause later check runs to fail. I think it would be a good idea to change this, especially if and when we move to continuous deployment via the new build system, but that's a separate discussion. As much as the perfectionist in me would love to have it, it is incompatible with the practical reality of this repo.

@DimitriPapadopoulos
Copy link
Contributor Author

DimitriPapadopoulos commented Jan 24, 2022

Note that codespell does not (or is not supposed to) enforce American English over British English by default – unless you explicitly instruct it to use dictionary_en-GB_to_en-US.txt. If it does, it should be considered a bug.

Some codespell suggestions might be debatable, for example enforcing usable over useable, or focused over focussed. About the last suggestion, see:

I understand there is “usually” no doubling when the preceding vowel is unstressed, which means that focused is “highly preferred” and focussed is “irregular”, although not entirely wrong. The point is that documentation should be easy to read for anyone, including non-native speakers. I believe authors should use standard English as much as possible and avoid irregular alternatives when writing for an international audience. In this case, focussed, while not wrong, is irregular and disturbing, and should be avoided. Please have a thought for us non-native speakers.

That said, I fully understand that enforcing more standard and readable English from the CI might be a bad idea, at least in some contexts, or as a first step, or as long as spelling tools are not perfect.

Copy link
Member

@AA-Turner AA-Turner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"focussed" was perhaps a poor example, and just something I noticed.

Bigger picture, I think I've ameliorated to Hugo & Jelle's view.

Could you add some text to https://github.com/python/peps/blob/main/CONTRIBUTING.rst, briefly explaining how to run codespell (with and without make), but more importantly saying something along the lines of Guido's post here that we generally don't want mass spellcheck PRs:

I am against fixing typos unless they reduce readability. Most typos don't hamper readability much less than many other "defects" in our documentation, like poor wording, unclear examples, and outright ambiguities. We receive way fewer fixes for those quality issues. To me this is proof that a lot of contributors prefer the quick win of a typo fix over something that requires thinking, and I don't want to encourage that sort of thing.

If you ping me when you've pushed that text I'll review & happy to merge after that.

A

@DimitriPapadopoulos
Copy link
Contributor Author

DimitriPapadopoulos commented Jan 24, 2022

Thank you, i'll look into CONTRIBUTING.rst.

Otherwise, yes, typos are only the tip of the iceberg. Yes, they are a quick win because fixing typos can be partially automated (current spell checking tools do require a human checks all suggestions), but they're still a win and a first step. Fixing typos is by no way a complete solution, but you have to start somewhere. That said, when fixing typos in code, I usually also fix missing words or poor wording that I notice when reviewing spellchecker suggestions, and I encourage anyone to do the same.

Also, the kind of “negative” feedback on readability issues I see here does not encourage to spend time on other qualities issues like poor wording, unclear examples, and outright ambiguities. Don't misunderstand me, I value and appreciate all feedback. By “negative” I just mean “opposed to fixing typos” and I understand and value all the arguments. However, spending time on improving documentation suddenly looks like a risky business, not worth investing time in it because the risk of rejected changes is too high.

Finally, typos do hamper readability as far as I am concerned – we are all different, including in the way we read.

@CAM-Gerlach
Copy link
Member

CAM-Gerlach commented Jan 24, 2022

I do think if the typo significantly impairs understanding of the word, including for a non-native speaker, then it motivates a change, but that's on a case by case basis. Typically, the types of typos detected by codespell are, by design, only those that are common misspellings that it can unambiguously detect and usually correct, which typically are therefore those that are usually the least likely to cause ambiguity or confusion on their own (at least, in my experience as a native speaker). On the other hand, I don't think bikeshedding over whether each and every typo fix to an old PEP is necessary is a great use of limited volunteer time either (from our side or yours), compared to more substantive fixes and improvements to draft, active or popular PEPs.

In any case, as I understand you've already corrected most of these typos already, so the work is more or less done there (and the cost paid), and for my part, with your improvements here, I can try to run codespell on newly submitted PRs to catch typos as part of my submission workflow and add suggestions to fix any valid ones, to avoid new ones creeping in; and authors and other PEP editors can be encouraged to do the same—and of course, you can help us with that too on any PRs you see. One potential problem is our current workflow that allows direct commits to main without the checks passing or any other oversight (which I think long-term should be moved away from, as mentioned above), but that would block any CI from being effective regardless.

I don't want to discourage you from contributing; we welcome the more substantive fixes that Guido mentioned, as well as your help reviewing newly submitted and updated PEPs for all manner of issues, large and small. There are other more mechanical things you could help with that would (at least in my view) have much more value—for instance, one of my biggest frustrations as a PEP reader is the PEP's header fields often missing important information, most especially a link to the current/latest mailing list or Discourse discussion of the PEP, which means that I need to manually go off searching for it, if I find the right discussion at all. We'd want to open an issue first and discuss it, but adding the right links to PEPs missing such would be (IMO) a valuable endeavor. Likewise, if you find examples that are unclear, missing, buggy, etc., open an issue, we can all discuss it and assuming it sounds good, you can go ahead with a PR—for such substantive changes, opening an issue helps ensure your time is not wasted if we do happen to take a different view of it.

In any case, I think we all agree that merging this PR with the basic infra for authors and editors to check specific PEPs themselves is a reasonable step. Right now we don't actually formally document any of the linting workflow at all, as far as I'm aware, which is something I'm responsible for. As such, I think it would be best if I write up a concise but sufficiently detailed section of the contributing guide mentioning how to do so (I already have a basic template for such in mind, based on the many related such sections I've added to other contributing guides in the past) as well as a short section clarifying when and what changes generally are and aren't appropriate to propose to Final PEPs along those lines. I can either do this as a commit on your PR here, or (my preference) a separate followup PR.

@AA-Turner
Copy link
Member

OK, given documentation for this will be added in a followup PR (taking advantage of @CAM-Gerlach's good nature), I'll merge now.

A

@AA-Turner AA-Turner changed the title Add codespell infastructure Add configuration for running codespell locally Jan 24, 2022
@AA-Turner AA-Turner merged commit e5de01a into python:main Jan 24, 2022
@AA-Turner
Copy link
Member

Bitten by only adding Sphinx runs to CI recently. I'll push a fix.

A

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants