ci: spell-check the code as part of linting #1388

tonyandrewmeyer · 2024-09-24T04:14:39Z

Add a codespell run as part of the tox -e lint environment, to pick up any spelling errors that might be missed in review. It's also added to the pre-commit checks, for anyone using those.

The PR also makes 4 corrections, which codespell found. Two are debatable (whether or not to hyphenate) but two are definite errors.

This adds a new (lint-time) dependency of codespell - it's also used in the default charmcraft profile, I've used it before, and it has a good Snyk score so it seems ok to me. I have lightly skimmed the code, but not read it in depth.

benhoyt

Looks reasonable to me. Out of interest, what does this check? Just code comments and strings?

benhoyt · 2024-09-24T04:40:49Z

.pre-commit-config.yaml

@@ -24,3 +24,10 @@ repos:
        args: [ --preview ]
      - id: ruff-format
        args: [ --preview ]
+  # Spellcheck the code.
+  - repo: https://github.com/codespell-project/codespell


Is this, or can we set this, to British English (Canonical style)? At a cursory look "socio-economic" is more common in British English, so that made me wonder.

My understanding is that it will currently accept both en-UK and en-US spellings, so it fails to help with that. There is an option to make en-UK spellings an error with the en-US the suggestion, but there doesn't seem to be the reverse option, which is what we would want.

There's a lot of customisation possible in terms of which dictionaries are used, so maybe it's possible to eliminate en-US? I'm not totally sure, and the docs don't seem to say anything about it.

tonyandrewmeyer · 2024-09-24T04:56:48Z

Out of interest, what does this check? Just code comments and strings?

I think it's doing a regular expression match for [\w\-'’]+ by default (you can customise this). The doc says:

Fix common misspellings in text files. It's designed primarily for checking misspelled words in source code (backslash escapes are skipped), but it can be used with other files as well. It does not check for word membership in a complete dictionary, but instead looks for a set of common misspellings. Therefore it should catch errors like "adn", but it will not catch "adnasdfasdf". This also means it shouldn't generate false-positives when you use a niche term it doesn't know about.

I think this 'look for misspellings' is how it manages to avoid having to look specifically for words in comments, strings, and so on. If I add a line adn = 42 then it'll flag that as a potential misspelling. There are ways to disable the check for specific lines, but if we start having to do that all over the place then I'd rather get rid of it or use a different tool. I think that it found 2 (to 4) legitimate mistakes and had no false positives is a positive sign, though.

Add a `codespell` run as part of the `tox -e lint` environment, to pick up any spelling errors that might be missed in review. It's also added to the pre-commit checks, for anyone using those. The PR also makes 4 corrections, which codespell found. Two are debatable (whether or not to hyphenate) but two are definite errors. This adds a new (lint-time) dependency of codespell - it's also used in the default charmcraft profile, I've used it before, and it has a [good Snyk score](https://snyk.io/advisor/python/codespell) so it seems ok to me. I have lightly skimmed the code, but not read it in depth.

Spell-check the code as part of linting.

c46cead

tonyandrewmeyer requested review from benhoyt and james-garner-canonical September 24, 2024 04:14

benhoyt approved these changes Sep 24, 2024

View reviewed changes

dimaqq approved these changes Sep 24, 2024

View reviewed changes

dimaqq merged commit 6f90333 into canonical:main Sep 25, 2024
29 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: spell-check the code as part of linting #1388

ci: spell-check the code as part of linting #1388

tonyandrewmeyer commented Sep 24, 2024

benhoyt left a comment

benhoyt Sep 24, 2024

tonyandrewmeyer Sep 24, 2024

tonyandrewmeyer commented Sep 24, 2024

ci: spell-check the code as part of linting #1388

ci: spell-check the code as part of linting #1388

Conversation

tonyandrewmeyer commented Sep 24, 2024

benhoyt left a comment

Choose a reason for hiding this comment

benhoyt Sep 24, 2024

Choose a reason for hiding this comment

tonyandrewmeyer Sep 24, 2024

Choose a reason for hiding this comment

tonyandrewmeyer commented Sep 24, 2024