Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: spell-check the code as part of linting #1388

Merged
merged 1 commit into from
Sep 25, 2024

Conversation

tonyandrewmeyer
Copy link
Contributor

Add a codespell run as part of the tox -e lint environment, to pick up any spelling errors that might be missed in review. It's also added to the pre-commit checks, for anyone using those.

The PR also makes 4 corrections, which codespell found. Two are debatable (whether or not to hyphenate) but two are definite errors.

This adds a new (lint-time) dependency of codespell - it's also used in the default charmcraft profile, I've used it before, and it has a good Snyk score so it seems ok to me. I have lightly skimmed the code, but not read it in depth.

Copy link
Collaborator

@benhoyt benhoyt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable to me. Out of interest, what does this check? Just code comments and strings?

@@ -24,3 +24,10 @@ repos:
args: [ --preview ]
- id: ruff-format
args: [ --preview ]
# Spellcheck the code.
- repo: https://github.com/codespell-project/codespell
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this, or can we set this, to British English (Canonical style)? At a cursory look "socio-economic" is more common in British English, so that made me wonder.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that it will currently accept both en-UK and en-US spellings, so it fails to help with that. There is an option to make en-UK spellings an error with the en-US the suggestion, but there doesn't seem to be the reverse option, which is what we would want.

There's a lot of customisation possible in terms of which dictionaries are used, so maybe it's possible to eliminate en-US? I'm not totally sure, and the docs don't seem to say anything about it.

@tonyandrewmeyer
Copy link
Contributor Author

Out of interest, what does this check? Just code comments and strings?

I think it's doing a regular expression match for [\w\-'’]+ by default (you can customise this). The doc says:

Fix common misspellings in text files. It's designed primarily for checking misspelled words in source code (backslash escapes are skipped), but it can be used with other files as well. It does not check for word membership in a complete dictionary, but instead looks for a set of common misspellings. Therefore it should catch errors like "adn", but it will not catch "adnasdfasdf". This also means it shouldn't generate false-positives when you use a niche term it doesn't know about.

I think this 'look for misspellings' is how it manages to avoid having to look specifically for words in comments, strings, and so on. If I add a line adn = 42 then it'll flag that as a potential misspelling. There are ways to disable the check for specific lines, but if we start having to do that all over the place then I'd rather get rid of it or use a different tool. I think that it found 2 (to 4) legitimate mistakes and had no false positives is a positive sign, though.

@dimaqq dimaqq merged commit 6f90333 into canonical:main Sep 25, 2024
29 checks passed
tonyandrewmeyer added a commit to tonyandrewmeyer/operator that referenced this pull request Oct 4, 2024
Add a `codespell` run as part of the `tox -e lint` environment, to pick
up any spelling errors that might be missed in review. It's also added
to the pre-commit checks, for anyone using those.

The PR also makes 4 corrections, which codespell found. Two are
debatable (whether or not to hyphenate) but two are definite errors.

This adds a new (lint-time) dependency of codespell - it's also used in
the default charmcraft profile, I've used it before, and it has a [good
Snyk score](https://snyk.io/advisor/python/codespell) so it seems ok to
me. I have lightly skimmed the code, but not read it in depth.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants