Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with --ignore-words? #2451

Closed
jeeftor opened this issue Aug 15, 2022 · 15 comments · Fixed by #2466
Closed

Issue with --ignore-words? #2451

jeeftor opened this issue Aug 15, 2022 · 15 comments · Fixed by #2466
Labels

Comments

@jeeftor
Copy link

jeeftor commented Aug 15, 2022

image

I feel like things aren't working as they should

image

@peternewman
Copy link
Collaborator

Did you read our usage doc:
https://github.com/codespell-project/codespell#usage

Specifically:

Important note: The list passed to -I is case-sensitive based on how it is listed in the codespell dictionaries.

@jeeftor
Copy link
Author

jeeftor commented Aug 16, 2022

Um. They are all the same case???

@peternewman
Copy link
Collaborator

peternewman commented Aug 16, 2022

I previously left #2451 which was closed however I feel this was done incorrectly.

My ignore list was:

ignore.txt

FRO
THA
ND
SUR

My file I was spell checking was:

in.txt

FRO FRUM FROM
ND NORTH SOUTH

And the output of CodeSpell is:

codespell --ignore-words=ignore.txt in.txt

in.txt:1: FRO ==> FOR, FROM
in.txt:2: ND ==> AND, 2ND

As per the docs the spelling IS case sensitive... so am I missing something here?

FRO is in the ignore list - and as such shouldn't FRO in the input file not be flagged as an issue?

Grep dictionary.txt and you'll find fro is in lower case in our dictionary. Some clever magic makes it show as upper case because your input string is upper case.

Specifically:

case-sensitive based on how it is listed in the codespell dictionaries

@peternewman peternewman reopened this Aug 16, 2022
@jeeftor
Copy link
Author

jeeftor commented Aug 17, 2022

If I understand correctly:

I put FRO in my ignore list... and I put FRO in my text but the built in fro (lowercase) is being flagged?

@peternewman
Copy link
Collaborator

Yes, put fro in your ignore list (as per the docs) and it should work as you intended.

@andyholmes
Copy link
Contributor

Seems like a good way to close this would be a README.md update including such an example.

I think it's not obvious that "case-sensitive based on how it is listed in the codespell dictionaries", means that your ignore match is case-sensitive, while the dictionary match itself is case-insensitive.

@peternewman
Copy link
Collaborator

Seems like a good way to close this would be a README.md update including such an example.

Do you mean "so if you want to ignore FRO, you'll need to add fro to your ignore list, because the codespell dictionaries feature the correction fro->for?

I think it's not obvious that "case-sensitive based on how it is listed in the codespell dictionaries", means that your ignore match is case-sensitive, while the dictionary match itself is case-insensitive.

Pull requests or comments welcome for how to make that more obvious.

I must admit it's been a while since I've personally looked at that bit of code for why we don't just lower-case the ignore list but I suspect it might be to allow potential future features such as correcting javascript to JavaScript based on a case specific dictionary.

Would a warning/info notification that "you've ignored FRO; it doesn't exist in the dictionary but fro does" be useful?

andyholmes added a commit to andyholmes/codespell that referenced this issue Aug 18, 2022
Update the README to elaborate on case-sensitivity in ignored words
and dictionaries.

close codespell-project#2451
@andyholmes
Copy link
Contributor

Took a stab at updating the README in #2466.

Would a warning/info notification that "you've ignored FRO; it doesn't exist in the dictionary but fro does" be useful?

I think a warning might be a good idea, I expect that's the place people will look first anyways.

@mreidel-godaddy
Copy link

@peternewman I just fell for the same issue: flagged was hasTable->hashable, I put hasTable in the ignore list, but it should have been hastable.

Would a warning/info notification that "you've ignored FRO; it doesn't exist in the dictionary but fro does" be useful?

That would have been very helpful indeed!

@giswqs
Copy link

giswqs commented Oct 6, 2022

The --ignore-words-list option seems to have no effect. The two words included in the ignored words list are still being flagged as typos. Anyone knows why?

Edit: changing the words in the ignored list to all lower case solves the issue. I think it is a bug.

image

nasa/Transform-to-Open-Science#275

@luckman212
Copy link

What if I want aCount to be allowed/ignored but I want acount to be flagged as a misspelling? Impossible?

@peternewman
Copy link
Collaborator

What if I want aCount to be allowed/ignored but I want acount to be flagged as a misspelling? Impossible?

The ignore words are about skipping entries that are normally in our dictionary, so you can't skip a typo from the dictionary (ACOUNT) and expect some variants of it to be flagged and others to be ignored.

While I can see the benefit of this use case, I think it would be confusing for the majority of people where they want to skip a word as it's a special term in their particular domain for example.

I think two things would fix your use case, in the short term, using the ignore regex option to essentially vanish aCount from the source text codespell is checking, alternatively when we add camel-case support, that will fix it properly, with the benefit of also flagging aCont as a typo too (potentially)!

@Sopor
Copy link

Sopor commented Nov 23, 2022

I'm new to codespell, so maybe i do something wrong here, so please bear with me 😊

I wanted to ignore the word clientA
chain\client.lua:12: clientA ==> client

I created a config file and added
ignore-words-list = clientA
to ignore clientA, but it didn't worked.

Readme file:
Important note: The list passed to -I is case-sensitive based on how it is listed in the codespell dictionaries.

When i changed the clientA to clienta it worked.

Shouldn't i use the same case as the word i want to ignore?

@andyholmes
Copy link
Contributor

No, that's what this issue is about clarifying. This is my current proposed change to the README:

Ignoring Words

When ignoring false positives, note that spelling errors are case-insensitive but words to ignore are case-sensitive. For example, the dictionary entry wrod will also match the typo Wrod, but to ignore it you must pass wrod.

The words to ignore can be passed in two ways:

  1. -I: A file with a word per line to ignore::

     codespell -I FILE, --ignore-words=FILE
    
  2. -L: A comma separated list of words to ignore on the command line::

     codespell -L word1,word2,word3,word4
    

@prestoncarman
Copy link

Ignoring words has been a confusing part of codespell for me. @andyholmes I like your explanation on how the ignore feature works.

After reading this threaded, I realize that the ignore list is not for the text or word in my source code, but for the codespell dictionary. This makes more sense as to why it must be lowercase.

I think adding the warning and clearing up the description in the README (using @andyholmes description above) would be helpful to new (and current) users of codespell.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants