Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't get confused by c-escape characters #3

Closed
epage opened this issue Jun 18, 2019 · 9 comments · Fixed by #319
Closed

Don't get confused by c-escape characters #3

epage opened this issue Jun 18, 2019 · 9 comments · Fixed by #319
Labels
enhancement Improve the expected

Comments

@epage
Copy link
Collaborator

epage commented Jun 18, 2019

Scspell will parse the file assumng escape characters exist but let you opt out.
See https://github.com/myint/scspell/blob/master/scspell/__init__.py#L78

Escape character support could instead be a file type setting, like dictionary values

@epage epage added the enhancement Improve the expected label Jun 26, 2019
@epage
Copy link
Collaborator Author

epage commented Jun 26, 2019

Before we support this, we'll probably want a way to define file types and what traits those file types should have (specialized dictionaries, _ / - as identifier characters, and whether escape sequences are supported.

That can then be extended into a config file that works with custom dictionaries (#9) to allow the user to override existing file type definitions or add their own.

@epage
Copy link
Collaborator Author

epage commented Dec 3, 2019

C++'s grammar rules: https://en.cppreference.com/w/cpp/language/escape

scspell's regex:

# Treat anything alphanumeric as a token of interest, as long as it is not
# immediately preceded by a single backslash.  (The string "\ntext" should
# match on "text" rather than "ntext".)
C_ESCAPE_TOKEN_REGEX = re.compile(r'(?<![^\\]\\)\w+')

Challenges:

  • Raw strings (e.g. r"" in python)

@epage
Copy link
Collaborator Author

epage commented Jul 23, 2021

Moved over from #313

Hi, thank you for great tool! I'm trying to use this tool instead of misspell.

I saw one false positive in my project.

Let's say the following file is saved as foo.go.

package main

import "fmt"

func main() {
	fmt.Println("welcome\nto this project")
}

and run typos with the following command:

typos ./foo.go

Expected result is getting no error.

Actual result is:

error: `nto` should be `not`
  --> foo.go:6:22
  |
6 |  fmt.Println("welcome\nto this project")
  |                       ^^^
  |

This is because typos does not consider escaped character \n. Though I don't have good idea about how to handle it well since characters which can be escaped are depending on their syntax, one character just after \ in string literals might work.

@epage
Copy link
Collaborator Author

epage commented Jul 23, 2021

@rhysd does go have a form of raw strings (no escaping performed)? I'm wondering how misspell handles the challenges they provide. For example, in Python its common to write regexes with raw strings so you don't have to double escape (e.g. r"\b" instead of "\\b" for word boundary)

@rhysd
Copy link

rhysd commented Jul 23, 2021

Yes, it has.

e.g.

re := regexp.MustCompile(`\d+\.\d+\.\d+`)

ref: https://pkg.go.dev/regexp

@rhysd

This comment has been minimized.

@epage
Copy link
Collaborator Author

epage commented Jul 30, 2021

Since our primary concern is with having 100% confidence in the changes we make, a simple solution to this problem is to just not check words that start with \. While it means we check less, it means we don't have to worry abut what is or isn't a raw string within each programming language.

Can't believe I hadn't thought of this before.

epage referenced this issue in epage/typos Jul 30, 2021
Since our goal is 100% confidence in the results, its better to not
check words than to correct the wrong words.

With that in mind, we'll ignore words after what might be c-escape
sequences (`\nfoo`) or printf substitutions (`%dfoo`).

Fixes #3
@groengpx
Copy link

The comparison docs still lists this as an open issue: https://github.com/crate-ci/typos/blob/84c8b30e065012c70187818aadba0e711ff5cafc/docs/comparison.md

Is that intended? From what I read here, it is fixed?

@epage
Copy link
Collaborator Author

epage commented Jun 22, 2023

Thanks! I've fixed it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improve the expected
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants