Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable the use of the free ngram data #113

Closed
David-Else opened this issue Oct 25, 2021 · 9 comments
Closed

Enable the use of the free ngram data #113

David-Else opened this issue Oct 25, 2021 · 9 comments
Labels
1-feature-request ✨ Issue type: Request for a desirable, nice-to-have feature 3-duplicate Issue resolution: Issue has been submitted before

Comments

@David-Else
Copy link

Language Tool can be greatly enhanced using the free ngram data, all the info is here: https://dev.languagetool.org/finding-errors-using-n-gram-data

It would be great if the user could supply the directory that they have downloaded this to and ltex-ls could use it.

Thanks.

@David-Else David-Else added the 1-feature-request ✨ Issue type: Request for a desirable, nice-to-have feature label Oct 25, 2021
@valentjn
Copy link
Owner

This is already possible with ltex.additionalRules.languageModel.

@valentjn
Copy link
Owner

Duplicate of valentjn/vscode-ltex#3.

@valentjn valentjn added the 3-duplicate Issue resolution: Issue has been submitted before label Oct 25, 2021
@David-Else
Copy link
Author

I have my ngrams unzipped in the home directory like so:

en
├── 3grams
│   ├── segments_1
│   ├── _1e4.si
│   ├── _1e4.fnm
│   ├── _1e4.nvm
│   ├── _1e4.nvd
│   ├── _1e4_Lucene50_0.tip
│   ├── _1e4_Lucene50_0.tim
│   ├── _1e4_Lucene50_0.doc
│   ├── _1e4.fdx
│   ├── _1e4.fdt
│   └── write.lock
├── 1grams
│   ├── segments.gen
│   ├── segments_1
│   ├── _1p.si
│   ├── _1p.nvm
│   ├── _1p.nvd
│   ├── _1p_Lucene41_0.tip
│   ├── _1p_Lucene41_0.tim
│   ├── _1p_Lucene41_0.pos
│   ├── _1p_Lucene41_0.doc
│   ├── _1p.fnm
│   ├── _1p.fdx
│   ├── _1p.fdt
│   └── write.lock
└── 2grams
    ├── segments.gen
    ├── segments_3
    ├── _v.si
    ├── _v.nvm
    ├── _v.nvd
    ├── _v_Lucene41_0.tip
    ├── _v_Lucene41_0.tim
    ├── _v_Lucene41_0.pos
    ├── _v_Lucene41_0.doc
    ├── _v.fnm
    ├── _v.fdx
    ├── _v.fdt
    ├── write.lock
    ├── _u.si
    ├── _u.nvm
    ├── _u.nvd
    ├── _u_Lucene41_0.tip
    ├── _u_Lucene41_0.tim
    ├── _u_Lucene41_0.pos
    ├── _u_Lucene41_0.doc
    ├── _u.fnm
    ├── _u.fdx
    ├── _u.fdt
    ├── segments_2
    ├── _t.si
    ├── _t_Lucene41_0.tip
    ├── _t_Lucene41_0.tim
    ├── _t_Lucene41_0.doc
    ├── _t.fnm
    ├── _t.fdx
    └── _t.fdt
[

and my Neovim config looks like:

local bin_path = '/home/david/bin/ltex-ls-14.0.0/bin/ltex-ls'
require('lspconfig/configs').ltex_ls = {
  default_config = {
    cmd = { bin_path },
    filetypes = { 'tex', 'bib', 'markdown' },
    root_dir = require('lspconfig/util').find_git_ancestor,
    settings = {
      ltex = {
        enabled = { 'latex', 'tex', 'bib', 'markdown' },
        language = 'en',
        diagnosticSeverity = 'information',
        setenceCacheSize = 2000,
        additionalRules = {
          enablePickyRules = true,
          motherTongue = 'en',
          languageModel = '~/en',
        },
        trace = { server = 'verbose' },
        dictionary = {},
        disabledRules = {},
        hiddenFalsePositives = {},
      },
    },
  },
}

I am using the text they suggest to test for the ngram feature and it is not working:

Don’t forget to put on the breaks < nothing

there last chance < nothing

(below works as normal)
LanguageTool is your intelligent writing assistant for all common browsers and
word processors. Write or paste your text here too have it checked continuously.
Errors will be underlined in different colours: we will mark seplling errors
with red underilnes. Furthermore grammar error's are highlighted in yellow.
LanguageTool also marks style issues in a reliable manner by underlining them in
blue. did you know that you can sea synonyms by double clicking a word? Its a
impressively versatile tool, e.g. if youd like to tell a colleague from over
sea's about what happened at 5 PM in the afternoon on Monday, 27 May 2007.

Is there something wrong with languageModel = '~/en'? Thanks!

@valentjn
Copy link
Owner

You need to set it to the parent directory of en. (This also enables using n-grams for multiple languages.)

I clarified this in the docs.

valentjn added a commit to valentjn/vscode-ltex that referenced this issue Oct 25, 2021
valentjn added a commit to valentjn/vscode-ltex that referenced this issue Oct 25, 2021
@David-Else
Copy link
Author

David-Else commented Oct 25, 2021

Brilliant! It now picks them up.

Strangely Don’t forget to put on the breaks gives no code action, but there last chance does. Not sure why. They are reported as:

Don’t forget to put on the breaks

there last chance

I  'breaks' (interruptions) seems less likely than 'brakes' (mechanical device to stop motion). LTeX(CONFUSION_RULE_BREAKS_BRAKES) [1, 28]
I  'there' (as in 'Is there an answer?') seems less likely than 'their' (as in 'It’s not their fault.'). LTeX(CONFUSION_RULE_THERE_THEIR) [3, 1]
Code actions:                                                                                                                                                                           
1. Use 'their'
2. Hide false positive
3. Disable rule
Type number and <Enter> or click with the mouse (q or empty cancels):

@valentjn
Copy link
Owner

valentjn commented Oct 26, 2021

There are code actions, at least in VS Code. I'm pretty sure LTEX LS is working as intended.

Code action request and response for breaks:

Sending request 'textDocument/codeAction - (1)'.
Params: {
    "textDocument": {
        "uri": "untitled:Untitled-1"
    },
    "range": {
        "start": {
            "line": 0,
            "character": 27
        },
        "end": {
            "line": 0,
            "character": 33
        }
    },
    "context": {
        "diagnostics": [
            {
                "range": {
                    "start": {
                        "line": 0,
                        "character": 27
                    },
                    "end": {
                        "line": 0,
                        "character": 33
                    }
                },
                "message": "'breaks' (interruptions) seems less likely than 'brakes' (mechanical device to stop motion).",
                "code": "CONFUSION_RULE_BREAKS_BRAKES",
                "codeDescription": {
                    "href": "https://community.languagetool.org/rule/show/CONFUSION_RULE_BREAKS_BRAKES?lang%3Den-US"
                },
                "severity": 3,
                "source": "LTeX"
            }
        ],
        "only": [
            "quickfix"
        ]
    }
}

Received response 'textDocument/codeAction - (1)' in 32ms.
Result: [
    {
        "title": "Use 'brakes'",
        "kind": "quickfix.ltex.acceptSuggestions",
        "diagnostics": [
            {
                "range": {
                    "start": {
                        "line": 0,
                        "character": 27
                    },
                    "end": {
                        "line": 0,
                        "character": 33
                    }
                },
                "severity": 3,
                "code": "CONFUSION_RULE_BREAKS_BRAKES",
                "codeDescription": {
                    "href": "https://community.languagetool.org/rule/show/CONFUSION_RULE_BREAKS_BRAKES?lang=en-US"
                },
                "source": "LTeX",
                "message": "'breaks' (interruptions) seems less likely than 'brakes' (mechanical device to stop motion)."
            }
        ],
        "edit": {
            "documentChanges": [
                {
                    "textDocument": {
                        "version": 1,
                        "uri": "untitled:Untitled-1"
                    },
                    "edits": [
                        {
                            "range": {
                                "start": {
                                    "line": 0,
                                    "character": 27
                                },
                                "end": {
                                    "line": 0,
                                    "character": 33
                                }
                            },
                            "newText": "brakes"
                        }
                    ]
                }
            ]
        }
    },
    {
        "title": "Hide false positive",
        "kind": "quickfix.ltex.hideFalsePositives",
        "diagnostics": [
            {
                "range": {
                    "start": {
                        "line": 0,
                        "character": 27
                    },
                    "end": {
                        "line": 0,
                        "character": 33
                    }
                },
                "severity": 3,
                "code": "CONFUSION_RULE_BREAKS_BRAKES",
                "codeDescription": {
                    "href": "https://community.languagetool.org/rule/show/CONFUSION_RULE_BREAKS_BRAKES?lang=en-US"
                },
                "source": "LTeX",
                "message": "'breaks' (interruptions) seems less likely than 'brakes' (mechanical device to stop motion)."
            }
        ],
        "command": {
            "title": "Hide false positive",
            "command": "_ltex.hideFalsePositives",
            "arguments": [
                {
                    "uri": "untitled:Untitled-1",
                    "falsePositives": {
                        "en-US": [
                            "{\"rule\":\"CONFUSION_RULE_BREAKS_BRAKES\",\"sentence\":\"^\\\\QDon't forget to put on the breaks\\\\E$\"}"
                        ]
                    }
                }
            ]
        }
    },
    {
        "title": "Disable rule",
        "kind": "quickfix.ltex.disableRules",
        "diagnostics": [
            {
                "range": {
                    "start": {
                        "line": 0,
                        "character": 27
                    },
                    "end": {
                        "line": 0,
                        "character": 33
                    }
                },
                "severity": 3,
                "code": "CONFUSION_RULE_BREAKS_BRAKES",
                "codeDescription": {
                    "href": "https://community.languagetool.org/rule/show/CONFUSION_RULE_BREAKS_BRAKES?lang=en-US"
                },
                "source": "LTeX",
                "message": "'breaks' (interruptions) seems less likely than 'brakes' (mechanical device to stop motion)."
            }
        ],
        "command": {
            "title": "Disable rule",
            "command": "_ltex.disableRules",
            "arguments": [
                {
                    "uri": "untitled:Untitled-1",
                    "ruleIds": {
                        "en-US": [
                            "CONFUSION_RULE_BREAKS_BRAKES"
                        ]
                    }
                }
            ]
        }
    }
]

Code action request and response for there:

Sending request 'textDocument/codeAction - (3)'.
Params: {
    "textDocument": {
        "uri": "untitled:Untitled-1"
    },
    "range": {
        "start": {
            "line": 2,
            "character": 0
        },
        "end": {
            "line": 2,
            "character": 5
        }
    },
    "context": {
        "diagnostics": [
            {
                "range": {
                    "start": {
                        "line": 2,
                        "character": 0
                    },
                    "end": {
                        "line": 2,
                        "character": 5
                    }
                },
                "message": "'there' (as in 'Is there an answer?') seems less likely than 'their' (as in 'It’s not their fault.').",
                "code": "CONFUSION_RULE_THERE_THEIR",
                "codeDescription": {
                    "href": "https://community.languagetool.org/rule/show/CONFUSION_RULE_THERE_THEIR?lang%3Den-US"
                },
                "severity": 3,
                "source": "LTeX"
            }
        ],
        "only": [
            "quickfix"
        ]
    }
}


Received response 'textDocument/codeAction - (3)' in 5ms.
Result: [
    {
        "title": "Use 'their'",
        "kind": "quickfix.ltex.acceptSuggestions",
        "diagnostics": [
            {
                "range": {
                    "start": {
                        "line": 2,
                        "character": 0
                    },
                    "end": {
                        "line": 2,
                        "character": 5
                    }
                },
                "severity": 3,
                "code": "CONFUSION_RULE_THERE_THEIR",
                "codeDescription": {
                    "href": "https://community.languagetool.org/rule/show/CONFUSION_RULE_THERE_THEIR?lang=en-US"
                },
                "source": "LTeX",
                "message": "'there' (as in 'Is there an answer?') seems less likely than 'their' (as in 'It’s not their fault.')."
            }
        ],
        "edit": {
            "documentChanges": [
                {
                    "textDocument": {
                        "version": 1,
                        "uri": "untitled:Untitled-1"
                    },
                    "edits": [
                        {
                            "range": {
                                "start": {
                                    "line": 2,
                                    "character": 0
                                },
                                "end": {
                                    "line": 2,
                                    "character": 5
                                }
                            },
                            "newText": "their"
                        }
                    ]
                }
            ]
        }
    },
    {
        "title": "Hide false positive",
        "kind": "quickfix.ltex.hideFalsePositives",
        "diagnostics": [
            {
                "range": {
                    "start": {
                        "line": 2,
                        "character": 0
                    },
                    "end": {
                        "line": 2,
                        "character": 5
                    }
                },
                "severity": 3,
                "code": "CONFUSION_RULE_THERE_THEIR",
                "codeDescription": {
                    "href": "https://community.languagetool.org/rule/show/CONFUSION_RULE_THERE_THEIR?lang=en-US"
                },
                "source": "LTeX",
                "message": "'there' (as in 'Is there an answer?') seems less likely than 'their' (as in 'It’s not their fault.')."
            }
        ],
        "command": {
            "title": "Hide false positive",
            "command": "_ltex.hideFalsePositives",
            "arguments": [
                {
                    "uri": "untitled:Untitled-1",
                    "falsePositives": {
                        "en-US": [
                            "{\"rule\":\"CONFUSION_RULE_THERE_THEIR\",\"sentence\":\"^\\\\Qthere last chance\\\\E$\"}"
                        ]
                    }
                }
            ]
        }
    },
    {
        "title": "Disable rule",
        "kind": "quickfix.ltex.disableRules",
        "diagnostics": [
            {
                "range": {
                    "start": {
                        "line": 2,
                        "character": 0
                    },
                    "end": {
                        "line": 2,
                        "character": 5
                    }
                },
                "severity": 3,
                "code": "CONFUSION_RULE_THERE_THEIR",
                "codeDescription": {
                    "href": "https://community.languagetool.org/rule/show/CONFUSION_RULE_THERE_THEIR?lang=en-US"
                },
                "source": "LTeX",
                "message": "'there' (as in 'Is there an answer?') seems less likely than 'their' (as in 'It’s not their fault.')."
            }
        ],
        "command": {
            "title": "Disable rule",
            "command": "_ltex.disableRules",
            "arguments": [
                {
                    "uri": "untitled:Untitled-1",
                    "ruleIds": {
                        "en-US": [
                            "CONFUSION_RULE_THERE_THEIR"
                        ]
                    }
                }
            ]
        }
    }
]

@oblitum
Copy link

oblitum commented Oct 28, 2021

There are code actions, at least in VS Code

Working fine on nvim 0.5.1 (and most possibly on vim) through coc.nvim as well:

gif-2021-10-28-192930

@David-Else careful to not confuse actions at the cursor (just one at a time) with actions for the whole document (two actions). Gif above illustrates that. Not sure whether that's the case or not in your setup.

@David-Else
Copy link
Author

@oblitum Thanks! You were right, I needed to move the cursor over the word :)

@oblitum
Copy link

oblitum commented Oct 29, 2021

@David-Else nice. In case you're unaware I'd like to share that it's possible to trigger it for code comments, I find it a great feature. You would have to figure out how to do that on your setup though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1-feature-request ✨ Issue type: Request for a desirable, nice-to-have feature 3-duplicate Issue resolution: Issue has been submitted before
Projects
None yet
Development

No branches or pull requests

3 participants