Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fr] updated dictionaries to dicollecte-6.4.1 #1963

Closed
wants to merge 647 commits into from

Conversation

OswaldLuc
Copy link
Contributor

The LT french grammatical dictionary ( french.dict) is build from the dicollecte lexique.
The current version of french.dict in LT-4.7 has been build using lexique-dicollecte-fr-v6.1, two years ago.
Since then, the dicollecte team has been working, but no update have been made for LT.
Here we give an update to the current version of dicollecte lexique ( lexique-dicollecte-fr-v6.4.1 ).
Some grammatical tags have been corrected and new entries added.
If you dump the two versions of french.dict, on can see that the old version has 617826 entries and the one provided here 624133.
The scripts given here to build this version of french.dict ( CreateDictFromLexiqueWithLT-4.7.sh and DicollecteDataFormatting.pl ) are more or less updates of the scripts used to build the old version ( create-lexicon.sh and dicollecte-to-lt.pl ).

danielnaber and others added 30 commits October 13, 2019 09:46
…levant to all languages, but still rely on the language object) can be added without touching all the sub classes
… is tested using ./build.sh (and not in languagetool-standalone)
Although the BBC page is archived, I used it because it is the clearest explanation that I can find.
Fixes  languagetool-org#2017
danielnaber and others added 24 commits October 19, 2019 22:41
…d a test shows the code didn't properly work anyway (offset issues in the errors found), so we can assume it's not used
Verified lexico.com
Verified www.lexico.com
I did not find the spelling 'unshakeable' in the American dictionaries.
@danielnaber
Copy link
Member

Sorry for the late reply and thanks for the PR. It has been merged now and should be online tonight at languagetool.org.

@vkyfox
Copy link
Contributor

vkyfox commented Nov 1, 2019

I have an issue with a number of word's new POS:

  • faire J m s
  • autour N m s
  • Y N m sp

Are those voluntary?

@OswaldLuc
Copy link
Contributor Author

Hi,
voluntary, is perhaps a bit strong since, of course, I didn't check all the entries of the dictionary.
And therefore it is a good thing to have a look at these entries.
The purpose of the script is to translate dicollecte-lexique-6.4.1 into a a Languagetool dictionary. If we have a POS in dicollecte (https://grammalecte.net/dictionary.php?prj=fr) we should expect to find it in Lt dict file.

faire J m s is not in dicollecte.
but it is not in LT dict file either.
This is what I get looking in the dump of the LT dict file :
$ cat french-6.4.1.dump | grep -e '^faire'$'\t'
faire faire V inf
So since "faire J m s" is not a new POS of the dictionary, I don't understand how it came out.
I would be interested in your explaining me how you ended up with that POS.

The two other POS are in dicollecte, so it is not surprising to find them as new POS.
I think that the best way to understand a POS is to give an exemple :

autour as a Noun
L'autour est un rapace de moyenne taille.
Un jeune autour de la même couvée est tombé du nid.

y as a noun ( in fact all the letters of the alphabet ):
Le y est-il une voyelle ou une consonne ?
Nous sommes arrivés à un carrefour en y.

@vkyfox
Copy link
Contributor

vkyfox commented Nov 4, 2019

Unfortunately I don't have on hand the sentence that triggered "Faire" as a J m s; as to how, it was using our analytic tool https://community.languagetool.org/analysis/analyzeText. Since I can't seem to reproduce the error, it must have been either a hallucination on my part or a mistake since fixed.

As for "autour," I genuinely did not know it could describe a bird.

For the letters of the alphabet, I can see why it makes sense to have them counted as nouns, I will see if it causes any issue with the way we handle things; and if it does, I think it would make more sense to handle it on our end.

Thank you again for your work and thank you for taking the time to answer my concerns!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants