-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/nn and fo language extensions #13116
Feature/nn and fo language extensions #13116
Conversation
Thanks for the nice PR, especially for including all the citations and tests! I'm a little concerned about the duplication between the And do you mind if I push directly to this branch to add these languages to the website docs? |
{ORTH: "feb.", NORM: "februar"}, | ||
{ORTH: "mar.", NORM: "mars"}, | ||
{ORTH: "apr.", NORM: "april"}, | ||
{ORTH: "jun.", NORM: "juni"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is "juli" missing on purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The abbreviation "jul(.)" for "juli" also means "Christmas" i Norwegian Nynorsk.
The Norwegian Nynorsk tokenizer is a mix of the Norwegian Bokmål and Danish tokenizers with the addition of some language-specific abbreviations. I'm honestly not too certain about the exact differences between the two variations of Norwegian, and I'm sure there's room for improvements by someone who knows Norwegian Nynorsk. And sure, go ahead and push to this branch :) |
Description
Added language extensions for Faroese and Norwegian Nynorsk.
Types of change
This is an enhancement as it enhances the support of the spaCy language extensions.
Checklist