Skip to content

Latest commit

 

History

History
2184 lines (1718 loc) · 78.8 KB

File metadata and controls

2184 lines (1718 loc) · 78.8 KB

LanguageTool Change Log

5.7 (2022-03-28)

Catalan

  • added and improved rules
  • updated dictionary (catalan-pos-dict-2.18)

French

  • added and improved rules
  • updated dictionary (french-pos-dict-0.5)

German

  • added and improved rules
  • extended dictionary

Polish

  • added and improved rules, especially for frequent mistakes
  • added a rule for spelling coherency (menedżer or menadżer but not both in the same document)
  • updated the user interface translation

Portuguese

  • added and improved rules
  • added words and POS data
  • fixed tons of false positives

Russian

  • added and improved rules
  • improved disambiguation

Spanish

  • added and improved rules
  • updated dictionary (spanish-pos-dict-1.6)

Ukrainian

  • new words in the POS dictionary
  • added and improved several rules
  • added pronoun checking for adj/noun agreement
  • improved tagging and disambiguation

5.6 (2021-12-29)

Catalan

  • added and improved rules

Dutch

  • more words in dictionary
  • added many special word groups
  • added rules for accidentally split words

English

French

  • added and improved rules

German

  • added and improved rules

Portuguese

  • added and improved rules
  • added words and POS data
  • fixed tons of false positives

Russian

  • added and improved rules
  • improved disambiguation
  • added words and POS data
  • fix POS data
  • rebuilt POS dictionary
  • improved spell checking

Spanish

  • added and improved rules
  • updated dictionary (spanish-pos-dict-1.5)

Ukrainian

  • many new words in the POS dictionary
  • added and improved rules
  • improved tagging and disambiguation

General

  • The --allow-origin option doesn't require a parameter anymore in order to avoid confusion about whether * needs to be quoted on Windows. Using --allow-origin without a parameter now implies *.
  • Added new value firstupper for case_conversion attribute in grammar.xml (see issue #3241).

5.5 (2021-10-02)

Catalan

  • added and improved rules
  • updated dictionary (catalan-pos-dict-2.15)

Dutch

  • added and improved rules
  • extended spelling dictionary

English

French

  • added and improved rules
  • updated dictionary (french-pos-dict-0.4) with words from added.txt and removed.txt, and fixed lemmas of many adjectives (infinitive->masc. sing.)

German

  • added and improved rules

Portuguese

  • added and improved rules
  • added words and POS data
  • fixed tons of false positives

Russian

  • added and improved rules
  • updated POS and spellchecker dictionary
  • activate some picky rules in picky mode

Spanish

  • added and improved rules
  • updated dictionary (spanish-pos-dict-1.3)

Ukrainian

  • over 5000 new words in the POS dictionary
  • added and improved rules
  • improved tagging and disambiguation

5.4 (2021-06-25)

Catalan

  • added and improved rules
  • updated dictionary (catalan-pos-dict-2.14)

Dutch

  • added and improved rules
  • extended spelling dictionary

English

  • additional tags for personal pronouns, e.g. us[we/PRP,we/PRP_O1P]; mine[mine/PRP$,I/PRP$_P1S]

French

  • added and improved rules

Galician

  • small rule improvements

German

  • added and improved rules

Portuguese

  • the sentence length rule is now active in 'picky' mode
  • added and improved rules
  • added words and POS data
  • fixed tons of false positives

Russian

  • small rule improvements

Spanish

  • added and improved rules
  • updated dictionary (spanish-pos-dict-1.2)

Ukrainian

  • new words in the POS dictionary
  • added and improved rules
  • improved tagging and disambiguation

General

  • The sentence length rule is now a text-level rule and it underlines the whole sentence, not just the position where the threshold is reached.

5.3 (2021-03-29)

Arabic

  • added and improved rules
  • improve tagger and synthesizer to better tag pronouns
  • add ArabicTransVerbRule and Arabic Punctuations Whitespace Rules

Catalan

  • added and improved rules
  • updated dictionary (catalan-pos-dict-2.14)

Dutch

  • added and improved rules

English

French

  • added and improved rules

German

  • added and improved rules

Portuguese

  • added and improved rules
  • added words and POS data
  • fixed tons of false positives

Russian

  • added and improved rules

Spanish

  • added and improved rules
  • updated dictionary (spanish-pos-dict-1.1)

Ukrainian

  • over 6000 new words in the POS dictionary
  • added and improved rules
  • improved tagging and disambiguation

5.2 (released 2020-12-29)

Catalan

  • added and improved rules
  • updated dictionary (catalan-pos-dict-2.13)

Dutch

  • added and improved rules
  • There's now support for Belgian Dutch (nl-BE). "Dutch" (nl) is still the default. nl-BE-specific rules can be added to nl-BE/grammar.xml

English

German

  • added and improved rules

French

Portuguese

  • added and improved rules
  • added words and POS data
  • fixed tons of false positives

Russian

  • added and improved rules
  • added new words and POS data for it
  • improved suggestions algorithms for spellchecking

Spanish

  • added and improved rules
  • updated dictionary (spanish-pos-dict-0.9)

Ukrainian

  • over 7000 new words in the POS dictionary
  • added and improved rules
  • improved tagging and disambiguation

General

  • There's now RegexAntiPatternFilter which can be used to have antipatterns for <regexp> rules. Use like this:
    <regexp>my regex</regexp>
    <filter class="org.languagetool.rules.patterns.RegexAntiPatternFilter" args="antipatterns:regex1|regex2"/>
    
    Note that the regex after antipatterns: cannot contain spaces.
  • German, French, Dutch, and Spanish have ngram-based false friends for some time already, meaning that a German/Dutch/... native speaker will get an error if (probably) using and English word incorrectly in an English text. The change in this version is that for all other language pairs that also have false friends, these rules are now active only in picky mode (--level PICKY on the command line, level=picky with the HTTP API.)

5.1.3 (released 2020-10-15)

LibreOffice / Apache OpenOffice Integration

5.1.2 (released 2020-10-05)

LibreOffice / Apache OpenOffice Integration

5.1.1 (released 2020-09-29)

LibreOffice / Apache OpenOffice Integration

  • fixed a NullPointerException crash in the LibreOffice/OpenOffice add-on

5.1 (released 2020-09-25)

Catalan

  • added and improved rules

Dutch

  • added and improved rules

English

French

  • added and improved rules

German

Italian

  • small rule improvements

Portuguese

  • added and improved rules
  • added words and POS data
  • fixed tons of false positives

Russian

  • added and improved rules

Spanish

  • added and improved rules

Ukrainian

  • dictionary update
  • many new punctuation rules
  • many new styling rules
  • tokenization and tagging improvements
  • disambiguation improvements

General

  • each pair of ConfusionProbabilityRule has its own ID now, so it can be separately turned on/off
  • new XML attribute chunk_re for <token>, which specifies a chunk as a regular expression

5.0.2 (2020-08-28)

5.0 (2020-06-27)

Arabic

  • added and improved rules
  • updated POS dictionary (Arramooz #e33794e)
  • remove the Algerian variant (ar-DZ)
  • add support of ngram data (languagetool-tools-ar)
  • add Darja, Diacritics, Redundancy, WrongWordInContext, Wordiness, Homophones and WordCoherency rules.

Catalan

  • added and improved rules
  • updated dictionary (catalan-pos-dict-2.10)

Dutch

  • added and improved rules

English

Esperanto

  • added and improved rules

French

  • added and improved rules

German

  • added and improved rules
  • rules that apply to de-DE and de-AT (but not de-CH) can now be placed in de/de-DE-AT/grammar.xml
  • Updated the German part-of-speech dictionary (https://github.com/languagetool-org/german-pos-dict) to version 1.2.1.
  • Special chars _ and / can now be escaped in spelling.txt and spelling_custom.txt using the backslash. For example, foo\/s will add foo/s to the dictionary.

Persian

  • commented out rules that caused many false alarms

Portuguese

  • added and improved rules
  • added words and POS data
  • fixed tons of false positives

Russian

  • added and improved rules
  • added new Java rules
  • rebuilt and improved main spellchecker dictionary, added many new words
  • new variant (only yo "ё") spellchecker dictionary and new java rule for it (set off by default)
  • new filter arguments: prefix and suffix to be used for matching the part-of-speech of parts of words with prefix and suffix added to original token, e.g.:
       <filter class="org.languagetool.rules.ru.RussianPartialPosTagFilter"
                args="no:2 regexp:(.*) postag_regexp:(ADV) prefix:не suffix:  "/>

Slovak

  • commented out rules that caused many false alarms

Spanish

Ukrainian

  • dictionary update, including many rare and slang words
  • new rules
  • tokenization and tagging improvements
  • disambiguation improvements

General

  • added replace_custom.txt for several languages so users can have their own very simple replace rules without worrying about updates (they still need to copy the file to the new LT version, though).
  • Updated dependency com.gitlab.dumonts:hunspell to 1.1.1 to make spell checking work on older Linux distributions like RHEL 7.

4.9 (2020-03-24)

Arabic

  • Added initial support for Arabic, contributed by Sohaib Afifi (#2219)

Catalan

  • added and improved rules
  • updated dictionary (catalan-pos-dict-2.7)

Dutch

  • added and improved rules

English

French

  • improved rules

German

  • added and improved rules
  • compounds.txt now automatically expands ß to ss when using German (Switzerland)
  • German spelling.txt now supports prefix_verb syntax like vorüber_eilen so the speller will accept all forms of "eilen" prefixed by "vorüber"

Irish

  • Added initial support for Irish, contributed by Jim Regan (#2260)

Portuguese

  • added and improved rules
  • added words and POS data

Russian

  • small improvements

Ukrainian

  • dictionary update
  • new rules
  • tokenization and tagging improvements

4.8 (released 2019-12-27)

Catalan

  • added and improved rules
  • updated dictionary (catalan-pos-dict-2.6)

Chinese

Danish

Dutch

  • added and improved rules

English

Esperanto

  • corrections are now offered for spell check errors

French

German

  • added and improved rules

Greek

  • updated spelling dictionary to el_GR 0.9 (14/03/2019), by George Zougianos

Khmer

Portuguese

  • added and improved rules
  • added words and POS data

Russian

  • added new words
  • improve java rule

Swedish

Ukrainian

  • dictionary update
  • new rules
  • tokenization improvements

General

  • The unmaintained code from package org.languagetool.dev.wikipedia.atom has been removed. It hadn't been maintained for years and didn't work properly anymore.
  • spelling_global.txt has been added. Words or phrases added here will be accepted for all languages.
  • prohibit_custom.txt and spelling_custom.txt can be used to make your own additions to spelling.txt and prohibit.txt without having to edit those files after a LanguageTool update (you will still need to manually copy those files). Paths to these files (xx = language code): ./org/languagetool/resource/xx/hunspell/prohibit_custom.txt ./org/languagetool/resource/xx/hunspell/spelling_custom.txt Note that you can simply create these files if they don't exist for your language yet.

HTTP API / LT server

  • The dynamic languages feature (lang-xx=... and lang-xx-dictPath=...) now also supports hunspell dictionaries. Just let lang-xx-dictPath point to the absolute path of the .dic file. Note that hunspell is quite slow when it comes to offering suggestions for misspelled words.

Java API

  • AbstractSimpleReplaceRule2 has been fixed so that it's now case-insensitive. If you implement a sub class of it and you want the old behavior, please implement isCaseSensitive() and have it return true. (Issue #2051)

Internal

  • The internal hunspell has been updated from 1.3 to 1.7, now using https://gitlab.com/dumonts/hunspell-java as the project providing the bindings. For Portuguese, this speeds up generating suggestions for misspellings by a factor of about 3 (but it's still slow compared to Morfologik). 32-bit systems are not supported anymore (only affects languages like German and French).
  • Experimental: the new default="temp_off" attribute in grammar.xml files will turn off a rule/rulegroup, but keep it activated for our nightly regression tests.
  • Many external dependencies have been updated to new versions.

4.7 (2019-09-28)

Catalan

  • added and improved rules
  • updated dictionary (catalan-pos-dict-2.5)

Dutch

  • added and improved rules

English

French

  • improved rules

Galician

  • improved rules

German

  • added and improved rules

Italian

  • small rule improvements

Portuguese

  • added rules and significantly improved accuracy
  • disambiguation improvements
  • POS and spelling improvements

Russian

  • improved rules
  • added new words to spellchecker dictionary

Spanish

  • added and improved rules

Ukrainian

  • 2k of new words in the dictionary
  • improved tokenization
  • improved dynamic tagging
  • added and improved rules

General

  • Spell suggestion improvements: for many cases of a misplaced space, the suggestions are now better. For example, "thef eedback" can now be corrected to "the feedback" in one step. (#1729)
  • The synthesizer now considers entries in added.txt and removed.txt (except for Catalan and Polish; for German removing compounds in removed.txt might not work) (#884)

4.6 (2019-06-26)

Catalan

  • added and improved rules
  • updated dictionary (catalan-pos-dict-2.4) with more health terminology

Dutch

  • added and improved rules

English

Esperanto

  • added and improved rules

French

  • added and improved rules
  • Rule FRENCH_WHITESPACE has been split into FRENCH_WHITESPACE (on by default) and FRENCH_WHITESPACE_STRICT (off by default). FRENCH_WHITESPACE only complains if there's no space at all before ?, !, ;, :, or ». FRENCH_WHITESPACE_STRICT complains if there's no space or a common space instead of a non-breaking space before these characters.
  • added some popular names to dictionary

Galician

  • added verbal agreement rules

German

  • added and improved rules
  • The false friend rule has been modified to use ngrams: Now false friends cause error messages if they are used in a wrong context, according to ngram statistics. Note that some pairs from false-friends.xml are not supported anymore because their precision isn't good enough. See confusion_sets_l2_de.txt for active DE/EN pairs. Use My handy is broken. to test the rule. As before, this will only create an error if motherTongue is set to a German language code.
  • prohibit.txt: lines starting with .* will prohibit all words ending with the subsequent string (e.g., .*artigel will prohibit Versandartigel)

Greek

  • added rules

Italian

  • added popular names to dictionary

Portuguese

  • POS and spelling improvements

Russian

  • added and improved rules
  • added new words to spell dictionary

Spanish

  • updated spell dictionary from 2.1 to 2.4

Ukrainian

  • support for new spelling rules from 2019
  • thousands of new words in the dictionary
  • many rule improvements
  • tokenization and tagging improvements

HTTP API / LT server

  • altLanguages will only be considered for words with >= 3 characters
  • Cleaned up error handling: invalid parameters will now return an HTTP error 400 instead of 500.

4.5.1 (2019-03-28)

LibreOffice / Apache OpenOffice Integration

  • Fixed a bug that caused the rules in the options dialog to not appear in the text language

4.5 (2019-03-26)

Catalan

  • added and improved rules
  • updated dictionary (catalan-pos-dict-2.3) with health terminology

English

  • resource/en/en-US-GB.txt contains a mapping from US to British English and vice versa. It's not used to detect correct or incorrect spellings, but only to improve error messages so that they explicitly explain that the incorrect word is actually a different variant (like 'colour' in an en-US text).
  • updated en_GB spellchecker dictionary from https://github.com/marcoagpinto/aoo-mozilla-en-dict (Version 2.70 - 2019-03-01)
  • spell check ignores single characters (e.g., 'α')

Galician

  • added and improved rules
  • disambiguation improvements
  • foreign names recognition

German

  • added and improved rules
  • Simple German: added and improved rules
  • improved suggestions for typos that end with a dot (typically at the end of the sentence) - the dot is not included anymore
  • spell check ignores single characters (e.g., 'α') and hyphenated compounds (e.g., 'α-Strahler')

Portuguese

  • added and significantly improved rules accuracy
  • disambiguation improvements
    • Chinese common names are now detected
  • POS and spelling improvements
  • updated Hunspell dictionaries to:
    • [pt-PT pos-AO] Dicionários Portugueses Complementares 3.1

Russian

  • added and improved rules
  • disambiguation improvements
  • added many words without "yo" letter to POS dictionary
  • added new words to spell dictionary

Ukrainian

  • dictionary update
  • added and improved rules
  • improvements to tokenization, tagging, and disambiguation

General

  • URLs written like mydomain.org/ are now detected as domains and not considered spelling errors anymore. Note that the slash is still needed to avoid missing real errors.
  • JSON output: The replacements list now has an optional new item shortDescription for each value. It can contain a short definition/hint about the word. Currently, the only words that have a short description are ones that have a description in confusion_sets.txt (i.e. a text after the | symbol).

General

  • bug fix: don't make interpretAs part of getTextWithMarkup() (#1393)
  • Experimental new attribute raw_pos for the <pattern> element in grammar.xml. If set to yes, the postag will refer to the part-of-speech tags before disambiguation.
  • Experimental support for <antipattern> in disambiguation.xml

HTTP API / LT server

  • Experimental new parameter preferredLanguages: up to a certain limit (currently 50 characters), only these languages will be considered for language detection. This has to be a comma-delimited list of language codes without variants (e.g. use 'en', not 'en-US'). This only works with fasttext configured as the language detector.
  • Spellcheck-only languages can now be added dynamically from the configuration using lang-xx=languagename and lang-xx-dictPath=/path/to/morfologik.dict. xx needs to be the language code. The JSON result will contain spellCheckOnly: true for these languages.

4.4.1 (2019-01-14)

  • Fixed a bug that prevented opening the Options dialog in LibreOffice/OpenOffice

4.4 (2018-12-27)

Catalan

  • added and improved rules
  • updated dictionary

Dutch

  • added and improved rules, including more confusion rules for dyslectic people
  • added large amount of family names to reduce false alarms in spelling

English

French
  • small rule improvements

German

  • added and improved rules
  • Swiss German: improved POS tagging of words that contain 'ß' in de-DE German (e.g., 'gross' is tagged as 'gross[groß/ADJ:PRD:GRU]'); (#1147)
  • Simple German: added and improved rules; restructured grammar.xml

Portuguese

  • added and improved rules
  • disambiguation improvements
  • POS and spelling improvements

Russian

  • added and improved rules
  • disambiguation improvements
  • POS and spelling dictionary improvements

Serbian

  • Serbian never moved beyond its "initial support" state with a tiny number of rules, and it has no active maintainer, so we have deactivated it for now. If you'd like to maintain support for Serbian, let us know in the forum (https://forum.languagetool.org). Once it's clear that a new active long-term maintainer has been found, we'll activate support for Serbian again.

Ukrainian

  • dictionary update (about 7k of new words)
  • added and improved rules
  • improvements to tokenization, tagging, and disambiguation

HTTP API / LT server

  • Experimental support for altLanguages parameter: takes a list of language codes. Unknown words of the main languages (as specified by the language parameter) will cause errors of type "Hint" if accepted by one of these languages. We expect clients to interpret this like style issues, e.g. these words should be underlined with a light blue instead of red. Support for this is experimental, i.e. it might be removed again or implemented in a different way.
  • Experimental support for noopLanguages parameter: takes a list of language codes of languages that are not supported by LT but that will be detected and mapped to a no-op language without rules. Useful for clients that rely on language auto-detection and whose users might use languages not supported by LT. NOTE 1: only works with fastText configured NOTE 2: setting languages here will worsen language detection quality on average
  • Change to language detection behavior: Removed fallback to English when confidence of detection algorithm is low, instead now always returning highest scoring detected language. Added a field confidence to detectedLanguage object in the JSON response that contains the probability score for the detected language as computed by the detection algorithm.

4.3 (2018-09-26)

Catalan

  • added and improved rules

Dutch

  • added and improved rules

English

  • added and improved rules
Esperanto
  • added and improved rules
French
  • small rule improvements

Galician

  • added and improved rules

German

  • added and improved rules

German (simple)

  • added and improved rules

Portuguese

  • added and improved rules
  • improvements to disambiguation, and segmentation
  • updated Hunspell dictionaries to:
    • [pt-PT pos-AO] Dicionários Portugueses Complementares 3.0

Russian

  • added and improved rules

Ukrainian

  • added and improved rules

General

HTTP API / LT server

  • The server can now accept JSON as the data parameter that describes markup. For example:
    {"annotation":[
      {"text": "A "},
      {"markup": "<b>"},
      {"text": "test"},
      {"markup": "</b>"}
    ]}
    
    With this input, LT will ignore the markup parts and run the check only on the text parts. The error offset positions will still refer to the original input including the markup, so that suggestions can easily be applied. You can optionally use interpretAs to have markup interpreted as whitespace, like this:
    {"markup": "<p>", "interpretAs": "\n\n"}
    
    Note that HTML entities (including &nbsp;) still need to be converted to Unicode characters before feeding them into LT. (Issue: #757)
  • The blockedReferrers setting now also considers the Origin header
  • A blockedReferrers setting of foobar.org will now automatically match http://foobar.org, http://www.foobar.org, https://foobar.org, and https://www.foobar.org
  • New setting fasttextModel (see https://fasttext.cc/docs/en/language-identification.html) and fasttextBinary (see https://fasttext.cc/docs/en/support.html). With these options set, the automatic language detection is much better than the built-in one.
  • Experimental new mode parameter with all, textLevelOnly, or allButTextLevelOnly as value: Will check only text-level rules or all other rules. As there are fewer text-level rules, this is usually much faster and the access limit for characters per minute that can be checked is more generous for this mode.
  • Improved spellchecker suggestions (not yet enabled by default). See https://forum.languagetool.org/t/gsoc-reports-spellchecker-server-side-framework-and-build-tool-tasks/2926/43
  • Experimental new type in JSON. This is supposed to help clients choose the color with which they underline/mark errors. Please do not rely on this yet, it might change or even be removed.

4.2 (2018-06-26)

Breton

  • made many messages shorter
  • updated FSA spelling dictionary from An Drouizig Breton Spellchecker 0.15

Catalan

  • added and improved rules
  • rules and updated dictionary for new diacritics rules (IEC 2017)

Dutch

  • added and improved rules

English

Esperanto

  • added and improved rules

German

  • added and improved rules
  • updated jwordsplitter to 4.4 to prevent excessively long processing times for artificially long compounds
  • prohibit.txt: lines ending with ".*" will prohibit all words starting with the previous string

German (simple)

  • added and improved rules

Greek

  • added rules

Portuguese

  • added and improved rules

Russian

  • added and improved grammar and punctuation rules
  • upgraded the tagging and synthesizer dictionaries from AOT.ru rev.269 (extend tags, add missing tags)
  • spelling dictionary update

Spanish

  • added and improved a few rules

Ukrainian

  • dictionary update (more than 15k of new words)
  • added and improved rules
  • some improvements to tokenization, tagging and disambiguation

HTTP API / LT server

  • The JSON contains a new section detectedLanguage (under language) that contains information about the automatically detected language. This way clients can suggest switching to that language, e.g. in cases where the user had selected the wrong language.
  • New optional configuration setting blockedReferrers: a comma-separated list of HTTP referrers that are blocked and will not be served
  • BETA: New optional configuration settings dbDriver, dbUrl, dbUsername, dbPassword to allow user-specific dictionaries

Java API

  • The parameters of the *SpellerRule classes (e.g. MorfologikRussianSpellerRule) have changed
  • LanguageIdentifier will now only consider the first 1000 characters when identifying the language of a text. This improves performance for long texts.

4.1 (2018-03-27)

Catalan

  • added and improved rules

Chinese

  • added some rules

Dutch

  • added and improved rules
  • added new Java rule NL_PREFERRED_WORD_RULE that suggests preferred words (e.g., 'fiets' for 'rijwiel')

English

  • all-uppercase words are now also spellchecked
  • added and improved rules
  • added remaining collocation rules (~130) contributed by Nicholas Walker (Bokomaru)

Esperanto

  • words written with x-sistemo now get proper POS tag so grammar mistakes can now be found in: ambaux virino (->ambaux virinoj), mi farigxis maljunan (-> mi farigxis maljuna), etc.
  • added and improved rules
  • added many <url> to rules

French

  • improved suggestion for spelling mistakes (#912)

Galician

  • added a couple of rules

German

  • added and improved rules
  • New rule that checks coherent use of Du/du, Dich/dich etc. Assumes that the first use has 'correct' capitalization and suggests the same capitalization for subsequent uses.
  • New line extension -* for ignore.txt: entries ending with -* are ignored only if they are part of a hyphenated compound (e.g, Fair-Trade-* allows Fair-Trade-Kakao)
  • Added a new rule that tries to find compounds that are probably not correct, like Lehrzeile instead of Leerzeile, requires ngram data (rule id DE_PROHIBITED_COMPOUNDS)

German (simple)

  • added and improved rules

Portuguese

  • added and improved rules

Russian

  • sentence segmentation improvements
  • added and improved rules
  • upgraded the tagging and synthesizer dictionaries with extended POS tags from AOT.ru rev.269

Spanish

  • update to the part-of-speech dictionary

Ukrainian

  • dictionary update (~5K new lemmas)
  • compound word tagging improvements
  • many new disambiguation rules
  • several new barbarism and grammar rules

HTTP API / LT server

  • The server now returns HTTP error code 500 in case of a timeout (it used to return 503)

Java API

  • Constructors that take a ResultCache have been removed from MultiThreadedJLanguageTool as using them caused incorrect results. (#897)

4.0 (2017-12-29)

Catalan

  • added and improved rules
  • updated and renamed dictionary: ca-ES.dict (external dependency: catalan-pos-dict 1.6)
  • added new dictionary for Valencian including most words from Diccionari Normatiu Valencià (AVL): ca-ES-valencia.dict (external dependency: catalan-pos-dict 1.6)

Dutch

  • added and improved rules

English

  • added and improved rules
  • removed the category MISC and moved the rules to more specific categories
  • added WordCoherencyRule, to detect cases where two different variants of a word are used in the same text (e.g. archaeology and archeology)
  • added approximately 70 collocation rules contributed by Nicholas Walker (Bokomaru)
  • added support for locale-specific spelling suggestions (locale-specific spelling_en-XY.txt files)
  • updated en_GB spellchecker dictionary from https://github.com/marcoagpinto/aoo-mozilla-en-dict
  • updated en_US spellchecker dictionary from http://wordlist.aspell.net (Version 2017.08.24)
  • updated en_CA spellchecker dictionary from http://wordlist.aspell.net (Version 2017.08.24)

French

  • LT now offers suggestions for spelling errors

Galician

  • added and improved rules, including:
    • grammar: agreement rules added (only number and gender agreement)
    • common normative errors: includes Castilianisms, Lusitanianisms, Hipergalicisms, archaisms and Anglicisms correction
    • style: barbarism, redundant expressions, and wordy expressions detection added
    • typography: spacing and number formatting improvements; chemical formulas; degree signs; dashes; punctuation; international system standards; and mathematical symbol formatting
  • development, punctuation and repetition rules categories added
  • multiword disambiguation added
  • disambiguation improvements
  • new word tokenizer
  • significant POS tagging and synthesizing improvements
  • spellchecking exceptions for:
    • abbreviations;
    • variables in formulas, units, and related statistical vocabulary;
    • common Latin, English and French expressions;
    • species scientific names;
    • famous personalities
  • updated Hunspell dictionaries to:
    • [gl-ES] Version 12.10 "Xoán Manuel Pintos"

German

  • added and improved rules
  • New rule that checks coherency of hyphen usage in compounds, e.g. it complains when "Ärzteverband" and "Ärzte-Verband" are both used in the same text. While both spellings are correct, it's probably a good idea to stick to one spelling.
  • improved POS tagging of hyphenated compounds (e.g., "CO2-arm" is recognized as a variant of "arm")

Polish

  • added rules
  • disambiguation improvements

Portuguese

  • added and improved rules
  • LibreOffice category rules moved to other categories
  • disambiguation improvements
  • updated Hunspell dictionaries to:
    • [pt-PT pos-AO] Dicionários Portugueses Complementares 2.2
    • [pt-AO pre-AO] Dicionários Portugueses Complementares 2.2
    • [pt-MZ pre-AO] Dicionários Natura 14.08.2017

Russian

  • added and improved grammar and punctuation rules
  • spelling dictionary update
  • new Russian-English false friends added (thanks to ZakShaker)

Serbian

  • initial support for Serbian by Zoltán Csala

Ukrainian

  • big dictionary update (~10K new lemmas)
  • improvements in tokenization
  • compound word tagging improvements
  • more than 350 new disambiguation rules
  • several new barbarism and grammar rules

General

  • Now runs with Java 9 (compilation with Maven still has issues with Java9)
  • The spell checker tries harder to find suggestion for misspellings that have a Levenshtein distance of larger than 2. The maximum Levenshtein distance is now 3. This way you now get a suggestion for e.g. algortherm (algorithm) or theromator (thermometer). In the worst case (every single word of a text misspelled), this has a performance penalty of about 30%.
  • Better support for Unicode codepoints greater than 0xFFFF

word2vec

GUI (stand-alone version)

Java API

  • A RuleMatch can now have a URL, too. The URL usually points to a page that describes the error or grammar rule in more detail. Before, only the Rule could have a URL. A RuleMatch URL will overwrite the Rule URL in the JSON output.
  • A RuleMatch now also has information about the sentence the error occurred in (it used to have only position information and the caller was expected to find the error context and/or sentence position in the original text).

HTTP API / LT server

  • change in configuration: requestLimit and requestLimitPeriodInSeconds now both need to be set for the limit to work
  • new property key timeoutRequestLimit: similar to requestLimit, but this one limits not all requests but blocks once this many timeouts have been caused by the IP in the time span set by requestLimitPeriodInSeconds
  • new property key requestLimitInBytes: similar to requestLimit, but this one limits the aggregated size of requests caused by an IP in the time span set by requestLimitPeriodInSeconds
  • New property key maxErrorsPerWordRate: set the maximum allowed errors per word, e.g. 0.3 if the maximum is about one error per three words. More errors will stop the check with an exception. This is useful so no processing time gets wasted for texts with a huge amount of errors that are only caused by the wrong language being selected (leading to most words being detected as spelling errors).
  • The JSON output now contains a sentence property with the text of the sentence the error occurred in.

3.9 (2017-09-26)

Breton

  • small rule improvements

Catalan

  • added and improved rules

Dutch

  • added and improved rules

English

  • added and improved rules

Esperanto

  • added and improved rules

French

  • added and improved rules
  • upgraded dictionaries to Dicollecte-6.1

German

  • added and improved rules
  • spell checker suggestions have been improved a lot by considering more words, especially compounds (de-DE only so far, not yet active for de-AT and de-CH) (#725)
  • added special dictionary extension files spelling-de-AT.txt and spelling-de-CH.txt for de-AT and de-CH that will be considered in addition to spelling.txt
  • updates according to "Amtliches Regelwerk der deutschen Rechtschreibung aktualisiert", 6/2017 (http://www.rechtschreibrat.com/DOX/rfdr_PM_2017-06-29_Aktualisierung_Regelwerk.pdf)
  • added POS tagging of alternative imperative forms such as "Geh" or "küss" (in addition to "Gehe"/"küsse")
  • introduced two new line endings ('?' and '$') for the data-file compounds.txt; these endings indicate that the mid-word parts of the compound need to be lower-cased (e.g., 'Geräte Wahl' -> 'Gerätewahl')

Portuguese

  • added and improved grammar and style rules, including:
    • grammar: general agreement rules, pronominal collocations, paronyms and homophones detection improvements; time agreement rules added
    • punctuation: greetings and farewell punctuation
    • style: puffery, weasel words, weak expressions, and biased opinion words detection added (disabled by default)
    • syntax: new category; fragment detection improvements
    • typography: spacing, number, and mathematical symbol formatting improvements
  • disambiguation improvements
  • false friends added
    • Portuguese to Galician (16 new pairs)
  • significant POS tagging and synthesizing improvements
  • spellchecking exceptions for abbreviations, variables in formulas, units, and related statistical vocabulary
  • updated Hunspell dictionaries to:
    • [pt-PT pos-AO] Dicionários Portugueses Complementares 2.0
    • [pt-AO pre-AO] Dicionários Portugueses Complementares 2.0
    • [pt-MZ pre-AO] Dicionários Natura 15.06.2017

Russian

  • spelling dictionary update
  • added and improved some rules

Spanish

  • added and improved some rules

Ukrainian

  • significant dictionary update:
    • more than 60K of new words
    • some inflection adjustments
  • improved dynamic tagging for compound words
  • many new rules (barbarism, grammar, and spelling)
  • inflection agreement rule updates

Java API

  • AnnotatedText (built via AnnotatedTextBuilder) can now contain document-level meta data. This might be used by rules in the future.

3.8 (2017-06-27)

Catalan

  • added and improved rules
  • updated dictionary and rules for official names of Valencian municipalities

Chinese

  • added one rule

Dutch

  • added many rules (by Ruud Baars)
  • spelling dictionary update

English

  • added and improved rules

German

  • added and improved rules
  • improved messages for old spelling variants, e.g. Kuß now suggests only Kuss and also has a message explaining the user that Kuß is an old spelling

Polish

  • added rules
  • added some common typos

Portuguese

  • added and improved grammar and style rules, including:
    • grammar: general agreement rules, contractions, pronominal collocations, compounding, and paronyms detection
    • style: wordy expressions detection added and significant redundant expressions detection improvements
    • punctuation: significant improvements
    • formal speech: archaims, cacophonies, childish language and slang detection added
    • typography: international system standards, number and mathematical symbol formatting
    • misspellings: foreign famous personalities common misspelings
    • AO90: identify words with changed spelling
  • disambiguation improvements
  • false friends support added
    • Portuguese to Catalan (26 new pairs)
    • Portuguese to Spanish (7 new pairs)
  • spell checking exceptions for common Latin, English, and French expressions, species scientific names, and famous personalities
  • updated Hunspell dictionaries to:
    • [pt-PT pos-AO] Dicionários Portugueses Complementares 1.4
    • [pt-BR] VERO version 2.1.4

Russian

  • added and improved rules

Slovak

  • major rule updates by Matúš Matula

Ukrainian

  • Significant dictionary update:
    • thousands of new words
    • some inflection adjustments
  • Improved dynamic tagging for compound words
  • Many new rules (barbarism, grammar, and spelling)
  • New noun-verb agreement rule

HTTP API

  • The deprecated AfterTheDeadline mode has been removed
  • The apiVersion property of the JSON output is now a number instead of a string (issue #712)

Java API

  • Some deprecated methods and classes have been removed.

Internal

  • spelling.txt allows multi-word entries: the words/tokens (separated by " ") of one line are converted to a DisambiguationPatternRule in which each word is a case-sensitive and non-inflected PatternToken (result: the entire multi-word entry is ignored by the spell checker)

LT server

  • When running a LT server, the enabled/disabled rules loaded from a configuration file at the startup time will be the new default rules. Previously these rules were "forgotten" when a server query used the parameters for enabling and disabling rules. Now the rules from the query will be added to the rules from the configuration file.

3.7 (2017-03-27)

Breton

  • small rule improvements

Catalan

  • added and improved rules
  • updated dictionary

English

  • added and improved rules

French

  • improved rules
  • upgraded dictionaries to Dicollecte-6.0.2

German

Greek

  • added and improved rules

Italian

  • added one rule

Lithuanian, Malayalam, and Icelandic

  • Lithuanian, Malayalam, and Icelandic are not part of this release anymore. They still exist in the git repository and can be re-activated as soon as a new maintainer takes care of them.

Portuguese

  • added and improved grammar and style rules, including:
    • grammar: general agreement rules, 'crase', pronomial colocations, impersonal verbs, fragment, and paronyms detection improvements
    • capitalization: AO90 and AO45 rules
    • style: repetitions and barbarism detection
    • typography: number formatting, chemical formulas, degrees signs, dash signs, and punctuation
    • semantics: wrong words in the context (22 confusion pairs), url validator and date checker improvements
    • registered brands category added
    • translation errors category added
  • false friends support added:
    • Portuguese to Spanish (186 new pairs)
    • Portuguese to English (156 new pairs)
    • Portuguese to French (78 new pairs)
    • Portuguese to German (16 new pairs)
    • Portuguese to Galician (9 new pairs)
  • spellchecking suggestions activated
  • updated Hunspell dictionary to:
    • [pt-PT pos-AO] Dicionários Portugueses Complementares 1.2
    • [pt-AO pre-AO] Dicionários Portugueses Complementares 1.2
    • [pt-MZ pre-AO] Dicionários Natura 18.02.2017

Russian

  • added and improved rules
  • updated tagger dictionary from AOT.ru rev.269 with extended POS tags

Ukrainian

  • Significant dictionary update:
    • many new words
    • some inflection adjustments
  • Many new rules (barbarism, punctuations, and grammar)
  • Improved dynamic tagging for compound words

LibreOffice / Apache OpenOffice Integration

  • Options dialog now uses system theme instead of Nimbus.

Command-line

  • Added a --languageModel option to the embedded server, thanks to Michał Janik (issue #404)

HTTP API

  • The 'AfterTheDeadline' mode has been deprecated and will be removed in the next version, unless users complain and present a valid use case.
  • The old XML-based API has been removed. The migration to the new JSON-based API is documented at https://languagetool.org/http-api/migration.php
  • Speed up with a cache for cases where the same sentences get checked again (e.g. due to a correction in a text that doesn't affect all sentences but causes the whole text to be re-checked)

Java API

  • Some deprecated methods have been removed.
  • A new class ResultCache has been added to speed up the LT server
  • EnglishRule, GermanRule, CatalanRule, and FrenchRuleare now deprecated. These are empty abstract classes that never had any real use. Rules that extend these classes will directly extend Rule or TextLevelRule in a future release.
  • All rules that work on the text level instead of the sentence level (e.g. word coherency) now extend TextLevelRule instead of Rule

Internal

  • OpenNLP has been updated from 1.6.0 to 1.7.2 (only used for English)

3.6 (2016-12-28)

Breton

  • small rule improvements

Catalan

  • added and improved rules

English

French

  • added and improved rules
  • upgraded dictionaries to Dicollecte-5.7

German

Lithuanian

  • The Lithuanian class has been deprecated. Lithuanian in LT hasn't been maintained for years and there's no new maintainer in sight. It has also very low usage on languagetool.org and very few error detection rules anyway, so we'll remove its support from LT in the next release.

Malayalam

  • The Malayalam class has been deprecated. Malayalam in LT hasn't been maintained for years and there's no new maintainer in sight. It has also very low usage on languagetool.org and very few error detection rules anyway, so we'll remove its support from LT in the next release.

Portuguese

  • general agreement rules added
    • number and gender words agreement
    • general subject-verb agreement
    • accentuated form confusion, 'dequeísmos' and many more
  • new compound form detection (pt-PT recognizes all compound verbal derivations)
  • duplications, redundancies, typography and semantics categories added
  • style category rules added
    • new word repetitions rules, fragment detection, verbosity checks, passive voice and many other
  • new sentence disambiguator and new word tokenizer
  • sentence segmentation improvements
  • former rules and messages revision, improvement and classification
  • post-reform agreement support added and pre-reform components updated
    • European Portuguese specific rule group added
      • post-reform agreement by default
      • compound verbs, possessive pronouns, reflexive forms placement, gerund and more
    • pre-reform agreement locales support added
      • Angola, Cape Verde, East Timor, Guinea Bissau, Macau, Mozambique and São Tomé e Principe
    • base spelling dictionary and tagger update
    • variants dictionaries added and many part-of-speech fixes
  • Portuguese has been prepared to use ngram data, that means it has a confusion_sets.txt file where word pairs could be added. See http://wiki.languagetool.org/finding-errors-using-n-gram-data for more information but note that we cannot offer the required ngram data yet for Portuguese, as we rely on the Google ngram data and Portuguese isn't part of that.

Russian

  • added and improved many rules
  • added new rules with java filter
  • added new Java rule RussianWordCoherencyRule
  • added words suggested by users
  • improved disambiguation rules
  • updated tagger dictionary from AOT rev.268 with extended POS tags
  • improved SRX sentences segmentation
  • added removed.txt for words that need to be removed from the dictionary

Spanish

  • added and improved rules

Ukrainian

  • significant dictionary update
  • new adj/noun inflection rule
  • dynamic tagging improvements
  • disambiguation improvements
  • some improvements to existing rules
  • experimental noun/verb agreement rule

HTTP API

Java API

  • A new method for removing overlapping errors has been implemented. By default, it is enabled for the HTTP API and LibreOffice outputs, and disabled for the command-line output. If necessary, priorities for rules and categories can bet set in Language.getPriorityForId(String id). Default value is 0, positive integers have higher priority and negative integers have lower priority.

  • Language.getShortName() has been deprecated, use Language.getShortCode() instead

  • Language.getShortNameWithCountryAndVariant() has been deprecated, use Language.getShortCodeWithCountryAndVariant() instead

  • Languages.getLanguageForShortName() has been deprecated, use Languages.getLanguageForShortCode() instead

  • The following languages have been unmaintained for a long time. A warning has been shown for some time on languagetool.org and in the stand-alone GUI for these languages. This warning has now been extended to Java in the form of a deprecation, i.e. the constructors of the following languages have been deprecated. That does not mean they are going to be removed in the next version, but it's a warning that we cannot offer support for them or guarantee they will be included in the future:

    • Belarusian
    • Swedish
    • Icelandic
    • Tagalog
    • Asturian
    • Danish
    • Slovenian

    If you're interested in contributing to one of these languages, please post to our forum at http://forum.languagetool.org.

  • The uppercase sentence start rule (id UPPERCASE_SENTENCE_START) now ignores immunized tokens - this way users can add lowercase words to disambiguation.xml so the rule won't complain about these lowercase words at the beginning of a sentence.

Command-line

Internal

  • Apache commons-lang has been updated from 2.6 to commons-lang3 3.5
  • Updated lucene-gosen-ipadic to 6.2.1 (#376)

3.5 (2016-09-30)

Catalan

  • added and improved rules
  • added words suggested by users

English

Esperanto

  • added and improved rules

French

  • added and improved rules

German

  • added rules
  • fixed several false alarms

Polish

  • added and improved rules

Portuguese (European)

  • added and improved rules

Portuguese (Brazilian)

  • added rules

Russian

  • now possible checking the texts with the signs of stress
  • added and improved many new grammar and style rules
  • added words suggested by users
  • improved disambiguation rules
  • for review, test and improve rules, feedback in bugtracker thanks to Konstantin Ladutenko

Spanish

  • added and improved rules

Ukrainian

  • added ~6k new words
  • added many new grammar and styling rules
  • added many new barbarism replacement suggestions
  • improved dynamic word tagging

General

  • Bugfix: avoid repeating the same suggestion
  • Enhancement: ignore e-mail addresses

Java API

  • Rule.getCorrectExamples() now returns a list of CorrectExamples instead of a list of Strings.

GUI (stand-alone version)

  • speed up for long texts with many errors (#530)
  • add new menu item for showing/hiding the result area

Command-line

3.4 (2016-06-27)

Catalan

  • added and improved rules
  • added words suggested by users

English

French

German

  • added some rules
  • improved handling of hyphenated compound words

Greek

  • added some rules

Polish

  • added and improved rules
  • removed some false alarms

Portuguese

  • added and improved rules

Spanish

Russian

  • rebuilt spellchecker dictionary
  • added words suggested by users
  • added and improved rules

Ukrainian

  • big dictionary update (thousands of new words and many fixes)
  • compound tagger improvements
  • several new rules and many improvements to existing ones
  • new token inflection agreement rule (still work-in-progress so turned off by default)
  • new replacement suggestions for barbarisms

Java API

  • some formerly deprecated code has been removed
  • all rules now have a category ("Misc" if the rule doesn't specify a category)
  • a new module languagetool-http-client has been added with a class RemoteLanguageTool that you can use to query a remote LanguageTool server via HTTP or HTTPS
  • removed the public modifier from LanguageComboBox

Embedded HTTPS server

  • The existing HTTP/HTTPS API will be replaced by a new one that returns JSON. This version of LanguageTool supports both APIs. The new API is prefixed with /v2/. It is documented at https://languagetool.org/http-api/swagger-ui/#/default. Please do not use the old XML-based HTTP API anymore. Information about migrating from the old to the new API can be found at https://languagetool.org/http-api/migration.php
  • Changed behaviour for OutOfMemory situations: the server process now stops instead of being in an unstable state
  • Missing parameters (like text) now cause a 400 Bad Request response (it used to produce 500 Internal Server Error)
  • New parameter preferredVariants to specify which variant is preferred when the language is auto-detected: Example: language=auto&preferredVariants=en-GB,de-AT - if English text is detected, British English will be used, if German text is detected, German (Austria) will be used.
  • Code refactorings: methods have been removed without being deprecated first, e.g. in LanguageToolHttpHandler

Rule Syntax

  • groups of rules and categories are now required to have non-empty names to avoid user confusion

GUI (stand-alone version)

  • detect encoding of files with BOM header
  • add new menu to open recent files
  • add new configuration option to allow user to select the GUI language
  • preserve GUI state between program restarts

Command-line

  • detect encoding of files with BOM header when there is no encoding parameter

3.3 (2016-03-28)

Breton

  • small rule improvements

Catalan

  • added and improved rules
  • added words suggested by users
  • minor change in the format of the binary dictionary: POS tag and frequency data are no longer separated by a separator character.

Dutch

  • small rule improvements and URL updates, thanks to Koen Vervloesem

English

Esperanto

  • improved several rules

French

  • added and improved rules

German

Greek

  • new rule for checking correct spell of ordinal numerals
  • added new XML rules

Polish

  • added and improved a large number of rules, largely improved disambiguation
  • upgraded the tagging and synthesis dictionaries to Morfologik Polimorf 2.1
  • improved tokenization of number ranges (such as 1-1234 or 1--10)
  • added checks on date ranges

Portuguese

  • added and improved rules, improved categorization of rules

Russian

  • added and improved rules, improved categorization of rules
  • added words suggested by users

Spanish

  • added German false friends

Ukrainian

  • big dictionary update:
    • more than 202K lemmas
    • homonyms have been properly split
    • vocative case for inanimates has be added
    • list of barbarism has been updated
  • improved some rules
  • improved sentence tokenization
  • improved dynamic tagging for compounds
  • some improvements for disambiguation

Java API

  • some formerly deprecated code has been removed
  • added acceptPhrases(List<String> phrases) to SpellingCheckRule so you can avoid false alarms on names and technical terms that consist of more than one word.

Embedded HTTPS server

  • Speed up for input with short sentences
  • Added new parameters enabledCategories and disabledCategories that take a comma-separated list of categories to enable/disable. Fixes #326.
  • The output now contains a shortmsg attribute if available, which is a short version of the msg attribute.
  • The output now contains a categoryid attribute if available. It's supposed not to change in future versions (while category might change).

Command-line

  • new parameters --enablecategories and --disablecategories to activate/deactivate all rules in a category (#66)
  • Bugfix: for files >= 64,000 bytes, the position information (fromx and tox) could be wrong. Also, rules that work across paragraphs like the German word coherency rule wouldn't work. Both bugs have been fixed but with the side-effect that large files will now be loaded into memory completely. If you're using LanguageTool on large files (several MB) you might need to split these files now before you check them. If you need the old behavior, use the --line-by-line switch. #254

Wikipedia

  • Indexing: fixed an IllegalArgumentException for long sentences (#364)

Core code

  • Fixed a bug while sentence and paragraph end tags were removed during disambiguation.
  • Fixed a bug with a possible NullPointerException for tokens containing soft hyphens that might be disambiguated.

Morfologik binary dictionaries

  • Updated Morfologik library to version 2.1.0. The tools for building dictionaries (languagetool-tools) have been adapted to the new version. The format of the dictionaries has not changed, except for a minor change only in Catalan.

3.2 (2015-12-29)

  • LanguageTool requires Java 8 now

Belarusian

Catalan

  • fixed false alarms
  • added new rules
  • added words suggested by users

Danish

  • updated hunspell dictionary to Version 2.3 (2015-11-15):
    • Corrections made regarding new spelling of 2012
    • General cleanup
    • A lot of compound flags added
  • fixed bug where Hunspell flags wrongly was in the tagger-dictionary. For example: vintrenes+F+sub:bes:plu:utr:gen/115,70,85,976,941,947 vinåndstermometrenes+F+sub:bes:plu:neu:gen/70,118,85,976
  • added new tags
  • updated and made adjustment for the new things introduced by the new spelling of 2012 and Hunspell-da 2.3

English

Esperanto

  • added/improved several rules

French

  • upgraded dictionaries to Dicollecte-5.5
  • added/improved several rules

German

  • added/improved a few rules
  • improved agreement rule to detect errors like Ich gebe dir ein kleine Kaninchen. where the determiner is indefinite but the adjective fits only for a definite determiner
  • added de/removed.txt so incorrect readings of the POS tagger can be avoided without rebuilding the binary dictionary

Italian

  • added an agreement rule

Portuguese

  • added/improved several rules

Russian

  • added/improved several rules
  • added words suggested by users to spellchecker dictionary

Ukrainian

  • big dictionary update: more than 10k new words, many fixes (the dictionary source is now available at https://github.com/arysin/dict_uk)
  • many new rules
  • improvements for euphony rules
  • improvements in dynamic compound tagger
  • new disambiguation rules

Rule Syntax

  • New rule syntax <regexp>...<regexp> as a simple alternative to <pattern><token>...</token></pattern>. Note that this is limited: E.g. it's not possible to address POS tags and the <suggestion> cannot change the case of the match. Available attributes: type with value smart (treats space in the regular expression as \s+ or a non-breaking space) or exact (smart is the default), mark to specify which part of the match gets underlined (everything by default, use 1 to only underline the first group etc.)
  • Non-breaking spaces (\u00A0) are now treated like regular spaces. Before, using a non-breaking space could cause a rule not to match.
  • <filter> can now also be used in disambiguation.xml

Embedded HTTPS server

  • Speed up for testing short sentences for de-DE, de-AT, and de-CH

Java API

  • GeneralCatalan has been removed, use Catalan instead
  • SuggestionExtractorTool and SuggestionExtractor have been removed
  • ConfusionProbabilityRule has been moved to package org.languagetool.rules.ngrams
  • ConfusionProbabilityRule.getWordTokenizer() is now called ConfusionProbabilityRule.getGoogleStyleWordTokenizer()
  • RuleAsXmlSerializer has been renamed to RuleMatchAsXmlSerializer
  • some formerly deprecated code has been removed
  • some code has been deprecated
  • StringTools.isWhitespace() now returns true for a token that is a non-breaking space or a narrow non-breaking space
  • RuleFilter is not an interface anymore but an abstract class
  • the LanguageModel interface has been redesigned, see BaseLanguageModel for a class similar to the previous implementation
  • Class BerkeleyLanguageModel was added to support BerkeleyLM language models. See https://github.com/adampauls/berkeleylm for the software and e.g. http://tomato.banatao.berkeley.edu:8080/berkeleylm_binaries/ for pre-built models. To use the new models your language class needs to overwrite the getLanguageModel(File) method. For now, we recommend you continue using the Lucene-based models at http://languagetool.org/download/ngram-data/.

LibreOffice / Apache OpenOffice Integration

  • fix: disabling rules that are disabled by default and had been enabled didn't work

Internal

3.1 (2015-09-28)

Catalan

  • added new rules
  • fixed false alarms
  • added words suggested by users

English

French

  • upgraded Hunspell dictionary to Dicollecte-5.4.1
  • upgraded POS tag and Synthesizer dictionaries to Dicollecte-5.4
  • added/improved several rules
  • new filter to be used for matching the part-of-speech of parts of words, e.g.:
       <filter class="org.languagetool.rules.fr.FrenchPartialPosTagFilter"
               args="no:1 regexp:(.*)-tu postag_regexp:V.*(ind|con|sub).*2\ss negate_pos:yes"/>

German

Japanese

  • major rule update with 700+ new rules, thanks to Shugyousha

Polish

  • added some compound prepositions to avoid false alarms (thanks to Sławek Borewicz)

Portuguese

  • added/improved several rules

Russian

  • added and improved a few rules
  • added a few false friends rules (Russian/English)

Ukrainian

  • significant dictionary update (fixes, lot of new adjectives and last names)

LibreOffice / Apache OpenOffice Integration

ngrams

Embedded server

  • new property file key rulesFile to use a .languagetool.cfg file to configure which options should be enabled/disabled in a server (#281)

API

  • several deprecated methods and classes have been removed
  • Rules can now overwrite getAntiPatterns() with patterns to be ignored. See the javadoc for details of what needs to be considered to make this work. See org.languagetool.rules.de.CaseRule for an example.

Internal

  • updated to Lucene 5.2.1
  • updated to Apache OpenNLP 1.6.0

3.0 (2015-06-29)

Breton

  • updated FSA spelling dictionary from An Drouizig Breton Spellchecker 0.13
  • updated POS dictionary from Apertium (svn r61079)

Catalan

  • added new rules
  • fixed false alarms
  • added words suggested by users

English

  • added a few new rules
  • ConfusionProbabilityRule (only enabled with the --languagemodel option) has been rewritten and homophones.txt has been renamed to confusion_sets.txt and now only has few items enabled by default, the rest is commented out to improve quality (less false alarms). Also see http://wiki.languagetool.org/finding-errors-using-big-data

German

  • fixed some false alarms
  • updated to jwordsplitter 4.1 for better compound splitting
  • the spell checker offers correct suggestions now for incorrect past tense forms like "gehte" -> "ging" (useful mostly for non-native speakers)
  • added word frequency information to improve spelling suggestions (but this won't help for compounds which are not in the dictionary)

Polish

  • added new rules
  • fixed dozens of false alarms

Portuguese

  • added/improved several rules (started adding morphologic rules)

Russian

  • improved rules
  • updated spellchecker

Slovak

  • dictionary update and several new rules

Ukrainian

  • big dictionary update (thousands of new words, new tagging for pronouns)
  • improved sentence and word tokenization
  • improved tokenization and tagging of lowercase abbreviations
  • new grammar and styling rules
  • new spelling rules, especially for lowercase abbreviations with dots
  • improved compound word tagging
  • improved some rules coverage
  • many new barbarism replacement suggestions

Bug Fixes

  • UppercaseSentenceStartRule didn't properly reset its state so that different errors could be found when e.g. JLanguageTool.check() got called twice with the same text.
  • Authenticator.setDefault() is now only called if it's allowed by the Java security manager. In rare cases, this might affect using external XML rule files as documented at http://wiki.languagetool.org/tips-and-tricks#toc9 (Github issue #255)

GUI (stand-alone version)

Embedded server

  • performance improvements for checking small texts for the use case that creates a new JLanguageTool object for every check, as done by the embedded server (or multithreaded LT users in general)

Command-line

  • Fixed an error with the --api option that printed invalid XML for large documents or when the input was STDIN (Github issue #251)
  • Print some information to STDERR instead of STDOUT so the --api option makes more sense

API

  • added MultiThreadedJLanguageTool.shutdown() to clean up the thread pool
  • several deprecated methods and classes have been removed, e.g.
    • Language.REAL_LANGUAGES is now Languages.get()
    • Language.LANGUAGES is now Languages.getWithDemoLanguage() - but you will probably want to use Languages.get()
  • Other static methods from class Language have also been moved to Languages
  • Language.addExternalRuleFile() and Language.getExternalRuleFiles() have been removed. To add rules, load them with PatternRuleLoader and call JLanguageTool.addRule().
  • getAllRules(), getAllActiveRules(), and getPatternRulesByIdAndSubId() in class JLanguageTool used to call reset() for all rules. This is not the case anymore. reset() is now called when one of the check() methods is called. This shouldn't make a difference for all common use-cases.
  • Language.setName() has been removed. If you need to set the name, overwrite the getName() method instead.
  • Rule.getCorrectExamples()/getIncorrectExamples(), PatternToken.getOrGroup()/getAndGroup() and RuleMatch.getSuggestedReplacements() now return an unmodifiable list
  • AbstractSimpleReplaceRule.getFileName() and AbstractWordCoherencyRule.getFileName() have been removed, the sub classes are now themselves responsible for loading their data
  • Sub classes of AbstractCompoundRule are now responsible for loading the compound data themselves using CompoundRuleData
  • AbstractCompoundRule.setShort(String) has been removed and added as a constructor parameter instead.

Internal

  • updated to language-detector 0.5

2.9.1 (2015-05-14)

LibreOffice / Apache OpenOffice Integration

Older versions

See CHANGES.txt for changes before 2.9.1.