Skip to content

Latest commit

 

History

History
41 lines (25 loc) · 1.34 KB

CHANGELOG.md

File metadata and controls

41 lines (25 loc) · 1.34 KB

Changelog

Added

  • Some benchmark scripts are added under benchmark/ (#235)

Changed

  • Behavior of the dictionary printer and builder are changed (#234)
    • DictioaryPrinter now prints word reference as (surface, pos, reading)-triple format.
    • DictionaryBuilder now allow dictionary-form to be triple format.

Fixed

  • Tutorial is updated (#237)
  • The byte order of a ByteBuffer returned by Config.Resource.asByteBuffer is now always little endian (#239)
    • Also, the byte order of StringUtil.readAllBytes is now little endian.

Added

  • Update tutorial.md (#226)
  • Lazy sentence split and tokenization (#231)
    • Add Tokenizer.lazyTokenizeSentences(SplitMode mode, Readable input), that performs analysis lazily and saves memory usage.

Fixed

  • Do not segfault on tokenizing with closed dictionary (#217)
  • The default config sudachi.json sets non-existent property joinKanjiNumeric in JoinNumericPlugin (#221)
  • fix incorrect size calculation when expand (#227)

Deprecated

  • Tokenizer.tokenizeSentences(SplitMode mode, Reader input) are marked as deprecated (#231)