- Some benchmark scripts are added under
benchmark/
(#235)
- Behavior of the dictionary printer and builder are changed (#234)
- DictioaryPrinter now prints word reference as (surface, pos, reading)-triple format.
- DictionaryBuilder now allow dictionary-form to be triple format.
- Tutorial is updated (#237)
- The byte order of a ByteBuffer returned by
Config.Resource.asByteBuffer
is now always little endian (#239)- Also, the byte order of
StringUtil.readAllBytes
is now little endian.
- Also, the byte order of
- Update tutorial.md (#226)
- Lazy sentence split and tokenization (#231)
- Add
Tokenizer.lazyTokenizeSentences(SplitMode mode, Readable input)
, that performs analysis lazily and saves memory usage.
- Add
- Do not segfault on tokenizing with closed dictionary (#217)
- The default config sudachi.json sets non-existent property joinKanjiNumeric in JoinNumericPlugin (#221)
- fix incorrect size calculation when expand (#227)
Tokenizer.tokenizeSentences(SplitMode mode, Reader input)
are marked as deprecated (#231)