[tokenizers] Fixes memory leak when there is overflowing tokens #3317

baldersheim · 2024-07-10T11:40:17Z

If you call TokenizersLibrary.LIB.getOverflowing you must also clean up all overflow encodings.

If withOverflowingTokens was false no Encodings where generated leaving jni Encoding handles that would not be properly deleted.

This introduces a new native method where you can inquire about number of overflow tokens without using any jni resources. And you will now only call TokenizersLibrary.LIB.getOverflowing(encoding) if withOverflowingTokens is true.

Description

Brief description of what this PR is about

If this change is a backward incompatible change, why must this change be made?
Interesting edge cases to note here

…up all overflow encodings. If withOverflowingTokens was false no Encodings where generated leaving jni Encoding handles that would not be properly deleted. This introduces a new native method where you can inquire about number of overflow tokens without using any jni resources. And you will now only call TokenizersLibrary.LIB.getOverflowing(encoding) if withOverflowingTokens is true.

baldersheim · 2024-07-10T11:45:11Z

@frankfliu This PR suggests a solution to issue #3316

baldersheim requested review from zachgk, frankfliu and a team as code owners July 10, 2024 11:40

Add the actual encoding handle as argument.

7fbf625

frankfliu changed the title ~~If you call TokenizersLibrary.LIB.getOverflowing you must also clean …~~ [tokenizers] Fixes memory leak when there is overflowing tokens Jul 10, 2024

frankfliu approved these changes Jul 10, 2024

View reviewed changes

frankfliu merged commit f5c9a82 into deepjavalibrary:master Jul 10, 2024
5 checks passed

baldersheim deleted the baldersheim/prevent-leaking-memory-when-withOverflowingTokens-is-false branch July 10, 2024 18:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[tokenizers] Fixes memory leak when there is overflowing tokens #3317

[tokenizers] Fixes memory leak when there is overflowing tokens #3317

baldersheim commented Jul 10, 2024 •

edited by frankfliu

Loading

baldersheim commented Jul 10, 2024

[tokenizers] Fixes memory leak when there is overflowing tokens #3317

[tokenizers] Fixes memory leak when there is overflowing tokens #3317

Conversation

baldersheim commented Jul 10, 2024 • edited by frankfliu Loading

Description

baldersheim commented Jul 10, 2024

baldersheim commented Jul 10, 2024 •

edited by frankfliu

Loading