Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing a multithreading bug in WordpieceTokenizer #382

Merged
merged 1 commit into from
Oct 25, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2015-2021, Oracle and/or its affiliates. All rights reserved.
* Copyright (c) 2015, 2024, Oracle and/or its affiliates. All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -48,7 +48,7 @@
* and Chinese characters. The resulting tokens are then applied to the
* wordpiece algorithm implemented in {@link Wordpiece} which is driven by an
* input vocabulary which matches tokens and token suffixes as it can. Any
* tokens that are not found in the input vocbulary are marked as "unknown".
* tokens that are not found in the input vocabulary are marked as "unknown".
*/
public class WordpieceTokenizer implements Tokenizer {

Expand Down Expand Up @@ -133,7 +133,7 @@ public boolean advance() {
currentToken = this.whitespaceTokenizer.getToken();
getWordpieceTokens();
currentWordpieceIndex = 0;
if (currentWordpieceTokens.size() == 0) {
if (currentWordpieceTokens.isEmpty()) {
return advance();
}
return true;
Expand Down Expand Up @@ -181,7 +181,7 @@ private void getWordpieceTokens() {

List<String> wordpieces = wordpiece.wordpiece(text);

if (wordpieces.size() == 0) {
if (wordpieces.isEmpty()) {
return;
} else if (wordpieces.size() == 1) {
String wp = wordpieces.get(0);
Expand Down Expand Up @@ -245,7 +245,7 @@ public WordpieceTokenizer clone() {
copy.basicTokenizer = basicTokenizer.clone();
copy.reset = false;
copy.currentToken = null;
copy.currentWordpieceTokens.clear();
copy.currentWordpieceTokens = new ArrayList<>();
copy.currentWordpieceIndex = -1;
return copy;
} catch (CloneNotSupportedException e) {
Expand Down
Loading