Skip to content

Commit

Permalink
TIKA-4278: colon delimiter detection is unreliable, use the next one …
Browse files Browse the repository at this point in the history
…if same confidence
  • Loading branch information
THausherr committed Jul 16, 2024
1 parent ca0ceab commit 6bc7389
Showing 1 changed file with 5 additions and 0 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,11 @@ CSVResult getBest(Reader reader, Metadata metadata) throws IOException {
if (bestResult.getConfidence() < minConfidence) {
return CSVResult.TEXT;
}
// TIKA-4278: colon isn't reliable, e.g. govdocs1/242/242970.txt
if (results.size() > 1 && bestResult.getDelimiter().equals(':') &&
results.get(1).getConfidence() == bestResult.getConfidence()) {
return results.get(1);
}
return bestResult;
}

Expand Down

0 comments on commit 6bc7389

Please sign in to comment.