Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collapse overlapped highlighted ranges #1473

Merged
merged 7 commits into from
Aug 26, 2022

Conversation

akr4
Copy link
Contributor

@akr4 akr4 commented Aug 24, 2022

Hi. This fixes a problem that occurs when calling Snippet.to_html() and the snippet.highlighted are overlapped ranges like this:

[0..2, 1..3]

This type of ranges is generated by NgramTokenizer, for example.

Error

In that case, there will be an error on calling Snippet.to_html(), which is caused by slicing String with invalid indices.

begin <= end (2 <= 1) when slicing `abc`
thread 'snippet::tests::test_snippet_with_overlapped_highlighted_ranges' panicked at 'begin <= end (2 <= 1) when slicing `abc`', library/core/src/str/mod.rs:111:5
stack backtrace:
...
   9: tantivy::snippet::Snippet::to_html
             at ./src/snippet/mod.rs:84:44

Fix

This PR collapses overlapped ranges into non-overlapped ones to fix the problem like this:

[0..2, 1..3] -> [0..3]

}

let mut result = Vec::new();
let mut current = ranges[0].clone();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure but maybe it would be nicer to iterate only once, e.g.

let mut result = Vec::new();

let mut ranges = ranges.iter();

let mut current = match ranges.next() {
  Some(range) => range.clone(),
  None => return result,
};

for range in ranges {
  ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I changed so. Thanks.

@@ -186,6 +186,29 @@ fn select_best_fragment_combination(fragments: &[FragmentCandidate], text: &str)
}
}

/// Returns ranges that are collapsed into non-overlapped ranges.
Copy link
Collaborator

@fulmicoton fulmicoton Aug 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add your example in the comment.

@@ -186,6 +186,29 @@ fn select_best_fragment_combination(fragments: &[FragmentCandidate], text: &str)
}
}

/// Returns ranges that are collapsed into non-overlapped ranges.
fn collapse_overlapped_ranges(ranges: &Vec<Range<usize>>) -> Vec<Range<usize>> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fn collapse_overlapped_ranges(ranges: &Vec<Range<usize>>) -> Vec<Range<usize>> {
fn collapse_overlapped_ranges(ranges: &[Range<usize>]) -> Vec<Range<usize>> {

&Vec is almost never a good idea.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another solution would be for you to pass an iterator directly.

impl Iterator<Item=Range<usize>>


for range in ranges {
if current.end > range.start {
current = current.start..range.end;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick... I feel the following is easier but that's your call.

Suggested change
current = current.start..range.end;
current.end = range.end;

/// Returns ranges that are collapsed into non-overlapped ranges.
fn collapse_overlapped_ranges(ranges: &Vec<Range<usize>>) -> Vec<Range<usize>> {
let mut result = Vec::new();
let mut ranges = ranges.iter();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick... I'm not fond of shadowing.
It is sometime useful to prevent the usage of a variable.

Here I would either call it ranges_it, or pass an Iterator as argument.

@fulmicoton
Copy link
Collaborator

@akr4 Awesome job spotting the job and fixing it! I had a bunch of rather minor comments.

@codecov-commenter
Copy link

codecov-commenter commented Aug 24, 2022

Codecov Report

Merging #1473 (b6d5aa6) into main (82209c5) will increase coverage by 0.01%.
The diff coverage is 98.18%.

@@            Coverage Diff             @@
##             main    #1473      +/-   ##
==========================================
+ Coverage   94.07%   94.09%   +0.01%     
==========================================
  Files         239      239              
  Lines       44627    44833     +206     
==========================================
+ Hits        41983    42185     +202     
- Misses       2644     2648       +4     
Impacted Files Coverage Δ
src/snippet/mod.rs 91.08% <98.18%> (+1.05%) ⬆️
src/fastfield/serializer/mod.rs 88.94% <0.00%> (-1.54%) ⬇️
bitpacker/src/bitpacker.rs 96.03% <0.00%> (-1.00%) ⬇️
src/reader/mod.rs 92.94% <0.00%> (-0.05%) ⬇️
src/fastfield/mod.rs 94.76% <0.00%> (-0.01%) ⬇️
src/fastfield/gcd.rs 100.00% <0.00%> (ø)
fastfield_codecs/src/bitpacked.rs 100.00% <0.00%> (ø)
fastfield_codecs/src/multilinearinterpol.rs
fastfield_codecs/src/linearinterpol.rs
fastfield_codecs/src/blockwise_linear.rs 100.00% <0.00%> (ø)
... and 7 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

NgramTokenizer with different min and max generates such ranges.
@akr4
Copy link
Contributor Author

akr4 commented Aug 24, 2022

@fulmicoton Thank you very much for your review. I will make changes tomorrow.

@akr4
Copy link
Contributor Author

akr4 commented Aug 25, 2022

Committed some changes. The main ones are changing the argument from &Vec to &[ ] and adding a comment. What do you think of this?

Copy link
Collaborator

@fulmicoton fulmicoton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved! Thank you @akr4

@fulmicoton
Copy link
Collaborator

@akr4 Can you run make fmt? (It requires rust nightly)

@akr4
Copy link
Contributor Author

akr4 commented Aug 25, 2022

@fulmicoton Done.

@fulmicoton fulmicoton merged commit 17093e8 into quickwit-oss:main Aug 26, 2022
This was referenced Jan 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants