Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add slop to quoted bare words #1401

Closed
wants to merge 1 commit into from

Conversation

saroh
Copy link
Contributor

@saroh saroh commented Jun 28, 2022

Following #1393
The fact that phrase queries are tokenized does not make this ideal TBH. It'd probably be better to have delimiters in the query language for a non tokenized string and have a slop param for it.

@codecov-commenter
Copy link

Codecov Report

Merging #1401 (bcea7ce) into main (db18366) will increase coverage by 0.00%.
The diff coverage is 93.33%.

@@           Coverage Diff           @@
##             main    #1401   +/-   ##
=======================================
  Coverage   94.32%   94.32%           
=======================================
  Files         236      236           
  Lines       43640    43668   +28     
=======================================
+ Hits        41165    41192   +27     
- Misses       2475     2476    +1     
Impacted Files Coverage Δ
src/query/query_parser/logical_ast.rs 87.50% <80.00%> (-0.88%) ⬇️
src/query/query_parser/query_parser.rs 94.99% <96.00%> (+0.01%) ⬆️
src/postings/stacker/expull.rs 98.57% <0.00%> (-0.48%) ⬇️
src/query/boolean_query/block_wand.rs 97.06% <0.00%> (+0.20%) ⬆️
src/fieldnorm/writer.rs 98.48% <0.00%> (+1.51%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update db18366...bcea7ce. Read the comment docs.

@fulmicoton fulmicoton requested review from PSeitz and removed request for PSeitz July 4, 2022 03:13
@fulmicoton
Copy link
Collaborator

It is too dangerous to add if it is not soemthing we can disable via conf.

It is very easy to craft a fuzzy query that is very expensive, so any service with tantivy in prod would be vulnerable.

@saroh
Copy link
Contributor Author

saroh commented Jul 4, 2022

@fulmicoton yeah I'm closing this, imo it requires a little bit of design, we can't both tokenize and not tokenize within "".

@saroh saroh closed this Jul 4, 2022
@saroh saroh deleted the add-slop-on-quoted-words branch July 4, 2022 16:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants