-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Accessing QueryParser #201
Comments
I agree, the article is odd. The code can't work with our current tantivy release: from tantivy import Collector, Index, QueryParser, SchemaBuilder, Term
# Create a schema
schema_builder = SchemaBuilder()
title_field = schema_builder.add_text_field("title", stored=True)
body_field = schema_builder.add_text_field("body", stored=True)
schema = schema_builder.build()
# Create an index with the schema
index = Index(schema)
# Add documents to the index
with index.writer() as writer:
writer.add_document({"title": "First document", "body": "This is the first document."})
writer.add_document({"title": "Second document", "body": "This is the second document."})
writer.commit()
# Create a query parser
query_parser = QueryParser(schema, ["title", "body"])
# Basic search
query = query_parser.parse_query("first")
collector = Collector.top_docs(10)
search_result = index.searcher().search(query, collector)
print("Basic search results:")
for doc in search_result.docs():
print(doc)
# Fuzzy search
fuzzy_query = query_parser.parse_query("frst~1") # Allows one edit distance
fuzzy_collector = Collector.top_docs(10)
fuzzy_search_result = index.searcher().search(fuzzy_query, fuzzy_collector)
print("Fuzzy search results:")
for doc in fuzzy_search_result.docs():
print(doc)
# Filtered search
title_term = Term(title_field, "first")
body_term = Term(body_field, "first")
filter_query = schema.new_boolean_query().add_term(title_term).add_term(body_term)
filtered_collector = Collector.top_docs(10)
filtered_search_result = index.searcher().search(filter_query, filtered_collector)
print("Filtered search results:")
for doc in filtered_search_result.docs():
print(doc)
|
Boosting is already requested in #50 (and it mentions fuzzy search also) |
Stange thing is that article uses tantivy-py, where we maintain tantivy. Tantivy-py stopped at 0.11. No idea how that ever worked looking back at that tag |
So what are my options? I can't boost fields right now? EDIT: from what I'm reading, it seems like just "plugging in". That sounds easy - is it? If you could provide some link on how to do that, I'll submit a PR. But IDK Rust, so I'd really appreciate it if you could do that in the next couple weeks. |
@safwansamsudeen I understand your frustration. Typically in most open-source projects, including this one, these are your options:
Fortunately, this feature is not very complex. It just needs someone to actually do the work :) |
If you want to take a quick stab at trying an implementation yourself, time-box it to a couple hours, then I can have a look at your code. Maybe that is enough. You can look at the other classes and how they are currently wrapped in tantivy-py, and then just try to copy that for QueryParser and the boosting. This is the sequence:
And then you basically keep repeating that cycle, fixing bugs, adding more features, and testing them in the python test. |
Hi @cjrh, Thank you for your detailed and kind reply. I think I might have sounded a little angry - not at all, thank you for your generous work. We're all the in the same boat, I realize that it's hard to work on OSS ;). Yeah, I think I'll give it a stab. BTW, how do I remove a document with Tantivy Py? Is there a way to directly remove a document? It seems that |
@safwansamsudeen Fortune smiles upon you, @adamreichold jumped in to add boosts for you in #202. Would you be able to test out the PR to check if it works for what you need? You will need to check out the PR branch and build a wheel. Then you can use that Python wheel file and install into your own virtualenv and try out the new features. For example: (venv) ~/Documents/repos/tantivy-py ±field-boost-fuzzy|✔︎ [venv://h/c/D/r/t/v:3.10.6]
$ maturin build --release
📦 Including license file "/home/caleb/Documents/repos/tantivy-py/LICENSE"
🍹 Building a mixed python/rust project
🔗 Found pyo3 bindings
🐍 Found CPython 3.10 at /home/caleb/Documents/repos/tantivy-py/venv/bin/python3
📡 Using build options bindings from pyproject.toml
Compiling tantivy v0.21.0 (/home/caleb/Documents/repos/tantivy-py)
Finished release [optimized] target(s) in 40.04s
📦 Built wheel for CPython 3.10 to /home/caleb/Documents/repos/tantivy-py/target/wheels/tantivy-0.21.0-cp310-cp310-manylinux_2_34_x86_64.whl Produces this wheel (Python 3.10): |
WOOT! That is brilliant! Thank you so much, @adamreichold and @cjrh. Plus, I should probably learn Rust, interesting language. I'll test it tomorrow and let you know. |
…oss#201) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
For fuzzy search or boosting fields, I need to access
QueryParser
.Is this possible with Tantivy Py? This article seems to thinks so, but it doesn't work (ImportError, I also checked and see that QueryParse isn't available in the top level anyway).
If not, how can I do fuzzy searching?
The text was updated successfully, but these errors were encountered: