Accessing QueryParser #201

safwansamsudeen · 2024-01-30T07:05:00Z

For fuzzy search or boosting fields, I need to access QueryParser.

Is this possible with Tantivy Py? This article seems to thinks so, but it doesn't work (ImportError, I also checked and see that QueryParse isn't available in the top level anyway).

If not, how can I do fuzzy searching?

The text was updated successfully, but these errors were encountered:

cjrh · 2024-01-30T11:10:39Z

I agree, the article is odd. The code can't work with our current tantivy release:

from tantivy import Collector, Index, QueryParser, SchemaBuilder, Term

# Create a schema
schema_builder = SchemaBuilder()
title_field = schema_builder.add_text_field("title", stored=True)
body_field = schema_builder.add_text_field("body", stored=True)
schema = schema_builder.build()

# Create an index with the schema
index = Index(schema)

# Add documents to the index
with index.writer() as writer:
    writer.add_document({"title": "First document", "body": "This is the first document."})
    writer.add_document({"title": "Second document", "body": "This is the second document."})
    writer.commit()

# Create a query parser
query_parser = QueryParser(schema, ["title", "body"])

# Basic search
query = query_parser.parse_query("first")
collector = Collector.top_docs(10)
search_result = index.searcher().search(query, collector)

print("Basic search results:")
for doc in search_result.docs():
    print(doc)

# Fuzzy search
fuzzy_query = query_parser.parse_query("frst~1")  # Allows one edit distance
fuzzy_collector = Collector.top_docs(10)
fuzzy_search_result = index.searcher().search(fuzzy_query, fuzzy_collector)

print("Fuzzy search results:")
for doc in fuzzy_search_result.docs():
    print(doc)

# Filtered search
title_term = Term(title_field, "first")
body_term = Term(body_field, "first")
filter_query = schema.new_boolean_query().add_term(title_term).add_term(body_term)
filtered_collector = Collector.top_docs(10)
filtered_search_result = index.searcher().search(filter_query, filtered_collector)

print("Filtered search results:")
for doc in filtered_search_result.docs():
    print(doc)

Collector and QueryParser aren't exposed yet.

cjrh · 2024-01-30T11:11:12Z

Boosting is already requested in #50 (and it mentions fuzzy search also)

wallies · 2024-01-30T11:23:05Z

Stange thing is that article uses tantivy-py, where we maintain tantivy. Tantivy-py stopped at 0.11. No idea how that ever worked looking back at that tag

safwansamsudeen · 2024-01-31T05:51:25Z

So what are my options? I can't boost fields right now?

EDIT: from what I'm reading, it seems like just "plugging in". That sounds easy - is it? If you could provide some link on how to do that, I'll submit a PR. But IDK Rust, so I'd really appreciate it if you could do that in the next couple weeks.

cjrh · 2024-02-01T09:33:59Z

@safwansamsudeen I understand your frustration. Typically in most open-source projects, including this one, these are your options:

If you need this feature because your employer needs it, i.e., there is a company behind your request, your fastest option is to convince your employer to pay someone to add the feature. It'll cost a couple hundred dollars (very rough estimate) but it can get done quickly. It shouldn't be hard to find someone with Rust experience who will do the work. You don't have to work through us to find someone, you yourself can look around to try to hire a short term contract to do the work.
You could try to just ask someone to do it, for example on Reddit or the Rust channel in LinkedIn. It's basically the same as option 1, but with no money for the contributor.
You could try to do it yourself. This is easier when you know a bit more of the stack, and more difficult when you don't. This option is a little tricky because if you need a lot of help getting it done, the help needs to come from somewhere which will also consume someone else's time, just like in options 1 and 2. I guess it would depend on how much time is involved for the helping party.
Wait for someone else to implement it. On a very active project, this can happen quickly. Unfortunately tantivy-py is not super active so new features wait until one of the maintainers or someone else has some free time. For example, I am doing my contributions over weekends, and I don't have many weekends free.

Fortunately, this feature is not very complex. It just needs someone to actually do the work :)

cjrh · 2024-02-01T09:38:47Z

If you want to take a quick stab at trying an implementation yourself, time-box it to a couple hours, then I can have a look at your code. Maybe that is enough. You can look at the other classes and how they are currently wrapped in tantivy-py, and then just try to copy that for QueryParser and the boosting.

This is the sequence:

Add the rust code to wrap QueryParser
Add a python tests file with a test to create an instance of a QueryParser (from Python), and try to call methods on it
Run the python tests with something like $ nox -s test-3.11 to run using Python 3.11, if that's the version of your virtualenv.

And then you basically keep repeating that cycle, fixing bugs, adding more features, and testing them in the python test.

safwansamsudeen · 2024-02-01T11:03:02Z

Hi @cjrh,

Thank you for your detailed and kind reply. I think I might have sounded a little angry - not at all, thank you for your generous work. We're all the in the same boat, I realize that it's hard to work on OSS ;).

Yeah, I think I'll give it a stab.

BTW, how do I remove a document with Tantivy Py? Is there a way to directly remove a document? It seems that writer.delete_documents kinda performs a search and just deletes it. If so, that's alright - could you explain how to use it? The help message isn't enough.

cjrh · 2024-02-01T14:37:58Z

@safwansamsudeen Fortune smiles upon you, @adamreichold jumped in to add boosts for you in #202. Would you be able to test out the PR to check if it works for what you need? You will need to check out the PR branch and build a wheel. Then you can use that Python wheel file and install into your own virtualenv and try out the new features.

For example:

(venv) ~/Documents/repos/tantivy-py  ±field-boost-fuzzy|✔︎ [venv://h/c/D/r/t/v:3.10.6] 
$ maturin build --release
📦 Including license file "/home/caleb/Documents/repos/tantivy-py/LICENSE"
🍹 Building a mixed python/rust project
🔗 Found pyo3 bindings
🐍 Found CPython 3.10 at /home/caleb/Documents/repos/tantivy-py/venv/bin/python3
📡 Using build options bindings from pyproject.toml
   Compiling tantivy v0.21.0 (/home/caleb/Documents/repos/tantivy-py)
    Finished release [optimized] target(s) in 40.04s
📦 Built wheel for CPython 3.10 to /home/caleb/Documents/repos/tantivy-py/target/wheels/tantivy-0.21.0-cp310-cp310-manylinux_2_34_x86_64.whl

Produces this wheel (Python 3.10):
tantivy-0.21.0-cp310-cp310-manylinux_2_34_x86_64.zip

safwansamsudeen · 2024-02-01T15:10:26Z

WOOT! That is brilliant! Thank you so much, @adamreichold and @cjrh.

Plus, I should probably learn Rust, interesting language.

I'll test it tomorrow and let you know.

…oss#201) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

cjrh added the help wanted Extra attention is needed label Jan 30, 2024

cjrh added the feature-parity Feature parity with upstream tantivy label Jan 30, 2024

adamreichold mentioned this issue Feb 1, 2024

Add field_boosts and fuzzy_fields optional parameters to Index::parse_query #202

Merged

cjrh closed this as completed in #202 Feb 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accessing QueryParser #201

Accessing QueryParser #201

safwansamsudeen commented Jan 30, 2024

cjrh commented Jan 30, 2024

cjrh commented Jan 30, 2024 •

edited

Loading

wallies commented Jan 30, 2024

safwansamsudeen commented Jan 31, 2024 •

edited

Loading

cjrh commented Feb 1, 2024

cjrh commented Feb 1, 2024

safwansamsudeen commented Feb 1, 2024

cjrh commented Feb 1, 2024

safwansamsudeen commented Feb 1, 2024

Accessing QueryParser #201

Accessing QueryParser #201

Comments

safwansamsudeen commented Jan 30, 2024

cjrh commented Jan 30, 2024

cjrh commented Jan 30, 2024 • edited Loading

wallies commented Jan 30, 2024

safwansamsudeen commented Jan 31, 2024 • edited Loading

cjrh commented Feb 1, 2024

cjrh commented Feb 1, 2024

safwansamsudeen commented Feb 1, 2024

cjrh commented Feb 1, 2024

safwansamsudeen commented Feb 1, 2024

cjrh commented Jan 30, 2024 •

edited

Loading

safwansamsudeen commented Jan 31, 2024 •

edited

Loading