Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search: not the same results showed #111

Open
bunekcca opened this issue Sep 18, 2018 · 5 comments
Open

Search: not the same results showed #111

bunekcca opened this issue Sep 18, 2018 · 5 comments
Labels
Milestone

Comments

@bunekcca
Copy link
Collaborator

bunekcca commented Sep 18, 2018

For the file name "icosahedron"

Searching for file path names: “icosahedron”

In the folder: http://167.99.186.133/folder/35/?page=2
“icosahedron”: http://167.99.186.133/search/?query=icosahedron
icosahedron”: http://167.99.186.133/search/?query=*icosahedron
“icosahedron
”: http://167.99.186.133/search/?query=icosahedron*
icosahedron”: http://167.99.186.133/search/?query=*icosahedron*

I don't have the same results.

@jraddaoui
Copy link
Collaborator

jraddaoui commented Sep 19, 2018

Hi @bunekcca,

The Elasticsearch query is made using a simple string query to allow special searches while preventing user errors. As noted in the syntax section of that page, the * special char is only used for prefix queries. In other parts of their documentation they recommend not to use it for suffix queries to avoid really slow searches (wildcard query docs), so it's probably being ignored when it's used in *icosahedron and probably the reason of not getting results in *icosahedron*.

The only term that brings different results in that folder and search pages is icosahedron*, which matches a different digital file more (from another folder) in the search page.

@bunekcca
Copy link
Collaborator Author

hi @jraddaoui thanks for your answer.
@stefanabreitwieser do you think we can add the syntax for the query in the FAQ page ?

The simple_query_string supports the following special characters:

  • signifies AND operation
    | signifies OR operation
  • negates a single token
    " wraps a number of tokens to signify a phrase for searching
  • at the end of a term signifies a prefix query
    ( and ) signify precedence
    ~N after a word signifies edit distance (fuzziness)
    ~N after a phrase signifies slop amount

@stefanabreitwieser
Copy link
Collaborator

@bunekcca -- I do think it's a good idea to add it to the FAQ page. That said, I think the original concern in raising this issue is that if you look at http://167.99.186.133/folder/35/?page=2 none of the icosahedron results there appear in any of the searches. After tinkering with the simple string query options, searching "icosahedron~2" finally brings up those results.

I did a quick poll of some of my colleagues (all experienced catalog researchers) to see if they were able to come up with that answer quickly or were familiar with edit distances -- they were not.

My question for you @jraddaoui is whether defaulting a certain edit distance in every search is both possible and desirable? I think defaulting a certain amount of fuzziness would improve search results without making a big ask of the researchers -- but of course, I have no idea how this implements on your end. Let me know what you think -- thanks!

@jraddaoui
Copy link
Collaborator

jraddaoui commented Sep 19, 2018

@stefanabreitwieser,

I'd not suggest to add default fuzziness to the queries for the following reasons:

  • It will increase the work load of the searches. It could be configured to avoid huge performance issues, but it may be noticeable as the data grows.
  • It will avoid getting exact matches, which may require adding a relevance sort option to make them appear on top.
  • It may require to use a different query type and loose some of the cool features from the simple string query.

Of course, we can give it a try in the second phase, so feel free to leave this issue open.

A few more links from the Elasticsearch documentation if you want to dig in:

https://www.elastic.co/guide/en/elasticsearch/guide/master/fuzzy-query.html
https://www.elastic.co/guide/en/elasticsearch/reference/master/query-dsl-fuzzy-query.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/common-options.html#fuzziness

@stefanabreitwieser
Copy link
Collaborator

Sounds good! I would like to continue looking into this, but I agree that we should leave it until Phase 2. Thanks!

@sallain sallain added ready and removed backlog labels Mar 27, 2019
@sallain sallain added this to the phase2 milestone Mar 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants