New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Document similarity worflow #247

Open

saggu opened this issue Aug 3, 2018 · 2 comments

Assignees

Member

saggu commented Aug 3, 2018 •

edited

Loading

Pipeline should work as follows:

Process each incoming document: create sentence vectors indices

Store the indices so that it can be re created if the process dies

For each `query`: compute vector, find k nearest matches irrespective of any threshold and return the ranked result which is a list of document ids with similarity scores

Fetch the documents from ES and return to DIG UI

If the user chooses a facet, add filter to the list of documents for a query, re rank the results and return to DIG UI. So, if originally we had k documents, adding a facet will always return <= k documents. The facets act as a filter

The text was updated successfully, but these errors were encountered:

Member Author

saggu commented Aug 6, 2018

Updated Pipeline

saggu assigned saggu and Ljferrer

Member Author

saggu commented Aug 28, 2018

The followings tasks are done:

Vectorize each sentence using tensorflow
Index the vectors in FAISS index, and store the link in hbase
Able to query a string and return k similar docs back

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment