Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make it possible to use Jelinek-Mercer QL scoring model #465

Closed
wants to merge 4 commits into from
Closed

Make it possible to use Jelinek-Mercer QL scoring model #465

wants to merge 4 commits into from

Conversation

tteofili
Copy link
Collaborator

@tteofili tteofili commented Nov 2, 2018

No description provided.

@Option(name = "-qlmj", usage = "use Jelinek-Mercer query likelihood scoring model")
public boolean qlmj = false;

@Option(name = "-lambda", metaVar = "[value]", usage = "Jelinek Mercer smoothing parameter")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rename it to qlmj.lambda?
I know for BM25 and QL the parameters were b and mu and we should have changed them too.
But those two were there from the very beginning of Anserini and breaking them will cause problem of regression tests.
If you look at other ranking functions like pl2 you will see its parameter as -pl2.c.

Also, since qlmj is a base ranking model could you please move these two options right under ql?

Thanks

Copy link
Collaborator Author

@tteofili tteofili Nov 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, will do.
actually it should be qljm (Query Likelihood Jelinek Mercer) instead of qlmj.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and technically -ql should be -qld for (Query Likelihood Dirichlet). Such a change will likely break lots of regression testing scripts... :(

But perhaps worth filing an issue?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe -qld can be added, which would work exactly like -ql for backward compatibility.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good idea.

@lintool
Copy link
Member

lintool commented Nov 2, 2018

Hi @tteofili thanks for your contributions! We'd love to know what you're using Anserini for?

@tteofili
Copy link
Collaborator Author

tteofili commented Nov 2, 2018

Hi @tteofili thanks for your contributions! We'd love to know what you're using Anserini for?

research / evaluations around usage of embeddings in IR.
so QL JM model is good to have as additional baseline.

@lintool
Copy link
Member

lintool commented Nov 2, 2018

@tteofili We've long discussed integrating word embeddings into Anserini directly... i.e., use a Lucene index as a simple key-value store for lookup up embedding vectors. Is this a feature you'd need? If so, interested in helping us build it out?

@tteofili
Copy link
Collaborator Author

tteofili commented Nov 2, 2018

@lintool sure, I would be very much interested.
I have a prototype Reranker I'm currently using, that can fetch existing stored embeddings or calculate them on the fly.
The not so nice part is having to choose one linear algebra / DL library unless we work with hundred dimensional extensions of Lucene PointValues.

@lintool
Copy link
Member

lintool commented Nov 2, 2018

@tteofili opening this thread for discussion #467

@lintool
Copy link
Member

lintool commented Nov 21, 2018

@tteofili were you planning on updating the PR per review comments, or should I close for now?

@tteofili
Copy link
Collaborator Author

yes, sure, I'll adjust the PR as per above comments.

@lintool
Copy link
Member

lintool commented Nov 28, 2018

hi @tteofili I'm going to close this PR for now... this conflicts with a recent change made by @Peilin-Yang that allows much more flexible parameter sweeping. If you still want to contribute code, I think it'll be easier to start from scratch off the current master.

@lintool lintool closed this Nov 28, 2018
@tteofili
Copy link
Collaborator Author

@lintool sure, thanks, it makes sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants