-
Notifications
You must be signed in to change notification settings - Fork 450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make it possible to use Jelinek-Mercer QL scoring model #465
Conversation
@Option(name = "-qlmj", usage = "use Jelinek-Mercer query likelihood scoring model") | ||
public boolean qlmj = false; | ||
|
||
@Option(name = "-lambda", metaVar = "[value]", usage = "Jelinek Mercer smoothing parameter") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we rename it to qlmj.lambda
?
I know for BM25 and QL the parameters were b
and mu
and we should have changed them too.
But those two were there from the very beginning of Anserini and breaking them will cause problem of regression tests.
If you look at other ranking functions like pl2
you will see its parameter as -pl2.c
.
Also, since qlmj
is a base ranking model could you please move these two options right under ql
?
Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, will do.
actually it should be qljm (Query Likelihood Jelinek Mercer) instead of qlmj.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and technically -ql
should be -qld
for (Query Likelihood Dirichlet). Such a change will likely break lots of regression testing scripts... :(
But perhaps worth filing an issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe -qld
can be added, which would work exactly like -ql
for backward compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good idea.
Hi @tteofili thanks for your contributions! We'd love to know what you're using Anserini for? |
research / evaluations around usage of embeddings in IR. |
@tteofili We've long discussed integrating word embeddings into Anserini directly... i.e., use a Lucene index as a simple key-value store for lookup up embedding vectors. Is this a feature you'd need? If so, interested in helping us build it out? |
@lintool sure, I would be very much interested. |
@tteofili were you planning on updating the PR per review comments, or should I close for now? |
yes, sure, I'll adjust the PR as per above comments. |
hi @tteofili I'm going to close this PR for now... this conflicts with a recent change made by @Peilin-Yang that allows much more flexible parameter sweeping. If you still want to contribute code, I think it'll be easier to start from scratch off the current master. |
@lintool sure, thanks, it makes sense. |
No description provided.