-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Useless files created when running BertQA.predict() #119
Comments
Just realized we need to keep |
But not in the structure the current The evaluation script for SQUAD compares only one answer per question. By the way, the json we will create for evaluation with the annotator will only have one answer per question. We cannot compare (evaluate) the predictions on |
By the way, we should be sure that our sklearn version of the model can also receive a list of question an generate a list of answer, as well a json file with questions and answers. We will need it for proper evaluation. Our current version sklearn wrapper does not work with a list of questions. |
I thought you already checked the sklearn version? See #70 |
I am talking about It still only works to one question, and it will send a list of pairs question-paragraph for this question with different paragraphs to We still have to make it able to apply the retriever for several questions, send these several questions with several paragraphs to the reader and obtain a |
I think I got it. Is it an easy change? Like a for loop over |
yes, it is. But we still have to handle For us and our needs, it should be generated by |
Actually there are 2 kinds of evaluations: "Reader only" & "QAPipeline" (= Retriever + Reader). I think "reader only" evaluation is working already because you can do a "multiple predict". But we can probably not evaluate with "QAPipeline" since it can only do "single predict". Is that correct? |
Yes exactly, there 2 kinds of evaluations. I always thought that what interests us and our work is the evaluation of the whole Pipeline, which evaluates the effectiveness of the app. I was also aware that this evaluation is not comparable to the evaluation of the model on SQUAD. Now, as you mention the reader-only evaluation (comparable to evaluation on SQUAD) I think we can do both. |
So I propose to implement "multiple predict" for Then we'll report both evaluations in the paper? |
I agree. |
Would you like to add a boolean parameter with equals to |
Yes, I think it is a good solution for this issue |
When we run the
predict()
method ofBertQA
, two files are also created in the repository where we run the code:nbest_predictions.json
: An empty json filepredictions.json
: a json file with predictions for the paragraphs in the squad-like dictionary fed as input topredict()
I suggest to take out the creation of these files as they are useless. Or maybe create a boolean parameter to keep the option to save
predictions.json
, but keep it asFalse
by default.The text was updated successfully, but these errors were encountered: