Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Useless files created when running BertQA.predict() #119

Closed
andrelmfarias opened this issue May 2, 2019 · 13 comments
Closed

Useless files created when running BertQA.predict() #119

andrelmfarias opened this issue May 2, 2019 · 13 comments
Assignees

Comments

@andrelmfarias
Copy link
Collaborator

When we run the predict() method of BertQA, two files are also created in the repository where we run the code:

  1. nbest_predictions.json: An empty json file
  2. predictions.json: a json file with predictions for the paragraphs in the squad-like dictionary fed as input to predict()

I suggest to take out the creation of these files as they are useless. Or maybe create a boolean parameter to keep the option to save predictions.json, but keep it as False by default.

@andrelmfarias andrelmfarias self-assigned this May 6, 2019
@fmikaelian
Copy link
Collaborator

Just realized we need to keep predictions.json for evaluation (see #104)

@andrelmfarias
Copy link
Collaborator Author

But not in the structure the current predictions.json is created. When we run predict() with BertQA (or QAPipeline) on the list of paragraphs sent by the retriever, it will generate a json file with several answers for the same question.

The evaluation script for SQUAD compares only one answer per question.

By the way, the json we will create for evaluation with the annotator will only have one answer per question. We cannot compare (evaluate) the predictions on predictions.json with respect to the json file of annotated BNP data set.

@andrelmfarias
Copy link
Collaborator Author

predictions.json should have only one answer per question sent to the model.

By the way, we should be sure that our sklearn version of the model can also receive a list of question an generate a list of answer, as well a json file with questions and answers. We will need it for proper evaluation. Our current version sklearn wrapper does not work with a list of questions.

@fmikaelian
Copy link
Collaborator

Our current version sklearn wrapper does not work with a list of questions.

I thought you already checked the sklearn version? See #70

@andrelmfarias
Copy link
Collaborator Author

I am talking about QAPipeline.predict()

It still only works to one question, and it will send a list of pairs question-paragraph for this question with different paragraphs to BertQA. BertQA takes this list and do a prediction for each of these pairs.

We still have to make it able to apply the retriever for several questions, send these several questions with several paragraphs to the reader and obtain a predictions.json with only one prediction per question.

@fmikaelian
Copy link
Collaborator

I am talking about QAPipeline.predict()

It still only works to one question, and it will send a list of pairs question-paragraph for this question with different paragraphs to BertQA. BertQA takes this list and do a prediction for each of these pairs.

We still have to make it able to apply the retriever for several questions, send these several questions with several paragraphs to the reader and obtain a predictions.json with only one prediction per question.

I think I got it. Is it an easy change? Like a for loop over X in the predict() ?

@andrelmfarias
Copy link
Collaborator Author

yes, it is.

But we still have to handle predict.json. It is generated by BertQA.predict().

For us and our needs, it should be generated by QAPipeline.predict()

@fmikaelian
Copy link
Collaborator

fmikaelian commented May 6, 2019

Actually there are 2 kinds of evaluations:

"Reader only" & "QAPipeline" (= Retriever + Reader). I think "reader only" evaluation is working already because you can do a "multiple predict". But we can probably not evaluate with "QAPipeline" since it can only do "single predict".

Is that correct?

@andrelmfarias
Copy link
Collaborator Author

Yes exactly, there 2 kinds of evaluations.

I always thought that what interests us and our work is the evaluation of the whole Pipeline, which evaluates the effectiveness of the app. I was also aware that this evaluation is not comparable to the evaluation of the model on SQUAD.

Now, as you mention the reader-only evaluation (comparable to evaluation on SQUAD) I think we can do both.

@fmikaelian
Copy link
Collaborator

So I propose to implement "multiple predict" for QAPipeline() so that we can do "QAPipeline evaluation" as well?

Then we'll report both evaluations in the paper?

@andrelmfarias
Copy link
Collaborator Author

I agree.

@fmikaelian
Copy link
Collaborator

fmikaelian commented May 15, 2019

@andrelmfarias

Would you like to add a boolean parameter with equals to False by default for exports of these files?

@andrelmfarias
Copy link
Collaborator Author

Yes, I think it is a good solution for this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants