Add TriviaQA task #204

ssatia · 2017-07-12T18:37:59Z

Add TriviaQA task with the RC version of the dataset: http://nlp.cs.washington.edu/triviaqa/

alexholdenmiller

Thanks for the work so far, it looks good overall! Going to hold back on merging it for the moment, I'm not sure how I want to handle the evidence exactly.

I'm going to take look at it more closely, but in the meantime can you add a test function for this to the test_downloads file in the tests directory?

alexholdenmiller · 2017-07-12T21:48:14Z

parlai/tasks/triviaqa/agents.py

+                                                  evidence_filename)
+                with open(evidence_file_path) as evidence_file:
+                    evidence = evidence_file.read()
+                    yield (evidence + '\n' + question, answers), True


hm I'm going to talk to Jason about how we want to set up the evidence, I think selecting one random one (fixed at launch time) isn't how we'll want to do this

alexholdenmiller · 2017-07-12T21:49:16Z

parlai/tasks/triviaqa/agents.py

+
+        for dataset in datasets:
+            dataset_path = os.path.join(path,
+                                        dataset + '-' + self.suffix + '.json')


I may want to split these into different teachers (ie. a WikiTeacher and a WebTeacher)

Split into different teachers: WikipediaTeacher and WebTeacher. The DefaultTeacher now inherits from MultiTaskTeacher.

…o run simultaneously

ssatia · 2017-07-13T16:38:38Z

Split existing teacher into different teachers and changed DefaultTeacher to inherit from MultiTaskTeacher in order to allow training on both datasets simultaneously.
Added download test

alexholdenmiller · 2017-07-14T21:46:43Z

parlai/tasks/triviaqa/agents.py

+    def setup_data(self, path):
+        return setup_data_common('wikipedia', path, self.suffix,
+                                 self.evidence_dir)
+


can you also add a "verified teacher" for the verified data? it will just always use the dev datatype (say, print a warning but don't error if datatype != 'valid'). this won't affect the default teacher

Added a VerifiedWikipediaTeacher and VerifiedWebTeacher inheriting from WikipediaTeacher and WebTeacher, respectively. Also added a VerifiedTeacher inheriting from MultiTaskTeacher.

alexholdenmiller · 2017-07-14T21:48:05Z

parlai/tasks/triviaqa/agents.py

+        if len(evidence_list) == 0:
+            continue
+
+        evidence_num = random.randrange(len(evidence_list))


adjust this as we discussed on messenger, thanks! (separate ex for each web evidence, concat all wiki evidence)

ssatia · 2017-07-15T19:53:11Z

Changed evidence handling as discussed and added verified teachers

alexholdenmiller

just one more change and we're there, thank you!!

alexholdenmiller · 2017-07-17T16:01:45Z

parlai/tasks/triviaqa/agents.py

+        dataset = 'web'
+        dataset_path = os.path.join(path,
+                                    self.prefix + dataset + '-' + self.suffix +
+                                    '.json')


actually, can you move this to the init, so that opt['datafile'] points to this? then the user can inspect the exact datafile being loaded by looking at the opt file

Moved the initialization of the datafile option to the constructor.

alexholdenmiller · 2017-07-18T14:54:29Z

Awesome, thank you!!

* added MTurk update to NEWS.md * added more MTurk news * added more explanation for multi-assignment design * fix virtualenv path error * Fixed bug with vqa_v2 teacher's len (facebookresearch#164) * Change MTurk local db to be in-memory * remote agent fixes, switch model param to default None (facebookresearch#166) * remote fixes (facebookresearch#167) * Added Personalized Dialog dataset (facebookresearch#163) * Fixed bug on building fb data (facebookresearch#171) * ExecutableWorld, designed to also work with BatchWorld (facebookresearch#170) * small * exec world * small * blah * mm * index * index * index * small batch fixes * small batch fixes * Update NEWS.md * updates to the training loop / logging (facebookresearch#172) * add sigfig rounding and unit tests for utils (facebookresearch#173) * train fixes (facebookresearch#174) * fixes to train and dict (facebookresearch#177) * bug fixes in drqa (facebookresearch#176) * Add image feature extraction modules and fix minor bugs. (facebookresearch#169) * fix vqa_v1 and vqa_v2 testset image source. * add image_featurizer * add image_featurizers and examples. * update image_featurizers.py and dialog_teacher.py based on the discussion. * Fixes to ParsedRemoteAgent (facebookresearch#178) * Capitalize 'all' to 'All' to follow naming convention * Fixed missing img path and step size (facebookresearch#180) * Fixing bibtex citation (facebookresearch#186) Missing braces * Added from scratch section to task tutorial (facebookresearch#184) * vqa fixes (facebookresearch#183) * Lazy Torch requirement (facebookresearch#182) * fix a typo (facebookresearch#187) * Update README.md * Refactor MTurk html to make it more modular * Add shutdowns (facebookresearch#191) * Update run_tests_short.sh * add insuranceqa as a task (facebookresearch#193) * Save parameters of all agents through calling world.save() * Move print statement * Fix docstring * mturk improvement and cleanup * fixed path bug * Adding cmdline args to task agents as well * Make add_task_args a separate function * updating save functions (facebookresearch#195) * Added download resuming (facebookresearch#194) * Support multiple tasks * implemented send messages in bulk * added HIT auto-test script * better error message * fixed message duplication error and other issues * [MTurk] able to approve/reject work individually, block workers, and pay bonus; improved database status checking * [MTurk] fixed auto test bug * [MTurk] ignore abandoned HIT * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * world.save_agents fix (facebookresearch#198) * save agents fix * moar * Update eval_model.py * [MTurk] changed hit approval flow * [MTurk] added unique_request_token for send_bonus * fixed comment * Clean up dir only on version update (facebookresearch#197) * Update NEWS.md * Update README.md * group args when printing (facebookresearch#201) * Rehosted COCO-IMG to allow for resuming download (facebookresearch#202) * [MTurk] removed unnecessary wait * [WIP] trying to increase max concurrent polling operations * change poll test * [MTurk] moved to us-east-1 (N. Virginia) for higher Lambda max connections * turned off debug * [MTurk] fixed bugs * [MTurk] block_worker works * elapsed time * avg elapsed * [MTurk] clean up * [MTurk] added retry for ajax request * Modified task tutorial (facebookresearch#206) * Delete memnn_luatorch_cpu.rst * Update README.md * Update agents.py * fix test for init files to exclude mturk html dir (facebookresearch#203) * add task directory to mturk manager * fix wikimovies kb teacher (facebookresearch#207) * Default evaluation uses 'valid' datatype (facebookresearch#210) * Add MS MARCO dataset, and modify insuranceQA to accept version (V1 or V2) (facebookresearch#200) * Added retrying with exp backoff to downloads (facebookresearch#209) * added reason for reject_work(); added MTurkWorld as parent class * Update worlds.py * added opt to MTurkWorld.__init__ * better mturk cost checking * Fixed typo * Update agents.py * moved sync hit assignment info to a better place * Fixed dialog data shared cands bug (facebookresearch#212) * fix to acts indexing in batchworld (facebookresearch#215) * fix insuranceqa bug (facebookresearch#211) * Update NEWS.md * Update NEWS.md * Update NEWS.md * Update NEWS.md * minor comment and print statement fixes (facebookresearch#217) * update basic tutorial (facebookresearch#219) * Add TriviaQA task (facebookresearch#204) * Update NEWS.md * Tutorial for seq2seq agent, updated agent (facebookresearch#222) * Add start of sentence token to dict.py (facebookresearch#221) * minor comment and print statement fixes * add start of sentence token to dict.py * return original order of special tokens; change names of start and end tokens * quick eos fix * Update NEWS.md * Update NEWS.md * update seq2seq to use "END" like dictionary (facebookresearch#227) * add hred to parlai (facebookresearch#228) * Add placeholder agent for HRED model (facebookresearch#229) * added email_worker API * refactored init_aws * setting assignment duration * getting worker id earlier in time * hide implementation detail of conversation_id to discourage change * removed unnecessary create_hit_type_lock * Added CLEVR task (facebookresearch#233) * Exception to alert user of possible mistake when providing label and label candidate (facebookresearch#232) * better handling for email_worker error case * turn off keypress trigger if send button is disabled * added abandoned HIT handling * clean up * disabled approve/reject for abandoned HITs * added gating for pay_bonus * fixed pay bonus bug * fixed mturk agent act() and observe() bug * fixed bug * fixed submit HIT handling * remove quotes around model path (facebookresearch#239) The command line option doesn't require quotes, even if /tmp/model is a placeholder it's slightly misleading * readme, example, and test fixes (facebookresearch#234) * Update README.md * bug where wrong word embedding is zeroed (facebookresearch#242) word_dict['<NULL>'] returns the index for UNK rather than NULL and thus the wrong embedding is being zeroed * Update README.md * Added candidates to clevr (facebookresearch#243) * Allow newlines in FbDialog format by using '\n' in the text file messages (facebookresearch#246) * Simple task that just loads the specified FbDialogData file (facebookresearch#245) * fromfile task * again * comment * init py * fromfile task: add docstring (facebookresearch#247) * Fixed dialog_babi task 6 candidates (facebookresearch#248) * Added image args to parlai args (facebookresearch#249) * weak ranking system for seq2seq (facebookresearch#235) * Added MemNN agent (facebookresearch#251) * add references to msmarco and insuranceqa * update msmarco description * task list * Added clevr to task list and item on news (facebookresearch#259) * Update README.md * fixed error in json format (facebookresearch#262) * print() is a function in Python 3 (facebookresearch#256) * print() is a function in Python 3 (facebookresearch#255)

Add TriviaQA task

a139c10

facebook-github-bot added the CLA Signed label Jul 12, 2017

Update agents.py

08923b9

alexholdenmiller reviewed Jul 12, 2017

View reviewed changes

Sanyam Satia added 3 commits July 12, 2017 22:06

Implement MultiTaskTeacher to allow WebTeacher and WikipediaTeacher t…

ac3d990

…o run simultaneously

Add download test for TriviaQA task

76df8a7

Modify test_downloads.py

f393ec2

alexholdenmiller suggested changes Jul 14, 2017

View reviewed changes

Change evidence handling and add verified teachers

6b472ff

alexholdenmiller suggested changes Jul 17, 2017

View reviewed changes

Set datafile option in constructor

80ae6f6

alexholdenmiller approved these changes Jul 18, 2017

View reviewed changes

alexholdenmiller merged commit 838a042 into facebookresearch:master Jul 18, 2017

This was referenced Nov 4, 2022

[Snyk] Fix for 1 vulnerabilities javakian/ParlAI#57

Open

[Snyk] Fix for 1 vulnerabilities javakian/ParlAI#59

Open

This was referenced Dec 25, 2022

[Snyk] Fix for 1 vulnerabilities javakian/ParlAI#72

Open

[Snyk] Fix for 1 vulnerabilities javakian/ParlAI#75

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add TriviaQA task #204

Add TriviaQA task #204

ssatia commented Jul 12, 2017

alexholdenmiller left a comment

alexholdenmiller Jul 12, 2017

alexholdenmiller Jul 12, 2017

ssatia Jul 13, 2017

ssatia commented Jul 13, 2017

alexholdenmiller Jul 14, 2017

ssatia Jul 15, 2017

alexholdenmiller Jul 14, 2017

ssatia commented Jul 15, 2017

alexholdenmiller left a comment

alexholdenmiller Jul 17, 2017

ssatia Jul 18, 2017

alexholdenmiller commented Jul 18, 2017

Add TriviaQA task #204

Add TriviaQA task #204

Conversation

ssatia commented Jul 12, 2017

alexholdenmiller left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ssatia commented Jul 13, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ssatia commented Jul 15, 2017

alexholdenmiller left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexholdenmiller commented Jul 18, 2017