-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implemented methods to send reader to GPU or CPU inside QAPipeline #143
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
fmikaelian
added a commit
that referenced
this pull request
Jun 25, 2019
* initial structure 🏗 * Update requirements.txt * Add config for packaging, CI and tests #11 * removing pytest to debug CI * quiet pip install * add pytest * add dummy test * Add structure for samples and examples #14 * Add Travis CI badge #16 * Implement download.py script (SQuAD fetch) #9 * Create LICENSE * Add document retriever script #8 * fix typo name title column * Move retriever example in /examples (currently in /samples) #24 * Add utils script to convert pandas df (title, content) to SQuAD #13 * fix typo * Find a name for our QA software #23 * Find a name for our QA software #23 * Upload weights and metrics and update download.py script #21 * Add run_squad.py in cdqa/reader #34 * Add scrapper folder under /cdqa #30 * Upload weights and metrics and update download.py script #21 * Find a robust method to get articles paragraphs #1 * Update converter.py * Update run_converter.py * Find a robust method to get articles paragraphs #1 * Find a robust method to get articles paragraphs #1 * Initiate README structure #32 * add contributing guidelines * precision on repo structure and workflow * small typo fixes * continue fixes * Add fetch of BNP Paribas Newsroom dataset v.1.0 to download.py #46 * small fixes download/converter * Changed url address for squad/evaluate-v1.1.py script #35 * Adapt retriever.py to BNP Paribas Newsroom dataset v.1.0 #45 * Adapt retriever.py to BNP Paribas Newsroom dataset v.1.0 #45 * Split run_squad.py in processing/train/predict #47 * Split run_squad.py in processing/train/predict #47 * add sklearn wrapper to run_squad.py to be able to call model from cdqa/pipeline * add sklearn wrapper to run_squad.py to be able to call model from cdqa/pipeline * Wrong dataset filename in examples #60 * small fixes * add BertQA.fit() code * add custom_weigths params in BertQA.fit() * typo fixes * remove uncessary args in BertQA() class * Add scikit-learn wrapper interface for BertForQuestionAnswering * update imports train/predict * json file or object for input + combine doc retriever+reader in pipeline * fix bad indents * small fixes * fix examples/features casting * read_squad_examples() does not work with our custom input object #61 * small fixes in estimator/transformer classes * Update run_squad with latest commits #64 * Split run squad.py in processing/train/predict (#66) * small fixes and updates * Adapt or disable verbose during model fit() #63 * Update run_squad.py * Update run_squad with latest commits #64 * NameError: name 'device' is not defined in predict() method #68 (#69) * #71 #65 (#73) Small fixes * #75 (#76) * fix #74 #36 #33 (#78) * continue fix best answer across paragraphs (#80) * fix #74 #36 #33 * update predict.py results * Be compliant with the Github open source guide #81 * start new structure docs * synchronise run-squad.py #82 (#83) * Disable logger info for BertProcessor() #77 (#84) * Disable logger info for BertProcessor() #77 * Adapt or disable verbose during model fit() #63 * Add comments + docstrings + changelog #79 (#86) * Add comments + docstrings + changelog #79 * Add comments + docstrings + changelog #79 * Add comments + docstrings + changelog #79 * small typo fixes * small typo in download.py * fix typo readme (#88) * Add comments + docstrings + changelog #79 (#89) * Add example (#93) * Add comments + docstrings + changelog #79 * add example notebook for prediction + small changes * add example notebook for prediction + small changes * add codecov * add codecov badge * sync with HF example (#94) * Sync hf (#98) * sync HF * update docstring * fix typo * Added download of CPU version of model to download.py (#100) * update example notebook and docstrings (#92, #90, #79) (#102) * update example notebook and docstrings (#92, #90, #79) * update docstrings #79 * continue #79 * add flake8 to pytest in CI * start integrating rest api #35 * add info readme * basic api #35 * update reqs * add refs and badges #87 (#105) * add refs and badges #87 * sync HF * first version of paper * Add sklearn wrapper for retriever as well #95 * Add sklearn wrapper for retriever as well #95 * update readme and clean repo * update evaluation section in README * debug-minor-updates (#106) * Add github badges #87 * Disable verbose during predictions #103 * fix typos and tests #95 * Rename variables and scripts #108 * adapt notebook to new retriever class (#109) * adapt notebook to new retriever class * remove samples dir * clean up repo and rename #108 * Fix predict berqa (#113) * Rename variables and scripts #108 * Rename variables and scripts #108 * BertQA().predict() should return only 1 final predictions object #110 * Created a sklearn wrapper for the QA Pipeline (#101) * Implemented QAPipeline object that do the whole process for question-answering * Added option to attribute model: path (string) or joblib object * corrected typo * Created example of jupyter notebook for use of qa_pipeline * Update notebook example * Added description of QAPipeline class" * Added descriptions to all methods of QAPipeline class" * Corrected typo * Added download of CPU version of model to download.py (#100) * update example notebook and docstrings (#92, #90, #79) (#102) * update example notebook and docstrings (#92, #90, #79) * update docstrings #79 * continue #79 * add flake8 to pytest in CI * start integrating rest api #35 * add info readme * basic api #35 * update reqs * add refs and badges #87 (#105) * add refs and badges #87 * sync HF * first version of paper * Add sklearn wrapper for retriever as well #95 * Add sklearn wrapper for retriever as well #95 * update readme and clean repo * update evaluation section in README * debug-minor-updates (#106) * Add github badges #87 * Disable verbose during predictions #103 * fix typos and tests #95 * Rename variables and scripts #108 * adapt notebook to new retriever class (#109) * adapt notebook to new retriever class * remove samples dir * clean up repo and rename #108 * Fix predict berqa (#113) * Rename variables and scripts #108 * Rename variables and scripts #108 * BertQA().predict() should return only 1 final predictions object #110 * Implemented QAPipeline object that do the whole process for question-answering * Added option to attribute model: path (string) or joblib object * corrected typo * Created example of jupyter notebook for use of qa_pipeline * Update notebook example * Added description of QAPipeline class" * Added descriptions to all methods of QAPipeline class * Corrected typo * Changed code from qa_pipeline.py to cdqa_sklearn.py * seperated kwargs for declaration of different classes within QAPipeline * removed qa_pipeline.py * Implemented predict() and retriever part of fit() * Implemented reader training in fit() and completed documentation * Modified documentation for predict() method * Deleted useless tutorial * Created notebook example for pipeline * Modified converter.py to correct for the creation of repeated articles… (#116) * Modified converter.py to correct for the creation of repeated articles in generate_squad_examples * included options for min and max length in filter_paragraphs() * Implement automatic pypi upload on master release #107 * Debug, small fixes and doc updates (#117) * Update CONTRIBUTING.md with new tree structure #112 * Build REST API using QAPipeline() #118 * start updating README with cdqa pipeline method * Allow for model evaluation directly from cdqa #104 * add filter script in utils for all data cleaning tasks * update badges pypi * Allow for model evaluation directly from cdqa #104 * api style fixing + update demo notebook * Build REST API using QAPipeline() #118 * update README and naming * update README * update tree structure * corrected typo related to sync with HF (#126) * Updated BertQA to enable multiple trainings and handled some errors (#130) * modified BertQA class to enable multiple calls to fit() * cerrected typo * Deleted tokenizer saving inside BertQA.fit * handled problem with self.output_dir * Implemented fit_reader() method and fixed fit() method. (#131) * replaced self.model by self.reader * Implemented fit_reader(), fixed fit() and updated doc * sync HF + auto export json in scrapper + move filters (#129) * sync HF + auto export json in scrapper + move filters * change wording converter => converters * fix typo api * update version of dataset * update version of dataset (2) * predict() method should also give back index of document + paragraph #91 (#132) * update API and notebook * Implemented multiple prediction in QAPipeline.predict() (#135) * replaced self.model by self.reader * Implemented fit_reader(), fixed fit() and updated doc * -- * Implemented multiple predictions in qa_pipeline * removed not used import * Improved doc of predict method * Handled error for predictions on GPU * filter paragraphs script (#140) * implemented methods to send reader to GPU or CPU inside QAPipeline (#143) * debug and update filters (#141) * small fixes * Fixed some errors (#145) * fixed typo * added other needed changes when sending to different devices * add instructions for reader training on SQuAD * Put all the display of messages under the verbose condition (#147) * Update issue templates (#148) * Update issue templates * remove old issue template method * Be compliant with the Github open source guide #81 * Deleted not used folders and included option to save logs with default False (#150) * Deleted useless folders for github repo * Added option in BertQA to save logs, the default is false * Implemented a better way to have the option to save logs * Updated documentation * Removed useless parameter in BertQA * Updated explanations to train and to evaluate reader on README (#151) * Updated explanation to train reader on README file * Updated explanation to evaluate reader on README file * Added explanation to evaluate pipeline * Implemented function to evaluate pipeline (#152) * Implemented function to evaluate pipeline * modified name of module from metrics.py to evaluate.py * Changed evaluate_from_files to evaluate_reader / modified name of the module (evaluate.py to evaluation.py) * README updates * update run_squad.py * create content column inside qa_pipeline * update refs * clean up docstrings after content col removal * fix typos and start write unit tests #136 (#155) * + pdf_converter (#149) * + pdf_converter * README updates * Use the sys.argv and save the data on a csv * change '\n'.join() by ' '.join() in order to correct the csv * update run_squad.py * create content column inside qa_pipeline * update refs * clean up docstrings after content col removal * minor bugs * fix typos and start write unit tests #136 (#155) * + pdf_converter * Use the sys.argv and save the data on a csv * change '\n'.join() by ' '.join() in order to correct the csv * minor bugs * update README * change param name to sth meaningful * fix typos and start migration to org * update URLs * corrected bug when predicting with log and set do_lower_case do False as default (the default BERT model we use is uncased) (#160) * change LICENSE + fix typo README * Included the whole team in the author paramater in setup.py (#161) * Prepare repo for release #159 (#162) * Prepare repo for release #159 * add GPU saving to example of reader training on SQuAD * remove useless dep * update README * Updated download.py and removed docs repo (#163) * updated download.py * deleted docs repo * fix LICENSE badge bug (#166) * Last updates on download / setup / NB example to official release (#168) * moved download.py to root and updated it to download models and BNP Paribas dataset * changed version in setup.py to 1.0.0 * updated tutorial example with changes in repository * done some minor updates on download.py * removed PyGithub from requirements as we do not use it anymore
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
As proposed on #137