-
Notifications
You must be signed in to change notification settings - Fork 78
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'master' of github.com:ncbi-nlp/NCBI_BERT
- Loading branch information
Showing
53 changed files
with
5,544 additions
and
6 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
PUBLIC DOMAIN NOTICE | ||
National Center for Biotechnology Information | ||
|
||
This software/database is a "United States Government Work" under the terms of | ||
the United States Copyright Act. It was written as part of the author's | ||
official duties as a United States Government employee and thus cannot be | ||
copyrighted. This software/database is freely available to the public for use. | ||
The National Library of Medicine and the U.S. Government have not placed any | ||
restriction on its use or reproduction. | ||
|
||
Although all reasonable efforts have been taken to ensure the accuracy and | ||
reliability of the software and data, the NLM and the U.S. Government do not and | ||
cannot warrant the performance or results that may be obtained by using this | ||
software or data. The NLM and the U.S. Government disclaim all warranties, | ||
express or implied, including warranties of performance, merchantability or | ||
fitness for any particular purpose. | ||
|
||
Please cite the author in any work or product based on this material: | ||
|
||
Peng Y, Yan S, Lu Z. Transfer Learning in Biomedical Natural Language | ||
Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets. | ||
In Proceedings of the 2019 Workshop on Biomedical Natural Language Processing | ||
(BioNLP 2019). 2019:58-65. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,142 @@ | ||
|
||
# Byte-compiled / optimized / DLL files | ||
__pycache__/ | ||
*.py[cod] | ||
*$py.class | ||
mtdnn_env | ||
venv_windows | ||
mtdnn_env_apex | ||
|
||
# C extensions | ||
*.so | ||
|
||
# Distribution / packaging | ||
.Python | ||
build/ | ||
develop-eggs/ | ||
dist/ | ||
downloads/ | ||
eggs/ | ||
.eggs/ | ||
lib/ | ||
lib64/ | ||
parts/ | ||
sdist/ | ||
var/ | ||
wheels/ | ||
*.egg-info/ | ||
.installed.cfg | ||
*.egg | ||
MANIFEST | ||
.DS_Store | ||
|
||
# PyInstaller | ||
# Usually these files are written by a python script from a template | ||
# before PyInstaller builds the exe, so as to inject date/other infos into it. | ||
*.manifest | ||
*.spec | ||
|
||
# Installer logs | ||
pip-log.txt | ||
pip-delete-this-directory.txt | ||
|
||
# Unit test / coverage reports | ||
htmlcov/ | ||
.tox/ | ||
.coverage | ||
.coverage.* | ||
.cache | ||
nosetests.xml | ||
coverage.xml | ||
*.cover | ||
.hypothesis/ | ||
.pytest_cache/ | ||
|
||
# Translations | ||
*.mo | ||
*.pot | ||
|
||
# Django stuff: | ||
*.log | ||
local_settings.py | ||
db.sqlite3 | ||
|
||
# Flask stuff: | ||
instance/ | ||
.webassets-cache | ||
|
||
# Scrapy stuff: | ||
.scrapy | ||
|
||
# Sphinx documentation | ||
docs/_build/ | ||
|
||
# PyBuilder | ||
target/ | ||
|
||
# Jupyter Notebook | ||
.ipynb_checkpoints | ||
|
||
# pyenv | ||
.python-version | ||
|
||
# celery beat schedule file | ||
celerybeat-schedule | ||
|
||
# SageMath parsed files | ||
*.sage.py | ||
|
||
# Environments | ||
.env | ||
.venv | ||
env/ | ||
venv/ | ||
ENV/ | ||
env.bak/ | ||
venv.bak/ | ||
|
||
# Spyder project settings | ||
.spyderproject | ||
.spyproject | ||
|
||
# Rope project settings | ||
.ropeproject | ||
|
||
# mkdocs documentation | ||
/site | ||
|
||
# mypy | ||
.mypy_cache/ | ||
|
||
# IDE pycharm | ||
.idea/ | ||
|
||
log/ | ||
model/ | ||
submission/ | ||
save/ | ||
book_corpus_test/ | ||
book_corpus_train/ | ||
checkpoints/ | ||
.pt_description_history | ||
.git-credentials | ||
pt_bert/philly | ||
.vs | ||
*.pyproj | ||
pt_bert/checkpoint | ||
*/aml_experiments | ||
screenlog.* | ||
data | ||
pt_bert/scripts | ||
pt_bert/model_data | ||
screen* | ||
checkpoint | ||
*.sln | ||
dt_mtl | ||
philly | ||
bert_models | ||
run_baseline* | ||
mt_dnn_models | ||
*pyc | ||
run_test/ | ||
experiments/superglue |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
PUBLIC DOMAIN NOTICE | ||
National Center for Biotechnology Information | ||
|
||
This software/database is a "United States Government Work" under the terms of | ||
the United States Copyright Act. It was written as part of the author's | ||
official duties as a United States Government employee and thus cannot be | ||
copyrighted. This software/database is freely available to the public for use. | ||
The National Library of Medicine and the U.S. Government have not placed any | ||
restriction on its use or reproduction. | ||
|
||
Although all reasonable efforts have been taken to ensure the accuracy and | ||
reliability of the software and data, the NLM and the U.S. Government do not and | ||
cannot warrant the performance or results that may be obtained by using this | ||
software or data. The NLM and the U.S. Government disclaim all warranties, | ||
express or implied, including warranties of performance, merchantability or | ||
fitness for any particular purpose. | ||
|
||
Please cite the author in any work or product based on this material: | ||
|
||
Peng Y, Chen Q, Lu Z. An Empirical Study of Multi-Task Learning on BERT | ||
for Biomedical Text Mining. In Proceedings of the 2020 Workshop on Biomedical | ||
Natural Language Processing (BioNLP 2020). 2020. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,77 @@ | ||
# Multi-Task Learning on BERT for Biomedical Text Mining | ||
|
||
This repository provides codes and models of the Multi-Task Learning on BERT for Biomedical Text Mining. | ||
The package is based on [`mt-dnn`](https://github.com/namisan/mt-dnn). | ||
|
||
## Pre-trained models | ||
|
||
The pre-trained MT-BlueBERT weights, vocab, and config files can be downloaded from: | ||
|
||
* [mt-bluebert-biomedical](https://github.com/yfpeng/mt-bluebert/releases/download/0.1/mt-bluebert-biomedical.pt) | ||
* [mt-bluebert-clinical](https://github.com/yfpeng/mt-bluebert/releases/download/0.1/mt-bluebert-clinical.pt) | ||
|
||
The benchmark datasets can be downloaded from [https://github.com/ncbi-nlp/BLUE_Benchmark](https://github.com/ncbi-nlp/BLUE_Benchmark) | ||
|
||
## Quick start | ||
|
||
### Setup Environment | ||
1. python3.6 | ||
2. install requirements | ||
```bash | ||
pip install -r requirements.txt | ||
``` | ||
|
||
### Download data | ||
Please refer to download BLUE_Benchmark: https://github.com/ncbi-nlp/BLUE_Benchmark | ||
|
||
|
||
### Preprocess data | ||
```bash | ||
bash ncbi_scripts/blue_prepro.sh | ||
``` | ||
|
||
### Train a MT-DNN model | ||
```bash | ||
bash ncbi_scripts/run_blue_mt_dnn.sh | ||
``` | ||
|
||
### Fine-tune a model | ||
```bash | ||
bash ncbi_scripts/run_blue_fine_tune.sh | ||
``` | ||
|
||
### Convert Tensorflow BERT model to the MT-DNN format | ||
```bash | ||
python ncbi_scripts/convert_tf_to_pt.py --tf_checkpoint_root $SRC_ROOT --pytorch_checkpoint_path $DEST --encoder_type 1``` | ||
``` | ||
|
||
## Citing MT-BLUE | ||
|
||
Peng Y, Chen Q, Lu Z. An Empirical Study of Multi-Task Learning on BERT | ||
for Biomedical Text Mining. In Proceedings of the 2020 Workshop on Biomedical | ||
Natural Language Processing (BioNLP 2020). 2020. | ||
|
||
``` | ||
@InProceedings{peng2019transfer, | ||
author = {Yifan Peng and Qingyu Chen and Zhiyong Lu}, | ||
title = {An Empirical Study of Multi-Task Learning on BERT for Biomedical Text Mining}, | ||
booktitle = {Proceedings of the 2020 Workshop on Biomedical Natural Language Processing (BioNLP 2020)}, | ||
year = {2020}, | ||
} | ||
``` | ||
## Acknowledgments | ||
This work was supported by the Intramural Research Programs of the National Institutes of Health, National Library of | ||
Medicine. This work was supported by the National Library of Medicine of the National Institutes of Health under award number K99LM013001-01. | ||
We are also grateful to the authors of BERT and mt-dnn to make the data and codes publicly available. | ||
## Disclaimer | ||
This tool shows the results of research conducted in the Computational Biology Branch, NLM/NCBI. The information produced | ||
on this website is not intended for direct diagnostic use or medical decision-making without review and oversight | ||
by a clinical professional. Individuals should not change their health behavior solely on the basis of information | ||
produced on this website. NIH does not independently verify the validity or utility of the information produced | ||
by this tool. If you have questions about the information produced on this website, please see a health care | ||
professional. More information about NLM/NCBI's disclaimer policy is available. |
Empty file.
Oops, something went wrong.