PhoNER_COVID19 is a dataset for recognizing COVID-19 related named entities in Vietnamese, consisting of 35K entities over 10K sentences. We define 10 entity types with the aim of extracting key information related to COVID-19 patients, which are especially useful in downstream applications. In general, these entity types can be used in the context of not only the COVID-19 pandemic but also in other future epidemics:
The construction of PhoNER_COVID19 is detailed in our NAACL 2021 paper:
@inproceedings{PhoNER_COVID19,
title = {{COVID-19 Named Entity Recognition for Vietnamese}},
author = {Thinh Hung Truong and Mai Hoang Dao and Dat Quoc Nguyen},
booktitle = {Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
year = {2021}
}
By downloading the PhoNER_COVID19 dataset, USER agrees:
- to use PhoNER_COVID19 for research or educational purposes only.
- to not distribute PhoNER_COVID19 or part of PhoNER_COVID19 in any original or modified form.
- and to cite our NAACL paper above whenever PhoNER_COVID19 is employed to help produce published results.
THE DATA IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE DATA OR THE USE OR OTHER DEALINGS IN THE
DATA.