Evaluation Dataset for the following manuscript:
Yiping Jin, Vishakha Kadam and Dittaya Wanvarie, Bootstrapping Large-Scale Fine-Grained Contextual Advertising Classifier from Wikipedia. (TextGraphs-15 Workshop@NAACL 2021)
The files are organized in the folder [coarse category ID]/[fine-grained category name]
. Mapping of the coarse-grained category IDs:
IAB1 arts-entertainment
IAB2 automotive
IAB3 business
IAB4 careers
IAB5 education
IAB6 family-parenting
IAB7 health-fitness
IAB8 food-drink
IAB9 hobbies-interests
IAB10 home-garden
IAB11 law-gov-t-politics
IAB12 news
IAB13 personal-finance
IAB14 society
IAB15 science
IAB16 pets
IAB17 sports
IAB18 style-fashion
IAB19 technology-computing
IAB20 travel
IAB21 real-estate
IAB22 shopping
IAB23 religion-spirituality
You can download the full training dataset here (2.2GB).
If you make use of this dataset for your research, please cite the following paper:
@inproceedings{jin-2021-bootstrapping,
title = "Bootstrapping Large-Scale Fine-Grained Contextual Advertising Classifier from Wikipedia",
author = "Jin, Yiping and Kadam, Vishakha and Wanvarie, Dittaya",
booktitle = "Proceedings of the Graph-based Methods for Natural Language Processing (TextGraphs)",
year = "2021",
publisher = "Association for Computational Linguistics",
}