version - 0.1.0 (Pre-release)
Bengali Dataset is the largest open source Bengali dataset for NLP. Solving NLP for Bengali comes with a broad set of challenges and difficulties. This is our first step to solve this problem. In future this dataset will be integrated with HuggingFace datasets library.
This data set will contain 1M annotated samples
This dataset is still in development phase, we need more contributors, developers to finish the initial 1M annotated Bengali dataset goal.
See the how to contribute guide
Contact the maintainers of the datasets
Join our discord community for further discussions.
LivingThings Community