NLPineers@ NLU of Devanagari Script Languages 2025: Hate Speech Detection using Ensembling of BERT-based Models
Accepted at COLING 2025, part of CHiPSAL: Challenges in Processing South Asian Languages
This repository hosts the code for the project titled "Hate Speech Detection using Ensembling of BERT-based Models" for Devanagari script languages (Hindi, Nepali). The aim is to leverage state-of-the-art techniques like BERT for hate speech detection in South Asian languages.
This project focuses on developing an ensemble-based model for hate speech detection using BERT-based architectures, specifically tailored for languages that use the Devanagari script, such as Hindi and Nepali. The goal is to improve the detection of hate speech and offensive content in social media posts, comments, and other online platforms in these languages.
- Data Augmentation: Augmentation techniques are applied to both Hindi and Nepali dataset to address class imbalance and enhance model performance.
- Multiple Models: The repository includes different models (referred to as m1, m2, m3, etc.) to handle various configurations and techniques. Please refer to the papers table for detailed descriptions of each model.
-
Install Dependencies:
pip install -r requirements.txt
-
Dataset Configuration:
Change the dataset location as per your setup. Ensure that the dataset path is correctly configured in the script files.
- Running the Models:
cd models
python m1_chipsal.py
Will release later,
For any queries, feel free to reach out via email 📧: