Skip to content

Latest commit

 

History

History
64 lines (41 loc) · 2.53 KB

READme.md

File metadata and controls

64 lines (41 loc) · 2.53 KB

arXiv

Accepted at COLING 2025, part of CHiPSAL: Challenges in Processing South Asian Languages


COLING Logo CHiPSAL Banner

COLING 2025 CHiPSAL


This repository hosts the code for the project titled "Hate Speech Detection using Ensembling of BERT-based Models" for Devanagari script languages (Hindi, Nepali). The aim is to leverage state-of-the-art techniques like BERT for hate speech detection in South Asian languages.

About the Project

This project focuses on developing an ensemble-based model for hate speech detection using BERT-based architectures, specifically tailored for languages that use the Devanagari script, such as Hindi and Nepali. The goal is to improve the detection of hate speech and offensive content in social media posts, comments, and other online platforms in these languages.

Scripts for:

  • Data Augmentation: Augmentation techniques are applied to both Hindi and Nepali dataset to address class imbalance and enhance model performance.
  • Multiple Models: The repository includes different models (referred to as m1, m2, m3, etc.) to handle various configurations and techniques. Please refer to the papers table for detailed descriptions of each model.

Setup

  1. Install Dependencies:

    pip install -r requirements.txt
  2. Dataset Configuration:

Change the dataset location as per your setup. Ensure that the dataset path is correctly configured in the script files.

  1. Running the Models:
    cd models
    
    python m1_chipsal.py
    

For citation

Will release later,


Contact

For any queries, feel free to reach out via email 📧: