coarl-counterspeech

Getting Started

Make sure you have git, python(>=3.8, <3.10), poetry installed. Preferably within a virtual environment.
```
pip install poetry
```

Install dependencies

cd coarl-counterspeech
poetry install
git init
git add .
git commit -m "add: initial commit."

Data Setup

To set up the dataset and prepare the environment for preprocessing and other pipelines, please follow the steps below.

Prerequisites

Ensure you have a Huggingface account and access to the IntentConanV2 dataset Aswini123/IntentCONANv2.
Huggingface's datasets library and the required dependencies must be installed. You can install them with the following command:

pip install datasets

Steps

Run the setup script:

The setup script will:
- Prompt you to log in to your Huggingface account.
- Download the IntentConanV2 dataset from Huggingface.
- Execute data preprocessing and other prompt-related pipelines necessary for the project.
To run the setup script, use the following command:
```
bash setup.sh
```
Login to Huggingface:

Upon running the script, you will be prompted to log in to your Huggingface account. Make sure you have the necessary access to the dataset - https://huggingface.co/datasets/Aswini123/IntentCONANv2

Example:

huggingface-cli login

Dataset Download and Preprocessing:

After successful login, the script will automatically download the IntentConanV2 dataset and run the required data preprocessing.

Training Pipelines

Once the project/data folder is populated with the necessary data, you can train the following pipelines by running the respective scripts:

Multitask Pipeline: Run the multitask training pipeline by executing:
```
bash multitask.sh
```
PEFT (Parameter-Efficient Fine-Tuning) Pipeline: Run the PEFT training pipeline by executing:
```
bash peft.sh
```
PPO (Proximal Policy Optimization) Pipeline: Run the PPO training pipeline by executing:
```
bash ppo.sh
```

Make sure that the project/data folder is fully populated before running any of these scripts.

Directory Structure

File	Description
project	Main directory containing all the code
project/creds	Directory containing all API access credentials ( project-debator / open-ai / aws)
project/runs	Directory to keep track of all model runs (train / eval). For each run, we store the best_model, classfication args, eval results, metrics, etc.
project/utils	Program containing utility functions
project/constants	Program for accessing costant variables, shared variables or default configs
CHANGELOG.md	Track changes in the code, datasets, etc.
LICENSE	Need to update
pyproject.toml	Track dependencies here. Also, this means you would be using poetry.
README.md	This must ring a bell.

Citation

If you find this repository useful in your research, please cite the following paper:

@misc{hengle2024intentconditioned,
      title={Intent-conditioned and Non-toxic Counterspeech Generation using Multi-Task Instruction Tuning with RLAIF}, 
      author={Amey Hengle and Aswini Kumar and Sahajpreet Singh and Anil Bandhakavi and Md Shad Akhtar and Tanmoy Chakroborty},
      year={2024},
      eprint={2403.10088},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
project		project
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.sh		setup.sh
train_multitask.sh		train_multitask.sh
train_peft.sh		train_peft.sh
train_ppo.sh		train_ppo.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

coarl-counterspeech

Getting Started

Data Setup

Prerequisites

Steps

Example:

Training Pipelines

Directory Structure

Citation

About

Releases

Packages

Contributors 3

Languages

LCS2-IIITD/coarl-counterspeech

Folders and files

Latest commit

History

Repository files navigation

coarl-counterspeech

Getting Started

Data Setup

Prerequisites

Steps

Example:

Training Pipelines

Directory Structure

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages