Prompting uses the capability of large language models (LLM's) to "fill in the blank" in order to classify the meaning of text. I conducted research on how a single model can use prompting to simultaneously learn multiple tasks. RoBERTa was the primary model I experimented with. Here are the results. This repository contains the code I used to train and evaluate models during my experiments.
Here is a diagram from this paper that explains the difference between prompting and head-based fine-tuning for language models.
The paper How Many Data Points is a Prompt Worth? performs some cool experiments showing the power of prompts in low resource settings.
RobertaPrompt wraps around HuggingFace's RobertaForMaskedLM class and allows developers to train and test a Roberta model using prompting based on a prompt definition
Suppose we have two tasks: given an argument and a topic, we must detect if the argument was in support or against the topic, and also whether or not the argument contained a fallacy (if it does, what exact fallacy the argument contains).
A prompt definition would contain:
- A template for each task. A template is a consistent text pattern associated with a task so the model recognizes which task needs to be completed. For example,
"Stance detection task. Topic: {insert topic here} and Argument: {insert argument here}. The stance is: <mask>"
- A policy function for each task. The policy function maps the token that the model uses to fill in the blank with the predicted label.
My experiments trained a Roberta model to accomplish the exact tasks mentioned above - you can take a look at some example predictions in the prompting_example.ipynb notebook
First, load a base model. A GPU as the device is highly reccomended.
pmodel = RobertaPrompt(model='roberta-large', device = torch.device('cuda'), prompt = argument_prompt)
Start training immediately by specifying the paths to a training and validation dataset. Training statistics will be displayed in stdout.
pmodel.train("sample_train_set.tsv", "sample_val_set.tsv", output_dir="sample_model", epochs=10)
After training is finished, evaluate the model on a test set using the following function and save the test results
pmodel.test("sample_test_set.tsv", save_path='stats.txt')
You should see text content in this format in the file specificed by save_path
. Overall f1 scores are included, along with more fine-grained statistics on model performance for each label
One can then use this model and fine-tune it on other tasks with different prompts.
Sample data for fallacious argument and stance detection is from Argotario.