This repository accompanying the code for my master's thesis LegSum: Legal Document Summarization
Notebook | Colab | Model checkpoint |
---|---|---|
T5 | Frederick0291/t5-small-finetuned-billsum | |
BART billsum | murali-admin/bart-billsum-1 | |
BART xsum | sshleifer/distilbart-xsum-12-6 | |
Pegasus Legal | nsi319/legal-pegasus | |
Pegasus billsum | google/pegasus-billsum | |
BigBird | google/bigbird-pegasus-large-bigpatent | |
LED | allenai/led-large-16384-arxiv |
Notebook | Colab |
---|---|
Extractive | |
Kmeans Bertsum | |
Luhn's algorithm | |
TF-IDF |
-
BillSum
- Official github repository 🤗 Dataset loader
- Processed and clean version of data can be found here
Following results are on BillSum Dataset (ca_test) with pre-trained models and extractive methods
Algorithm / model | Rouge-1 | Rouge-2 | Rouge-L |
---|---|---|---|
Extractive | |||
KL | 24.44 | 9.74 | 21.98 |
LSA | 30.85 | 12.45 | 27.64 |
SumBasics | 31.01 | 12.61 | 27.83 |
Bert | 33.29 | 15.17 | 29.67 |
Tf-Idf | 33.97 | 15.98 | 29.92 |
LexRank | 36.83 | 18.98 | 32.95 |
TextRank | 36.57 | 19.10 | 32.35 |
Luhn’s Algorithm | 37.48 | 19.93 | 33.35 |
Abstractive | |||
BART | 26.02 | 11.87 | 22.02 |
Pegasus(small) | 28.61 | 12.19 | 25.88 |
T5(small) | 32.99 | 15.52 | 30.21 |
BillPegasus | 34.25 | 16.63 | 30.22 |