Skip to content
This repository has been archived by the owner on Apr 5, 2023. It is now read-only.

deepvk/headline_gen_baselines

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Baselines for Headline Generation on "Rossiya Segodnya" dataset

Data preparation for first sentence and OpenNMT baselines. You need to place processed-ria.json into data folder. We have excluded HTML tags from "Rossiya Segodnya" corpus to obtain processed-ria.json. Run python main.py to get data prepared, but you need to download corpus by yourself.

To test first sentence baseline you need to run

python get_rouge.py data/test_first_sents.bpe data/test_headlines.bpe

To train OpenNMT baseline, just follow instructions from official OpenNMT repository:

python preprocess.py -train_src data/train_first_sents.bpe -train_tgt data/train_headlines.bpe -valid_src data/valid_first_sents.bpe -valid_tgt data/valid_headlines.bpe -save_data data/ria
python train.py -data data/ria -save_model ria-model

To test OpenNMT:

python translate.py -model ria-model_XXX.pt -src data/test_first_sents.bpe -output pred.bpe -replace_unk
python get_rouge.py pred.bpe data/test_headlines.bpe

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages