gsp-enc/generator_data at adj-mm · lisjin/gsp-enc

History

Name		Name	Last commit message	Last commit date
parent directory ..
scripts		scripts
stog		stog
LICENSE		LICENSE
README.md		README.md
run_standford_corenlp_server.sh		run_standford_corenlp_server.sh

README.md

Data Preprocessing for AMR-to-Text Generation

Assuming that you're working on AMR 2.0 (LDC2017T10), unzip the corpus to data/AMR/LDC2017T10, and make sure it has the following structure:

data/AMR/LDC2017T10
├── data
│   ├── alignments
│   ├── amrs
│   └── frames
├── docs
│   ├── AMR-alignment-format.txt
│   ├── amr-guidelines-v1.2.pdf
│   ├── file.tbl
│   ├── frameset.dtd
│   ├── PropBank-unification-notes.txt
│   └── README.txt
└── index.html

Download Artifacts:

./scripts/download_artifacts.sh

Prepare training/dev/test data:

./scripts/prepare_data.sh -v 2 -p data/AMR/LDC2017T10

We use Stanford CoreNLP (version 3.9.2) for tokenizing. First, start a CoreNLP server by sh run_standford_corenlp_server.sh Then, annotate AMR sentences:

sh run_standford_corenlp_server.sh
./scripts/annotate_features.sh data/AMR/amr_2.0

Data Preprocessing

./scripts/preprocess_2.0.sh

(Acknowledgements) A large body of the code for AMR preprocessing is from sheng-z/stog.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

generator_data

generator_data

README.md

Data Preprocessing for AMR-to-Text Generation

Files

generator_data

Directory actions

More options

Directory actions

More options

Latest commit

History

generator_data

Folders and files

parent directory

README.md

Data Preprocessing for AMR-to-Text Generation