Skip to content

Neural network dependency parser based on Chen and Manning '14 implemented in Torch and Scala.

Notifications You must be signed in to change notification settings

strubell/nn-depparse

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

nn-depparse

A Torch/Scala reimplementation of the neural network dependency parser described in Chen and Manning '14.

Requirements

Setup

  1. Set up environment variables. For example, from the root directory of this project on blake:
export NNDEPPARSE_ROOT=`pwd`
export DATA_DIR=/iesl/canvas/strubell/data/
  1. Put a word embeddings file in $NNDEPPARSE_ROOT/data/embeddings. The file is expected to contain one embedding per line, where the first field is the token and the remaining fields are the values of the embedding, each field separated by a single space. You can get the Collobert et al. embeddings here.
  2. Compile: sbt compile
  3. Perform all data preprocessing for a given configuration [also compiles]. For example:
./bin/all-data-processing.sh config/chen-ptb.conf

Running

  1. Train the parser:
./bin/train-parser.sh config/chen-ptb.conf
  1. Evaluate the parser (accuracy and speed):
./bin/parse-fast.sh config/chen-ptb.conf
  1. Tune hyperparameters (assumes a GPU machine and uses all of its GPUs):
./bin/tune-hyperparams.sh config/chen-ptb.conf

[optional detail] Generating training data

  1. Generate parse decisions + features for training from PTB: ./bin/get-parse-decisions-ptb.sh
  2. Generate intmaps from parse decisions + features: ./bin/convert-ptb-feats-to-ints.sh
  3. Generate Torch tensors from intmaps: ./bin/convert-ptb-feats-to-torch.sh
  4. If word intmaps changed, generate Torch embedding tensors: ./bin/convert-collobert-embeddings-to-torch.sh

[optional detail] Generating test data

  1. Generate dev/test intmaps for each sentence in PTB: ./bin/convert-ptb-sents-to-ints.sh
  2. Generate Torch tensors from sentence intmaps: ./bin/convert-ptb-sents-to-torch.sh

[optional] Luajit hack to allow array-of-size def

In torch-distro/exe/luajit-rocks/luajit-2.1/src/luajit.c, add the function:

static int new_sized_table( lua_State *L )
{
    int asize = lua_tointeger( L, 1 );
    int hsize = lua_tointeger( L, 2 );
    lua_createtable( L, asize, hsize );
    return( 1 );
}

in main, after L is initialized add the lines:

lua_pushcfunction( L, new_sized_table );
lua_setglobal( L, "sized_table" );

Reinstall Torch.

About

Neural network dependency parser based on Chen and Manning '14 implemented in Torch and Scala.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published