cnn_topic_classification

Intro

I wrote this classifier to classify Quora questions based on topics (Techonlogy, Business, Design, Food, Books). Its inspired by this. I tweaked his code a bit to use pretrained wordvec. Also you can find the topicwise questions in data directory. Every run is assosiated with a graph that plots accuracies vs steps taken. To download Quora questions, I used this.

Requirements

Python 2
Tensorflow > 0.8
Numpy

Loading embedding vectors

$ cd data

$ wget http://nlp.stanford.edu/data/glove.6B.zip

$ unzip glove.6B.zip

python embedding_loader.py --help

optional arguments:
  -h, --help           show this help message and exit
  --data_dir DATA_DIR  data directory containing glove vectors

Training

Print parameters:

python train.py --help

optional arguments:
  -h, --help            show this help message and exit
  --embedding_dim EMBEDDING_DIM
                        Dimensionality of character embedding (default: 128)
  --filter_sizes FILTER_SIZES
                        Comma-separated filter sizes (default: '3,4,5')
  --num_filters NUM_FILTERS
                        Number of filters per filter size (default: 128)
  --dropout_keep_prob DROPOUT_KEEP_PROB
                        Dropout keep probability (default: 0.5)
  --l2_reg_lambda L2_REG_LAMBDA
                        L2 regularizaion lambda (default: 0.0)
  --batch_size BATCH_SIZE
                        Batch Size (default: 64)
  --num_epochs NUM_EPOCHS
                        Number of training epochs (default: 200)
  --evaluate_every EVALUATE_EVERY
                        Evaluate model on dev set after this many steps
                        (default: 100)
  --checkpoint_every CHECKPOINT_EVERY
                        Save model after this many steps (default: 100)
  --allow_soft_placement [ALLOW_SOFT_PLACEMENT]
                        Allow device soft device placement
  --noallow_soft_placement
  --log_device_placement [LOG_DEVICE_PLACEMENT]
                        Log placement of ops on devices
  --nolog_device_placement
  --data_dir DATA_DIR   Provide directory location where glove vectors are
                        unzipped

Train:

python train.py

Evaluating

python eval.py --eval_train --checkpoint_dir="./runs/1459637919/checkpoints/"

Replace the checkpoint dir with the output from the training. To use your own data, change the eval.py script to load your data.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
data_helpers.py		data_helpers.py
embedding_loader.py		embedding_loader.py
text_cnn.py		text_cnn.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cnn_topic_classification

Intro

Requirements

Loading embedding vectors

Training

Evaluating

References

About

Releases

Packages

Languages

License

vishaljain3991/cnn_topic_classification

Folders and files

Latest commit

History

Repository files navigation

cnn_topic_classification

Intro

Requirements

Loading embedding vectors

Training

Evaluating

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages