Skip to content

CNN topic classification helps to classify Quora questions based on topics.

License

Notifications You must be signed in to change notification settings

vishaljain3991/cnn_topic_classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cnn_topic_classification

Intro

I wrote this classifier to classify Quora questions based on topics (Techonlogy, Business, Design, Food, Books). Its inspired by this. I tweaked his code a bit to use pretrained wordvec. Also you can find the topicwise questions in data directory. Every run is assosiated with a graph that plots accuracies vs steps taken. To download Quora questions, I used this.

Requirements

  • Python 2
  • Tensorflow > 0.8
  • Numpy

Loading embedding vectors

$ cd data
$ wget http://nlp.stanford.edu/data/glove.6B.zip
$ unzip glove.6B.zip
python embedding_loader.py --help
optional arguments:
  -h, --help           show this help message and exit
  --data_dir DATA_DIR  data directory containing glove vectors

Training

Print parameters:

python train.py --help
optional arguments:
  -h, --help            show this help message and exit
  --embedding_dim EMBEDDING_DIM
                        Dimensionality of character embedding (default: 128)
  --filter_sizes FILTER_SIZES
                        Comma-separated filter sizes (default: '3,4,5')
  --num_filters NUM_FILTERS
                        Number of filters per filter size (default: 128)
  --dropout_keep_prob DROPOUT_KEEP_PROB
                        Dropout keep probability (default: 0.5)
  --l2_reg_lambda L2_REG_LAMBDA
                        L2 regularizaion lambda (default: 0.0)
  --batch_size BATCH_SIZE
                        Batch Size (default: 64)
  --num_epochs NUM_EPOCHS
                        Number of training epochs (default: 200)
  --evaluate_every EVALUATE_EVERY
                        Evaluate model on dev set after this many steps
                        (default: 100)
  --checkpoint_every CHECKPOINT_EVERY
                        Save model after this many steps (default: 100)
  --allow_soft_placement [ALLOW_SOFT_PLACEMENT]
                        Allow device soft device placement
  --noallow_soft_placement
  --log_device_placement [LOG_DEVICE_PLACEMENT]
                        Log placement of ops on devices
  --nolog_device_placement
  --data_dir DATA_DIR   Provide directory location where glove vectors are
                        unzipped


Train:

python train.py

Evaluating

python eval.py --eval_train --checkpoint_dir="./runs/1459637919/checkpoints/"

Replace the checkpoint dir with the output from the training. To use your own data, change the eval.py script to load your data.

References

About

CNN topic classification helps to classify Quora questions based on topics.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages