This is unofficial implementation of Semantics-Assisted Video Captioning Model Trained with Scheduled Sampling Strategy. You can find official tensorflow implementation here.
- python 3.6
- pytorch
- pycocoevalcap
pip install -r requirements.txt
Please download data in official repo and put them all in ./data
If you want to train on msvd with default parameters:
python train.py --cfg configs/msvd_default.yml --savedir saved_results --exp_name anonymous_run
Remember to create savedir by mkdir
. --exp_name
is the name you give to this run.
For training on msrvtt, just change --cfg
to configs/msrvtt_default.yml
. It
takes about 90 min to train on msvd, 5h to train on msr-vtt (on GTX 1080Ti).
For more details about configs, please see opts.py
and yaml files in ./configs
You can see training process by tensorboard.
tensorboard --logdir saved_results --port my_port --host 0.0.0.0
python eval.py --savedir saved_results --exp_name anonymous_run --max_sent_len 20 --model_path path_of_model_to_eval
If you don't specify --model_path
, best model will be evaluated.
Results of my implementation are not chosen. I just run once for each dataset. My implementation is comparable to official claim.
Model | B-4 | R | M | C |
---|---|---|---|---|
official | 61.8 | 76.8 | 37.8 | 103.0 |
mine | 61.2 | 76.6 | 38.5 | 106.5 |
Model | B-4 | R | M | C |
---|---|---|---|---|
official | 43.8 | 62.4 | 28.9 | 51.4 |
mine | 44.4 | 62.7 | 28.8 | 50.7 |
Since Tensorflow and Pytorch implement Adam differently. I also offer tensorflow version of Adam in optim.py. But I found they perform comparable. So I choose Pytorch Adam by default. See more detials in reference.
Official implementation choose best model by a weighted sum of all scores. I just choose model of best cider on validation set.
Official implementation do dropout after schedule sampling. I do it before.
- beam search
- reinforcement learning
Thank for the original tensorflow implementation.