The instruction of the object-centric caption generation.
Our model is implemented in Torch, and depends on the following packages:
After installing torch, you can install / update these dependencies by running the following:
luarocks install torch
luarocks install nn
luarocks install image
luarocks install lua-cjson
luarocks install https://raw.githubusercontent.com/qassemoquab/stnbhwd/master/stnbhwd-scm-1.rockspec
luarocks install https://raw.githubusercontent.com/jcjohnson/torch-rnn/master/torch-rnn-scm-1.rockspec
To run the model on new images, use the script run_model.lua
. To run the model on a test image,
use the following command:
th run_model.lua -input_image /path/to/my/image/file -output_vis_dir /path/to/the/output/folder
If you have an entire directory of images on which you want to run the model, use the -input_dir
flag instead:
th run_model.lua -input_dir /path/to/my/image/folder -output_vis_dir /path/to/the/output/folder
The resulting output file format is as follows:
[
{
"boxes": [
[9.4456, 46.8276,569.0354, 368.3203],
[183.6740, 77.7138, 185.4196, 332.1285],
[403.1037, 77.593994, 323.3377, 334.4553],
...
]
"captions": [
'the man wearing black shirt',
'the man has head',
'the man wearing a white shirt',
...
]
}
...
]
This work was supported by Institute for Information & communications Technology Promotion(IITP) grant funded by the Korea government(MSIT) (2017-0-01780, The technology development for event recognition/relational reasoning and learning knowledge based system for video understanding)