This is the official GitHub page for the paper:
Eric Müller-Budack, Matthias Springstein, Sherzod Hakimov, Kevin Mrutzek, and Ralph Ewerth: "Ontology-driven Event Type Classification in Images". In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 2928-2938, IEEE, Virtual Conference, 2021.
The paper is available on:
- Computer Vision Foundation (CVF): https://openaccess.thecvf.com/content/WACV2021/html/Muller-Budack_Ontology-Driven_Event_Type_Classification_in_Images_WACV_2021_paper.html
- arXiv: https://arxiv.org/pdf/2011.04714.pdf
Further information can be found on the EventKG website: http://eventkg.l3s.uni-hannover.de/VisE
We provide three different ways to setup the project. The results can be reproduced using the setup with singularity. The singularity image is built with an optimized pytorch implementation on arch linux, which we used for training and testing.
While the other two setups using a virtual environment or docker produce the same result on our testsets, they slightly differ from the results reported in the paper (deviation around 0.1%).
To install singularity please follow the instructions on: https://sylabs.io/guides/3.6/admin-guide/installation.html
Download our singularity image from: link (Filesize is 5 GB)
To run code using sinularity, please run:
singularity exec \
-B </PATH/TO/REPOSITORY>:/src \
--nv </PATH/TO/SINGULARITY/IMAGE>.sif \
bash
cd /src
Please run the following command to setup the project in your (virtual) environment:
pip install -r requirements.txt
NOTE: This setup produces slightly different results (deviation around 0.1%) while testing. To fully reproduce our results we have provided a singularity image, which is a copy of our training and testing environment and uses a highly optimized pytorch implementation.
We have provided a Docker container to execute our code. You can build the container with:
docker build <PATH/TO/REPOSITORY> -t <DOCKER_NAME>
To run the container please use:
docker run \
--volume <PATH/TO/REPOSITORY>:/src \
--shm-size=256m \
-u $(id -u):$(id -g) \
-it <DOCKER_NAME> bash
cd /src
NOTE: This setup produces slightly different results (deviation around 0.1%) while testing. To fully reproduce our results we have provided a singularity image, which is a copy of our training and testing environment and uses a highly optimized pytorch implementation.
You can automatically download the files (ontologies, models, etc.) that are required for inference and test with the following command:
python download_resources.py
The files will be stored in a folder called resources/
relative to the repository path.
We provide the trained models for the following approaches:
- Classification baseline (denoted as
C
): link - Best ontology driven approach using the cross-entropy loss (denoted as
CO_cel
): link - Best ontology driven approach using the cross-entropy loss (denoted as
CO_cos
): link
The performance of these models regarding the top-k accuracy, jaccard similarity coefficient (JSC), and cosine similarity (CS) on the VisE-Bing and VisE-Wiki testsets is listed below using the provided singularity image:
VisE-Bing
Model | Top-1 | Top-3 | Top-5 | JSC | CS |
---|---|---|---|---|---|
C | 77.4 | 89.8 | 93.6 | 84.7 | 87.7 |
CO_cel | 81.5 | 91.8 | 94.3 | 87.5 | 90.0 |
CO_cos | 81.9 | 90.8 | 93.2 | 87.9 | 90.4 |
VisE-Wiki
Model | Top-1 | Top-3 | Top-5 | JSC | CS |
---|---|---|---|---|---|
C | 61.7 | 74.6 | 79.2 | 72.7 | 77.8 |
CO_cel | 63.4 | 74.7 | 78.8 | 73.9 | 78.7 |
CO_cos | 63.5 | 74.3 | 78.8 | 74.1 | 79.0 |
In order to apply our models on an image or a list of images, please execute the following command:
python infer.py -c </path/to/model.yml> -i </path/to/image(s)>
If you followed the instructions in Download Ontology, Dataset and Models the model config is placed in resources/VisE-D/models/<modelname>.yml
relative to the repository path.
Optional parameters: As standard parameters the batch size is set to 32, the top-5 predictions will be shown, and the multiplied values of the leaf node probability and subgraph cosine similarity are used to convert the subgraph vector to a leaf node vector (details are presented in Section 4.2.3 of the paper).
--batch_size <int>
specifies the batch size (default 16
)
--num_predictions <int>
sets the number of top predictions printed on the console (default 3
)
--s2l_strategy [leafprob, cossim, leafprob*cossim]
specifies the strategies to retrieve the leaf node vector from a subgraph vector (default leafprob*cossim
)
This step requires to download the test images in the VisE-Bing or VisE-Wiki dataset. You can run the following command to automatically download the images:
python download_images.py -d </path/to/dataset.jsonl> -o </path/to/output/root_directory/>
If you followed the instructions in Download Ontology, Dataset and Models the dataset is placed in resources/VisE-D/<datasetname>.jsonl
and model config is placed in resources/VisE-D/models/<modelname>.yml
relative to the repository path.
Optional parameters:
-t <int>
sets the number of parallel threads (default 32
)
-r <int>
sets the number of retries to download an image (default 5
)
--max_img_dim <int>
sets the dimension of the longer image dimension (default 512
)
NOTE: This step also allows to download the training and validation images in case you want to build your own models.
After downloading the test images you can calculate the results using the following command:
python test.py \
-c </path/to/model.yml> \
-i </path/to/image/root_directory> \
-t </path/to/testset.jsonl>
-o </path/to/output.json>
Optional parameters: As standard parameters the batch size is set to 32 and the multiplied values of the leaf node probability and subgraph cosine similarity are used to convert the subgraph vector to a leaf node vector (details are presented in Section 4.2.3 of the paper).
--batch_size <int>
specifies the batch size (default 16
)
--s2l_strategy [leafprob, cossim, leafprob*cossim]
specifies the strategies to retrieve the leaf node vector from a subgraph vector (default leafprob*cossim
)
The Visual Event Classification Dataset (VisE-D) is available on: https://data.uni-hannover.de/de/dataset/vise
You can automatically download the dataset by following the instructions in Download Ontology, Dataset and Models. To download the images from the provided URLs, please run the following command:
python download_images.py -d </path/to/dataset.jsonl> -o </path/to/output/root_directory/>
Optional parameters:
-t <int>
sets the number of parallel threads (default 32
)
-r <int>
sets the number of retries to download an image (default 5
)
--max_img_dim <int>
sets the dimension of the longer image dimension (default 512
)
In Section 3.2 of the paper, we have presented several methods to create an Ontology for newsworthy event types. Statistics are presented in Table 1 of the paper.
Different versions of the Visual Event Ontology (VisE-O) can be downloaded here: link
Furthermore you can explore the Ontologies using the following links:
- Initial Ontology (result of Section 3.2.2): explore
- Disambiguated Ontology (result of Section 3.2.3): explore
- Refined Ontology (result of Section 3.2.4): explore
USAGE: After opening an Ontology, the Leaf Event Nodes (blue), Branch Event Nodes (orange), and Root Node (yellow) as well as their Relations are displayed. By clicking on a specific Event Node additional information such as the Wikidata ID and related child (Incoming) and parent (Outgoing) nodes are shown. In addition, the search bar can be used to directly access a specific Event Node.
In order to evaluate the presented ontology-diven approach on other benchmark datasets, we have manually linked classes of the Web Images for Event Recognition (WIDER), Social Event Dataset (SocEID), and the Rare Event Dataset (RED) to the Wikidata knowledge base according to Section 5.3.3. The resulting Ontologies for these datasets can be downloaded and explored here:
- WIDER Ontology: download | explore
- SocEID Ontology: download | explore
- RED Ontology: download | explore
Detailed information on the sampling strategy to gather event images, statistics for the training and testing datasets presented in Section 3.3, and results using different inference strategies (Section 4.2.3) are available in the vise_supplemental.pdf.
This work is published under the GNU GENERAL PUBLIC LICENSE Version 3, 29 June 2007. For details please check the LICENSE file in the repository.