[ECCV2024 oral] C2C: Component-to-Composition Learning for Zero-Shot Compositional Action Recognition

Project Page | Paper

C2C: Component-to-Composition Learning for Zero-Shot Compositional Action Recognition

Rongchang Li, Zhenhua Feng, Tianyang Xu, Linze Li, Xiaojun Wu†, Muhammad Awais, Sara Atito, Josef Kittler
ECCV, 2024

Seen: Open a door

Seen: Close a book

Unseen: Close a door

Zero-Shot Compositional Action Recognition (ZS-CAR)

🛠️ Prepare Something-composition (Sth-com)

Some samples in Something-composition

Download Something-Something V2 (Sth-v2). Our proposed Something-composition (Sth-com) is based on Sth-V2. We refer to the official website to download the videos to the path video_path.
Extract frames. To accelerate the dataloader when training, we extract the frames for each video and save them in the frame_path. The command is:
```
python tools/extract_frames.py --video_root video_path --frame_root frame_path
```

Download Dataset annotations. We provide our Sth-com annotation files in the data_split dir. The format is like:

  [
      {
      "id": "54463", # means the sample name
      "action": "opening a book", # means composition
      "verb": "Opening [something]", # means the verb component
      "object": "book" # means the object component
      },
      {
        ...
      },
      {
        ...
      },
  ]

Please kindly download these files to annotation_path.

Finally, the dataset is built successfully. The structure looks like this:
- annotation_path
  - data_split
    - generalized
      
      train_pairs.json
      
      val_pairs.json
      
      test_pairs.json
- frame_path
  - 0
    - 000001.jpg
    - 000002.jpg
    - ......
  - 1
    - 000001.jpg
    - 000002.jpg
    - ......
  - ......

🚀 Train and test

🔔 Now take the dir codes as the project root.

Before running

Prepare the word embedding models. We recommend following Compcos to download the word embedding models.
You should modify the paths :

(For example, running C2C_vanilla, TSM-18 as the backbone.)
1. dataset_path in ./config/c2c_vanilla_tsm.yml
2. save_path in ./config/c2c_vanilla_tsm.yml
3. The code line: t=fasttext.load_model('YOUR_PATH/cc.en.300.bin') in models/vm_models/word_embedding.py

Train

Train a model with the command:

CUDA_VISIBLE_DEVICES=YOUR_GPU_INDEXEX python train.py --config config/c2c_vm/c2c_vanilla_tsm.yml

Test

For the test, imagine you have trained your model and set the log dir as YOUR_LOG_PATH.

Then, you can test it using:

CUDA_VISIBLE_DEVICES=YOUR_GPU_INDEXEX python test_for_models.py --logpath YOUR_LOG_PATH

📝 TODO List

Add training codes for VM+word embedding paradigm.
Add training codes from VLM paradigm.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
codes		codes
data_split/generalized		data_split/generalized
samples		samples
tools		tools
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[ECCV2024 oral] C2C: Component-to-Composition Learning for Zero-Shot Compositional Action Recognition

Project Page | Paper

🛠️ Prepare Something-composition (Sth-com)

🚀 Train and test

Before running

Train

Test

📝 TODO List

About

Releases

Packages

Languages

RongchangLi/ZSCAR_C2C

Folders and files

Latest commit

History

Repository files navigation

[ECCV2024 oral] C2C: Component-to-Composition Learning for Zero-Shot Compositional Action Recognition

Project Page | Paper

🛠️ Prepare Something-composition (Sth-com)

🚀 Train and test

Before running

Train

Test

📝 TODO List

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages