LocLLM: Exploiting Generalizable Human Keypoint Localization via Large Language Model

[arXiv][Paper]

LocLLM: Exploiting Generalizable Human Keypoint Localization via Large Language Model
Dongkai Wang, Shiyu Xuan, Shiliang Zhang
CVPR 2024 Highlight

Installation

1. Clone code

    git clone https://github.com/kennethwdk/LocLLM
    cd ./LocLLM

2. Create a conda environment for this repo

    conda create -n LocLLM python=3.10
    conda activate LocLLM

3. Install CUDA 11.7 (other version may not work)

    conda install -c conda-forge cudatoolkit-dev

4. Install PyTorch following official instruction (should match cuda version)

    conda install pytorch==2.0.1 torchvision==0.15.2 pytorch-cuda=11.7 -c pytorch -c nvidia

4. Install other dependency python packages (do not change package version)

    pip install pycocotools
    pip install opencv-python
    pip install accelerate==0.21.0
    pip install sentencepiece==0.1.99
    pip install transformers==4.31.0

5. Prepare dataset

Download COCO , MPII and Human-Art from website and put the zip file under the directory following below structure, (xxx.json) denotes their original name.

./data
|── coco
│   └── annotations
|   |   └──coco_train.json(person_keypoints_train2017.json)
|   |   └──coco_val.json(person_keypoints_val2017.json)
|   └── images
|   |   └──train2017
|   |   |   └──000000000009.jpg
|   |   └──val2017
|   |   |   └──000000000139.jpg
├── HumanArt
│   └── annotations
|   |   └──validation_humanart.json
|   └── images
|   |   └──2D_virtual_human
├── mpii
│   └── annot
|   |   └──valid.json
|   |   └──gt_valid.mat
|   └── images
|   |   └──000001163.jpg

Usage

1. Download trained model

    git lfs install

    git clone https://huggingface.co/d0ntcare/LocLLM

    mkdir checkpoints
    mkdir checkpoints/ckpts
    mv LocLLM/coco checkpoints/ckpts
    mv LocLLM/h36m checkpoints/ckpts
    # for training
    mkdir checkpoints/model_weights
    mv LocLLM/pretrained/dinov2_vitl14_pretrain.pth checkpoints/model_weights
    # clone vicuna1.5
    cd checkpoints/model_weights
    git clone https://huggingface.co/lmsys/vicuna-7b-v1.5

2. Evaluate Model

Change IDX option in script to specify the gpu ids for evaluation, multiple ids denotes multiple gpu evaluation.

    # evaluate on coco val set
    bash scripts/valid_coco.sh
    # evaluate on h36m set
    bash scripts/valid_h36m.sh
    # evaluate on humanart set
    bash scripts/valid_humanart.sh
    # evaluate on mpii set
    bash scripts/valid_mpii.sh

3. Train Model

    # train on coco
    bash scripts/train_coco.sh
    # train on h36m and mpii
    bash scripts/train_h36m.sh

Note that GPU memory should not be less than 24GB, training on 4 RTX 3090 GPUs takes about 3 days.

Citations

If you find this code useful for your research, please cite our paper:

@inproceedings{wang2024locllm,
    author    = {Wang, Dongkai and Xuan, Shiyu and Zhang, Shiliang},
    title     = {LocLLM: Exploiting Generalizable Human Keypoint Localization via Large Language Model},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    year      = {2024}
}

Contact me

If you have any questions about this code or paper, feel free to contact me at dongkai.wang@pku.edu.cn.

Acknowledgement

The code is mainly encouraged by Pink and LLaVA. You can also refer to our group work Pink on referential comprehension for general multi-modal LLMs.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
datasets		datasets
img		img
models		models
scripts		scripts
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LocLLM: Exploiting Generalizable Human Keypoint Localization via Large Language Model

Installation

1. Clone code

2. Create a conda environment for this repo

3. Install CUDA 11.7 (other version may not work)

4. Install PyTorch following official instruction (should match cuda version)

4. Install other dependency python packages (do not change package version)

5. Prepare dataset

Usage

1. Download trained model

2. Evaluate Model

3. Train Model

Citations

Contact me

Acknowledgement

About

Releases

Packages

Languages

License

kennethwdk/LocLLM

Folders and files

Latest commit

History

Repository files navigation

LocLLM: Exploiting Generalizable Human Keypoint Localization via Large Language Model

Installation

1. Clone code

2. Create a conda environment for this repo

3. Install CUDA 11.7 (other version may not work)

4. Install PyTorch following official instruction (should match cuda version)

4. Install other dependency python packages (do not change package version)

5. Prepare dataset

Usage

1. Download trained model

2. Evaluate Model

3. Train Model

Citations

Contact me

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages