A simple, effective and unified baseline for human fashion segmentation and recognition (ECCV 2022)
Shilin Xu*, Xiangtai Li*, Jingbo Wang, Guangliang Cheng, Yunhai Tong, Dacheng Tao.
We present a simple, effective, unified baseline for fashion segmentation and attribute recognition. The figure below shows that the entire architecture is the Encoder-Decoder framework, like DETR.
This codebase also contains the implementation of MaskAttribute-RCNN.
Fashionformer achieve new state-of-the-art results on three fashion segmentation datasets.
We adopt the Open-MMLab codebase and use the specific version of mmdetection and mmcv. To run this code, make sure you have mmcv and mmdet in your environment.
- Python=3.8.13, CUDA=11.1
- PyTorch=1.9.0, torchvision=0.10.0
- mmcv==1.3.18 (full version, need CUDA extension)
- mmdet==2.18.0
Detection: apparel object instance segmentation with localized attributes prediction:
Global attributes prediction:
- attributes_train2020
- attributes_val2020
- test_images_info2020: same as detection task (Not used)
path/to/Fashionpedia/
├── annotations/ # annotation json files
│ ├── attributes_train2020.json
│ ├── attributes_val2020.json
│ ├── instances-attributes_train2020.json
│ ├── instances-attributes_val2020.json
└── train/
└── test/
│ ├── train2017/ # train images
│ ├── val2017/ # val images
│ └── test2017/ # test images
Please see the details of this link.
Please use the default setting by mmdetection.
# for single machine
./tools/dist_train.sh $config $num_gpu
# for multi machine with slurm
./tools/slurm_train.sh $partition $job_name $config $work_dir
# for single machine
./tools/dist_test.sh $config $checkpoint $num_gpu --eval segm
# for multi machine with slurm
./tools/slurm_test.sh $partition $job_name $config $checkpoint --eval segm
python demo/image_demo.py $img $config $checkpoint
We give the config to reproduce the Fashionformer and Mask-Attributes Mask-RCNN.
Fashionformer CheckPoints one drive and baidu yun Access Code: uvlc;
We build our codebase based on K-Net and mmdetection. Much thanks for their open-sourced code. In particular, we modify the K-Net the kernel prediction head with extra attribute query prediction, which makes a two-stream query(kernel) prediction framework.
If you find this repo is useful for your research, Please consider citing our paper:
@article{xu2022fashionformer,
title={Fashionformer: A simple, Effective and Unified Baseline for Human Fashion Segmentation and Recognition},
author={Xu, Shilin and Li, Xiangtai and Wang, Jingbo and Cheng, Guangliang and Tong, Yunhai and Tao, Dacheng},
journal={ECCV},
year={2022}
}