This repository contains the implementation of the following paper:
Human-Art: A Versatile Human-Centric Dataset Bridging Natural and Artificial Scenes [Project Page] [Paper] [Code] [Data] [Video]
Xuan Ju∗12, Ailing Zeng∗1, Jianan Wang1, Qiang Xu2, Lei Zhang1
∗ Equal contribution 1International Digital Economy Academy 2The Chinese University of Hong Kong
Table of Contents
This paper proposes a large-scale dataset, Human-Art, that targets multi-scenario human-centric tasks to bridge the gap between natural and artificial scenes. It includes twenty high-quality human scenes, including natural and artificial humans in both 2D representation (yellow dashed boxes) and 3D representation (blue solid boxes).
Contents of Human-Art:
50,000
images including human figures in20 scenarios
(5 natural scenarios, 3 2D artificial scenarios, and 12 2D artificial scenarios)- Human-centric annotations include
human bounding box
,21 2D human keypoints
,human self-contact keypoints
, anddescription text
- baseline human detector and human pose estimator trained on the joint of MSCOCO and Human-Art
Tasks that Human-Art targets for:
- multi-scenario
human detection
,2D human pose estimation
, and3D human mesh recovery
- Notably, after training with ED-Pose, results on MSCOCO raise 0.8, indicating multi-scenario images may benefit feature extraction and human understanding of real scenes.
- multi-scenario
human image generation
(especiallycontrollable
human image generation, e.g. with conditions such as pose and text) out-of-domain
human detection and human pose estimation
Under the CC-license, Human-Art is available for download. Fill out this form to request authorization to use Human-Art for non-commercial purposes. After you submit the form, an email containing the dataset will be instantly delivered to you. Please do not share or transfer the data privately.
For convenience of usage, Human-Art is processed using the same format as MSCOCO. Please save the dataset with the following file structure after downloading (we also include the file structure of COCO because we use it for joint training of COCO and Human-Art):
|-- data
|-- HumanArt
|-- annotations
|-- training_coco.json
|-- training_humanart.json
|-- training_humanart_coco.json
|-- training_humanart_cartoon.json
|-- ...
|-- validation_coco.json
|-- validation_humanart.json
|-- validation_humanart_coco.json
|-- validation_humanart_cartoon.json
|-- ...
|-- images
|-- 2D_virtual_human
|-- ...
|-- 3D_virtual_human
|-- ...
|-- real_human
|-- ...
|-- coco
|-- annotations
|-- train2017
|-- val2017
Noted that we have several different json settings:
-
the ones end with _coco (e.g. training_coco.json) is reprocessed coco annotation json files (e.g. person_keypoints_train2017.json), which can be used in same format as Human-Art
-
the ones end with _humanart (e.g. training_humanart.json) is the annotation json files of Human-Art
-
the ones end with _humanart_coco (e.g. training_humanart_coco.json) is the annotation json files of the assemble of COCO and Human-Art
-
the ones end with _humanart_[scenario] (e.g. training_humanart_cartoon.json) is the annotation json files of one specific scenario of Human-Art
-
HumanArt_validation_detections_AP_H_56_person.json is the detection results with an AP of 56 for the evaluation of top-down pose estimation models (similar with COCO_val2017_detections_AP_H_56_person.json in MSCOCO)
The annotation json files of Human-Art is described as follows:
{
"info":{xxx}, # some basic information of Human-Art
"images":[
{
"file_name": "xxx" # the path of the image (same definition with COCO)
"height": xxx, # the image height (same definition with COCO)
"width": xxx, # the image width (same definition with COCO)
"id": xxx, # the image id (same definition with COCO)
"page_url": "xxx", # the web link of the page containing the image
"image_url": "xxx", # the web link of the image
"picture_name": "xxx", # the name of the image
"author": "xxx", # the author of the image
"description": "xxx", # the text description of the image
"category": "xxx" # the scenario of the image (e.g. cartoon)
},
...
],
"annotations":[
{
"keypoints":[xxx], # 17 COCO keypoints' position (same definition with COCO)
"keypoints_21":[xxx], # 21 Human-Art keypoints' position
"self_contact": [xxx], # self contact keypoints, x1,y1,x2,y2...
"num_keypoints": xxx, # annotated keypoints (not invisible) in 17 COCO format keypoints (same definition with COCO)
"num_keypoints_21": xxx, # annotated keypoints (not invisible) in 21 Human-Art format keypoints
"iscrowd": xxx, # annotated or not (same definition with COCO)
"image_id": xxx, # the image id (same definition with COCO)
"area": xxx, # the human area (same definition with COCO)
"bbox": [xxx], # the human bounding box (same definition with COCO)
"category_id": 1, # category id=1 means it is a person category (same definition with COCO)
"id": xxx, # annotation id (same definition with COCO)
"annotator": xxx # annotator id
}
],
"categories":[] # category infromation (same definition with COCO)
}
Human pose estimators trained on Human-Art is now supported in MMPose in this pr. The detailed usage and Model Zoo can be found in MMPose's documents: (1) ViTPose, (2) HRNet, and (3) RTMPose.
To train and evaluate human pose estimators, please refer to MMPose. Due to the frequent update of MMPose, we do not maintain a codebase in this repo. Since Human-Art is compatible with MSCOCO, you can train and evaluate any model in MMPose using its dataloader.
The supported model include (xx-coco means trained on MSCOCO only and xx-humanart-coco means trained on Human-Art and MSCOCO):
Results of ViTPose on Human-Art validation dataset with ground-truth bounding-box
With classic decoder
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
ViTPose-S-coco | 256x192 | 0.507 | 0.758 | 0.531 | 0.551 | 0.780 | ckpt | log |
ViTPose-S-humanart-coco | 256x192 | 0.738 | 0.905 | 0.802 | 0.768 | 0.911 | ckpt | log |
ViTPose-B-coco | 256x192 | 0.555 | 0.782 | 0.590 | 0.599 | 0.809 | ckpt | log |
ViTPose-B-humanart-coco | 256x192 | 0.759 | 0.905 | 0.823 | 0.790 | 0.917 | ckpt | log |
ViTPose-L-coco | 256x192 | 0.637 | 0.838 | 0.689 | 0.677 | 0.859 | ckpt | log |
ViTPose-L-humanart-coco | 256x192 | 0.789 | 0.916 | 0.845 | 0.819 | 0.929 | ckpt | log |
ViTPose-H-coco | 256x192 | 0.665 | 0.860 | 0.715 | 0.701 | 0.871 | ckpt | log |
ViTPose-H-humanart-coco | 256x192 | 0.800 | 0.926 | 0.855 | 0.828 | 0.933 | ckpt | log |
Results of HRNet on Human-Art validation dataset with ground-truth bounding-box
With classic decoder
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hrnet_w32-coco | 256x192 | 0.533 | 0.771 | 0.562 | 0.574 | 0.792 | ckpt | log |
pose_hrnet_w32-humanart-coco | 256x192 | 0.754 | 0.906 | 0.812 | 0.783 | 0.916 | ckpt | log |
pose_hrnet_w48-coco | 256x192 | 0.557 | 0.782 | 0.593 | 0.595 | 0.804 | ckpt | log |
pose_hrnet_w48-humanart-coco | 256x192 | 0.769 | 0.906 | 0.825 | 0.796 | 0.919 | ckpt | log |
Results of RTM-Pose on Human-Art validation dataset with ground-truth bounding-box
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
rtmpose-t-coco | 256x192 | 0.444 | 0.725 | 0.453 | 0.488 | 0.750 | ckpt | log |
rtmpose-t-humanart-coco | 256x192 | 0.655 | 0.872 | 0.720 | 0.693 | 0.890 | ckpt | log |
rtmpose-s-coco | 256x192 | 0.480 | 0.739 | 0.498 | 0.521 | 0.763 | ckpt | log |
rtmpose-s-humanart-coco | 256x192 | 0.698 | 0.893 | 0.768 | 0.732 | 0.903 | ckpt | log |
rtmpose-m-coco | 256x192 | 0.532 | 0.765 | 0.563 | 0.571 | 0.789 | ckpt | log |
rtmpose-m-humanart-coco | 256x192 | 0.728 | 0.895 | 0.791 | 0.759 | 0.906 | ckpt | log |
rtmpose-l-coco | 256x192 | 0.564 | 0.789 | 0.602 | 0.599 | 0.808 | ckpt | log |
rtmpose-l-humanart-coco | 256x192 | 0.753 | 0.905 | 0.812 | 0.783 | 0.915 | ckpt | log |
Human detectors trained on Human-Art is now supported in MMPose in this pr. The detailed usage and Model Zoo can be found here.
To train and evaluate human detectors, please refer to MMDetection, which is an open source object detection toolbox based on PyTorch that support diverse detection frameworks with higher efficiency and higher accuracy. Due to the frequent update of MMDetection, we do not maintain a codebase in this repo. Since Human-Art is compatible with MSCOCO, you can train and evaluate any model in MMDetection using its dataloader.
The supported model include:
Detection Config | Model AP |
Download |
---|---|---|
RTMDet-tiny | 46.6 | Det Model |
RTMDet-s | 50.6 | Det Model |
YOLOX-nano | 38.9 | Det Model |
YOLOX-tiny | 47.7 | Det Model |
YOLOX-s | 54.6 | Det Model |
YOLOX-m | 59.1 | Det Model |
YOLOX-l | 60.2 | Det Model |
YOLOX-x | 61.3 | Det Model |
If you find this repository useful for your work, please consider citing it as follows:
@inproceedings{ju2023human,
title={Human-Art: A Versatile Human-Centric Dataset Bridging Natural and Artificial Scenes},
author={Ju, Xuan and Zeng, Ailing and Wang, Jianan and Xu, Qiang and Zhang, Lei},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2023},
}