Dataset and source code for Detecting Human Artifacts from Text-to-Image Models.
We setup the environment following EVA-02-det
.
conda create --name hadm python=3.8 -y
conda activate hadm
pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 --extra-index-url https://download.pytorch.org/whl/cu116
pip install cryptography
pip install -r requirements.txt
pip install -v -U git+https://github.com/facebookresearch/xformers.git@v0.0.18#egg=xformers
pip install mmcv==1.7.1 openmim
mim install mmcv-full
python -m pip install -e .
We provide the dataset used in the paper. The dataset is available in the following link: HADM Dataset.
The structure of the dataset should look like:
|-- annotations
| |-- train_ALL
| |-- val_ALL
| |-- val_dalle2
| |-- val_dalle3
| |-- val_mj
| `-- val_sdxl
|-- images
| |-- train_ALL
| |-- val_ALL
| |-- val_dalle2
| |-- val_dalle3
| |-- val_mj
| `-- val_sdxl
`-- info.pkl
Note that we provide the validation set for each domain for the convenience of the evaluation, and val_ALL is the combination of all validation sets. The info.pkl file contains the information of the dataset, including the image filename and the corresponding propmt for generating the image.
Finally, set the environment variable for the dataset path:
export DETECTRON2_DATASETS=datasets
After downloading our Human Artifact Dataset, please place it under the datasets
directory. Then, download the training images from the follwoing real datasets: LV-MHP-v1, OCHuman, CrowdHuman, HCD, Facial Descriptors. We also filtered COCO with ViTPose and find the images with human presence, and the filtered COCO images are available here.
After downloading these datasets, please place them under the datasets/human_artifact_dataset/images
directory. The structure of the dataset should look like:
datasets/human_artifact_dataset/images/
|-- coco_train2017_human
|-- CrowdHuman
|-- facial_descriptors_dataset_images
|-- HCDDataset_images
|-- LV-MHP-v1-images
|-- OCHuman
|-- train_ALL
|-- val_ALL
|-- val_dalle2
|-- val_dalle3
|-- val_mj
`-- val_sdxl
Also, generate the corresponding empty annotation files under datasets/human_artifact_dataset/annotations
for the training images from the real datasets by running the following command:
python datasets/generate_empty_anno.py --data_root datasets/human_artifact_dataset/images/coco_train2017_human
python datasets/generate_empty_anno.py --data_root datasets/human_artifact_dataset/images/CrowdHuman
python datasets/generate_empty_anno.py --data_root datasets/human_artifact_dataset/images/facial_descriptors_dataset_images
python datasets/generate_empty_anno.py --data_root datasets/human_artifact_dataset/images/HCDDataset_images
python datasets/generate_empty_anno.py --data_root datasets/human_artifact_dataset/images/LV-MHP-v1-images
python datasets/generate_empty_anno.py --data_root datasets/human_artifact_dataset/images/OCHuman
Make sure to download the pretrained weights for EVA-02-L from EVA-02-det
and place them under the pretrained_models
directory. The pretrained weights can be downloaded from here.
We provide the pretrained weights for the Local Human Artifact Detection Model (HADM-L) and Global Human Artifact Detection Model (HADM-G) models to reproduce the results presented in the paper. The pretrained weights can be downloaded from the following links:
Also make sure to place the pretrained weights under the pretrained_models
directory.
Inference HADM-L on arbitrary input images under demo/images.
python tools/lazyconfig_train_net.py --num-gpus 1 --inference \
--config-file projects/ViTDet/configs/eva2_o365_to_coco/demo_local.py \
train.output_dir=./outputs/demo_local \
train.init_checkpoint=pretrained_models/HADM-L_0249999.pth \
dataloader.train.total_batch_size=1 \
train.model_ema.enabled=True \
train.model_ema.use_ema_weights_for_eval_only=True \
inference.input_dir=demo/images \
inference.output_dir=demo/outputs/result_local
Results will be saved under demo/outputs/result_local
.
Inference HADM-G on arbitrary input images under demo/images.
python tools/lazyconfig_train_net.py --num-gpus 1 --inference \
--config-file projects/ViTDet/configs/eva2_o365_to_coco/demo_global.py \
train.output_dir=./outputs/demo_global \
train.init_checkpoint=pretrained_models/HADM-G_0249999.pth \
dataloader.train.total_batch_size=1 \
train.model_ema.enabled=True \
train.model_ema.use_ema_weights_for_eval_only=True \
inference.input_dir=demo/images \
inference.output_dir=demo/outputs/result_global
Results will be saved under demo/outputs/result_global
.
Evaluate HADM-L on all domains (SDXL, DALLE-2, DALLE-3, Midjourney).
python tools/lazyconfig_train_net.py --num-gpus 1 --eval-only \
--config-file projects/ViTDet/configs/eva2_o365_to_coco/eva02_large_local.py \
train.output_dir=./outputs/eva02_large_local/250k_on_all_val \
train.init_checkpoint=pretrained_models/HADM-L_0249999.pth \
dataloader.evaluator.output_dir=cache/large_local_human_artifact_ALL_val/250k_on_all_val \
dataloader.evaluator.dataset_name=local_human_artifact_val_ALL \
dataloader.test.dataset.names=local_human_artifact_val_ALL \
dataloader.train.total_batch_size=1 \
train.model_ema.enabled=True \
train.model_ema.use_ema_weights_for_eval_only=True
Expected results:
Task: bbox
AP,AP50,AP75,APs,APm,APl
24.907,43.307,25.990,18.322,25.382,32.773
Evaluate HADM-L on a specific domains (SDXL in this example).
python tools/lazyconfig_train_net.py --num-gpus 1 --eval-only \
--config-file projects/ViTDet/configs/eva2_o365_to_coco/eva02_large_local.py \
train.output_dir=./outputs/eva02_large_local/250k_on_sdxl_val \
train.init_checkpoint=pretrained_models/HADM-L_0249999.pth \
dataloader.evaluator.output_dir=cache/large_local_human_artifact_sdxl_val/250k_on_sdxl_val \
dataloader.evaluator.dataset_name=local_human_artifact_val_sdxl \
dataloader.test.dataset.names=local_human_artifact_val_sdxl \
dataloader.train.total_batch_size=1 \
train.model_ema.enabled=True \
train.model_ema.use_ema_weights_for_eval_only=True
Expected results:
Task: bbox
AP,AP50,AP75,APs,APm,APl
21.141,39.529,21.372,17.813,22.557,26.149
To evaluate on other domains, you may also replace dataloader.evaluator.dataset_name
and dataloader.test.dataset.names
to local_human_artifact_val_<DOMAIN>
(e.g., val_sdxl
, val_mj
, val_dalle2
, val_dalle3
).
Evaluate HADM-G on all domains (SDXL, DALLE-2, DALLE-3, Midjourney).
python tools/lazyconfig_train_net.py --num-gpus 1 --eval-only \
--config-file projects/ViTDet/configs/eva2_o365_to_coco/eva02_large_global.py \
train.output_dir=./outputs/eva02_large_global/250k_on_all_val \
train.init_checkpoint=pretrained_models/HADM-G_0249999.pth \
dataloader.evaluator.output_dir=cache/large_global_human_artifact_ALL_val/250k_on_all_val \
dataloader.evaluator.dataset_name=global_human_artifact_val_ALL \
dataloader.test.dataset.names=global_human_artifact_val_ALL \
dataloader.train.total_batch_size=1 \
train.model_ema.enabled=True \
train.model_ema.use_ema_weights_for_eval_only=True
Expected results:
Task: bbox
AP,AP50,AP75,APs,APm,APl
22.083,25.539,23.993,nan,0.000,22.332
Evaluate HADM-G on a specific domains (SDXL in this example).
python tools/lazyconfig_train_net.py --num-gpus 1 --eval-only \
--config-file projects/ViTDet/configs/eva2_o365_to_coco/eva02_large_global.py \
train.output_dir=./outputs/eva02_large_global/250k_on_sdxl_val \
train.init_checkpoint=pretrained_models/HADM-G_0249999.pth \
dataloader.evaluator.output_dir=cache/large_global_human_artifact_sdxl_val/250k_on_sdxl_val \
dataloader.evaluator.dataset_name=global_human_artifact_val_sdxl \
dataloader.test.dataset.names=global_human_artifact_val_sdxl \
dataloader.train.total_batch_size=1 \
train.model_ema.enabled=True \
train.model_ema.use_ema_weights_for_eval_only=True
Expected results:
Task: bbox
AP,AP50,AP75,APs,APm,APl
23.674,27.393,25.681,nan,0.000,23.891
Similarly, to evaluate on other domains, you may also replace dataloader.evaluator.dataset_name
and dataloader.test.dataset.names
to global_human_artifact_val_<DOMAIN>
(e.g., val_sdxl
, val_mj
, val_dalle2
, val_dalle3
).
To train Local Human Artifact Detection Model (HADM-L):
python tools/lazyconfig_train_net.py \
--config-file projects/ViTDet/configs/eva2_o365_to_coco/eva02_large_local.py \
--num-gpus=1 train.eval_period=10000 train.log_period=500 \
train.output_dir=./outputs/eva02_large_local \
dataloader.evaluator.output_dir=cache/large_local_human_artifact_ALL_val \
dataloader.train.total_batch_size=4
To train Global Human Artifact Detection Model (HADM-G):
python tools/lazyconfig_train_net.py \
--config-file projects/ViTDet/configs/eva2_o365_to_coco/eva02_large_global.py \
--num-gpus=1 train.eval_period=10000 train.log_period=500 \
train.output_dir=./outputs/eva02_large_global \
dataloader.evaluator.output_dir=cache/large_global_human_artifact_ALL_val \
dataloader.train.total_batch_size=4
If you find this work useful, please consider citing:
@article{Wang2024HADM,
title={Detecting Human Artifacts from Text-to-Image Models},
author={Wang, Kaihong and Zhang, Lingzhi and Zhang, Jianming},
journal={arXiv preprint arXiv:2411.13842},
year={2024}
}
Our codebase is heavily borrowed from EVA-02-det and Detectron2.