Skip to content

Code and dataset for "Detecting Human Artifacts from Text-to-Image Models"

Notifications You must be signed in to change notification settings

wangkaihong/HADM

Repository files navigation

Detecting Human Artifacts from Diffusion Models

Dataset and source code for Detecting Human Artifacts from Text-to-Image Models.

summary_tab

Setup

Environment

We setup the environment following EVA-02-det.

conda create --name hadm python=3.8 -y
conda activate hadm

pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 --extra-index-url https://download.pytorch.org/whl/cu116
pip install cryptography
pip install -r requirements.txt
pip install -v -U git+https://github.com/facebookresearch/xformers.git@v0.0.18#egg=xformers
pip install mmcv==1.7.1 openmim
mim install mmcv-full

python -m pip install -e .

Dataset: Human Artifact Dataset (HAD)

We provide the dataset used in the paper. The dataset is available in the following link: HADM Dataset.

The structure of the dataset should look like:

|-- annotations
|   |-- train_ALL
|   |-- val_ALL
|   |-- val_dalle2
|   |-- val_dalle3
|   |-- val_mj
|   `-- val_sdxl
|-- images
|   |-- train_ALL
|   |-- val_ALL
|   |-- val_dalle2
|   |-- val_dalle3
|   |-- val_mj
|   `-- val_sdxl
`-- info.pkl

Note that we provide the validation set for each domain for the convenience of the evaluation, and val_ALL is the combination of all validation sets. The info.pkl file contains the information of the dataset, including the image filename and the corresponding propmt for generating the image.

Finally, set the environment variable for the dataset path:

export DETECTRON2_DATASETS=datasets

Additional Real Human Images for Training

After downloading our Human Artifact Dataset, please place it under the datasets directory. Then, download the training images from the follwoing real datasets: LV-MHP-v1, OCHuman, CrowdHuman, HCD, Facial Descriptors. We also filtered COCO with ViTPose and find the images with human presence, and the filtered COCO images are available here.

After downloading these datasets, please place them under the datasets/human_artifact_dataset/images directory. The structure of the dataset should look like:

datasets/human_artifact_dataset/images/
|-- coco_train2017_human
|-- CrowdHuman
|-- facial_descriptors_dataset_images
|-- HCDDataset_images
|-- LV-MHP-v1-images
|-- OCHuman
|-- train_ALL
|-- val_ALL
|-- val_dalle2
|-- val_dalle3
|-- val_mj
`-- val_sdxl

Also, generate the corresponding empty annotation files under datasets/human_artifact_dataset/annotations for the training images from the real datasets by running the following command:

python datasets/generate_empty_anno.py --data_root datasets/human_artifact_dataset/images/coco_train2017_human
python datasets/generate_empty_anno.py --data_root datasets/human_artifact_dataset/images/CrowdHuman
python datasets/generate_empty_anno.py --data_root datasets/human_artifact_dataset/images/facial_descriptors_dataset_images
python datasets/generate_empty_anno.py --data_root datasets/human_artifact_dataset/images/HCDDataset_images
python datasets/generate_empty_anno.py --data_root datasets/human_artifact_dataset/images/LV-MHP-v1-images
python datasets/generate_empty_anno.py --data_root datasets/human_artifact_dataset/images/OCHuman

HADM Pre-trained Weights

Make sure to download the pretrained weights for EVA-02-L from EVA-02-det and place them under the pretrained_models directory. The pretrained weights can be downloaded from here.

We provide the pretrained weights for the Local Human Artifact Detection Model (HADM-L) and Global Human Artifact Detection Model (HADM-G) models to reproduce the results presented in the paper. The pretrained weights can be downloaded from the following links:

Also make sure to place the pretrained weights under the pretrained_models directory.

Demo

Local Human Artifact Detection Model (HADM-L)

Inference HADM-L on arbitrary input images under demo/images.
python tools/lazyconfig_train_net.py --num-gpus 1 --inference \
    --config-file projects/ViTDet/configs/eva2_o365_to_coco/demo_local.py \
    train.output_dir=./outputs/demo_local \
    train.init_checkpoint=pretrained_models/HADM-L_0249999.pth \
    dataloader.train.total_batch_size=1 \
    train.model_ema.enabled=True \
    train.model_ema.use_ema_weights_for_eval_only=True \
    inference.input_dir=demo/images \
    inference.output_dir=demo/outputs/result_local

Results will be saved under demo/outputs/result_local.

Global Human Artifact Detection Model (HADM-G)

Inference HADM-G on arbitrary input images under demo/images.
python tools/lazyconfig_train_net.py --num-gpus 1 --inference \
    --config-file projects/ViTDet/configs/eva2_o365_to_coco/demo_global.py \
    train.output_dir=./outputs/demo_global \
    train.init_checkpoint=pretrained_models/HADM-G_0249999.pth \
    dataloader.train.total_batch_size=1 \
    train.model_ema.enabled=True \
    train.model_ema.use_ema_weights_for_eval_only=True \
    inference.input_dir=demo/images \
    inference.output_dir=demo/outputs/result_global

Results will be saved under demo/outputs/result_global.

Evaluation

Local Human Artifact Detection Model (HADM-L)

Evaluate HADM-L on all domains (SDXL, DALLE-2, DALLE-3, Midjourney).
python tools/lazyconfig_train_net.py --num-gpus 1 --eval-only \
    --config-file projects/ViTDet/configs/eva2_o365_to_coco/eva02_large_local.py \
    train.output_dir=./outputs/eva02_large_local/250k_on_all_val \
    train.init_checkpoint=pretrained_models/HADM-L_0249999.pth \
    dataloader.evaluator.output_dir=cache/large_local_human_artifact_ALL_val/250k_on_all_val \
    dataloader.evaluator.dataset_name=local_human_artifact_val_ALL \
    dataloader.test.dataset.names=local_human_artifact_val_ALL \
    dataloader.train.total_batch_size=1 \
    train.model_ema.enabled=True \
    train.model_ema.use_ema_weights_for_eval_only=True 

Expected results:

Task: bbox
AP,AP50,AP75,APs,APm,APl
24.907,43.307,25.990,18.322,25.382,32.773
Evaluate HADM-L on a specific domains (SDXL in this example).
python tools/lazyconfig_train_net.py --num-gpus 1 --eval-only \
    --config-file projects/ViTDet/configs/eva2_o365_to_coco/eva02_large_local.py \
    train.output_dir=./outputs/eva02_large_local/250k_on_sdxl_val \
    train.init_checkpoint=pretrained_models/HADM-L_0249999.pth \
    dataloader.evaluator.output_dir=cache/large_local_human_artifact_sdxl_val/250k_on_sdxl_val \
    dataloader.evaluator.dataset_name=local_human_artifact_val_sdxl \
    dataloader.test.dataset.names=local_human_artifact_val_sdxl \
    dataloader.train.total_batch_size=1 \
    train.model_ema.enabled=True \
    train.model_ema.use_ema_weights_for_eval_only=True 

Expected results:

Task: bbox
AP,AP50,AP75,APs,APm,APl
21.141,39.529,21.372,17.813,22.557,26.149

To evaluate on other domains, you may also replace dataloader.evaluator.dataset_name and dataloader.test.dataset.names to local_human_artifact_val_<DOMAIN> (e.g., val_sdxl, val_mj, val_dalle2, val_dalle3).

Global Human Artifact Detection Model (HADM-G)

Evaluate HADM-G on all domains (SDXL, DALLE-2, DALLE-3, Midjourney).
python tools/lazyconfig_train_net.py --num-gpus 1 --eval-only \
    --config-file projects/ViTDet/configs/eva2_o365_to_coco/eva02_large_global.py \
    train.output_dir=./outputs/eva02_large_global/250k_on_all_val \
    train.init_checkpoint=pretrained_models/HADM-G_0249999.pth \
    dataloader.evaluator.output_dir=cache/large_global_human_artifact_ALL_val/250k_on_all_val \
    dataloader.evaluator.dataset_name=global_human_artifact_val_ALL \
    dataloader.test.dataset.names=global_human_artifact_val_ALL \
    dataloader.train.total_batch_size=1 \
    train.model_ema.enabled=True \
    train.model_ema.use_ema_weights_for_eval_only=True 

Expected results:

Task: bbox
AP,AP50,AP75,APs,APm,APl
22.083,25.539,23.993,nan,0.000,22.332
Evaluate HADM-G on a specific domains (SDXL in this example).
python tools/lazyconfig_train_net.py --num-gpus 1 --eval-only \
    --config-file projects/ViTDet/configs/eva2_o365_to_coco/eva02_large_global.py \
    train.output_dir=./outputs/eva02_large_global/250k_on_sdxl_val \
    train.init_checkpoint=pretrained_models/HADM-G_0249999.pth \
    dataloader.evaluator.output_dir=cache/large_global_human_artifact_sdxl_val/250k_on_sdxl_val \
    dataloader.evaluator.dataset_name=global_human_artifact_val_sdxl \
    dataloader.test.dataset.names=global_human_artifact_val_sdxl \
    dataloader.train.total_batch_size=1 \
    train.model_ema.enabled=True \
    train.model_ema.use_ema_weights_for_eval_only=True 

Expected results:

Task: bbox
AP,AP50,AP75,APs,APm,APl
23.674,27.393,25.681,nan,0.000,23.891

Similarly, to evaluate on other domains, you may also replace dataloader.evaluator.dataset_name and dataloader.test.dataset.names to global_human_artifact_val_<DOMAIN> (e.g., val_sdxl, val_mj, val_dalle2, val_dalle3).

Training

To train Local Human Artifact Detection Model (HADM-L):

python tools/lazyconfig_train_net.py \
    --config-file projects/ViTDet/configs/eva2_o365_to_coco/eva02_large_local.py \
    --num-gpus=1 train.eval_period=10000 train.log_period=500  \
    train.output_dir=./outputs/eva02_large_local \
    dataloader.evaluator.output_dir=cache/large_local_human_artifact_ALL_val \
    dataloader.train.total_batch_size=4

To train Global Human Artifact Detection Model (HADM-G):

python tools/lazyconfig_train_net.py \
    --config-file projects/ViTDet/configs/eva2_o365_to_coco/eva02_large_global.py \
    --num-gpus=1 train.eval_period=10000 train.log_period=500  \
    train.output_dir=./outputs/eva02_large_global \
    dataloader.evaluator.output_dir=cache/large_global_human_artifact_ALL_val \
    dataloader.train.total_batch_size=4

Citation

If you find this work useful, please consider citing:

@article{Wang2024HADM,
  title={Detecting Human Artifacts from Text-to-Image Models},
  author={Wang, Kaihong and Zhang, Lingzhi and Zhang, Jianming},
  journal={arXiv preprint arXiv:2411.13842},
  year={2024}
}

Acknowledgment

Our codebase is heavily borrowed from EVA-02-det and Detectron2.

About

Code and dataset for "Detecting Human Artifacts from Text-to-Image Models"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published