Skip to content

sail-sg/DiffMemorize

Repository files navigation

On Memorization in Diffusion Models

Code for the paper On Memorization in Diffusion Models.

Environments

  • We run all our experiments on A100 GPUs

  • Python 3.8 and PyTorch 1.13 and CUDA 11.8.

  • Run the following commands to install python libraries:

    • pip install -r requirements.txt

Datasets

We run our experiments on the CIFAR-10 and ImageNet datasets.

CIFAR-10 can be downloaded and saved to datasets/cifar10 by the following commands:

mkdir datasets
mkdir datasets/cifar10
wget -P datasets/cifar10 https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz

Prepare the full training dataset of CIFAR-10 with $|\mathcal{D}|=50\text{k}$:

python dataset_tool.py --source=datasets/cifar10/cifar-10-python.tar.gz --dest=datasets/cifar10/cifar10-train.zip

To download ImageNet, please refer to ImageNet Object Localization Challenge and save it to datasets/imagenet.

Optimal Diffusion Model

Firstly, we compare the generated images by the theoretical optimum and state-of-the-art diffusion model (EDM). The experiments are run on a single A100 GPU.

We include the implementations of the theoretical optimum in training/optim.py. We use following command to generate images by this theoretical optimum:

torchrun --standalone --nproc_per_node=1 generate_optim.py --outdir=fid-tmp-optim --seeds=0-49999 --subdirs --network=datasets/cifar10/cifar10-train.zip

We use following command to generate images by EDM:

torchrun --standalone --nproc_per_node=1 generate.py --outdir=fid-tmp-edm --seeds=0-49999 --subdirs --network=https://nvlabs-fi-cdn.nvidia.com/edm/pretrained/edm-cifar10-32x32-uncond-vp.pkl

Empirical Study

The basic procedure to evaluate the contribution of a factor on memorization in diffusion models is as follows:

Step I: Sample a training dataset with different sizes $|\mathcal{D}|$. The codes are in dataset_utils, which will be introduced later. The sampled dataset will be saved to $data_path.

Step II: Train a diffusion model on the training data.

All of our experiments related to model training are run on 8 A100 GPUs through DDP with multi-node training. The basic command is

torchrun --nproc_per_node 1 \
         --nnodes $WORLD_SIZE \
         --node_rank $RANK \
         --master_addr $MASTER_ADDR \
         --master_port $MASTER_PORT \
         train.py --outdir=$savedir --argument=$argument

Alternatively, you can use the following command to support DDP with single-node training

torchrun --standalone --nproc_per_node=8 train.py --outdir=$savedir --argument=$argument

We suggest to provide a unique $savedir for each experiment. $argument includes all hyper-parameters.

Step III: Evaluate the snapshots of this trained diffusion model and report the highest memorization ratio.

torchrun --standalone --nproc_per_node=$num_gpu mem_ratio.py --expdir=$outdir --knn-ref=$data_path --log=$outdir/mem_traj.log --seeds=0-9999 --subdirs --batch=512

$outdir refers to the folder including all model snapshots.

Step IV: Gradually increase the training dataset size $|\mathcal{D}|$, and then repeat Step I to Step III and find the Effective Model Memorization (EMM).

Step V: Modify the value of the evaluated factor, and then repeat Step I to Step IV to observe the effect of this factor to memorization.

We provide all the scripts to reproduce our experimental results in the paper.

  • Data distribution $P$: refer to scripts/data_distribution.md. Here we highlight that data dimension has significant contributions to memorization in diffusion models.

  • Model configuration $\mathcal{M}$: refer to scripts/model_config.md. Here we highlight that skip connections on higher resolutions play important roles on memorization.

  • Training procedure $\mathcal{T}$: refer to scripts/train_procedure.md.

  • Unconditional v.s. conditional generation: refer to scripts/conditional.md. Here we highlight that random labels as conditions can trigger the memorization of diffusion models.

Finally, we highlight that conditional EDM with unique labels as conditions can largely memorize training data with $|\mathcal{D}|=50\text{k}$ compared to unconditional EDM.

References

If you find the code useful for your research, please consider citing our paper.

@article{gu2023memorization,
      title={On Memorization in Diffusion Models}, 
      author={Xiangming Gu and Chao Du and Tianyu Pang and Chongxuan Li and Min Lin and Ye Wang},
      journal={arXiv preprint arXiv:2310.02664},
      year={2023}
}

Acknowledgements

Our codes are modified based on the official implementation of EDM.

Releases

No releases published

Packages

No packages published

Languages