Code for the paper:
- Heljakka, A., Solin, A., Kannala, J. (2020). Towards Photographic Image Manipulation with Balanced Growing of Generative Autoencoders. In: IEEE Winter Conference on Applications of Computer Vision. [arXiv]
Implementation by Ari Heljakka, based on [2] (which used [3]-[4] and the h5tool from [5]). The master branch of this repository may continue to be updated, and is not guaranteed compatible with the pre-trained models of [1].
Tested with:
- Python 3.6.5
- CUDA v9.0.176
- PyTorch v0.5.0a0
The batch sizes have been selected to enable running on 12 GB of GPU memory. Note that due to PyTorch 12671, 1-GPU setup is the most reliable option for training. When training, please ensure enough disk space for checkpoints.
For detailed Python package configuration used, see requirements.txt.
Pre-trained models are available for each dataset.
You can run them on command line with the usage examples below, inserting the proper dataset name (e.g. -d celebaHQ
), path name (e.g. --save_dir CelebAHQ
) and checkpoint ID (e.g. --start_iteration=36200000
).
To reconstruct & interpolate external face images, or produce random samples, you only need the pre-trained models. Please note that the input face images must be cropped and aligned as in CelebA-HQ.
You need the datasets only for training or for reproducing the reconstruction results shown on the paper.
Supported datasets are:
- LSUN bedrooms
- Please bear in mind the very long time on first run of any LSUN usage to just build the index (for the training, can take hours).
- CelebA-HQ
- You need to put the dataset in H5 format for PyTorch, separately for training and testing, as follows:
- Download the CelebA dataset (originals, NOT the aligned & cropped one)
- Download the Delta files from the author https://github.com/tkarras/progressive_growing_of_gans (see "Data files needed to reconstruct the CelebA-HQ dataset")
- Run the dataset-tools/h5tool.py we provide (which contains the train/test split unlike the original CelebA-HQ script). For the syntax, run
python h5tool.py -h
data_split=train
, thendata_split=test
. This will create a separate 27k/3k split for training and testing.
- You need to put the dataset in H5 format for PyTorch, separately for training and testing, as follows:
pip install -r requirements.txt
If you use TensorboardX, please run
pip install tensorboardx
Otherwise, you need to provide the command-line argument --no_TB
.
For all command-line arguments, run
python -m pioneer.train -h
Below, examples for typical use cases are given. For other arguments, the defaults should be fine.
To resume training from a checkpoint, it is sufficient to add --start_iteration==N
where N is the step number of your latest state file (eg. for checkpoint/256000_state
, N=256000) or --start_iteration==-1
to use the latest check point in your save_dir
.
Note that you need to place both the checkpoint (_state
file) and the U matrix snapshot (_SNU
file) in the checkpoint directory (both found in the pretrained model .zip archives). This will exactly replicate the correct U matrix used by the spectral norm; otherwise, the matrix needs to be re-calculated 6, which can be attempted by starting with a dry run.
You can test trained models by giving its directory as save_dir
. The checkpoints are saved under [save_dir]/checkpoint
.
All examples show a sample checkpoint step count in the --start_iteration
argument.
When using the pre-trained models (or replicating the earlier training runs), use --e_last_relu --n_generator=2
. Otherwise, for training, we recommend --n_generator=1
.
CelebA-HQ, 256x256:
python -m pioneer.train -d celebaHQ --start_iteration=25480000 --save_dir celebaHQ256 --test_path /data/celeba_test.h5 --sample_N=256 --reconstructions_N=10 --interpolate_N=3 --max_phase=6 --e_last_relu --testonly
- CelebA-HQ, 256x256:
python -m pioneer.train -d celebaHQ --start_iteration=25480000 --save_dir celebaHQ256 --sample_N=16 --max_phase=6 --e_last_relu --testonly
- LSUN, 256x256:
python -m pioneer.train -d lsun --start_iteration=25400000 --save_dir lsun256 --sample_N=128 --max_phase=6 --e_last_relu --testonly
CelebAHQ 256x256:
python -m pioneer.train -d celebaHQ --start_iteration=25480000 --save_dir celebaHQ256 --reconstructions_N=10 --interpolate_N=6 --max_phase=6 --testonly --aux_inpath [my-image-path] --e_last_relu --aux_outpath celebaHQ/out256
Note that, for legacy reasons, images must be found at my-image-path/sub-directory-name/*.png
where sub-directory-name
can be anything. reconstructions_N
must not exceed the number of input files available.
Dump a subset of a training (or testing) set as separate images, for computing Fréchet Inception Distance and other metrics. You can vary the exact phase and alpha to see the fade-in effects.
- LSUN, 256x256:
python -m pioneer.train -d lsun --dump_trainingset_N=20 --dump_trainingset_dir=refLSUN_20 --start_phase=6 --max_phase=6 --force_alpha=1.0 --train_path /data/lsun
- CelebA-HQ, 256x256:
python -m pioneer.train -d celebaHQ --dump_trainingset_N=20 --dump_trainingset_dir=refCAHQ_20 --start_phase=6 --max_phase=6 --force_alpha=1.0 --train_path /data/celeba_train.h5
For an example of how to create the attribute vectors, and how to use the pre-provided vectors, please see the Jupyter Notebook.
Training will need to use the pre-configured scheduler. For new datasets, you can configure the schedule of phases, margin values and learning rates in makeTS(...)
method of train.py
. See the existing code for how to set them up.
The resolutions of each phase are defined in powers of 2, as follows: 0 = 4x4, ..., 3 = 32x32, 4 = 64x64, 5 = 128x128, 6 = 256x256
Note that all loss function hyper-parameters in the code base are scaled by a factor of 0.1 in comparison to the paper.
Training of CelebAHQ is very stable, but with other datasets, unstable outcomes are possible (requiring restart). In all cases, you should use the last checkpoint with stable-looking samples (and best FID).
- CelebA-HQ, 256x256:
python -m pioneer.train -d celebaHQ --save_dir celebaHQ_quicktest --train_path /data/celebaHQ_train.h5 --test_path /data/celebaHQ_test.h5 --sample_N=16 --reconstructions_N=8 --interpolate_N=0 --max_phase=6 --n_generator=1 --total_kimg=30000
- LSUN Bedrooms, 256x256:
python -m pioneer.train -d lsun --save_dir lsun_quicktest --train_path /data/celebaHQ_train.h5 --test_path /data/celebaHQ_test.h5 --sample_N=16 --reconstructions_N=0 --interpolate_N=0 --max_phase=6 --n_generator=1--total_kimg=30000
In order to resume training from the existing (latest) checkpoint, remember to use --start_iteration=-1
.
For all correspondence, please contact ari.heljakka@aalto.fi.
Support and email replies are not always guaranteed, but we will appreciate and evaluate all feedback.
-
Heljakka, A., Solin, A., Kannala, J. (2020). Towards Photographic Image Manipulation with Balanced Growing of Generative Autoencoders. In: IEEE Winter Conference on Applications of Computer Vision. arXiv
-
Miyato, T., Kataoka, T., Koyama, M., and Yoshida, Y (2018). Spectral normalization for generative adversarial networks. In Proceedings of the International Conference on Learning Representations (ICLR).
This software is distributed under the MIT License; please refer to the file LICENSE, included with the software, for details.
Please cite our work as follows:
@inproceedings{Heljakka+Solin+Kannala:2020,
title = {Towards Photographic Image Manipulation with Balanced Growing of Generative Autoencoders},
author = {Heljakka, Ari and Solin, Arno and Kannala, Juho},
booktitle = {IEEE Winter Conference on Applications of Computer Vision (WACV)},
year = {2020}
}