GAN Inverter is a GAN inversion toolbox based on PyTorch library.
We collect SOTA inversion methods and construct a uniform pipeline with more features. Different methods and training strategies are convenient to compose and add. We hope that this toolbox could help people in every use.
- Implementations of sota inversion methods.
- Unified training/evaluation/inference/editing process.
- Modular and flexible configuration. Easy to set options by config file (yaml) or command in every use.
- Additional training features.
- Distributed training.
- Weight & bias (wandb).
- Automatically resume training.
- Evaluation system and inversion benchmark.
- More editing methods.
We are working for supporting more methods' inference and conducting the benchmark.
2023.05
: Benchmark v1.0 is released. We now support: pSp, e4e, LSAP, ReStyle, HyperStyle, PTI, SAM, HFGI, DHR (some of them only for inference). Have fun!
2023.02
: SAM is supported.
2023.02
: V1.1. Re-organized codes: methods' class, inference pipeline. Add our new work DHR "What Decreases Editing Capability? Domain-Specific Hybrid Refinement for Improved GAN Inversion".
2022.11
: Add more optimizers and PTI is supported now.
2022.10
: GAN Inverter v1.0 is released. Support methods: pSp, e4e, LSAP.
2022.09
: LSAP is published on arxiv.
Although previous works use "encoder-based", "optimization-based" and "hybrid method" to categorize inversion methods, this set of division criteria is no longer appropriate at present. According to the purpose of methods, we divide the inversion process into two steps: Image Embedding and Result Refinement:
- Image Embedding aims to embed images into latent code by encoder or optimization.
- Result Refinement aims to refine the initial inversion and editing results from the first step by various strategies (e.g., adjusting generator weight or intermediate feature).
Method | Type | Repo | Paper | Source | |
---|---|---|---|---|---|
☑️ | pSp | E | code | paper | CVPR2021 |
☑️ | e4e | E | code | paper | SIGGRAPH2021 |
☑️ | LSAP | E | code | paper | Arxiv2022 |
☑️ | ReStyle | E | code | paper | ICCV2021 |
◻️ | E2Style | E | code | paper | TIP2022 |
◻️ | Style Transformer | E | code | paper | CVPR2022 |
☑️ | StyleGAN2(LPIPS) | O | code | paper | CVPR2020 |
Note: E
/O
/H
means encoder-based and optimization-based methods.
Method | Repo | Paper | Source | |
---|---|---|---|---|
☑️ | HyperStyle | code | paper | CVPR2022 |
☑️ | HFGI | code | paper | CVPR2022 |
☑️ | SAM | code | paper | CVPR2022 |
☑️ | PTI | code | paper | TOG2022 |
◻️ | FeatureStyleEncoder | code | paper | ECCV2022 |
☑️ | Domain-Specific Hybrid Refinement (DHR) | code | paper | Arxiv2023 |
Method | Repo | Paper | Source | |
---|---|---|---|---|
☑️ | InterFaceGAN | code | paper | CVPR2020 |
☑️ | GANSpace | code | paper | NeurIPS2020 |
◻️ | StyleClip | code | paper | ICCV2021 |
As evaluation settings are different in previous inversion works, we conduct a benchmark to better evaluate inversion methods based on our unified pipeline. See Evaluation for more details.
Evaluation Settings:
- Dataset: CelebA-HQ test split (2,824) images;
- ID: identity similarity measured by face recognition model;
- LPIPS version: VGG;
- Images are generated and converted to uint8 to evaluate, except for ID and FID, which are evaluated on saved images (png format).
- See
scripts/test.py
for more details.
Refinement | Embedding | PSNR |
MSE |
LPIPS |
ID |
FID |
Latency |
---|---|---|---|---|---|---|---|
- | Optimization-W | 15.8460 | 0.0670 | 0.1967 | 0.32 | 25.7447 | - |
- | Optimization-W+ | 20.5940 | 0.0242 | 0.1092 | 0.77 | 17.0812 | - |
- | pSp | 18.0348 | 0.0345 | 0.1591 | 0.56 | 25.2540 | 41ms |
- | e4e | 16.6616 | 0.0472 | 0.1974 | 0.50 | 28.4952 | 42ms |
- | LSAP | 17.4958 | 0.0391 | 0.1765 | 0.53 | 29.3118 | 42ms |
- | ReStyle-e4e | 17.0903 | 0.0428 | 0.1904 | 0.51 | 25.5141 | 150ms |
HyperStyle | W-encoder | 20.0864 | 0.0219 | 0.0985 | 0.74 | 21.6660 | 162ms |
PTI |
W-pivot | 24.6131 | 0.0082 | 0.0817 | 0.85 | 14.2792 | 59s |
HFGI | e4e | 20.1402 | 0.0210 | 0.1166 | 0.68 | 16.0659 | 67ms |
SAM | e4e | 20.5933 | 0.0193 | 0.1442 | 0.57 | 17.3631 | 67s |
SAM | LSAP | 21.6179 | 0.0152 | 0.1205 | 0.60 | 15.2710 | 67s |
DHR | e4e | 28.1661 | 0.0035 | 0.0438 | 0.87 | 5.9960 | 12s |
DHR | LSAP | 28.2786 | 0.0034 | 0.0422 | 0.88 | 6.0594 | 12s |
Note:
-
$\dagger$ : we don't apply regularization in PTI, following issue and official config - We recommend using this benchmark and evaluation settings to evaluate inversion methods in future work.
- Latency is tested on a single RTX3090 of batchsize 1 (TODO).
- The results may be inconsistent with the reported results in our paper because of different implementations.
- There are very small numerical differences in some values during multiple measurements.
We conduct a unified config system in train/inference/test/edit. All options are saved in the config file, which can be conveniently determined for any use.
We define all options in options
. And options/base_options.py
contains communal options in every phase.
We follow two-stage inference in this repository. The base inference class TwoStageInference
is defined in ./inference/two_stage_inference.py
. It follows image embedding -> result refinement pipeline.
This uniform inversion process can easily combine two methods. Users can try any combination of methods, not limited to those employed by the original authors. For example, GANInverter makes it possible to connect ReStyle with HyperStyle by --embed_mode restyle --refine_mode hyperstyle
or PTI + e4e by --embed_mode e4e --refine_mode pti
.
You can run any method combination by setting their config files now. See Inference for more details.
For example:
- e4e:
--configs configs/e4e/e4e_ffhq_r50.yaml
- PTI + e4e:
--configs configs/e4e/e4e_ffhq_r50.yaml configs/pti/pti.yaml
- DHR + saved latent codes:
--embed_mode code --refine_mode dhr --code_path /path/to/code/xxxx.pt
You can load a checkpoint by two ways:
--checkpoint_path xxxx.pt
: manually set checkpoint path to load. Although model architecture is slightly different from previous repository (e.g., pSp, e4e), the weight will be automatically converted to fit our architecture. You can use their original weight file.--auto_resume True
: automatically load{exp_dir}/checkpoints/last.pt
in training phase or{exp_dir}/checkpoints/best_model.pt
in the other phase.
--train_dataset_path
,--batch_size
,--workers
are used only in training.--test_dataset_path
,--test_batch_size
,--test_workers
are default test/inference options in every use.
Please refer to Installation Instructions for the details of installation.
Please refer to Dataset Instructions for the details of datasets.
Using download_weight.sh
to easily download pre-trained weights of pSp, e4e, ReStyle, PTI, HFGI, SAM, DHR.
# To download all models (including all methods and auxiliary models).
sh download_weight.sh all
# To download a specific model
sh download_weight.sh e4e
sh download_weight.sh lsap
# To download auxiliary training weights (e.g., face recognition model for id_loss)
sh download_weight.sh train
# To download auxiliary evaulation weights (e.g., face recognition model for id_loss)
sh download_weight.sh evaluation
Example: Train LSAP on FFHQ
python scripts/train.py -c configs/lsap/lsap_ffhq_r50.yaml
python -m torch.distributed.launch --nproc_per_node=8 --master_port=12345 -c configs/lsap/lsap_ffhq_r50.yaml --gpu_num 8
Notes:
- set
--auto_resume True
for automatically resume. - Batch size means total size of all gpus. It must be a multiple of gpu num.
- In our experiments, distributed training with batch size of 8 may much slower or accelerate marginally. For example, one iteration of e4e cost 50 sec on both one or two cards. However, distributed training can amplify the total batch size (batch size 8 cost 21G gpu memory) and may achieve fast convergence by large batch size and learning rate. If batch size increased to 16 on two cards (8 sample per card), cost per iteration only slightly increase (from 50 to <60 sec). We are glad to receive any suggestions to improve performance of distributed training.
Example: Train HFGI with LSAP on FFHQ.
You can infer images by:
python scripts/infer.py -c /path/to/config1 /path/to/config2
or
python scripts/infer.py \
--embed_mode [embed_mode] \
--refine_mode [refine_mode] \
--test_dataset_path [/path/to/dataset]\
--output_dir [/path/to/output] \
--save_code [true/false] \
--checkpoint_path [/path/to/checkpoint]
--save_code
: whether to save latent code.--test_dataset_path
: image file or folder.--output_dir
: path to save inversion results. Inverse images will be saved in{output_dir}/inversion/
and latent codes will be saved in{output_dir}/code/
. If not set, use{exp_dir}/inference/
by default.--checkpoint_path
: model weight.
Example1: LSAP on CelebA-HQ.
python scripts/infer.py -c configs/lsap/lsap_ffhq_r50.yaml
Example2: Optimization on CelebA-HQ.
python scripts/infer.py -c configs/optim/optim_celeba-hq.yaml
--stylegan_weights
: whether to save latent code.
Example3: PTI+e4e
python scripts/infer.py -c configs/e4e/e4e_ffhq_r50.yaml configs/pti/pti.yaml
We have three embed_mode
in editing: encoder
, optim
, and code
.
python scripts/edit.py -c configs/lsap/lsap_ffhq_r50.yaml --edit_mode interfacegan --edit_path editing/interfacegan_directions/age.pt --edit_factor 1.0
If you have inferred images first and saved the latent codes, you can edit these latent codes without inversion. We recommend "inference->edit" pipeline since editing with various attributes and factors will not cost extra inversion time.
python scripts/edit.py -c configs/optim/optim_celeba-hq.yaml --embed_mode code --test_dataset_path /path/to/latent/codes/ --edit_mode interfacegan --edit_path editing/interfacegan_directions/age.pt --edit_factor 1.0
These three metrics can be evaluated during.
python scripts/edit.py -c configs/e4e/e4e_ffhq_r50.yaml
You need saving inference results to evaluate fid by pytorch-fid.
python -m pytorch-fid /path/to/dataset/ /path/to/inference/results/
You need saving inference results to evaluate id similarity by face recognition model.
python scripts/calc_id_loss.py --gt_path /path/to/dataset/ --data_path /path/to/inference/results/
If you use this toolbox for your research, please cite our repo.
@misc{cao2022ganinverter,
author = {Pu Cao and Dongxu Liu and Lu Yang and Qing Song},
title = {GAN Inverter},
howpublished = {\url{https://github.com/caopulan/GANInverter}},
year = {2022},
}
@article{cao2022lsap,
title={LSAP: Rethinking Inversion Fidelity, Perception and Editability in GAN Latent Space},
author={Cao, Pu and Yang, Lu and Liu, Dongxv and Liu, Zhiwei and Wang, Wenguan and Li, Shan and Song, Qing},
journal={arXiv preprint arXiv:2209.12746},
year={2022}
}