fork-the-planet
diff --git a/‎CODE_OF_CONDUCT.md
+71 b/‎CODE_OF_CONDUCT.md
+71
diff --git a/‎CONTRIBUTING.md
+11 b/‎CONTRIBUTING.md
+11
diff --git a/‎LICENSE
+39 b/‎LICENSE
+39
diff --git a/‎README.md
+114 b/‎README.md
+114
diff --git a/‎evaluate_fid.py
+144 b/‎evaluate_fid.py
+144
diff --git a/‎guided_samples.jpeg
1.26 MB b/‎guided_samples.jpeg
1.26 MB
@@ -0,0 +1,71 @@
+# Code of Conduct
+
+## Our Pledge
+
+In the interest of fostering an open and welcoming environment, we as
+contributors and maintainers pledge to making participation in our project and
+our community a harassment-free experience for everyone, regardless of age, body
+size, disability, ethnicity, sex characteristics, gender identity and expression,
+level of experience, education, socio-economic status, nationality, personal
+appearance, race, religion, or sexual identity and orientation.
+
+## Our Standards
+
+Examples of behavior that contributes to creating a positive environment
+include:
+
+* Using welcoming and inclusive language
+* Being respectful of differing viewpoints and experiences
+* Gracefully accepting constructive criticism
+* Focusing on what is best for the community
+* Showing empathy towards other community members
+
+Examples of unacceptable behavior by participants include:
+
+* The use of sexualized language or imagery and unwelcome sexual attention or
+  advances
+* Trolling, insulting/derogatory comments, and personal or political attacks
+* Public or private harassment
+* Publishing others' private information, such as a physical or electronic
+  address, without explicit permission
+* Other conduct which could reasonably be considered inappropriate in a
+  professional setting
+
+## Our Responsibilities
+
+Project maintainers are responsible for clarifying the standards of acceptable
+behavior and are expected to take appropriate and fair corrective action in
+response to any instances of unacceptable behavior.
+
+Project maintainers have the right and responsibility to remove, edit, or
+reject comments, commits, code, wiki edits, issues, and other contributions
+that are not aligned to this Code of Conduct, or to ban temporarily or
+permanently any contributor for other behaviors that they deem inappropriate,
+threatening, offensive, or harmful.
+
+## Scope
+
+This Code of Conduct applies within all project spaces, and it also applies when
+an individual is representing the project or its community in public spaces.
+Examples of representing a project or community include using an official
+project e-mail address, posting via an official social media account, or acting
+as an appointed representative at an online or offline event. Representation of
+a project may be further defined and clarified by project maintainers.
+
+## Enforcement
+
+Instances of abusive, harassing, or otherwise unacceptable behavior may be
+reported by contacting the open source team at [[email protected]](mailto:[email protected]). All
+complaints will be reviewed and investigated and will result in a response that
+is deemed necessary and appropriate to the circumstances. The project team is
+obligated to maintain confidentiality with regard to the reporter of an incident.
+Further details of specific enforcement policies may be posted separately.
+
+Project maintainers who do not follow or enforce the Code of Conduct in good
+faith may face temporary or permanent repercussions as determined by other
+members of the project's leadership.
+
+## Attribution
+
+This Code of Conduct is adapted from the [Contributor Covenant](https://www.contributor-covenant.org), version 1.4,
+available at [https://www.contributor-covenant.org/version/1/4/code-of-conduct.html](https://www.contributor-covenant.org/version/1/4/code-of-conduct.html)
@@ -0,0 +1,11 @@
+# Contribution Guide
+
+Thanks for your interest in contributing. This project was released to accompany a research paper for purposes of reproducibility, and beyond its publication there are limited plans for future development of the repository.
+
+While we welcome new pull requests and issues please note that our response may be limited. Forks and out-of-tree improvements are strongly encouraged.
+
+## Before you get started
+
+By submitting a pull request, you represent that you have the right to license your contribution to Apple and the community, and agree by submitting the patch that your contributions are licensed under the [LICENSE](LICENSE).
+
+We ask that all community members read and observe our [Code of Conduct](CODE_OF_CONDUCT.md).
@@ -0,0 +1,39 @@
+Copyright (C) 2024 Apple Inc. All Rights Reserved.
+
+IMPORTANT:  This Apple software is supplied to you by Apple
+Inc. ("Apple") in consideration of your agreement to the following
+terms, and your use, installation, modification or redistribution of
+this Apple software constitutes acceptance of these terms.  If you do
+not agree with these terms, please do not use, install, modify or
+redistribute this Apple software.
+
+In consideration of your agreement to abide by the following terms, and
+subject to these terms, Apple grants you a personal, non-exclusive
+license, under Apple's copyrights in this original Apple software (the
+"Apple Software"), to use, reproduce, modify and redistribute the Apple
+Software, with or without modifications, in source and/or binary forms;
+provided that if you redistribute the Apple Software in its entirety and
+without modifications, you must retain this notice and the following
+text and disclaimers in all such redistributions of the Apple Software.
+Neither the name, trademarks, service marks or logos of Apple Inc. may
+be used to endorse or promote products derived from the Apple Software
+without specific prior written permission from Apple.  Except as
+expressly stated in this notice, no other rights or licenses, express or
+implied, are granted by Apple herein, including but not limited to any
+patent rights that may be infringed by your derivative works or by other
+works in which the Apple Software may be incorporated.
+
+The Apple Software is provided by Apple on an "AS IS" basis.  APPLE
+MAKES NO WARRANTIES, EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION
+THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY AND FITNESS
+FOR A PARTICULAR PURPOSE, REGARDING THE APPLE SOFTWARE OR ITS USE AND
+OPERATION ALONE OR IN COMBINATION WITH YOUR PRODUCTS.
+
+IN NO EVENT SHALL APPLE BE LIABLE FOR ANY SPECIAL, INDIRECT, INCIDENTAL
+OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+INTERRUPTION) ARISING IN ANY WAY OUT OF THE USE, REPRODUCTION,
+MODIFICATION AND/OR DISTRIBUTION OF THE APPLE SOFTWARE, HOWEVER CAUSED
+AND WHETHER UNDER THEORY OF CONTRACT, TORT (INCLUDING NEGLIGENCE),
+STRICT LIABILITY OR OTHERWISE, EVEN IF APPLE HAS BEEN ADVISED OF THE
+POSSIBILITY OF SUCH DAMAGE.
@@ -0,0 +1,114 @@
+# Normalizing Flows are Capable Generative Models
+
+This repo contains code that accompanies the research paper, [Normalizing Flows are Capable Generative Models](http://arxiv.org/abs/2412.06329).
+
+![Teaser image](guided_samples.jpeg) 
+
+# Setup
+
+```bash
+pip install -r requirements.txt
+```
+
+# Preparing datasets
+
+Download the datasets you want to experiment with:
+- [Imagenet](https://www.image-net.org/download.php)
+- [Imagenet64](https://arxiv.org/abs/1601.06759)
+- [AFHQ](https://www.kaggle.com/datasets/dimensi0n/afhq-512)
+
+Save the training files only in `data/<dataset>/<category>/<filename>`, the code does not use the validation/test files.
+
+Compute and save stats for the true data distribution
+```bash
+# Files are saved in ./data
+torchrun --standalone --nproc_per_node=8 prepare_fid_stats.py --dataset=imagenet64 --img_size=64  # Unconditional
+torchrun --standalone --nproc_per_node=8 prepare_fid_stats.py --dataset=imagenet --img_size=64    # Conditional
+torchrun --standalone --nproc_per_node=8 prepare_fid_stats.py --dataset=imagenet --img_size=128   # Conditional
+torchrun --standalone --nproc_per_node=8 prepare_fid_stats.py --dataset=afhq --img_size=256       # Conditional
+```
+
+Note: To run on a single GPU, replace `torchrun` with `python` like this:
+```bash
+python prepare_fid_stats.py --dataset=imagenet --img_size=64    # Conditional
+```
+
+# Training
+
+Reproducing results from the paper
+```bash
+# Unconditional ImageNet64 (8 GPUs)
+torchrun --standalone --nproc_per_node=8 train.py --dataset=imagenet64 --img_size=64 --channel_size=3\
+  --patch_size=2 --channels=768 --blocks=8 --layers_per_block=8\
+  --noise_std=0.05 --batch_size=256 --epochs=200 --lr=1e-4 --nvp\
+  --sample_freq=5 --logdir=runs/imagenet64-uncond
+
+# Conditional ImageNet64 (8 GPUs)
+torchrun --standalone --nproc_per_node=8 train.py --dataset=imagenet --img_size=64 --channel_size=3\
+  --patch_size=2 --channels=768 --blocks=8 --layers_per_block=8\
+  --noise_std=0.05 --batch_size=256 --epochs=200 --lr=1e-4 --nvp --cfg=0 --drop_label=0.1\
+  --sample_freq=5 --logdir=runs/imagenet64-cond
+
+# Conditional ImageNet128 (need to run on 4 nodes, 32 GPUs total)
+torchrun --standalone --nproc_per_node=8 train.py --dataset=imagenet --img_size=128 --channel_size=3\
+  --patch_size=4 --channels=1024 --blocks=8 --layers_per_block=8\
+  --noise_std=0.15 --batch_size=768 --epochs=320 --lr=1e-4 --nvp --cfg=0 --drop_label=0.1\
+  --sample_freq=20 --logdir=runs/imagenet128-cond
+
+# AFHQ (8 GPUs)
+torchrun --standalone --nproc_per_node=8 train.py --dataset=afhq --img_size=256 --channel_size=3\
+  --patch_size=8 --channels=768 --blocks=8 --layers_per_block=8\
+  --noise_std=0.07 --batch_size=256 --epochs=4000 --lr=1e-4 --nvp --cfg=0 --drop_label=0.1\
+  --sample_freq=200 --logdir=runs/afhq256
+```
+
+
+For single-GPU
+```bash
+python train.py --dataset=imagenet64 --img_size=64 --channel_size=3\
+  --patch_size=2 --channels=768 --blocks=8 --layers_per_block=8\
+  --noise_std=0.05 --batch_size=32 --epochs=200 --lr=1e-4 --nvp\
+  --sample_freq=5 --logdir=runs/imagenet64-uncond
+# etc...
+```
+
+# Sampling
+Use the notebook to generate samples from a model checkpoint. Inside the notebook is an option to [download a pretrained checkpoint](https://ml-site.cdn-apple.com/models/tarflow/afhq256/afhq_model_8_768_8_8_0.07.pth) on AFHQ. 
+```
+jupyter notebook sample.ipynb
+```
+
+# Evaluating FID
+
+Multi-GPU (8 GPUs)
+```bash
+# Conditional ImageNet64, samples saved in runs/imagenet64-cond/eval
+torchrun --standalone --nproc_per_node=8 evaluate_fid.py --dataset=imagenet --img_size=64 --channel_size=3\
+  --patch_size=2 --channels=768 --blocks=8 --layers_per_block=8\
+  --noise_std=0.05 --cfg=2.3 --nvp --batch_size=1024\
+  --ckpt_file=runs/imagenet64-cond/imagenet_model_2_768_8_8_0.05.pth\
+  --logdir=runs/imagenet64-cond/eval
+```
+
+For single-GPU
+```bash
+# Conditional ImageNet64, samples saved in runs/imagenet64-cond/eval
+python evaluate_fid.py --dataset=imagenet --img_size=64 --channel_size=3\
+  --patch_size=2 --channels=768 --blocks=8 --layers_per_block=8\
+  --noise_std=0.05 --cfg=2.3 --nvp --batch_size=32\
+  --ckpt_file=runs/imagenet64-cond/imagenet_model_2_768_8_8_0.05.pth\
+  --logdir=runs/imagenet64-cond/eval
+```
+
+# BibTeX
+```bibtex
+@article{zhai2024tarflow,
+         title={Normalizing Flows are Capable Generative Models},
+         author={Shuangfei Zhai and Ruixiang Zhang and Preetum Nakkiran and David Berthelot and Jiatao Gu and Huangjie Zheng and Tianrong Chen and Miguel Angel Bautista and Navdeep Jaitly and Josh Susskind},
+         year={2024},
+         eprint={2412.06329},
+         archivePrefix={arXiv},
+         primaryClass={cs.CV},
+         url={https://arxiv.org/abs/2412.06329}
+}
+```
@@ -0,0 +1,144 @@
+#
+# For licensing see accompanying LICENSE file.
+# Copyright (C) 2024 Apple Inc. All Rights Reserved.
+#
+import argparse
+import builtins
+import pathlib
+
+import numpy as np
+import torch
+import torch.utils.data
+import torchvision as tv
+
+import transformer_flow
+import utils
+
+
+def main(args):
+    args.denoising_batch_size = args.batch_size // 4
+    dist = utils.Distributed()
+    utils.set_random_seed(100 + dist.rank)
+    num_classes = utils.get_num_classes(args.dataset)
+
+    def print(*args, **kwargs):
+        if dist.local_rank == 0:
+            builtins.print(*args, **kwargs)
+
+    # check if the fid stats had been previously computed
+    fid_stats_file = f'{args.dataset}_{args.img_size}_fid_stats.pth'
+    fid_stats_file = args.data / f'{args.dataset}_{args.img_size}_fid_stats.pth'
+    assert fid_stats_file.exists()
+    print(f'Loading FID stats from {fid_stats_file}')
+    fid = utils.FID(reset_real_features=False, normalize=True).cuda()
+    fid.load_state_dict(torch.load(fid_stats_file, map_location='cpu', weights_only=False))
+    dist.barrier()
+
+    model = transformer_flow.Model(
+        in_channels=args.channel_size,
+        img_size=args.img_size,
+        patch_size=args.patch_size,
+        channels=args.channels,
+        num_blocks=args.blocks,
+        layers_per_block=args.layers_per_block,
+        nvp=args.nvp,
+        num_classes=num_classes,
+    ).cuda()
+    for p in model.parameters():
+        p.requires_grad = False
+
+    model_name = f'{args.patch_size}_{args.channels}_{args.blocks}_{args.layers_per_block}_{args.noise_std:.2f}'
+    sample_dir: pathlib.Path = args.logdir / f'{args.dataset}_samples_{model_name}'
+
+    if dist.local_rank == 0:
+        sample_dir.mkdir(parents=True, exist_ok=True)
+
+    ckpt = torch.load(args.ckpt_file, map_location='cpu', weights_only=True)
+    model.load_state_dict(ckpt, strict=True)
+    model.eval()
+
+    print('Starting sampling')
+    num_batches = int(np.ceil(args.num_samples / args.batch_size))
+    last_batch_size = args.num_samples - (num_batches - 1) * args.batch_size
+
+    def get_noise(b):
+        return torch.randn(
+            b, (args.img_size // args.patch_size) ** 2, args.channel_size * args.patch_size**2, device='cuda'
+        )
+
+    for i in range(num_batches):
+        noise = get_noise(args.batch_size // dist.world_size)
+        if num_classes:
+            y = torch.randint(num_classes, (args.batch_size // dist.world_size,), device='cuda')
+        else:
+            y = None
+        while True:
+            with torch.inference_mode(), torch.autocast(device_type='cuda', dtype=torch.bfloat16):
+                samples = model.reverse(noise, y, args.cfg, attn_temp=args.attn_temp, annealed_guidance=True)
+                assert isinstance(samples, torch.Tensor)
+
+            if args.self_denoising_lr > 0:
+                samples = samples.cpu()
+                assert args.batch_size % args.denoising_batch_size == 0
+                db = args.denoising_batch_size // dist.world_size
+                # This should be the theoretical optimal denoising lr
+                base_lr = db * args.img_size**2 * args.channel_size * args.noise_std**2
+                lr = args.self_denoising_lr * base_lr
+                denoised_samples = []
+                for j in range(args.batch_size // args.denoising_batch_size):
+                    x = torch.clone(samples[j * db : (j + 1) * db]).detach().cuda()
+                    x.requires_grad = True
+                    y_ = y[j * db : (j + 1) * db] if y is not None else None
+                    with torch.autocast(device_type='cuda', dtype=torch.bfloat16):
+                        z, _, logdets = model(x, y_)
+                    loss = model.get_loss(z, logdets)
+                    grad = torch.autograd.grad(loss, [x])[0]
+                    x.data.add_(grad, alpha=-lr)
+                    denoised_samples.append(x.detach().cpu())
+                samples = torch.cat(denoised_samples, dim=0).cuda()
+
+            samples = dist.gather_concat(samples.detach())
+            if not samples.isnan().any().item():
+                break
+            else:
+                noise = get_noise(args.batch_size // dist.world_size)
+
+        if i == num_batches - 1:
+            samples = samples[:last_batch_size]
+
+        fid.update(0.5 * (samples.clip(min=-1, max=1) + 1), real=False)
+        print(f'{i+1}/{num_batches} batch sample complete')
+    fid_score = fid.compute().item()
+    fid.reset()
+
+    print(f'{args.ckpt_file} {model_name} cfg {args.cfg:.2f} fid {fid_score:.2f}')
+    if dist.local_rank == 0:
+        tv.utils.save_image(samples, sample_dir / f'samples_cfg{args.cfg:.2f}.png', normalize=True, nrow=16)
+    dist.barrier()
+
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--data', default='data', type=pathlib.Path, help='Path for training data')
+    parser.add_argument('--logdir', default='runs', type=pathlib.Path, help='Path for artifacts')
+
+    parser.add_argument('--ckpt_file', default='', type=str, help='Path for checkpoint for evaluation')
+    parser.add_argument('--dataset', default='imagenet', type=str, choices=['imagenet', 'imagenet64', 'afhq'], help='Name of dataset')
+    parser.add_argument('--img_size', default=32, type=int, help='Image size')
+    parser.add_argument('--channel_size', default=3, type=int, help='Image channel size')
+
+    parser.add_argument('--patch_size', default=4, type=int, help='Patch size for the model')
+    parser.add_argument('--channels', default=512, type=int, help='Model width')
+    parser.add_argument('--blocks', default=4, type=int, help='Number of autoregressive flow blocks')
+    parser.add_argument('--layers_per_block', default=8, type=int, help='Depth per flow block')
+    parser.add_argument('--noise_std', default=0.05, type=float, help='Input noise standard deviation')
+    parser.add_argument('--nvp', default=True, action=argparse.BooleanOptionalAction, help='Whether to use the non volume preserving version')
+    parser.add_argument('--cfg', default=0, type=float, help='Guidance weight for sampling, 0 is no guidance. For conditional models consider the range in [1, 3]')
+    parser.add_argument('--attn_temp', default=1.0, type=float, help='Attention temperature for unconditional guidance, enabled when not 1 (eg, 0.5, 1.5)')
+    parser.add_argument('--batch_size', default=1024, type=int, help='Batch size for drawing samples')
+    parser.add_argument('--num_samples', default=50000, type=int, help='Number of total samples to draw')
+    parser.add_argument('--self_denoising_lr', default=1.0, type=float, help='Learning rate multiplier for denoising, 1 is the theoretical optimal one')
+
+    args = parser.parse_args()
+
+    main(args)