Detecting fake images

Towards Universal Fake Image Detectors that Generalize Across Generative Models
Utkarsh Ojha*, Yuheng Li*, Yong Jae Lee
(*Equal contribution)
CVPR 2023

[Project Page] [Paper]

>
Using images from one type of generative model (e.g., GAN), detect fake images from other breeds (e.g., Diffusion models)

Setup

Clone this repository

git clone https://github.com/Yuheng-Li/UniversalFakeDetect
cd UniversalFakeDetect

Install the necessary libraries

pip install torch torchvision

Data

Of the 19 models studied overall (Table 1/2 in the main paper), 11 are taken from a previous work. Download the test set, i.e., real/fake images for those 11 models given by the authors from here (dataset size ~19GB).
Download the file and unzip it in datasets/test. You could also use the bash scripts provided by the authors, as described here in their code repository.
This should create a directory structure as follows:


datasets
└── test					
      ├── progan	
      │── cyclegan   	
      │── biggan
      │      .
      │      .

Each directory (e.g., progan) will contain real/fake images under 0_real and 1_fake folders respectively.
Dataset for the diffusion models (e.g., LDM/Glide) can be found here. Note that in the paper (Table 2/3), we had reported the results over 10k randomly sampled images. Since providing that many images for all the domains will take up too much space, we are only releasing 1k images for each domain; i.e., 1k images fake images and 1k real images for each domain (e.g., LDM-200).
Download and unzip the file into ./diffusion_datasets directory.

Evaluation

You can evaluate the model on all the dataset at once by running:

python validate.py  --arch=CLIP:ViT-L/14   --ckpt=pretrained_weights/fc_weights.pth   --result_folder=clip_vitl14

You can also evaluate the model on one generative model by specifying the paths of real and fake datasets

python validate.py  --arch=CLIP:ViT-L/14   --ckpt=pretrained_weights/fc_weights.pth   --result_folder=clip_vitl14  --real_path datasets/test/progan/0_real --fake_path datasets/test/progan/1_fake

Note that if no arguments are provided for real_path and fake_path, the script will perform the evaluation on all the domains specified in dataset_paths.py.

The results will be stored in results/<folder_name> in two files: ap.txt stores the Average Prevision for each of the test domains, and acc.txt stores the accuracy (with 0.5 as the threshold) for the same domains.

Training

Our main model is trained on the same dataset used by the authors of this work. Download the official training dataset provided here (dataset size ~ 72GB).
Download and unzip the dataset in datasets/train directory. The overall structure should look like the following:

datasets
└── train			
      └── progan			
           ├── airplane
           │── bird
           │── boat
           │      .
           │      .

A total of 20 different object categories, with each folder containing the corresponding real and fake images in 0_real and 1_fake folders.
The model can then be trained with the following command:

python train.py --name=clip_vitl14 --wang2020_data_path=datasets/ --data_mode=wang2020  --arch=CLIP:ViT-L/14  --fix_backbone

Important: do not forget to use the --fix_backbone argument during training, which makes sure that the only the linear layer's parameters will be trained.

Deploying Model

The provided Dockerfile can be used to create an image:

export DOCKER_REGISTRY="hannahyk" # Put your Docker Hub username here  
# Build the Docker image for runtime
docker build -t "$DOCKER_REGISTRY/hannah-ufd" -f Dockerfile .

Run this Docker image locally on a GPU to test that it can run inferences as expected:

docker run --gpus=all -d --rm -p 80:8000 --env SERVER_PORT=8000  --name "hannah-ufd" "$DOCKER_REGISTRY/hannah-ufd"

In a separate terminal, run the following command one or more times

curl -X GET http://localhost:80/healthcheck

until you see {"healthy":true}.

Then, test that inference can be run as expected:

curl -X POST http://localhost:80/predict \
    -H "Content-Type: application/json" \
    --data '{"file_path":"https://uploads.civai.org/files/jhxTVhsg/b751515306e7.jpg"}'

Finally, if successful, push the docker image to docker hub:

docker login

docker push "$DOCKER_REGISTRY/hannah-ufd:latest"

Acknowledgement

We would like to thank Sheng-Yu Wang for releasing the real/fake images from different generative models. Our training pipeline is also inspired by his open-source code. We would also like to thank CompVis for releasing the pre-trained LDMs and LAION for open-sourcing LAION-400M dataset.

Citation

If you find our work helpful in your research, please cite it using the following:

@inproceedings{ojha2023fakedetect,
      title={Towards Universal Fake Image Detectors that Generalize Across Generative Models}, 
      author={Ojha, Utkarsh and Li, Yuheng and Lee, Yong Jae},
      booktitle={CVPR},
      year={2023},
}

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
__pycache__		__pycache__
checkpoints		checkpoints
clip_vitl14		clip_vitl14
data		data
models		models
networks		networks
options		options
pretrained_weights		pretrained_weights
resources		resources
Dockerfile		Dockerfile
README.md		README.md
custommodel.py		custommodel.py
dataset_paths.py		dataset_paths.py
requirements.txt		requirements.txt
server.py		server.py
test.sh		test.sh
train.py		train.py
validate.py		validate.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Detecting fake images

Contents

Setup

Data

Evaluation

Training

Deploying Model

Acknowledgement

Citation

About

Releases

Packages

Languages

hannahyklee/UniversalFakeDetect

Folders and files

Latest commit

History

Repository files navigation

Detecting fake images

Contents

Setup

Data

Evaluation

Training

Deploying Model

Acknowledgement

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages