-
Notifications
You must be signed in to change notification settings - Fork 152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compressed model inference speed is slower than original pytorch pix2pix #62
Comments
Update: One thing I missed was the Pytorch version. The other repo had a newer version. I upgraded to 1.7.1 for this one and inference speed is identical @ 8 FPS for compressed. So at least not slower, but no speed benefit. 40x memory use reduction though! |
I am simply timing the inference within test.py like this: `start_time = time.time() model.test() # run inference print("test FPS: ", 1 / (time.time() - start_time)) # FPS ` I also overlooked that I was using my own custom model in the original Pytorch code, but since the model is significantly larger than the compressed edgestoshoes_r, I expected some speedup even between different models. I also am finding Pix2Pix particularly difficult to convert to tflite or Pytorch mobile for inference on Android, so I might be forced to use it in Python on Android anyway, which can't be multithreaded to take advantage of the lower memory footprint. It would be great on an embedded device such as a Jetson Nano, but unfortunately I am targeting Android only. |
@mpottinger upload the original and compressed models, along with some test images |
Ok thank you. I am currently working more on my app that will make use of the models. I have figured out how to convert the model to mobile via onnx and ran inference successfully, so as I do more tests if I keep seeing the same results I will upload the models. |
How does the onnx model compares to the compressed, is it faster in inferencing? memory? size? |
onnx models are definitely faster than in Pytorch, that is due to being able to use frameworks optimized just for inference. Model size is about the same when converted. I am able to use onnxruntime which is much faster for CPU inference, I have also successfully run inference in OpenCV dnn, which is slower but easy to implement on multiple platforms including Android. I have also tried Alibaba MNN which is supposed to be very fast on mobile, speeds on mobile are comparable to CPU speeds on my fast desktop PC. 5-10 FPS with uncompressed models. I have found that this issue can probably be closed. My mistake was comparing a custom trained model in one repo to the edges2shoes in the other repo. I thought inference time should be constant, but apparently not and it seems the trained model makes a difference? I modified the jupyter notebook in this repo to do webcam inference on CPU only and comparing only the edges2shoes full & compressed in this repo. There I was able to see the speed difference on a live webcam stream. Approx 3 FPS for the uncompressed full model, ~6 FPS for the compressed model. So I am assuming I will get a similar 2x speedup on my own custom models. So my initial comparison was flawed. Here is the inference code I am using to test on the webcam: `#!/usr/bin/env python import os import pickle import numpy as np from utils.util import tensor2im from models import create_model Get our modelfilename = 'opts/opt_compressed.pkl' filename = 'opts/opt_full.pkl'with open(filename, 'rb') as f: opt.gpu_ids = [] transform_list = [transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]
` |
Hello, I am just curious, I have adapted test.py to do realtime inference on webcam, as well as the exact same modifications to https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix, and set it to do inference on CPU only, since my goal is to run it on a mobile phone.
While the memory usage of the compressed model is much smaller, only 5MB vs 200MB, inference time seems to actually be slower on the compressed model. I am getting 4 FPS on test_compressed.sh for edges2shoes_r, and in the original pytorch-CycleGAN-and-pix2pix code I am getting 8 FPS. Is this normal? I am sure the modifications I did in the inference code is the same, and that it looks like actual model latency.
Having 5MB instead of 200MB ram usage is a dramatic decrease, but is there also supposed to be a dramatic speedup?
The text was updated successfully, but these errors were encountered: