-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] asr/whisper service slower on Gaudi2 than on Xeon #1018
Comments
On my Gaudi the perf for whisper should be similar/faster than Xeon. I suspect there are some setting/env gap that break the HPU static shape generation on your machine to make it look super slow. |
Thank you for your insight @Spycsh. We've tried multiple variations of driver version and Gaudi container versions (1.16.2. & 1.18.0) to no avail. We also tried on Tiber cloud but got an error on the server's docker build:
Could you share more about your settings and/or env so we can try to reproduce on our end? Thank you! |
Priority
P2-High
OS type
Ubuntu
Hardware type
Gaudi2
Installation method
Deploy method
Running nodes
Single Node
What's the version?
vault.habana.ai/gaudi-docker/1.16.2/ubuntu22.04/habanalabs/pytorch-installer-2.2.2
NOTE: the original Gaudi dockerfile uses Gaudi version 1.18.0. We are currently getting segfault using this version running on our machine.
Description
Following the steps for Gaudi2 from the README, running asr/whisper service is significantly slower than on Xeon. I generated a simple benchmark script that clocks the duration of a
requests.post()
to the service on all examples in the LibriSpeech test-clean dataset. The plots below show how Gaudi2 performed against an Xeon machine.As file size increased, Gaudi performed slower and than Xeon as seen in the plot below:
Reproduce steps
curl
in the README (2.2.3 results in an inference time of ~3.6 seconds where as on Xeon it's only taking ~0.5 seconds)whisper_benchmark.py
(updating variables as needed)<EXP_NAME>_0.json
Raw log
Attachments
Below is the
whisper_benchmark.py
script used to gather results for the plots.The text was updated successfully, but these errors were encountered: