-
Notifications
You must be signed in to change notification settings - Fork 0
Description
The results you get back when running a model can depend on the device, and can even vary across several calls on the same device. It might be a good idea to add a "Reproducibility" section to the documentation in which we discuss these issues.
For example, let us use the model introduced in w2v2-how-to:
import audeer
import audonnx
import numpy as np
url = 'https://zenodo.org/record/6221127/files/w2v2-L-robust-12.6bc4a7fd-1.1.0.zip'
cache_root = audeer.mkdir('cache')
model_root = audeer.mkdir('model')
archive_path = audeer.download_url(url, cache_root, verbose=True)
audeer.extract_archive(archive_path, model_root)
np.random.seed(1)
sampling_rate = 16000
signal = np.random.normal(size=sampling_rate).astype(np.float32)
Now, let us execute the model on the CPU:
>>> model = audonnx.load(model_root, device='cpu')
>>> model(signal, sampling_rate)['logits']
array([[0.6832043 , 0.64673305, 0.49750742]], dtype=float32)
>>> model(signal, sampling_rate)['logits']
array([[0.6832043 , 0.64673305, 0.49750742]], dtype=float32)
>>> model(signal, sampling_rate)['logits']
array([[0.6832043 , 0.64673305, 0.49750742]], dtype=float32)
When using the CPU we always get back the same result,
when executing it multiple times.
Then let's switch to the GPU:
>>> model = audonnx.load(model_root, device='cuda:0')
>>> model(signal, sampling_rate)['logits']
array([[0.68319285, 0.64667934, 0.49738473]], dtype=float32)
>>> model(signal, sampling_rate)['logits']
array([[0.68317926, 0.6466613 , 0.4974225 ]], dtype=float32)
>>> model(signal, sampling_rate)['logits']
array([[0.683162 , 0.64668435, 0.4973961 ]], dtype=float32)
We see that we get different results after the fifth decimal place for each run,
and the average result deviates from the CPU based result by:
array([[-2.62856483e-05, -5.79953194e-05, -1.06304884e-04]], dtype=float32)
This is a known ONNX limitation (microsoft/onnxruntime#9704).
In microsoft/onnxruntime#4611 (comment) they propose to select a fixed convolution algorithm to improve this behavior, see also https://pytorch.org/docs/stable/notes/randomness.html#cuda-convolution-benchmarking.
With audonnx
we can achieve this by
>>> providers = [("CUDAExecutionProvider", {'cudnn_conv_algo_search': 'DEFAULT'})]
>>> model = audonnx.load(model_root, device=providers)
>>> model(signal, sampling_rate)['logits']
array([[0.683191 , 0.64670646, 0.4973919 ]], dtype=float32)
>>> model(signal, sampling_rate)['logits']
array([[0.6830938 , 0.6466217 , 0.49734592]], dtype=float32)
>>> model(signal, sampling_rate)['logits']
array([[0.6831656 , 0.64666504, 0.497427 ]], dtype=float32)
It does not really improve results.
It seems that we can only recommend the following when reproducibility is desired:
- use CPU as device
- limit the outcome of the model to two decimal places, e.g.
array([[0.68, 0.65, 0.50]], dtype=float32)
/cc @audeerington