Add a section on reproducibility to the docs

The results you get back when running a model can depend on the device, and can even vary across several calls on the same device. It might be a good idea to add a "Reproducibility" section to the documentation in which we discuss these issues.

For example, let us use the model introduced in [w2v2-how-to](https://github.com/audeering/w2v2-how-to):

```python
import audeer
import audonnx
import numpy as np


url = 'https://zenodo.org/record/6221127/files/w2v2-L-robust-12.6bc4a7fd-1.1.0.zip'
cache_root = audeer.mkdir('cache')
model_root = audeer.mkdir('model')

archive_path = audeer.download_url(url, cache_root, verbose=True)
audeer.extract_archive(archive_path, model_root)

np.random.seed(1)
sampling_rate = 16000
signal = np.random.normal(size=sampling_rate).astype(np.float32)
```

Now, let us execute the model on the CPU:

```python
>>> model = audonnx.load(model_root, device='cpu')
>>> model(signal, sampling_rate)['logits']
array([[0.6832043 , 0.64673305, 0.49750742]], dtype=float32)
>>> model(signal, sampling_rate)['logits']
array([[0.6832043 , 0.64673305, 0.49750742]], dtype=float32)
>>> model(signal, sampling_rate)['logits']
array([[0.6832043 , 0.64673305, 0.49750742]], dtype=float32)
```

When using the CPU we always get back the same result,
when executing it multiple times.

Then let's switch to the GPU:

```python
>>> model = audonnx.load(model_root, device='cuda:0')
>>> model(signal, sampling_rate)['logits']
array([[0.68319285, 0.64667934, 0.49738473]], dtype=float32)
>>> model(signal, sampling_rate)['logits']
array([[0.68317926, 0.6466613 , 0.4974225 ]], dtype=float32)
>>> model(signal, sampling_rate)['logits']
array([[0.683162  , 0.64668435, 0.4973961 ]], dtype=float32)
```

We see that we get different results after the fifth decimal place for each run,
and the average result deviates from the CPU based result by:
```
array([[-2.62856483e-05, -5.79953194e-05, -1.06304884e-04]], dtype=float32)
```

This is a known ONNX limitation (https://github.com/microsoft/onnxruntime/issues/9704).
In https://github.com/microsoft/onnxruntime/issues/4611#issuecomment-1176998033 they propose to select a fixed convolution algorithm to improve this behavior, see also https://pytorch.org/docs/stable/notes/randomness.html#cuda-convolution-benchmarking.
With `audonnx` we can achieve this by

```python
>>> providers = [("CUDAExecutionProvider", {'cudnn_conv_algo_search': 'DEFAULT'})]
>>> model = audonnx.load(model_root, device=providers)
>>> model(signal, sampling_rate)['logits']
array([[0.683191  , 0.64670646, 0.4973919 ]], dtype=float32)
>>> model(signal, sampling_rate)['logits']
array([[0.6830938 , 0.6466217 , 0.49734592]], dtype=float32)
>>> model(signal, sampling_rate)['logits']
array([[0.6831656 , 0.64666504, 0.497427  ]], dtype=float32)
```

It does not really improve results.

It seems that we can only recommend the following when reproducibility is desired:

* use CPU as device
* limit the outcome of the model to two decimal places, e.g. `array([[0.68, 0.65, 0.50]], dtype=float32)`

/cc @audeerington 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add a section on reproducibility to the docs #61

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add a section on reproducibility to the docs #61

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions