Skip to content

Incorrect result when using MMD with some chunk_size argument values #252

@jaime-cespedes-sisniega

Description

Describe the bug

Incorrect result when using MMD with some chunk_size argument values. For many chunk_size values there is a difference between the MMD² with chunk_size=None and chunk_size!=None.

For the provided code to reproduce, the following chunk_size values produce an incorrect result: 4, 5, 6, 7, 8, 9, 11, 12, 13, 14, 18, 19. The remaining values between 1 and 20 produce a correct result.

Steps/Code to Reproduce

from frouros.detectors.data_drift import MMD
import numpy as np
from functools import partial
from frouros.utils.kernels import rbf_kernel

np.random.seed(seed=31)

dim = 1
size = 20
kernel = partial(rbf_kernel, sigma=0.5)
chunk_size = 4

X_ref = np.random.multivariate_normal(mean=np.zeros(dim), cov=np.identity(dim), size=size)
X_test = np.random.multivariate_normal(mean=np.full(dim, 0.3), cov=np.identity(dim), size=size)

detector = MMD(
    kernel=kernel,
    chunk_size=None,
)
detector.fit(X_ref)
result, _ = detector.compare(X=X_test, verbose=True)

detector_chunk = MMD(
    kernel=kernel,
    chunk_size=chunk_size,
)
detector_chunk.fit(X_ref)
result_chunk, _ = detector_chunk.compare(X=X_test, verbose=True)

assert result.distance == result_chunk.distance

Expected Results

No error is thrown.

Actual Results

Traceback (most recent call last):
  File "/home/jaime/.config/JetBrains/PyCharm2023.1/scratches/frouros/expected/data_drift/batch/mmd_chunk.py", line 30, in <module>
    assert result.distance == result_chunk.distance
AssertionError

Versions

'0.5.1'

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingneeds triageIssue requires triage

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions