[bug] log_metric() fails when logging NaN values, causing component failures

## Problem Description

When using `metrics_out.log_metric()` in Kubeflow Pipelines components, passing NaN values causes the entire component to fail with serialization errors. This is a common issue in ML pipelines where models may produce NaN metrics (e.g., when predictions are invalid or data is missing).

## Expected Behavior

`log_metric()` should handle NaN values gracefully, either by:
- Converting them to a default value (e.g., 0.0)
- Skipping NaN metrics with a warning
- Providing a clear error message

## Current Behavior

The component fails with serialization errors when NaN values are passed to `log_metric()`.

## Minimal Reproduction Example

```python
from kfp import dsl
from kfp.dsl import component, Input, Output, Dataset, Metrics

@component
def evaluate_model(metrics_out: Output[Metrics]):
    import numpy as np
    
    # Simulate metrics that might contain NaN values
    metrics = {
        'accuracy': 0.85,
        'precision': np.nan,  # This causes the failure
        'recall': 0.92,
        'f1_score': float('nan')  # This also causes the failure
    }
    
    # This line causes the component to fail
    for metric, value in metrics.items():
        metrics_out.log_metric(metric, value)  # Fails on NaN values
```

## Error Output
ValueError: invalid literal for int() with base 10: 'nan'


## Workaround

Currently, users must manually check for NaN values before logging:

```python
import math
import numpy as np

for metric, value in metrics.items():
    if (math.isnan(value) if isinstance(value, (int, float)) else
            (isinstance(value, np.floating) and np.isnan(value))):
        value = 0.0  # or skip logging
    metrics_out.log_metric(metric, value)
```

## Proposed Solution

The KFP SDK should handle NaN values internally in the `log_metric()` method, either by:
1. Converting NaN to a configurable default value
2. Skipping NaN metrics with a warning log
3. Raising a more descriptive error message

## Environment

- Kubeflow Pipelines version: 2.13.0
- Python version: 3.9.1

## Additional Context

This issue affects ML practitioners who work with models that can produce NaN metrics, which is common in scenarios with:
- Invalid predictions
- Missing data
- Division by zero in metric calculations
- Edge cases in model evaluation

A fix would improve the robustness of KFP components and reduce the need for manual NaN handling in every pipeline.

Should I start implementing this ?

---


Impacted by this bug? Give it a 👍.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[bug] log_metric() fails when logging NaN values, causing component failures #12227

Problem Description

Expected Behavior

Current Behavior

Minimal Reproduction Example

Error Output

Workaround

Proposed Solution

Environment

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[bug] log_metric() fails when logging NaN values, causing component failures #12227

Description

Problem Description

Expected Behavior

Current Behavior

Minimal Reproduction Example

Error Output

Workaround

Proposed Solution

Environment

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions