If a metric is about improvement, should the score indicate the directionality of improvement or just its magnitude?

### Environment details
* SDMetrics version: 0.21.0

### Background
The SDMetrics library is set up to produce 1 final score in the range `[0,1]`, where 0 is the worst and 1 is the best.

For some of the SDMetrics, we are interested in computing whether the synthetic data is helping to *improve* some kind of task/property. For example:
- In [BinaryClassifierPrecisionEfficacy](https://docs.sdv.dev/sdmetrics/metrics/ml-augmentation-metrics/binaryclassifierprecisionefficacy), we are interested in knowing whether synthetic data will improve a ML classifier's predictions
- In [EqualizedOddsImprovement](https://docs.sdv.dev/sdmetrics/metrics/privacy-and-fairness-metrics/equalizedoddsimprovement), we are interested in knowing whether the synthetic data will improve fairness

The question is: How should these metrics be formulating the overall score?

### Details
The diagram below shows 2 alternatives for returning a final score.

![Image](https://github.com/user-attachments/assets/0783fe95-1a4c-4e8f-a3b4-eda52222bc8a)

- **Alternative A returns the magnitude of improvement.** The score is a 0 if synthetic data is _not_ improving the task
    - Any score >0 is considered as an improvement, even something small like 0.1. This can be a bit misleading because 0.1 is usually considered a "bad" value for other metrics (such as KSComplement, CategoryCoverage, CorrelationSimilarity, etc.)
    - The score makes no distinction between the synthetic data having no effect vs. the synthetic data having a very bad effect (in both cases, the score will be 0)
- **Alternative B is about both magnitude and direction of improvement.** The score is 0.5 if synthetic has not effect; <0.5 means the synthetic data has a bad effect; and >0.5 if the synthetic data is making improvements.
    - Now, only higher scores like 0.7 or 0.8 can be considered "good", while lower scores like 0.1 and 0.2 are considered "bad". This is similar to other SDMetrics (such as KSComplement, CategoryCoverage, CorrelationSimilarity, etc.) 
    - This metric also gives us the magnitude of improvement (or lack thereof). Now there is a distinction between synthetic data having no effect (0.5), versus synthetic data being actively bad for the usage (eg. 0.1)
    - However, it leaves an arbitrary cutoff at 0.5; you need to consider this when interpreting the score

We've currently implemented **Alternative B** since it seems to have fewer cons. But I'm leaving this as an question to consider Alternative A.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

If a metric is about improvement, should the score indicate the directionality of improvement or just its magnitude? #778

Environment details

Background

Details

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

If a metric is about improvement, should the score indicate the directionality of improvement or just its magnitude? #778

Description

Environment details

Background

Details

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions