Skip to content

If a metric is about improvement, should the score indicate the directionality of improvement or just its magnitude? #778

@npatki

Description

@npatki

Environment details

  • SDMetrics version: 0.21.0

Background

The SDMetrics library is set up to produce 1 final score in the range [0,1], where 0 is the worst and 1 is the best.

For some of the SDMetrics, we are interested in computing whether the synthetic data is helping to improve some kind of task/property. For example:

The question is: How should these metrics be formulating the overall score?

Details

The diagram below shows 2 alternatives for returning a final score.

Image

  • Alternative A returns the magnitude of improvement. The score is a 0 if synthetic data is not improving the task
    • Any score >0 is considered as an improvement, even something small like 0.1. This can be a bit misleading because 0.1 is usually considered a "bad" value for other metrics (such as KSComplement, CategoryCoverage, CorrelationSimilarity, etc.)
    • The score makes no distinction between the synthetic data having no effect vs. the synthetic data having a very bad effect (in both cases, the score will be 0)
  • Alternative B is about both magnitude and direction of improvement. The score is 0.5 if synthetic has not effect; <0.5 means the synthetic data has a bad effect; and >0.5 if the synthetic data is making improvements.
    • Now, only higher scores like 0.7 or 0.8 can be considered "good", while lower scores like 0.1 and 0.2 are considered "bad". This is similar to other SDMetrics (such as KSComplement, CategoryCoverage, CorrelationSimilarity, etc.)
    • This metric also gives us the magnitude of improvement (or lack thereof). Now there is a distinction between synthetic data having no effect (0.5), versus synthetic data being actively bad for the usage (eg. 0.1)
    • However, it leaves an arbitrary cutoff at 0.5; you need to consider this when interpreting the score

We've currently implemented Alternative B since it seems to have fewer cons. But I'm leaving this as an question to consider Alternative A.

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionGeneral question about the software

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions