Skip to content

[FR]: New Evaluaton Metric "Structured Output Compliance" #2528

@vincentkoc

Description

@vincentkoc

Proposal summary

We like to extend the existing evaluation metrics to include a new metric called "Structured Output Compliance". Esentially we are ensuring this output is JSON and/or JSON-LD compatible. Ideal solution would have Pydantic schema support.

Example of an existing judge metric (Hallucination) is defined here:

Ideally this is implemented as an LLM-as-a-judge but could use normal herustics by extending the regex/heuristic metric. Expectation is the new judge is added to the frontend for using LLM-as-a-judge from the UI (Online Evaluation tab) as well as in the Python SDK. The appropriate docs needs to be updated and a video attached of the metric working.

Return should be boolean (True/False), and if its using LLM as a eval then should also contain a "reason"

Motivation

I would like to see more robust set of metrics and evaluations based on recent research. We also know structured data compliance is critical for a number of use-cases.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions