Skip to content

FactScore evaluator class #196

@dylanbouchard

Description

@dylanbouchard

Is your feature request related to a problem? Please describe.
We would like an evaluator class specific to FactScore. This dataset asks questions about celebrities and leverages Wikipedia content as an answer key. The evaluation approach for a single question is as follows:

  1. Generate an LLM response to a question from FactScore dataset
  2. Deconstruct that response into individual claims (using an LLM)
  3. Calculate the precision relative to the Wikipedia answer key (i.e., proportion of generated claims supported by the Wikipedia content)
  4. The precision is the FactScore for that single LLM response

Describe the solution you'd like
Assigning this to @virenbajaj and trusting his guidance for the design.

Describe alternatives you've considered
Loading FactScore dataset directly from load_example_dataset utility function and going through the above steps manually

Additional context
Refer to the linked paper above for more information

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions