-
Notifications
You must be signed in to change notification settings - Fork 112
Open
Description
Is your feature request related to a problem? Please describe.
We would like an evaluator class specific to FactScore. This dataset asks questions about celebrities and leverages Wikipedia content as an answer key. The evaluation approach for a single question is as follows:
- Generate an LLM response to a question from FactScore dataset
- Deconstruct that response into individual claims (using an LLM)
- Calculate the precision relative to the Wikipedia answer key (i.e., proportion of generated claims supported by the Wikipedia content)
- The precision is the FactScore for that single LLM response
Describe the solution you'd like
Assigning this to @virenbajaj and trusting his guidance for the design.
Describe alternatives you've considered
Loading FactScore dataset directly from load_example_dataset utility function and going through the above steps manually
Additional context
Refer to the linked paper above for more information
Metadata
Metadata
Assignees
Labels
No labels