-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Description
RAG Evaluation
- 100 questions
Types of questions:
- 60 on general trade
- 12 on growth/variation
- 28 on rankings
- RAG evaluation results
Best combination tested so far: multi-qa-mpnet-base-cos-v1 (embeddings) + gpt-3.5-turbo (LLM)
- Accuracy: 73%
- Answers missing data: 9
- Answers missing context: 14
- Incorrect answers: 4
- Average latency: 4.19 s
Out of the wrong answers:
- 24 were general questions
- 3 of growth (the lowest)
- 0 of ranking questions
We're preparing a presentation gathering the results of all approaches with more detail. Next week I'll be improving the RAG + LLM and evaluating the previous multi-layer approach with Pippo.
Metadata
Metadata
Assignees
Labels
No labels