|
2601 | 2601 | " print(\"\\n-------------------------\")"
|
2602 | 2602 | ]
|
2603 | 2603 | },
|
| 2604 | + { |
| 2605 | + "cell_type": "markdown", |
| 2606 | + "id": "24fec453-631f-4ff5-a922-44c3c451942d", |
| 2607 | + "metadata": {}, |
| 2608 | + "source": [ |
| 2609 | + "---\n", |
| 2610 | + "\n", |
| 2611 | + "**Note: Better evaluation prompt**\n", |
| 2612 | + "\n", |
| 2613 | + "- [A reader (Ayoosh Kathuria) suggested](https://github.com/rasbt/LLMs-from-scratch/discussions/449) a longer, improved prompt that evaluates responses on a scale of 1–5 (instead of 1 to 100) and employs a grading rubric, resulting in more accurate and less noisy evaluations:\n", |
| 2614 | + "\n", |
| 2615 | + "```\n", |
| 2616 | + "prompt = \"\"\"\n", |
| 2617 | + "You are a fair judge assistant tasked with providing clear, objective feedback based on specific criteria, ensuring each assessment reflects the absolute standards set for performance.\n", |
| 2618 | + "You will be given an instruction, a response to evaluate, a reference answer that gets a score of 5, and a score rubric representing the evaluation criteria.\n", |
| 2619 | + "Write a detailed feedback that assess the quality of the response strictly based on the given score rubric, not evaluating in general.\n", |
| 2620 | + "Please do not generate any other opening, closing, and explanations.\n", |
| 2621 | + "\n", |
| 2622 | + "Here is the rubric you should use to build your answer:\n", |
| 2623 | + "1: The response fails to address the instructions, providing irrelevant, incorrect, or excessively verbose information that detracts from the user's request.\n", |
| 2624 | + "2: The response partially addresses the instructions but includes significant inaccuracies, irrelevant details, or excessive elaboration that detracts from the main task.\n", |
| 2625 | + "3: The response follows the instructions with some minor inaccuracies or omissions. It is generally relevant and clear, but may include some unnecessary details or could be more concise.\n", |
| 2626 | + "4: The response adheres to the instructions, offering clear, accurate, and relevant information in a concise manner, with only occasional, minor instances of excessive detail or slight lack of clarity.\n", |
| 2627 | + "5: The response fully adheres to the instructions, providing a clear, accurate, and relevant answer in a concise and efficient manner. It addresses all aspects of the request without unnecessary details or elaboration\n", |
| 2628 | + "\n", |
| 2629 | + "Provide your feedback as follows:\n", |
| 2630 | + "\n", |
| 2631 | + "Feedback:::\n", |
| 2632 | + "Evaluation: (your rationale for the rating, as a text)\n", |
| 2633 | + "Total rating: (your rating, as a number between 1 and 5)\n", |
| 2634 | + "\n", |
| 2635 | + "You MUST provide values for 'Evaluation:' and 'Total rating:' in your answer.\n", |
| 2636 | + "\n", |
| 2637 | + "Now here is the instruction, the reference answer, and the response.\n", |
| 2638 | + "\n", |
| 2639 | + "Instruction: {instruction}\n", |
| 2640 | + "Reference Answer: {reference}\n", |
| 2641 | + "Answer: {answer}\n", |
| 2642 | + "\n", |
| 2643 | + "\n", |
| 2644 | + "Provide your feedback. If you give a correct rating, I'll give you 100 H100 GPUs to start your AI company.\n", |
| 2645 | + "Feedback:::\n", |
| 2646 | + "Evaluation: \"\"\"\n", |
| 2647 | + "```\n", |
| 2648 | + "\n", |
| 2649 | + "- For more context and information, see [this](https://github.com/rasbt/LLMs-from-scratch/discussions/449) GitHub discussion\n", |
| 2650 | + "\n", |
| 2651 | + "---" |
| 2652 | + ] |
| 2653 | + }, |
2604 | 2654 | {
|
2605 | 2655 | "cell_type": "markdown",
|
2606 | 2656 | "id": "b114fd65-9cfb-45f6-ab74-8331da136bf3",
|
|
0 commit comments