-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: adds evaluating PydanticaAI agent tool use trajectories and results with the Vertex AI Generative AI Evaluation service SDK #1662
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi team,
I'm currently reviewing this pull request and will have a full review in a few minutes. In the meantime, here's a quick summary for you and other reviewers to get up to speed:
This pull request, authored by ghchinoy, aims to add functionality for evaluating PydanticAI agent tool usage trajectories and results using the Vertex AI Generative AI Evaluation service SDK. The primary change is the addition of a new Python file, gemini/evaluation/evaluating_pydanticai_agent-tool_use_trajectories.py
, which contains a comprehensive script for this purpose.
Here's a breakdown of the changes:
-
New File:
gemini/evaluation/evaluating_pydanticai_agent-tool_use_trajectories.py
This file introduces a complete Python script to evaluate PydanticAI agents. The script covers:- Building a local PydanticAI agent.
- Preparing an agent evaluation dataset.
- Performing single tool usage evaluation.
- Conducting trajectory evaluation.
- Executing response evaluation.
- Including a custom metric for evaluating the logical flow of the agent's response.
-
Functionality: The script uses the Vertex AI Generative AI Evaluation service SDK to evaluate various aspects of the agent's behavior, including the selection of tools, the order of tool usage, and the correctness of the final response. It also demonstrates how to define and use custom metrics.
-
Dependencies: The script requires several dependencies, including
google-cloud-aiplatform[evaluation]
,pandas
, andpydantic-ai-slim[vertexai]
. These are likely already included in the project's dependencies, but it's worth confirming.
The script is well-documented and includes clear instructions on how to set up the environment and run the evaluation. The use of helper functions improves readability and maintainability.
I'll be looking for things like error handling, efficiency, and overall clarity in my full review. I'll also check that the code adheres to our style guidelines.
And now, a little haiku to lighten the mood:
Code flows like a stream,
Tests pass, a joyful green light,
Review, then merge it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pull request adds a new Python file demonstrating how to evaluate PydanticAI agent tool use trajectories and results with the Vertex AI Generative AI Evaluation service SDK. The code is well-structured and easy to follow. However, there are a few minor improvements that could be made, and some additional documentation would be beneficial.
I've also noticed that the checkbox in the pull request description for listing your account in CODEOWNERS
is unchecked. Please ensure this is addressed before merging.
gemini/evaluation/evaluating_pydanticai_agent-tool_use_trajectories.py
Outdated
Show resolved
Hide resolved
# print("prompt: Get product details and price for headphones") | ||
# print(agent_parsed_outcome(prompt='Get product details and price for headphones')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trajectory_evaluation() | ||
final_response_evaluation() | ||
custom_metric() | ||
#delete_experiment(EXPERIMENT_NAME) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The delete_experiment
function is commented out. If it's not intended for use in this script, consider removing it entirely. If it's meant to be used later, uncomment it and add a brief explanation of its purpose in a comment.
#delete_experiment(EXPERIMENT_NAME) | |
delete_experiment(EXPERIMENT_NAME) |
Description
Thank you for opening a Pull Request!
Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
CONTRIBUTING
Guide.CODEOWNERS
for the file(s).nox -s format
from the repository root to format).Fixes #1660 🦕