Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: adds evaluating PydanticaAI agent tool use trajectories and results with the Vertex AI Generative AI Evaluation service SDK #1662

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

ghchinoy
Copy link
Contributor

@ghchinoy ghchinoy commented Jan 26, 2025

Description

Thank you for opening a Pull Request!
Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Follow the CONTRIBUTING Guide.
  • You are listed as the author in your notebook or README file.
    • Your account is listed in CODEOWNERS for the file(s).
  • Make your Pull Request title in the https://www.conventionalcommits.org/ specification.
  • Ensure the tests and linter pass (Run nox -s format from the repository root to format).
  • Appropriate docs were updated (if necessary)

Fixes #1660 🦕

@ghchinoy ghchinoy requested a review from a team as a code owner January 26, 2025 22:29
Copy link
Contributor

@code-review-assist code-review-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi team,

I'm currently reviewing this pull request and will have a full review in a few minutes. In the meantime, here's a quick summary for you and other reviewers to get up to speed:

This pull request, authored by ghchinoy, aims to add functionality for evaluating PydanticAI agent tool usage trajectories and results using the Vertex AI Generative AI Evaluation service SDK. The primary change is the addition of a new Python file, gemini/evaluation/evaluating_pydanticai_agent-tool_use_trajectories.py, which contains a comprehensive script for this purpose.

Here's a breakdown of the changes:

  • New File: gemini/evaluation/evaluating_pydanticai_agent-tool_use_trajectories.py This file introduces a complete Python script to evaluate PydanticAI agents. The script covers:

    • Building a local PydanticAI agent.
    • Preparing an agent evaluation dataset.
    • Performing single tool usage evaluation.
    • Conducting trajectory evaluation.
    • Executing response evaluation.
    • Including a custom metric for evaluating the logical flow of the agent's response.
  • Functionality: The script uses the Vertex AI Generative AI Evaluation service SDK to evaluate various aspects of the agent's behavior, including the selection of tools, the order of tool usage, and the correctness of the final response. It also demonstrates how to define and use custom metrics.

  • Dependencies: The script requires several dependencies, including google-cloud-aiplatform[evaluation], pandas, and pydantic-ai-slim[vertexai]. These are likely already included in the project's dependencies, but it's worth confirming.

The script is well-documented and includes clear instructions on how to set up the environment and run the evaluation. The use of helper functions improves readability and maintainability.

I'll be looking for things like error handling, efficiency, and overall clarity in my full review. I'll also check that the code adheres to our style guidelines.


And now, a little haiku to lighten the mood:

Code flows like a stream,
Tests pass, a joyful green light,
Review, then merge it!

@ghchinoy ghchinoy marked this pull request as draft January 26, 2025 22:31
Copy link
Contributor

@code-review-assist code-review-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pull request adds a new Python file demonstrating how to evaluate PydanticAI agent tool use trajectories and results with the Vertex AI Generative AI Evaluation service SDK. The code is well-structured and easy to follow. However, there are a few minor improvements that could be made, and some additional documentation would be beneficial.

I've also noticed that the checkbox in the pull request description for listing your account in CODEOWNERS is unchecked. Please ensure this is addressed before merging.

Comment on lines 152 to 153
# print("prompt: Get product details and price for headphones")
# print(agent_parsed_outcome(prompt='Get product details and price for headphones'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

These lines are commented out. If they are not needed, consider removing them to avoid confusion.

trajectory_evaluation()
final_response_evaluation()
custom_metric()
#delete_experiment(EXPERIMENT_NAME)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

The delete_experiment function is commented out. If it's not intended for use in this script, consider removing it entirely. If it's meant to be used later, uncomment it and add a brief explanation of its purpose in a comment.

Suggested change
#delete_experiment(EXPERIMENT_NAME)
delete_experiment(EXPERIMENT_NAME)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feat]: Add sample of Vertex AI Generative AI Evaluation service with PydanticAI
2 participants