feat: adds evaluating PydanticaAI agent tool use trajectories and results with the Vertex AI Generative AI Evaluation service SDK #1662

ghchinoy · 2025-01-26T22:29:47Z

Description

Thank you for opening a Pull Request!
Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

Follow the CONTRIBUTING Guide.
You are listed as the author in your notebook or README file.
- Your account is listed in CODEOWNERS for the file(s).
Make your Pull Request title in the https://www.conventionalcommits.org/ specification.
Ensure the tests and linter pass (Run nox -s format from the repository root to format).
Appropriate docs were updated (if necessary)

Fixes #1660 🦕

code-review-assist

Hi team,

I'm currently reviewing this pull request and will have a full review in a few minutes. In the meantime, here's a quick summary for you and other reviewers to get up to speed:

This pull request, authored by ghchinoy, aims to add functionality for evaluating PydanticAI agent tool usage trajectories and results using the Vertex AI Generative AI Evaluation service SDK. The primary change is the addition of a new Python file, gemini/evaluation/evaluating_pydanticai_agent-tool_use_trajectories.py, which contains a comprehensive script for this purpose.

Here's a breakdown of the changes:

New File: gemini/evaluation/evaluating_pydanticai_agent-tool_use_trajectories.py This file introduces a complete Python script to evaluate PydanticAI agents. The script covers:
- Building a local PydanticAI agent.
- Preparing an agent evaluation dataset.
- Performing single tool usage evaluation.
- Conducting trajectory evaluation.
- Executing response evaluation.
- Including a custom metric for evaluating the logical flow of the agent's response.
Functionality: The script uses the Vertex AI Generative AI Evaluation service SDK to evaluate various aspects of the agent's behavior, including the selection of tools, the order of tool usage, and the correctness of the final response. It also demonstrates how to define and use custom metrics.
Dependencies: The script requires several dependencies, including google-cloud-aiplatform[evaluation], pandas, and pydantic-ai-slim[vertexai]. These are likely already included in the project's dependencies, but it's worth confirming.

The script is well-documented and includes clear instructions on how to set up the environment and run the evaluation. The use of helper functions improves readability and maintainability.

I'll be looking for things like error handling, efficiency, and overall clarity in my full review. I'll also check that the code adheres to our style guidelines.

And now, a little haiku to lighten the mood:

Code flows like a stream,
Tests pass, a joyful green light,
Review, then merge it!

code-review-assist

This pull request adds a new Python file demonstrating how to evaluate PydanticAI agent tool use trajectories and results with the Vertex AI Generative AI Evaluation service SDK. The code is well-structured and easy to follow. However, there are a few minor improvements that could be made, and some additional documentation would be beneficial.

I've also noticed that the checkbox in the pull request description for listing your account in CODEOWNERS is unchecked. Please ensure this is addressed before merging.

gemini/evaluation/evaluating_pydanticai_agent-tool_use_trajectories.py

code-review-assist · 2025-01-26T22:31:50Z

gemini/evaluation/evaluating_pydanticai_agent-tool_use_trajectories.py

+# print("prompt: Get product details and price for headphones")
+# print(agent_parsed_outcome(prompt='Get product details and price for headphones'))


These lines are commented out. If they are not needed, consider removing them to avoid confusion.

code-review-assist · 2025-01-26T22:31:50Z

gemini/evaluation/evaluating_pydanticai_agent-tool_use_trajectories.py

+    trajectory_evaluation()
+    final_response_evaluation()
+    custom_metric()
+    #delete_experiment(EXPERIMENT_NAME)


The delete_experiment function is commented out. If it's not intended for use in this script, consider removing it entirely. If it's meant to be used later, uncomment it and add a brief explanation of its purpose in a comment.

Suggested change

#delete_experiment(EXPERIMENT_NAME)

delete_experiment(EXPERIMENT_NAME)

feat: adds evaluating pydanticai tool use trajectories and results

598008b

ghchinoy requested a review from a team as a code owner January 26, 2025 22:29

code-review-assist bot reviewed Jan 26, 2025

View reviewed changes

ghchinoy marked this pull request as draft January 26, 2025 22:31

code-review-assist bot reviewed Jan 26, 2025

View reviewed changes

ghchinoy and others added 5 commits January 26, 2025 15:37

refactor: linter appeasement

d7ceff9

refactor: linter appeasement

3fb9343

refactor: linter appeasement

8e8e788

Merge branch 'main' into agent-eval-pydanticai

5f88700

Formatting

a144018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: adds evaluating PydanticaAI agent tool use trajectories and results with the Vertex AI Generative AI Evaluation service SDK #1662

feat: adds evaluating PydanticaAI agent tool use trajectories and results with the Vertex AI Generative AI Evaluation service SDK #1662

ghchinoy commented Jan 26, 2025 •

edited

Loading

code-review-assist bot left a comment

code-review-assist bot left a comment

code-review-assist bot Jan 26, 2025

code-review-assist bot Jan 26, 2025

		# print("prompt: Get product details and price for headphones")
		# print(agent_parsed_outcome(prompt='Get product details and price for headphones'))

	#delete_experiment(EXPERIMENT_NAME)
	delete_experiment(EXPERIMENT_NAME)

feat: adds evaluating PydanticaAI agent tool use trajectories and results with the Vertex AI Generative AI Evaluation service SDK #1662

Are you sure you want to change the base?

feat: adds evaluating PydanticaAI agent tool use trajectories and results with the Vertex AI Generative AI Evaluation service SDK #1662

Conversation

ghchinoy commented Jan 26, 2025 • edited Loading

Description

code-review-assist bot left a comment

Choose a reason for hiding this comment

code-review-assist bot left a comment

Choose a reason for hiding this comment

code-review-assist bot Jan 26, 2025

Choose a reason for hiding this comment

code-review-assist bot Jan 26, 2025

Choose a reason for hiding this comment

ghchinoy commented Jan 26, 2025 •

edited

Loading