Skip to content

Run judge tool#17

Merged
juho-y merged 3 commits intomainfrom
run-judge-tool
May 3, 2025
Merged

Run judge tool#17
juho-y merged 3 commits intomainfrom
run-judge-tool

Conversation

@juho-y
Copy link
Contributor

@juho-y juho-y commented Apr 28, 2025

Continuation to #16

Includes the run judge tool call and updates README.

Judges do not yet support the RAG case; hence, the RAG-specific Judge tools have been omitted.

@juho-y juho-y force-pushed the list-judges-tool branch 8 times, most recently from 8a8561f to 3177a07 Compare May 2, 2025 05:50
@juho-y juho-y force-pushed the run-judge-tool branch from 6bb8672 to 4dbdd63 Compare May 2, 2025 06:02
@juho-y juho-y force-pushed the list-judges-tool branch from 3177a07 to b844697 Compare May 2, 2025 07:41
Base automatically changed from list-judges-tool to main May 2, 2025 08:20
@juho-y juho-y force-pushed the run-judge-tool branch 4 times, most recently from 20001bc to 54db952 Compare May 2, 2025 10:18
@juho-y juho-y marked this pull request as ready for review May 2, 2025 10:57
@juho-y juho-y requested a review from TensorTemplar May 2, 2025 10:57
Comment on lines +100 to +104
result = await self.async_client.run_judge(
judge_id=request.judge_id,
request=request.request,
response=request.response,
)
Copy link
Contributor

@TensorTemplar TensorTemplar May 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: could pass the **request here if needed, but ideally we just pass RunJudgeRequest downstream. Maintaining this mapping seems redundant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, passing it downstream.

payload = {
"request": request,
"response": response,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similarily here, ideally we map From a RS API domain object for a Judge Request to our local JudgeRequest, which can be a simplified version with exclude_none=True in the basemodel

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, would need to change all the evaluator calls as well, so will leave that into a follow-up refactoring PR.

Copy link
Contributor

@TensorTemplar TensorTemplar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A cleaner domain boundary could use pydantic models explicitly, instead of enforcing it via selective mapping in the run judge client logic. Not a blocker while the judges are WIP, but good to revisit later

@juho-y juho-y force-pushed the run-judge-tool branch from 54db952 to 7acf9f9 Compare May 2, 2025 11:17
@juho-y juho-y force-pushed the run-judge-tool branch from 7acf9f9 to b0b8006 Compare May 2, 2025 12:51
@juho-y juho-y merged commit 428a5f0 into main May 3, 2025
1 check passed
@juho-y juho-y deleted the run-judge-tool branch May 3, 2025 05:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants