Skip to content

Conversation

@mikasenghaas
Copy link
Member

@mikasenghaas mikasenghaas commented Oct 21, 2025

Description

The info column may include fields that are not pyarrow serializable which makes the push to HF hub fail. One example is the eval results from tau2-bench. This PR JSON-serializes each element in the info column in make_dataset so that the HF Hub push works.

This may have ramification for our platform if they use the info column?

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Test improvement

Testing

  • All existing tests pass when running uv run pytest locally.
  • New tests have been added to cover the changes

Checklist

  • My code follows the style guidelines of this project as outlined in AGENTS.md
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Additional Notes

@mikasenghaas mikasenghaas changed the title Serialize info column in evals dataset Serialize info column Oct 21, 2025
@mikasenghaas mikasenghaas marked this pull request as ready for review October 21, 2025 11:09
ronaldnetawat pushed a commit to ronaldnetawat/verifiers that referenced this pull request Nov 13, 2025
* educe vllm logs in tests

* Fix bug in run_benchmark

* Do not limit model len

* Kill trainer if orchestrator dies

* Increase e2e timeout

* Improve async client resilience

* Only raise if orchestrator died with exit code 1

* Fix ruff issues

---------

Co-authored-by: Mika Senghaas <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants