Skip to content

Implements Offline APO #106

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 116 commits into
base: main
Choose a base branch
from
Open

Implements Offline APO #106

wants to merge 116 commits into from

Conversation

jdchang1
Copy link
Collaborator

@jdchang1 jdchang1 commented Jul 7, 2025

Main Changes:

  • Added non-preference based offline RL track
  • Offline Dataloader
  • Offline Model (HF/MPT)
  • Offline forward / loss

Backwards compatibility breaking change
offline_rl callback now refers to the single prompt/response case
pairwise_offline_rl callback is what was previously offline_rl

Working runs are:
apo-single-stream-openr1-gBSwx1

@jdchang1 jdchang1 requested a review from bowenyang008 as a code owner July 17, 2025 19:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants