-
Notifications
You must be signed in to change notification settings - Fork 342
Introduce new scoring APIs for curation + training #88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: SumanthRH <[email protected]>
Signed-off-by: SumanthRH <[email protected]>
Signed-off-by: SumanthRH <[email protected]>
Signed-off-by: SumanthRH <[email protected]>
Signed-off-by: SumanthRH <[email protected]>
Signed-off-by: SumanthRH <[email protected]>
Signed-off-by: SumanthRH <[email protected]>
Signed-off-by: SumanthRH <[email protected]>
Signed-off-by: SumanthRH <[email protected]>
Signed-off-by: SumanthRH <[email protected]>
Signed-off-by: SumanthRH <[email protected]>
Signed-off-by: SumanthRH <[email protected]>
recipes/sky-t1-preview/recipe.py
Outdated
| # We explicitly set the target number of blocks to help tune performance. | ||
| # For materialized datasets, the number of blocks determined by ray data can be small, | ||
| # especially for a multi-stage pipeline like the one here. | ||
| TARGET_NUM_ROWS_PER_BLOCK = 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still tuning settings like these for performance
| @timeout(5) # Add timeout of 5 seconds | ||
| def check_correctness(self, problem, generation): | ||
| solution = extract_answer(problem[self.task_config.answer_key]) | ||
| solution = strip_answer_string(solution) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not needed because strip_answer_string is already called in extract_answer
| return dataset.iloc[start:end] if end > 0 else dataset.iloc[start:] | ||
|
|
||
|
|
||
| def _temp_run(problem, generation, debug, result): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Placed outside to solve the same issue as in #89
Signed-off-by: SumanthRH <[email protected]>
recipes/sky-t1-preview/recipe.py
Outdated
| numina_ds_olympiads = numina_ds_olympiads.limit(num_samples) | ||
| numina_ds_math = numina_ds_math.limit(num_samples) | ||
|
|
||
| # 2. Get model responses for each of the datasets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to remove comment
|
|
||
| def convert_to_sharegpt_format(row: Dict[str, Any], prompt_column, response_column): | ||
| prompt = row[prompt_column] | ||
| # accept |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to remove
Signed-off-by: SumanthRH <[email protected]>
Signed-off-by: SumanthRH <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will move the actual example to a config file once other tests also make it in. Currently there's only one test so having all the context in one place is good
| backend: The backend to use for scoring. Supports "ray" or "mp" (str). | ||
| """ | ||
|
|
||
| TIMEOUT = 6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just wondering how did you pick this value (6 here and 10 for apps)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the timeout value is actually from the original source code for the datasets.
erictang000
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me!
| backend: The backend to use for scoring. Supports "ray" or "mp" (str). | ||
| """ | ||
|
|
||
| TIMEOUT = 6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also should this be a class constant or could it make sense for this to be a user configurable parameter passed into the constructor?
| @@ -0,0 +1,272 @@ | |||
| """ | |||
| This is the recipe for data curation for the Sky T1 Preview model . | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| This is the recipe for data curation for the Sky T1 Preview model . | |
| This is the recipe for data curation for the Sky T1 Preview model. |
recipes/sky-t1-preview/recipe.py
Outdated
|
|
||
| config = vLLMEngineProcessorConfig( | ||
| model="Qwen/QwQ-32B-Preview", | ||
| # model="Qwen/Qwen2-0.5B-Instruct", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes will do. recipe.py is still under construction.
Signed-off-by: SumanthRH <[email protected]>
Signed-off-by: SumanthRH <[email protected]>
What does this PR do?
WIP PR to introduce the new scoring APIs to be shared between evaluation + curation + training.
Also adds an example for using this with the new
ray.data.llmAPIs: docs.ray.io/en/master/data/working-with-llms.html using Sky-T1-32B-Preview data curation.TODO: