You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Fixes some broken links after #77 . `skythought_evals` has been renamed to `evals` and the package name is `skythought`.
- Added separate yamls for Numina for a better quickstart experience. Ideally, we shouldn't have to keep adding yamls for all the training datasets in the evaluation library, and should instead provide APIs for standalone scripts. For now we do this to support reproduction of Sky-T1 models.
Signed-off-by: SumanthRH <[email protected]>
Copy file name to clipboardExpand all lines: README.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -37,7 +37,7 @@
37
37
38
38
We open source the code and scripts we used for data curation, training, and evaluation for Sky-T1-32B-Preview, you can find more details in each directory.
39
39
-[`recipes`](./recipes/): Recipes - data curation steps and training strategies - for building our models `Sky-T1-32B-Flash`, `Sky-T1-32B-Preview` and `Sky-T1-7B` series.
40
-
-[`skythought/skythought_evals`](./skythought/skythought_evals/): Our data generation and evaluation library.
40
+
-[`skythought/evals`](./skythought/evals/): Our data generation and evaluation library.
41
41
-[`skythought/train`](./skythought/train/): Training scripts for Sky-T1. We use [Llama-Factory](https://github.com/hiyouga/LLaMA-Factory) to perform training.
42
42
-[`skythought/skythought-rl`](./skythought/skythought-rl/): RL training code for Sky-T1-7B and Sky-T1-mini.
43
43
@@ -79,7 +79,7 @@ We support a wide variety of datasets in mathematics, science and coding:
79
79
- GSM8K
80
80
- AIME'25
81
81
82
-
For more details, please refer to our [evaluation guide](examples/evaluate.ipynb) and the [README](skythought/skythought_evals/README.md).
82
+
For more details, please refer to our [evaluation guide](examples/evaluate.ipynb) and the [README](skythought/evals/README.md).
83
83
84
84
85
85
### Evaluation results
@@ -112,7 +112,7 @@ We also evaluate on non-reasoning benchmarks (these are benchmarks for instructi
For more details, refer [here](./skythought/skythought_evals/base_instruct_evals.md).
115
+
For more details, refer [here](./skythought/evals/base_instruct_evals.md).
116
116
117
117
## Fully Open-source: Driving Progress Together
118
118
We believe that open-source collaboration drives progress, and with Sky-T1-32B-Preview, we are fully committed to empowering the community. We open-source all details (i.e., data, codes, model weights) to enable the community to replicate and improve on our results *easily*:
Copy file name to clipboardExpand all lines: examples/evaluate.ipynb
+3-3Lines changed: 3 additions & 3 deletions
Original file line number
Diff line number
Diff line change
@@ -94,7 +94,7 @@
94
94
"1. Quick Start\n",
95
95
"\n",
96
96
"```bash\n",
97
-
"skythought_evals evaluate \\\n",
97
+
"skythought evaluate \\\n",
98
98
"--task aime24 \\\n",
99
99
"--model NovaSky-AI/Sky-T1-32B-Preview \\\n",
100
100
"--backend vllm \\\n",
@@ -124,7 +124,7 @@
124
124
"### Key Concepts\n",
125
125
"\n",
126
126
"- Task: A task is an evaluation dataset. We use the `task` argument to retrieve the corresponding configuration file from our pre-configured benchmarks (To see the available tasks, use `skythought evaluate --help`) \n",
127
-
"- Model: A Model consists of the model ID and templating configuration. This configuration optionally contains the system prompt and an assistant prefill message. We use the `model` argument to retrieve pre-configured templating parameters (system prompt, assistant prefill, etc) for the model, if available. You can find the list of pre-configured models [here](https://github.com/NovaSky-AI/SkyThought/blob/main/skythought/skythought_evals/models/model_configs.yaml). If a model is not available, then we use no system prompt (i.e we default to the system prompt in the chat template, if specified). You can use the `--system-prompt-name` flag to use one of the pre-configured system prompts in the library. To see the available system prompts, use `skythought evaluate --help`. You can also pass the full system prompt via CLI with the `--system-prompt` option. \n",
127
+
"- Model: A Model consists of the model ID and templating configuration. This configuration optionally contains the system prompt and an assistant prefill message. We use the `model` argument to retrieve pre-configured templating parameters (system prompt, assistant prefill, etc) for the model, if available. You can find the list of pre-configured models [here](https://github.com/NovaSky-AI/SkyThought/blob/main/skythought/evals/models/model_configs.yaml). If a model is not available, then we use no system prompt (i.e we default to the system prompt in the chat template, if specified). You can use the `--system-prompt-name` flag to use one of the pre-configured system prompts in the library. To see the available system prompts, use `skythought evaluate --help`. You can also pass the full system prompt via CLI with the `--system-prompt` option. \n",
128
128
"- Backend: The Backend is concerned with how the LLM instance is created and queried. We support a variety of backends via the `backend` argument. \n",
129
129
" - The `openai` backend can be used to query OpenAI-compatible endpoints. Example: `--backend openai --backend-args base_url=https://api.openai.com`\n",
130
130
" - The `vllm` backend instantiates a local model instance with [vLLM](docs.vllm.ai) for efficient inference. \n",
@@ -154,7 +154,7 @@
154
154
"cell_type": "markdown",
155
155
"metadata": {},
156
156
"source": [
157
-
"For more details - such as which configurations are best for performance, how to perform multi-node inference, etc refer to the [README](../skythought/skythought_evals/README.md)"
157
+
"For more details - such as which configurations are best for performance, how to perform multi-node inference, etc refer to the [README](../skythought/evals/README.md)"
Copy file name to clipboardExpand all lines: recipes/sky-t1-7b/README.md
+43-8Lines changed: 43 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,23 +3,58 @@ For a detailed training recipes and technical details, refer to the [blog](https
3
3
4
4
## SFT: Step 1 and Step 3 SFT
5
5
### Distillation Data Mixture
6
-
Make sure you have installed the `skythought-evals` package as outlined in the [README.md](/README.md#usage). All the data curation commands are provided from the root directory of the repo.
6
+
Make sure you have installed the `skythought` package as outlined in the [README.md](/README.md#usage). All the data curation commands are provided from the root directory of the repo.
7
7
8
8
For the distillation performed in step 1 and step 3, we use the following script for data generation. Replace the `$MODEL_NAME` with the model to be distilled from.
9
+
For each subset (`numina_math`, `numina_olympaids`, etc), the `score` command requires the output directory from the previous `generate` command.
For step 1 and step 3 SFT, follow the instructions in `skythought/train`.
20
55
21
56
## RL: Step 2 and Step 4 RL
22
57
For RL training, install our modified fork of [VeRL](https://github.com/volcengine/verl) under `skythought/skythought-rl` and follow the instructions there, we also incorporate the math and coding testing utils from the [PRIME](https://github.com/PRIME-RL/PRIME) repo.
23
58
24
59
## Evaluation
25
-
For evaluation, we use the [script](https://github.com/QwenLM/Qwen2.5-Math/blob/main/evaluation/sh/eval.sh) from the [Qwen math eval suite](https://github.com/QwenLM/Qwen2.5-Math/tree/main/evaluation). We use vLLM version `0.6.2` for all evaluations. For AMC and AIME, we use temp=0.6, top_p=0.95 and n_sample=8. After the generation, we calculate the pass@1 using this [script](https://github.com/NovaSky-AI/SkyThought/tree/main/scripts/qwen_eval_bon.py). For MATH500 and OlympiadBench, we use greedy decoding. We use the [skythought system prompt](https://github.com/NovaSky-AI/SkyThought/blob/main/skythought/skythought_evals/models/model_configs.yaml) when evaluating all the models trained by us except for Sky-T1-mini which is evaluated [without a system prompt](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B#usage-recommendations).
60
+
For evaluation, we use the [script](https://github.com/QwenLM/Qwen2.5-Math/blob/main/evaluation/sh/eval.sh) from the [Qwen math eval suite](https://github.com/QwenLM/Qwen2.5-Math/tree/main/evaluation). We use vLLM version `0.6.2` for all evaluations. For AMC and AIME, we use temp=0.6, top_p=0.95 and n_sample=8. After the generation, we calculate the pass@1 using this [script](https://github.com/NovaSky-AI/SkyThought/tree/main/scripts/qwen_eval_bon.py). For MATH500 and OlympiadBench, we use greedy decoding. We use the [skythought system prompt](https://github.com/NovaSky-AI/SkyThought/blob/main/skythought/evals/models/model_configs.yaml) when evaluating all the models trained by us except for Sky-T1-mini which is evaluated [without a system prompt](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B#usage-recommendations).
Copy file name to clipboardExpand all lines: recipes/sky-t1-flash/README.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ For a detailed breakdown of the duration curation steps and training methodology
6
6
7
7
## Setup
8
8
9
-
Make sure you have installed the `skythought-evals` package as outlined in the [README.md](/README.md#usage). All the data curation commands are provided from the root directory of the repo.
9
+
Make sure you have installed the `skythought` package as outlined in the [README.md](/README.md#usage). All the data curation commands are provided from the root directory of the repo.
Copy file name to clipboardExpand all lines: recipes/sky-t1-preview/README.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ Give below are the instructions to replicate the data preprocessing and training
6
6
7
7
## Setup
8
8
9
-
Make sure you have installed the `skythought-evals` package as outlined in the [README.md](/README.md#usage). All the data curation commands are provided from the root directory of the repo.
9
+
Make sure you have installed the `skythought` package as outlined in the [README.md](/README.md#usage). All the data curation commands are provided from the root directory of the repo.
10
10
Set the env variable `SKYT_HOME` as the directory for the final dataset.
0 commit comments