Skip to content

Commit 5433333

Browse files
DachengLi1caoshiyicck0517Xiuyu-LiShangyint
authored
Update Code for S* (#80)
* SStar Code Co-authored-by: Dacheng Li <[email protected]> Co-authored-by: Shiyi Cao <[email protected]> Co-authored-by: Chengkun Cao <[email protected]> Co-authored-by: Xiuyu Li <[email protected]> Co-authored-by: Shangyin Tan <[email protected]> * update readme * exclude check * exclude check --------- Co-authored-by: caoshiyi <[email protected]> Co-authored-by: Chengkun Cao <[email protected]> Co-authored-by: Xiuyu Li <[email protected]> Co-authored-by: Shangyin Tan <[email protected]>
1 parent 4c2085a commit 5433333

File tree

163 files changed

+19744
-2
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

163 files changed

+19744
-2
lines changed

.pre-commit-config.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,12 +5,12 @@ repos:
55
- id: ruff
66
args: [ --fix, --exit-non-zero-on-fix ]
77
# NOTE (sumanthrh): Many of the files excluded here are used for validating code generation, and linters do not recognize some of the logic in these files. skythought/train is excluded for now because it's a fork of Llamafactory
8-
exclude: (^skythought/train/.*|^skythought/skythought-rl/.*|tasks/taco/pyext2\.py|tasks/taco/taco_util\.py|tasks/apps/apps_util\.py|scripts/prompts\.py)$
8+
exclude: (^skythought/train/.*|^skythought/skythought-rl/.*|tasks/taco/pyext2\.py|tasks/taco/taco_util\.py|tasks/apps/apps_util\.py|scripts/prompts\.py|skythought/test-time-scaling/.*)$
99

1010

1111
# Black needs to be ran after ruff with --fix
1212
- repo: https://github.com/psf/black
1313
rev: 24.10.0
1414
hooks:
1515
- id: black
16-
exclude: (^skythought/train/.*|^skythought/skythought-rl/.*|tasks/taco/pyext2\.py)$
16+
exclude: (^skythought/train/.*|^skythought/skythought-rl/.*|tasks/taco/pyext2\.py|skythought/test-time-scaling/.*)$

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@
2020

2121

2222
# News
23+
- **[2025/02/21]** 🎉 We released S*: Test time scaling for code generation ([paper](https://arxiv.org/pdf/2502.14382), [code](https://github.com/NovaSky-AI/SkyThought/tree/main/skythought/test-time-scaling)), a simple and extensible test time scaling framework for code generation.
2324
- **[2025/02/11]** 🎉 We released Sky-T1-7B ([model](https://huggingface.co/NovaSky-AI/Sky-T1-7B)) and Sky-T1-mini ([model](https://huggingface.co/NovaSky-AI/Sky-T1-mini)) to demonstrate the potential of RL in further enhancing model's capability beyond distillation.
2425
- **[2025/01/23]** ⚡️ We released Sky-T1-32B-Flash ([model](https://huggingface.co/NovaSky-AI/Sky-T1-32B-Flash), [data](https://huggingface.co/datasets/NovaSky-AI/Sky-T1_preference_data_10k)) to tackle overthinking and reduce reasoning sequence lengths while maintaining accuracy.
2526
- **[2025/01/19]** 🎉 [Chat demo](http://164.152.23.196:3000/) for Sky-T1-32B-Preview is alive! Please check it out!
Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
# S*: Test Time Scaling for Code Generation ####
2+
This folder provides the code for the paper "S*: Test Time Scaling for Code Generation".
3+
4+
![Overview of S* approach](assets/figure1.png)
5+
6+
## Installation (Main packages)
7+
```dspy=2.6.2, torch, vllm```
8+
9+
## Usage
10+
The scripts to reproduce the results in the paper are in the `scripts` folder.
11+
- baselines are in `baselines`, `baselines_selfdebug`, `majority_baselines`.
12+
- experiments on dev set are in: `sec[4,5,6]`.
13+
- experiments on final test set are in: `final_[]`. First run commands under `final_oracle` to produce all generations without different selection methods, then run commands under `final_[]_cached` to produce generations with different selection methods.
14+
15+
Results are availeble in google cloud storage ([Link](https://drive.google.com/drive/u/1/folders/1kmCoJ7Mkvj-umpkfsA5960hYpNrgH4X4)).
16+
17+
Simple run commands to produce generations with oracle selection and 3 rounds of generation for gpt-4o-mini.
18+
19+
Set OPENAI_API_KEY in your environment variable with `export OPENAI_API_KEY=xxx`.
20+
21+
```
22+
python evaluate_multiprocess.py \
23+
--difficulty=easy \
24+
--temperature=0.7 \
25+
--num_threads=32 \
26+
--n=16 \
27+
--selection oracle_all_rounds \
28+
--lcb_version release_v2 \
29+
--num_round 3 \
30+
--result_json_path="results/final_4omini_n_16_debug_public3_select_oracle_easy.json"
31+
```
32+
33+
To run experiments with local serve models, use ```vllm serve model_name``` to serve the model first.
34+
35+
36+
37+
#### Citation
38+
```
39+
@article{li2025sstar,
40+
title={S*: Test Time Scaling for Code Generation},
41+
author={Li, Dacheng and Cao, Shiyi and Cao, Chengkun and Li, Xiuyu and Tan, Shangyin and Keutzer, Kurt and Xing, Jiarong and Gonzalez, Joseph E. and Stoica, Ion},
42+
year={2025}
43+
}
44+
```
45+
46+
89.6 KB
Loading

0 commit comments

Comments
 (0)