Skip to content

Commit fdd013c

Browse files
authored
Update README.md
1 parent 86767c7 commit fdd013c

File tree

1 file changed

+37
-5
lines changed

1 file changed

+37
-5
lines changed

README.md

+37-5
Original file line numberDiff line numberDiff line change
@@ -152,6 +152,14 @@ assert torch.testing.assert_close(y, model(x))
152152

153153
### Speed up LLM training
154154

155+
Install LitGPT (without updating other dependencies)
156+
157+
```
158+
pip install --no-deps 'litgpt[all]'
159+
```
160+
161+
and run
162+
155163
```python
156164
import thunder
157165
import torch
@@ -170,6 +178,14 @@ out.sum().backward()
170178

171179
### Speed up HuggingFace BERT inference
172180

181+
Install Hugging Face Transformers (recommended version is `4.50.2` and above)
182+
183+
```
184+
pip install -U transformers
185+
```
186+
187+
and run
188+
173189
```python
174190
import thunder
175191
import torch
@@ -188,14 +204,22 @@ with torch.device("cuda"):
188204

189205
inp = tokenizer(["Hello world!"], return_tensors="pt")
190206

191-
thunder_model = thunder.compile(model, plugins="reduce-overhead")
207+
thunder_model = thunder.compile(model)
192208

193209
out = thunder_model(**inp)
194210
print(out)
195211
```
196212

197213
### Speed up HuggingFace DeepSeek R1 distill inference
198214

215+
Install Hugging Face Transformers (recommended version is `4.50.2` and above)
216+
217+
```
218+
pip install -U transformers
219+
```
220+
221+
and run
222+
199223
```python
200224
import torch
201225
import transformers
@@ -214,9 +238,7 @@ with torch.device("cuda"):
214238

215239
inp = tokenizer(["Hello world! Here's a long story"], return_tensors="pt")
216240

217-
thunder_model = thunder.compile(
218-
model, recipe="hf-transformers", plugins="reduce-overhead"
219-
)
241+
thunder_model = thunder.compile(model)
220242

221243
out = thunder_model.generate(
222244
**inp, do_sample=False, cache_implementation="static", max_new_tokens=100
@@ -240,7 +262,7 @@ with torch.device("cuda"):
240262

241263
out = model(inp)
242264

243-
thunder_model = thunder.compile(model, plugins="reduce-overhead")
265+
thunder_model = thunder.compile(model)
244266

245267
out = thunder_model(inp)
246268
```
@@ -257,6 +279,16 @@ Thunder comes with a few plugins included of the box, but it's easy to write new
257279
- reduce latency with CUDAGraphs
258280
- debugging and profiling
259281

282+
For example, in order to reduce CPU overheads via CUDAGraphs you can add "reduce-overhead"
283+
to the `plugins=` argument of `thunder.compile`:
284+
285+
```python
286+
thunder_model = thunder.compile(model, plugins="reduce-overhead")
287+
```
288+
289+
This may or may not make a big difference. The point of Thunder is that you can easily
290+
swap optimizations in and out and explore the best combination for your setup.
291+
260292
## How it works
261293

262294
Thunder works in three stages:

0 commit comments

Comments
 (0)