Skip to content

Commit 690854a

Browse files
authored
Release v6.2.4 with Lemonade relocation notice (#325)
Signed-off-by: Jeremy Fowers <[email protected]>
1 parent 8b577ec commit 690854a

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

45 files changed

+326
-2856
lines changed

README.md

Lines changed: 11 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -7,25 +7,22 @@
77

88
We are on a mission to make it easy to use the most important tools in the ONNX ecosystem. TurnkeyML accomplishes this by providing a full SDK for LLMs with the Lemonade SDK, as well as a no-code CLIs for general ONNX workflows with `turnkey`.
99

10-
## 🍋 Lemonade SDK: Quickly serve, benchmark and deploy LLMs
10+
## 🚧 🍋The Lemonade SDK Project has moved to: https://github.com/lemonade-sdk/lemonade 🚧
1111

12-
The [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) is designed to make it easy to serve, benchmark, and deploy large language models (LLMs) on a variety of hardware platforms, including CPU, GPU, and NPU.
12+
The new PyPI package name is `lemonade-sdk`
1313

14-
<div align="center">
15-
<img src="https://download.amd.com/images/lemonade_640x480_1.gif" alt="Lemonade Demo" title="Lemonade in Action">
16-
</div>
14+
The new Lemonade_Server_Installer.exe is at: https://github.com/lemonade-sdk/lemonade/releases
1715

18-
The [Lemonade SDK](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md) is comprised of the following:
16+
Please migrate to the new repository and package as soon as possible.
1917

20-
- 🌐 **Lemonade Server**: A server interface that uses the standard Open AI API, allowing applications to integrate with local LLMs.
21-
- 🐍 **Lemonade Python API**: Offers High-Level API for easy integration of Lemonade LLMs into Python applications and Low-Level API for custom experiments.
22-
- 🖥️ **Lemonade CLI**: The `lemonade` CLI lets you mix-and-match LLMs, frameworks (PyTorch, ONNX, GGUF), and measurement tools to run experiments. The available tools are:
23-
- Prompting an LLM.
24-
- Measuring the accuracy of an LLM using a variety of tests.
25-
- Benchmarking an LLM to get the time-to-first-token and tokens per second.
26-
- Profiling the memory usage of an LLM.
18+
For example:
19+
```
20+
pip uninstall turnkeyml
21+
pip install lemonade-sdk[YOUR_EXTRAS]
22+
e.g., pip install lemonade-sdk[llm-oga-hybrid]
23+
```
2724

28-
### [Click here to get started with Lemonade.](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/README.md)
25+
Thank you for using Lemonade SDK!
2926

3027
## 🔑 Turnkey: A Complementary Tool for ONNX Workflows
3128

docs/contribute.md

Lines changed: 0 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -17,18 +17,6 @@ The guidelines document is organized as the following sections:
1717
- [PyPI Release Process](#pypi-release-process)
1818
- [Public APIs](#public-apis)
1919

20-
## 🍋 Contributing a Lemonade Server Demo
21-
22-
Lemonade Server Demos aim to be reproducible in under 10 minutes, require no code changes to the app you're integrating, and use an app supporting the OpenAI API with a configurable base URL.
23-
24-
Please see [AI Toolkit ReadMe](https://github.com/onnx/turnkeyml/blob/main/examples/lemonade/server/ai-toolkit.md) for an example Markdown contribution.
25-
26-
- To Submit your example, open a pull request in the TurnkeyML GitHub repo with the following:
27-
- Add your .md file in the [Server Examples](https://github.com/onnx/turnkeyml/tree/main/examples/lemonade/server) folder.
28-
- Assign your PR to the maintainers
29-
30-
We’re excited to see what you build! If you’re unsure about your idea or need help unblocking an integration, feel free to reach out via GitHub Issues or [email](mailto:[email protected]).
31-
3220
## Contributing a model
3321

3422
One of the easiest ways to contribute is to add a model to the benchmark. To do so, simply add a `.py` file to the `models/` directory that instantiates and calls a PyTorch model. The automation in `discover` will make the PyTorch model available to the rest of the `Tools`!

docs/lemonade/README.md

Lines changed: 11 additions & 254 deletions
Original file line numberDiff line numberDiff line change
@@ -1,265 +1,22 @@
11
# 🍋 Lemonade SDK
22

3-
*The long-term objective of the Lemonade SDK is to provide the ONNX ecosystem with the same kind of tools available in the GGUF ecosystem.*
3+
### RELOCATION NOTICE
44

5-
Lemonade SDK is built on top of [OnnxRuntime GenAI (OGA)](https://github.com/microsoft/onnxruntime-genai), an ONNX LLM inference engine developed by Microsoft to improve the LLM experience on AI PCs, especially those with accelerator hardware such as Neural Processing Units (NPUs).
5+
This content has moved to: https://github.com/lemonade-sdk/lemonade/blob/main/docs/README.md
66

7-
The Lemonade SDK provides everything needed to get up and running quickly with LLMs on OGA:
7+
The Lemonade SDK project has moved to https://github.com/lemonade-sdk/lemonade
88

9-
| **Feature** | **Description** |
10-
|------------------------------------------|-----------------------------------------------------------------------------------------------------|
11-
| **🌐 Local LLM server with OpenAI API compatibility (Lemonade Server)** | Replace cloud-based LLMs with private and free LLMs that run locally on your own PC's NPU and GPU. |
12-
| **🖥️ CLI with tools for prompting, benchmarking, and accuracy tests** | Enables convenient interoperability between models, frameworks, devices, accuracy tests, and deployment options. |
13-
| **🐍 Python API based on `from_pretrained()`** | Provides easy integration with Python applications for loading and using LLMs. |
9+
The new PyPI package name is `lemonade-sdk`
1410

11+
The new Lemonade_Server_Installer.exe is at: https://github.com/lemonade-sdk/lemonade/releases
1512

16-
## Table of Contents
13+
Please migrate to the new repository and package as soon as possible.
1714

18-
- [Installation](#installation)
19-
- [Installing Lemonade Server via Executable](#installing-from-lemonade_server_installerexe)
20-
- [Installing Lemonade SDK From PyPI](#installing-from-pypi)
21-
- [Installing Lemonade SDK From Source](#installing-from-source)
22-
- [CLI Commands](#cli-commands)
23-
- [Prompting](#prompting)
24-
- [Accuracy](#accuracy)
25-
- [Benchmarking](#benchmarking)
26-
- [LLM Report](#llm-report)
27-
- [Memory Usage](#memory-usage)
28-
- [Serving](#serving)
29-
- [API](#api)
30-
- [High-Level APIs](#high-level-apis)
31-
- [Low-Level API](#low-level-api)
32-
- [Contributing](#contributing)
33-
34-
35-
# Installation
36-
37-
There are 3 ways a user can install the Lemonade SDK:
38-
39-
1. Use the [Lemonade Server Installer](#installing-from-lemonade_server_installerexe). This provides a no code way to run LLMs locally and integrate with OpenAI compatible applications.
40-
1. Use [PyPI installation](#installing-from-pypi) by installing the `turnkeyml` package with the appropriate extras for your backend. This will install the full set of Turnkey and Lemonade SDK tools, including Lemonade Server, API, and CLI commands.
41-
1. Use [source installation](#installing-from-source) if you plan to contribute or customize the Lemonade SDK.
42-
43-
44-
## Installing From Lemonade_Server_Installer.exe
45-
46-
The Lemonade Server is available as a standalone tool with a one-click Windows installer `.exe`. Check out the [Lemonade_Server_Installer.exe guide](lemonade_server_exe.md) for installation instructions and the [server spec](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/server_spec.md) to learn more about the functionality.
47-
48-
The Lemonade Server [examples folder](https://github.com/onnx/turnkeyml/tree/main/examples/lemonade/server) has guides for how to use Lemonade Server with a collection of applications that we have tested.
49-
50-
## Installing From PyPI
51-
52-
To install the Lemonade SDK from PyPI:
53-
54-
1. Create and activate a [miniconda](https://repo.anaconda.com/miniconda/Miniconda3-latest-Windows-x86_64.exe) environment.
55-
```bash
56-
conda create -n lemon python=3.10
57-
```
58-
59-
```bash
60-
conda activate lemon
61-
```
62-
63-
3. Install Lemonade for your backend of choice:
64-
- [OnnxRuntime GenAI with CPU backend](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/ort_genai_igpu.md):
65-
```bash
66-
pip install turnkeyml[llm-oga-cpu]
67-
```
68-
- [OnnxRuntime GenAI with Integrated GPU (iGPU, DirectML) backend](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/ort_genai_igpu.md):
69-
> Note: Requires Windows and a DirectML-compatible iGPU.
70-
```bash
71-
pip install turnkeyml[llm-oga-igpu]
72-
```
73-
- OnnxRuntime GenAI with Ryzen AI Hybrid (NPU + iGPU) backend:
74-
> Note: Ryzen AI Hybrid requires a Windows 11 PC with an AMD Ryzen™ AI 300-series processor.
75-
76-
- Follow the environment setup instructions [here](https://ryzenai.docs.amd.com/en/latest/llm/high_level_python.html)
77-
- Hugging Face (PyTorch) LLMs for CPU backend:
78-
```bash
79-
pip install turnkeyml[llm]
80-
```
81-
- llama.cpp: see [instructions](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/llamacpp.md).
82-
83-
4. Use `lemonade -h` to explore the LLM tools, and see the [command](#cli-commands) and [API](#api) examples below.
84-
85-
## Installing From Source
86-
87-
The Lemonade SDK can be installed from source code by cloning this repository and following the instructions [here](source_installation_inst.md).
88-
89-
90-
# CLI Commands
91-
92-
The `lemonade` CLI uses a unique command syntax that enables convenient interoperability between models, frameworks, devices, accuracy tests, and deployment options.
93-
94-
Each unit of functionality (e.g., loading a model, running a test, deploying a server, etc.) is called a `Tool`, and a single call to `lemonade` can invoke any number of `Tools`. Each `Tool` will perform its functionality, then pass its state to the next `Tool` in the command.
95-
96-
You can read each command out loud to understand what it is doing. For example, a command like this:
97-
98-
```bash
99-
lemonade -i microsoft/Phi-3-mini-4k-instruct oga-load --device igpu --dtype int4 llm-prompt -p "Hello, my thoughts are"
100-
```
101-
102-
Can be read like this:
103-
104-
> Run `lemonade` on the input `(-i)` checkpoint `microsoft/Phi-3-mini-4k-instruct`. First, load it in the OnnxRuntime GenAI framework (`oga-load`), onto the integrated GPU device (`--device igpu`) in the int4 data type (`--dtype int4`). Then, pass the OGA model to the prompting tool (`llm-prompt`) with the prompt (`-p`) "Hello, my thoughts are" and print the response.
105-
106-
The `lemonade -h` command will show you which options and Tools are available, and `lemonade TOOL -h` will tell you more about that specific Tool.
107-
108-
109-
## Prompting
110-
111-
To prompt your LLM, try one of the following:
112-
113-
OGA iGPU:
114-
```bash
115-
lemonade -i microsoft/Phi-3-mini-4k-instruct oga-load --device igpu --dtype int4 llm-prompt -p "Hello, my thoughts are" -t
116-
```
117-
118-
Hugging Face:
119-
```bash
120-
lemonade -i facebook/opt-125m huggingface-load llm-prompt -p "Hello, my thoughts are" -t
121-
```
122-
123-
The LLM will run with your provided prompt, and the LLM's response to your prompt will be printed to the screen. You can replace the `"Hello, my thoughts are"` with any prompt you like.
124-
125-
You can also replace the `facebook/opt-125m` with any Hugging Face checkpoint you like, including LLaMA-2, Phi-2, Qwen, Mamba, etc.
126-
127-
You can also set the `--device` argument in `oga-load` and `huggingface-load` to load your LLM on a different device.
128-
129-
The `-t` (or `--template`) flag instructs lemonade to insert the prompt string into the model's chat template.
130-
This typically results in the model returning a higher quality response.
131-
132-
Run `lemonade huggingface-load -h` and `lemonade llm-prompt -h` to learn more about these tools.
133-
134-
## Accuracy
135-
136-
To measure the accuracy of an LLM using MMLU (Measuring Massive Multitask Language Understanding), try the following:
137-
138-
OGA iGPU:
139-
```bash
140-
lemonade -i microsoft/Phi-3-mini-4k-instruct oga-load --device igpu --dtype int4 accuracy-mmlu --tests management
141-
```
142-
143-
Hugging Face:
144-
```bash
145-
lemonade -i facebook/opt-125m huggingface-load accuracy-mmlu --tests management
146-
```
147-
148-
This command will run just the management test from MMLU on your LLM and save the score to the lemonade cache at `~/.cache/lemonade`. You can also run other subject tests by replacing management with the new test subject name. For the full list of supported subjects, see the [MMLU Accuracy Read Me](mmlu_accuracy.md).
149-
150-
You can run the full suite of MMLU subjects by omitting the `--test` argument. You can learn more about this with `lemonade accuracy-mmlu -h`.
151-
152-
## Benchmarking
153-
154-
To measure the time-to-first-token and tokens/second of an LLM, try the following:
155-
156-
OGA iGPU:
157-
```bash
158-
lemonade -i microsoft/Phi-3-mini-4k-instruct oga-load --device igpu --dtype int4 oga-bench
159-
```
160-
161-
Hugging Face:
162-
```bash
163-
lemonade -i facebook/opt-125m huggingface-load huggingface-bench
164-
```
165-
166-
This command will run a few warm-up iterations, then a few generation iterations where performance data is collected.
167-
168-
The prompt size, number of output tokens, and number iterations are all parameters. Learn more by running `lemonade oga-bench -h` or `lemonade huggingface-bench -h`.
169-
170-
## LLM Report
171-
172-
To see a report that contains all the benchmarking results and all the accuracy results, use the `report` tool with the `--perf` flag:
173-
174-
`lemonade report --perf`
175-
176-
The results can be filtered by model name, device type and data type. See how by running `lemonade report -h`.
177-
178-
## Memory Usage
179-
180-
The peak memory used by the `lemonade` build is captured in the build output. To capture more granular
181-
memory usage information, use the `--memory` flag. For example:
182-
183-
OGA iGPU:
184-
```bash
185-
lemonade --memory -i microsoft/Phi-3-mini-4k-instruct oga-load --device igpu --dtype int4 oga-bench
15+
For example:
18616
```
187-
188-
Hugging Face:
189-
```bash
190-
lemonade --memory -i facebook/opt-125m huggingface-load huggingface-bench
191-
```
192-
193-
In this case a `memory_usage.png` file will be generated and stored in the build folder. This file
194-
contains a figure plotting the memory usage over the build time. Learn more by running `lemonade -h`.
195-
196-
## Serving
197-
198-
You can launch an OpenAI-compatible server with:
199-
200-
```bash
201-
lemonade serve
17+
pip uninstall turnkeyml
18+
pip install lemonade-sdk[YOUR_EXTRAS]
19+
e.g., pip install lemonade-sdk[llm-oga-hybrid]
20220
```
20321

204-
Visit the [server spec](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/server_spec.md) to learn more about the endpoints provided as well as how to launch the server with more detailed informational messages enabled.
205-
206-
See the Lemonade Server [examples folder](https://github.com/onnx/turnkeyml/tree/main/examples/lemonade/server) to see a collection of applications that we have tested with Lemonade Server.
207-
208-
# API
209-
210-
Lemonade is also available via API.
211-
212-
## High-Level APIs
213-
214-
The high-level Lemonade API abstracts loading models from any supported framework (e.g., Hugging Face, OGA) and backend (e.g., CPU, iGPU, Hybrid) using the popular `from_pretrained()` function. This makes it easy to integrate Lemonade LLMs into Python applications. For more information on recipes and compatibility, see the [Lemonade API ReadMe](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/lemonade_api.md).
215-
216-
OGA iGPU:
217-
```python
218-
from lemonade.api import from_pretrained
219-
220-
model, tokenizer = from_pretrained("Qwen/Qwen2.5-0.5B-Instruct", recipe="oga-igpu")
221-
222-
input_ids = tokenizer("This is my prompt", return_tensors="pt").input_ids
223-
response = model.generate(input_ids, max_new_tokens=30)
224-
225-
print(tokenizer.decode(response[0]))
226-
```
227-
228-
You can learn more about the high-level APIs [here](https://github.com/onnx/turnkeyml/tree/main/examples/lemonade).
229-
230-
## Low-Level API
231-
232-
The low-level API is useful for designing custom experiments. For example, sweeping over specific checkpoints, devices, and/or tools.
233-
234-
Here's a quick example of how to prompt a Hugging Face LLM using the low-level API, which calls the load and prompt tools one by one:
235-
236-
```python
237-
import lemonade.tools.torch_llm as tl
238-
import lemonade.tools.prompt as pt
239-
from turnkeyml.state import State
240-
241-
state = State(cache_dir="cache", build_name="test")
242-
243-
state = tl.HuggingfaceLoad().run(state, input="facebook/opt-125m")
244-
state = pt.Prompt().run(state, prompt="hi", max_new_tokens=15)
245-
246-
print("Response:", state.response)
247-
```
248-
249-
# Contributing
250-
251-
Contributions are welcome! If you decide to contribute, please:
252-
253-
- Do so via a pull request.
254-
- Write your code in keeping with the same style as the rest of this repo's code.
255-
- Add a test under `test/lemonade` that provides coverage of your new feature.
256-
257-
The best way to contribute is to add new tools to cover more devices and usage scenarios.
258-
259-
To add a new tool:
260-
261-
1. (Optional) Create a new `.py` file under `src/lemonade/tools` (or use an existing file if your tool fits into a pre-existing family of tools).
262-
1. Define a new class that inherits the `Tool` class from `TurnkeyML`.
263-
1. Register the class by adding it to the list of `tools` near the top of `src/lemonade/cli.py`.
264-
265-
You can learn more about contributing on the repository's [contribution guide](https://github.com/onnx/turnkeyml/blob/main/docs/contribute.md).
22+
Thank you for using Lemonade SDK!

0 commit comments

Comments
 (0)