Skip to content

Commit

Permalink
Example fixes for LLMs
Browse files Browse the repository at this point in the history
  • Loading branch information
Oscilloscope98 committed Jun 18, 2024
1 parent 535c38d commit f7ca2cf
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 8 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ pip install llama-index-llms-ipex-llm[xpu] --extra-index-url https://pytorch-ext
The example [basic.py](./basic.py) shows how to run `IpexLLM` on Intel CPU or GPU and conduct tasks such as text completion. Run the example as following:

```bash
python basic.py -m <path_to_model> -d <cpu_or_xpu> -q <query_to_LLM>
python basic.py -m <path_to_model> -d <cpu_or_xpu_or_xpu:device_id> -q <query_to_LLM>
```

> Please note that in this example we'll use [HuggingFaceH4/zephyr-7b-alpha](https://huggingface.co/HuggingFaceH4/zephyr-7b-alpha) model for demonstration. It requires updating `transformers` and `tokenizers` packages.
Expand All @@ -41,7 +41,7 @@ python basic.py -m <path_to_model> -d <cpu_or_xpu> -q <query_to_LLM>
The example [low_bit.py](./low_bit.py) shows how to save and load low_bit model by `IpexLLM` on Intel CPU or GPU and conduct tasks such as text completion. Run the example as following:
```bash
python low_bit.py -m <path_to_model> -d <cpu_or_xpu> -q <query_to_LLM> -s <save_low_bit_dir>
python low_bit.py -m <path_to_model> -d <cpu_or_xpu_or_xpu:device_id> -q <query_to_LLM> -s <save_low_bit_dir>
```
> Please note that in this example we'll use [HuggingFaceH4/zephyr-7b-alpha](https://huggingface.co/HuggingFaceH4/zephyr-7b-alpha) model for demonstration. It requires updating `transformers` and `tokenizers` packages.
Expand All @@ -52,12 +52,10 @@ python low_bit.py -m <path_to_model> -d <cpu_or_xpu> -q <query_to_LLM> -s <save_
### More Data Types Example
By default, `IpexLLM` loads the model in int4 format. To load a model in different data formats like `sym_int5`, `sym_int8`, etc., you can use the `load_in_low_bit` option in `IpexLLM`. To load a model on different device like `cpu` or `xpu`, you can use the `device_map` option in `IpexLLM`.
The example [more_data_type.py](./more_data_type.py) shows how to use the `load_in_low_bit` option and `device_map` option. Run the example as following:
By default, `IpexLLM` loads the model in int4 format. To load a model in different data formats like `sym_int5`, `sym_int8`, etc., you can use the `load_in_low_bit` option in `IpexLLM`.
```bash
python more_data_type.py -m <path_to_model> -t <path_to_tokenizer> -l <low_bit_format> -d <device> -q <query_to_LLM>
python more_data_type.py -m <path_to_model> -t <path_to_tokenizer> -l <low_bit_format> -d <cpu_or_xpu_or_xpu:device_id> -q <query_to_LLM>
```
> Note: If you're using [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) model in this example, it is recommended to use transformers version
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -51,8 +51,8 @@ def completion_to_prompt(completion):
"--device",
"-d",
type=str,
default="xpu",
help="The device the model will run on.",
default="cpu",
help="The device (Intel CPU or Intel GPU) the LLM model runs on",
)
parser.add_argument(
"--query",
Expand Down

0 comments on commit f7ca2cf

Please sign in to comment.