Example fixes for LLMs

intel-staging · Jun 18, 2024 · f7ca2cf · f7ca2cf
1 parent 535c38d
commit f7ca2cf
Show file tree

Hide file tree

Showing 2 changed files with 6 additions and 8 deletions.
diff --git a/llama-index-integrations/llms/llama-index-llms-ipex-llm/examples/README.md b/llama-index-integrations/llms/llama-index-llms-ipex-llm/examples/README.md
@@ -27,7 +27,7 @@ pip install llama-index-llms-ipex-llm[xpu] --extra-index-url https://pytorch-ext
 The example [basic.py](./basic.py) shows how to run `IpexLLM` on Intel CPU or GPU and conduct tasks such as text completion. Run the example as following:
 
 ```bash
-python basic.py -m <path_to_model> -d <cpu_or_xpu> -q <query_to_LLM>
+python basic.py -m <path_to_model> -d <cpu_or_xpu_or_xpu:device_id> -q <query_to_LLM>
 ```
 
 > Please note that in this example we'll use [HuggingFaceH4/zephyr-7b-alpha](https://huggingface.co/HuggingFaceH4/zephyr-7b-alpha) model for demonstration. It requires updating `transformers` and `tokenizers` packages.
@@ -41,7 +41,7 @@ python basic.py -m <path_to_model> -d <cpu_or_xpu> -q <query_to_LLM>
 The example [low_bit.py](./low_bit.py) shows how to save and load low_bit model by `IpexLLM` on Intel CPU or GPU and conduct tasks such as text completion. Run the example as following:
 
 ```bash
-python low_bit.py -m <path_to_model> -d <cpu_or_xpu> -q <query_to_LLM> -s <save_low_bit_dir>
+python low_bit.py -m <path_to_model> -d <cpu_or_xpu_or_xpu:device_id> -q <query_to_LLM> -s <save_low_bit_dir>
 ```
 
 > Please note that in this example we'll use [HuggingFaceH4/zephyr-7b-alpha](https://huggingface.co/HuggingFaceH4/zephyr-7b-alpha) model for demonstration. It requires updating `transformers` and `tokenizers` packages.
@@ -52,12 +52,10 @@ python low_bit.py -m <path_to_model> -d <cpu_or_xpu> -q <query_to_LLM> -s <save_
 
 ### More Data Types Example
 
-By default, `IpexLLM` loads the model in int4 format. To load a model in different data formats like `sym_int5`, `sym_int8`, etc., you can use the `load_in_low_bit` option in `IpexLLM`. To load a model on different device like `cpu` or `xpu`, you can use the `device_map` option in `IpexLLM`.
-
-The example [more_data_type.py](./more_data_type.py) shows how to use the `load_in_low_bit` option and `device_map` option. Run the example as following:
+By default, `IpexLLM` loads the model in int4 format. To load a model in different data formats like `sym_int5`, `sym_int8`, etc., you can use the `load_in_low_bit` option in `IpexLLM`.
 
 ```bash
-python more_data_type.py -m <path_to_model> -t <path_to_tokenizer> -l <low_bit_format> -d <device> -q <query_to_LLM>
+python more_data_type.py -m <path_to_model> -t <path_to_tokenizer> -l <low_bit_format> -d <cpu_or_xpu_or_xpu:device_id> -q <query_to_LLM>
 ```
 
 > Note: If you're using [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) model in this example, it is recommended to use transformers version

diff --git a/llama-index-integrations/llms/llama-index-llms-ipex-llm/examples/more_data_type.py b/llama-index-integrations/llms/llama-index-llms-ipex-llm/examples/more_data_type.py
@@ -51,8 +51,8 @@ def completion_to_prompt(completion):
         "--device",
         "-d",
         type=str,
-        default="xpu",
-        help="The device the model will run on.",
+        default="cpu",
+        help="The device (Intel CPU or Intel GPU) the LLM model runs on",
     )
     parser.add_argument(
         "--query",