-
Notifications
You must be signed in to change notification settings - Fork 2
Add IPEX-LLM with GPU #24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add IPEX-LLM with GPU #24
Conversation
|
| "-l", | ||
| type=str, | ||
| default="asym_int4", | ||
| choices=["sym_int4", "asym_int4", "sym_int5", "asym_int5", "sym_int8"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update the choices to and add GPU related data_types. For a full list of datatypes we can support, refer to https://ipex-llm.readthedocs.io/en/latest/doc/PythonAPI/LLM/transformers.html#automodelforcausallm load_in_lowbit param api doc
Various data types were tested, including: fp4, fp8, fp16, bf16, nf3, nf4, fp8_e4m3, fp8_e5m2. All generate normally. |
| The example [rag.py](./rag.py) shows how to use RAG pipeline. Run the example as following: | ||
| ```bash | ||
| python rag.py -m <path_to_model> -q <question> -u <vector_db_username> -p <vector_db_password> -e <path_to_embedding_model> -n <num_token> -t <path_to_tokenizer> -x <device> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we'd better to use the same option -d for device for all examples。
llama-index-integrations/llms/llama-index-llms-ipex-llm/examples/more_data_type.py
Outdated
Show resolved
Hide resolved
|
|
also fix the test errors |
add those options to example choices |
| "\n", | ||
| "## `IpexLLM`\n", | ||
| "\n", | ||
| "Setting `device_map=\"xpu\"` when initializing `IpexLLM` will put the embedding model on Intel GPU and benefit from IPEX-LLM optimizations:\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change this line to use the descriptions in llm jupyter doc.
add the descriptions to explain prompts like in llm jupyter doc
llama-index-integrations/llms/llama-index-llms-ipex-llm/examples/basic.py
Outdated
Show resolved
Hide resolved
llama-index-integrations/llms/llama-index-llms-ipex-llm/examples/low_bit.py
Outdated
Show resolved
Hide resolved
llama-index-integrations/llms/llama-index-llms-ipex-llm/llama_index/llms/ipex_llm/base.py
Outdated
Show resolved
Hide resolved
llama-index-integrations/llms/llama-index-llms-ipex-llm/llama_index/llms/ipex_llm/base.py
Outdated
Show resolved
Hide resolved
llama-index-integrations/llms/llama-index-llms-ipex-llm/llama_index/llms/ipex_llm/base.py
Outdated
Show resolved
Hide resolved
llama-index-integrations/llms/llama-index-llms-ipex-llm/examples/low_bit.py
Outdated
Show resolved
Hide resolved
llama-index-integrations/llms/llama-index-llms-ipex-llm/examples/more_data_type.py
Outdated
Show resolved
Hide resolved
| python basic.py -m <path_to_model> -d <cpu_or_xpu> -q <query_to_LLM> | ||
| ``` | ||
|
|
||
| > Please note that in this example we'll use [HuggingFaceH4/zephyr-7b-alpha](https://huggingface.co/HuggingFaceH4/zephyr-7b-alpha) model for demonstration. It requires updating `transformers` and `tokenizers` packages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lowbit also uses zephyr, put this update also in the low bit exmaple
|
Why we use langchain description here :)? |
ipex-llmis a PyTorch library for running LLM on Intel CPU and GPU. This PR adds GPU support to the IpexLLM llm integration.-doption for all examples to choose device.