You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: demos/continuous_batching/README.md
+14-48
Original file line number
Diff line number
Diff line change
@@ -5,6 +5,10 @@ That makes it easy to use and efficient especially on on Intel® Xeon® processo
5
5
6
6
> **Note:** This demo was tested on Intel® Xeon® processors Gen4 and Gen5 and Intel dGPU ARC and Flex models on Ubuntu22/24 and RedHat8/9.
7
7
8
+
## Prerequisites
9
+
-**For Linux users**: Installed Docker Engine
10
+
-**For Windows users**: Installed OVMS binary package according to the [baremetal deployment guide](../../docs/deploying_server_baremetal.md)
11
+
8
12
## Model preparation
9
13
> **Note** Python 3.9 or higher is need for that step
10
14
Here, the original Pytorch LLM model and the tokenizer will be converted to IR format and optionally quantized.
@@ -46,9 +50,7 @@ models
46
50
└── tokenizer.json
47
51
```
48
52
49
-
The default configuration of the `LLMExecutor` should work in most cases but the parameters can be tuned inside the `node_options` section in the `graph.pbtxt` file.
50
-
Note that the `models_path` parameter in the graph file can be an absolute path or relative to the `base_path` from `config.json`.
51
-
Check the [LLM calculator documentation](../../docs/llm/reference.md) to learn about configuration options.
53
+
The default configuration should work in most cases but the parameters can be tuned via `export_model.py` script arguments. Run the script with `--help` argument to check available parameters and see the [LLM calculator documentation](../../docs/llm/reference.md) to learn more about configuration options.
It will create an image called `openvino/model_server:latest`.
82
-
> **Note:** This operation might take 40min or more depending on your build host.
83
-
> **Note:**`GPU` parameter in image build command is needed to include dependencies for GPU device.
84
-
> **Note:** The public image from the last release might be not compatible with models exported using the the latest export script. Check the [demo version from the last release](https://github.com/openvinotoolkit/model_server/tree/releases/2024/4/demos/continuous_batching) to use the public docker image.
85
-
86
75
## Deploying on Bare Metal
87
76
88
-
Download model server archive and unpack it to `model_server` directory. The package contains OVMS binary and all of its dependencies.
-`<release>` - model server version: `v2024.4`, `v2024.5` etc.
97
-
-`<dist>` - package for desired OS, one of: `ovms_redhat.tar.gz`, `ovms_ubuntu22.tar.gz`, `ovms_win.zip`
98
-
99
-
For correct Python initialization also set `PYTHONHOME` environment variable in the shell that will be used to launch model server.
100
-
It may also be required to add OVMS-provided Python catalog to `PATH` to make it a primary choice for the serving during startup.
77
+
Assuming you have unpacked model server package to your current working directory run `setupvars` script for environment setup:
101
78
102
-
**Linux**
103
-
104
-
```bash
105
-
export PYTHONHOME=$PWD/ovms/python
106
-
export PATH=$PWD/ovms/python;$PATH
107
-
```
108
-
109
-
**Windows Command Line**:
79
+
**Windows Command Line**
110
80
```bat
111
-
set PYTHONHOME="$pwd\ovms\python"
112
-
set PATH="$pwd\ovms\python;%PATH%"
81
+
./ovms/setupvars.bat
113
82
```
114
83
115
-
**Windows PowerShell**:
84
+
**Windows PowerShell**
116
85
```powershell
117
-
$env:PYTHONHOME="$pwd\ovms\python"
118
-
$env:PATH="$pwd\ovms\python;$env:PATH"
86
+
./ovms/setupvars.ps1
119
87
```
120
88
121
-
Once it's set, you can launch the model server.
122
-
123
89
### CPU
124
90
125
91
In model preparation section, configuration is set to load models on CPU, so you can simply run the binary pointing to the configuration file and selecting port for the HTTP server to expose inference endpoint.
It will create an image called `openvino/model_server:latest`.
86
+
> **Note:** This operation might take 40min or more depending on your build host.
87
+
> **Note:**`GPU` parameter in image build command is needed to include dependencies for GPU device.
88
+
> **Note:** The public image from the last release might be not compatible with models exported using the the latest export script. Check the [demo version from the last release](https://github.com/openvinotoolkit/model_server/tree/releases/2024/4/demos/continuous_batching) to use the public docker image.
0 commit comments