Skip to content

map workers and GPUs, deviceIds not considered in ts_config #3393

@RuDevKu

Description

@RuDevKu

lt;dr: using my existing configuration shows no effect when using the "deviceIds" property.

I am successfully hosting three diffeerent models on a server with two gpus.
Each model can be run on a single gpu, but one is more demanding - so I'd like to control the distribution of workers per gpu.

The deviceIds property seems to be exactly what I'd need for that.
It is described here for the archiver and here for either/and the archivers yaml or the model configuration.
And seems to be implemented here.

However, using my existing configuration - which succsessfully controls the worker numbers and timeouts - shows no effect whatsoever when using the deviceIds or deviceType properties. Is this only implemented for the YAML file uppon archiving?

Is there a way to set the deviceIds via the API?

Configuration excerpt:
...
"defaultVersion": true,
"marName": "model.mar",
"deviceIds": [1,],
"minWorkers": 4,
"maxWorkers": 4,
"batchSize": 1,
"maxBatchDelay": 50,
"responseTimeout": 120
...


Environment headers

Torchserve branch:

torchserve==0.12.0
torch-model-archiver==0.12.0

Python version: 3.10 (64-bit runtime)
Python executable: /opt/conda/bin/python

Versions of relevant python libraries:
captum==0.6.0
numpy==2.2.3
pillow==10.3.0
psutil==5.9.8
requests==2.32.0
torch==2.4.0+cu121
torch-model-archiver==0.12.0
torch-workflow-archiver==0.2.15
torchaudio==2.4.0+cu121
torchelastic==0.2.2
torchserve==0.12.0
torchvision==0.19.0+cu121
wheel==0.42.0
torch==2.4.0+cu121
**Warning: torchtext not present ..
torchvision==0.19.0+cu121
torchaudio==2.4.0+cu121

Java Version:

OS: N/A
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: N/A
CMake version: version 3.26.4

Is CUDA available: Yes
CUDA runtime version: 12.1
NVIDIA GPU models and configuration:
NVIDIA RTX 4000 Ada Generation
NVIDIA RTX 4000 Ada Generation
Nvidia driver version: 565.77
Nvidia driver cuda version: 12.7
cuDNN version: 9.1.0

Environment:
library_path (LD_/DYLD_): /usr/local/nvidia/lib:/usr/local/nvidia/lib64

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions