[BUG] TritonServer start fails to load after Initializing QueryFaiss

Tried running this notebook example:
- Building-and-deploying-multi-stage-RecSys

When I reach the point to start the server:
`tritonserver --model-repository=/ensemble_export_path/ --backend-config=tensorflow,version=2`
from the terminal (which I open adjacently in jupyterhub while the notebook is running), the terminal gets stuck and stops loading anything after the below lines:

`2022-12-07 17:58:42.661940: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:991] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-07 17:58:42.662552: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 19610 MB memory:  -> device: 1, name: NVIDIA A10G, pci bus id: 0000:00:1c.0, compute capability: 8.6
2022-12-07 17:58:42.662612: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:991] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-07 17:58:42.663241: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:2 with 19610 MB memory:  -> device: 2, name: NVIDIA A10G, pci bus id: 0000:00:1d.0, compute capability: 8.6
2022-12-07 17:58:42.663299: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:991] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-07 17:58:42.663917: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:3 with 19610 MB memory:  -> device: 3, name: NVIDIA A10G, pci bus id: 0000:00:1e.0, compute capability: 8.6
2022-12-07 17:58:42.672242: I tensorflow/cc/saved_model/loader.cc:230] Restoring SavedModel bundle.
2022-12-07 17:58:42.728537: I tensorflow/cc/saved_model/loader.cc:214] Running initialization op on SavedModel bundle at path: /Merlin/examples/Building-and-deploying-multi-stage-RecSys/poc_ensemble/1_predicttensorflow/1/model.savedmodel
2022-12-07 17:58:42.752115: I tensorflow/cc/saved_model/loader.cc:321] SavedModel load for tags { serve }; Status: success: OK. Took 106507 microseconds.
I1207 22:58:42.752287 1485 python_be.cc:1767] TRITONBACKEND_ModelInstanceInitialize: 0_queryfeast (GPU device 1)
I1207 22:58:45.055801 1485 python_be.cc:1767] TRITONBACKEND_ModelInstanceInitialize: 2_queryfaiss (GPU device 0)`

I'm running on the following:
Merlin version: nacre.io/nvidia/merlin/merlin-tensorflow:22.10
Running on an ec2 g5 instance
Python version: 3.8.10
Tensorflow version (GPU): tensor flow 2.9.1+nv22.8

Faiss-gpu installed:
faiss                        1.7.2
faiss-gpu                1.7.2





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] TritonServer start fails to load after Initializing QueryFaiss #760

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] TritonServer start fails to load after Initializing QueryFaiss #760

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions