-
Notifications
You must be signed in to change notification settings - Fork 124
Description
Tried running this notebook example:
- Building-and-deploying-multi-stage-RecSys
When I reach the point to start the server:
tritonserver --model-repository=/ensemble_export_path/ --backend-config=tensorflow,version=2
from the terminal (which I open adjacently in jupyterhub while the notebook is running), the terminal gets stuck and stops loading anything after the below lines:
2022-12-07 17:58:42.661940: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:991] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-12-07 17:58:42.662552: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 19610 MB memory: -> device: 1, name: NVIDIA A10G, pci bus id: 0000:00:1c.0, compute capability: 8.6 2022-12-07 17:58:42.662612: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:991] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-12-07 17:58:42.663241: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:2 with 19610 MB memory: -> device: 2, name: NVIDIA A10G, pci bus id: 0000:00:1d.0, compute capability: 8.6 2022-12-07 17:58:42.663299: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:991] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-12-07 17:58:42.663917: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:3 with 19610 MB memory: -> device: 3, name: NVIDIA A10G, pci bus id: 0000:00:1e.0, compute capability: 8.6 2022-12-07 17:58:42.672242: I tensorflow/cc/saved_model/loader.cc:230] Restoring SavedModel bundle. 2022-12-07 17:58:42.728537: I tensorflow/cc/saved_model/loader.cc:214] Running initialization op on SavedModel bundle at path: /Merlin/examples/Building-and-deploying-multi-stage-RecSys/poc_ensemble/1_predicttensorflow/1/model.savedmodel 2022-12-07 17:58:42.752115: I tensorflow/cc/saved_model/loader.cc:321] SavedModel load for tags { serve }; Status: success: OK. Took 106507 microseconds. I1207 22:58:42.752287 1485 python_be.cc:1767] TRITONBACKEND_ModelInstanceInitialize: 0_queryfeast (GPU device 1) I1207 22:58:45.055801 1485 python_be.cc:1767] TRITONBACKEND_ModelInstanceInitialize: 2_queryfaiss (GPU device 0)
I'm running on the following:
Merlin version: nacre.io/nvidia/merlin/merlin-tensorflow:22.10
Running on an ec2 g5 instance
Python version: 3.8.10
Tensorflow version (GPU): tensor flow 2.9.1+nv22.8
Faiss-gpu installed:
faiss 1.7.2
faiss-gpu 1.7.2