Skip to content

Error during llm node initialization for models_path #2912

@devangvin

Description

@devangvin

Describe the bug
A clear and concise description of what the bug is.

I have prepared a text-generation model using the file demos/common/export_models/export_model.py. The config file is:

{
    "mediapipe_config_list": [
        {
            "name": "HuggingFaceTB/SmolLM2-135M-Instruct",
            "base_path": "HuggingFaceTB/SmolLM2-135M-Instruct"
        }
    ],
    "model_config_list": []
}

When I run the inference server using the docker container:

sudo docker run \
        --rm  -d \
        -p 8085:8085  \
        -v $MODEL_DIR:/workspace:ro  \
        openvino/model_server:2024.5  \
        --rest_port 8085  \
        --rest_bind_address 0.0.0.0 \
        --config_path /workspace/config.json

The server starts but i also get an error:

[2024-12-13 09:28:58.129][1][serving][info][server.cpp:84] OpenVINO Model Server 2024.5.816f620b6
[2024-12-13 09:28:58.129][1][serving][info][server.cpp:85] OpenVINO backend 2024.5.0.17288.7975fa5da0c
[2024-12-13 09:28:58.129][1][serving][debug][server.cpp:86] CLI parameters passed to ovms server
[2024-12-13 09:28:58.129][1][serving][debug][server.cpp:103] config_path: /workspace/config.json
[2024-12-13 09:28:58.129][1][serving][debug][server.cpp:105] gRPC port: 9178
[2024-12-13 09:28:58.129][1][serving][debug][server.cpp:106] REST port: 8085
[2024-12-13 09:28:58.129][1][serving][debug][server.cpp:107] gRPC bind address: 0.0.0.0
[2024-12-13 09:28:58.129][1][serving][debug][server.cpp:108] REST bind address: 0.0.0.0
[2024-12-13 09:28:58.129][1][serving][debug][server.cpp:109] REST workers: 64
[2024-12-13 09:28:58.129][1][serving][debug][server.cpp:110] gRPC workers: 1
[2024-12-13 09:28:58.129][1][serving][debug][server.cpp:111] gRPC channel arguments: 
[2024-12-13 09:28:58.129][1][serving][debug][server.cpp:112] log level: DEBUG
[2024-12-13 09:28:58.129][1][serving][debug][server.cpp:113] log path: 
[2024-12-13 09:28:58.129][1][serving][debug][server.cpp:114] file system poll wait milliseconds: 1000
[2024-12-13 09:28:58.129][1][serving][debug][server.cpp:115] sequence cleaner poll wait minutes: 5
[2024-12-13 09:28:58.129][1][serving][info][pythoninterpretermodule.cpp:35] PythonInterpreterModule starting
[2024-12-13 09:28:58.248][1][serving][info][pythoninterpretermodule.cpp:46] PythonInterpreterModule started
[2024-12-13 09:28:58.250][1][modelmanager][debug][mediapipefactory.cpp:47] Registered Calculators: AddHeaderCalculator, AlignmentPointsRectsCalculator, AnnotationOverlayCalculator, AnomalyCalculator, AnomalySerializationCalculator, AssociationNormRectCalculator, BeginLoopDetectionCalculator, BeginLoopFloatCalculator, BeginLoopGpuBufferCalculator, BeginLoopImageCalculator, BeginLoopImageFrameCalculator, BeginLoopIntCalculator, BeginLoopMatrixCalculator, BeginLoopMatrixVectorCalculator, BeginLoopModelApiDetectionCalculator, BeginLoopNormalizedLandmarkListVectorCalculator, BeginLoopNormalizedRectCalculator, BeginLoopRectanglePredictionCalculator, BeginLoopTensorCalculator, BeginLoopUint64tCalculator, BoxDetectorCalculator, BoxTrackerCalculator, CallbackCalculator, CallbackPacketCalculator, CallbackWithHeaderCalculator, ClassificationCalculator, ClassificationListVectorHasMinSizeCalculator, ClassificationListVectorSizeCalculator, ClassificationSerializationCalculator, ClipDetectionVectorSizeCalculator, ClipNormalizedRectVectorSizeCalculator, ColorConvertCalculator, ConcatenateBoolVectorCalculator, ConcatenateClassificationListCalculator, ConcatenateClassificationListVectorCalculator, ConcatenateDetectionVectorCalculator, ConcatenateFloatVectorCalculator, ConcatenateImageVectorCalculator, ConcatenateInt32VectorCalculator, ConcatenateLandmarListVectorCalculator, ConcatenateLandmarkListCalculator, ConcatenateLandmarkListVectorCalculator, ConcatenateLandmarkVectorCalculator, ConcatenateNormalizedLandmarkListCalculator, ConcatenateNormalizedLandmarkListVectorCalculator, ConcatenateRenderDataVectorCalculator, ConcatenateStringVectorCalculator, ConcatenateTensorVectorCalculator, ConcatenateTfLiteTensorVectorCalculator, ConcatenateUInt64VectorCalculator, ConstantSidePacketCalculator, CountingSourceCalculator, CropCalculator, DefaultSidePacketCalculator, DequantizeByteArrayCalculator, DetectionCalculator, DetectionClassificationCombinerCalculator, DetectionClassificationResultCalculator, DetectionClassificationSerializationCalculator, DetectionExtractionCalculator, DetectionLabelIdToTextCalculator, DetectionLetterboxRemovalCalculator, DetectionProjectionCalculator, DetectionSegmentationCombinerCalculator, DetectionSegmentationResultCalculator, DetectionSegmentationSerializationCalculator, DetectionSerializationCalculator, DetectionsToRectsCalculator, DetectionsToRenderDataCalculator, EmbeddingsCalculator, EmptyLabelCalculator, EmptyLabelClassificationCalculator, EmptyLabelDetectionCalculator, EmptyLabelRotatedDetectionCalculator, EmptyLabelSegmentationCalculator, EndLoopAffineMatrixCalculator, EndLoopBooleanCalculator, EndLoopClassificationListCalculator, EndLoopDetectionCalculator, EndLoopFloatCalculator, EndLoopGpuBufferCalculator, EndLoopImageCalculator, EndLoopImageFrameCalculator, EndLoopLandmarkListVectorCalculator, EndLoopMatrixCalculator, EndLoopModelApiDetectionClassificationCalculator, EndLoopModelApiDetectionSegmentationCalculator, EndLoopNormalizedLandmarkListVectorCalculator, EndLoopNormalizedRectCalculator, EndLoopPolygonPredictionsCalculator, EndLoopRectanglePredictionsCalculator, EndLoopRenderDataCalculator, EndLoopTensorCalculator, EndLoopTfLiteTensorCalculator, FaceLandmarksToRenderDataCalculator, FeatureDetectorCalculator, FlowLimiterCalculator, FlowPackagerCalculator, FlowToImageCalculator, FromImageCalculator, GateCalculator, GetClassificationListVectorItemCalculator, GetDetectionVectorItemCalculator, GetLandmarkListVectorItemCalculator, GetNormalizedLandmarkListVectorItemCalculator, GetNormalizedRectVectorItemCalculator, GetRectVectorItemCalculator, GraphProfileCalculator, HandDetectionsFromPoseToRectsCalculator, HandLandmarksToRectCalculator, HttpLLMCalculator, HttpSerializationCalculator, ImageCloneCalculator, ImageCroppingCalculator, ImagePropertiesCalculator, ImageToTensorCalculator, ImageTransformationCalculator, ImmediateMuxCalculator, InferenceCalculatorCpu, InstanceSegmentationCalculator, InverseMatrixCalculator, IrisToRenderDataCalculator, KeypointDetectionCalculator, LandmarkLetterboxRemovalCalculator, LandmarkListVectorSizeCalculator, LandmarkProjectionCalculator, LandmarkVisibilityCalculator, LandmarksRefinementCalculator, LandmarksSmoothingCalculator, LandmarksToDetectionCalculator, LandmarksToRenderDataCalculator, LocalFileContentsCalculator, MakePairCalculator, MatrixMultiplyCalculator, MatrixSubtractCalculator, MatrixToVectorCalculator, MediaPipeInternalSidePacketToPacketStreamCalculator, MergeCalculator, MergeDetectionsToVectorCalculator, MergeGpuBuffersToVectorCalculator, MergeImagesToVectorCalculator, ModelInferHttpRequestCalculator, ModelInferRequestImageCalculator, MotionAnalysisCalculator, MuxCalculator, NonMaxSuppressionCalculator, NonZeroCalculator, NormalizedLandmarkListVectorHasMinSizeCalculator, NormalizedRectVectorHasMinSizeCalculator, OpenCvEncodedImageToImageFrameCalculator, OpenCvImageEncoderCalculator, OpenCvPutTextCalculator, OpenCvVideoDecoderCalculator, OpenCvVideoEncoderCalculator, OpenVINOConverterCalculator, OpenVINOInferenceAdapterCalculator, OpenVINOInferenceCalculator, OpenVINOModelServerSessionCalculator, OpenVINOTensorsToClassificationCalculator, OpenVINOTensorsToDetectionsCalculator, OverlayCalculator, PacketClonerCalculator, PacketGeneratorWrapperCalculator, PacketInnerJoinCalculator, PacketPresenceCalculator, PacketResamplerCalculator, PacketSequencerCalculator, PacketThinnerCalculator, PassThroughCalculator, PreviousLoopbackCalculator, PyTensorOvTensorConverterCalculator, PythonExecutorCalculator, QuantizeFloatVectorCalculator, RectToRenderDataCalculator, RectToRenderScaleCalculator, RectTransformationCalculator, RefineLandmarksFromHeatmapCalculator, RerankCalculator, RoiTrackingCalculator, RotatedDetectionCalculator, RotatedDetectionSerializationCalculator, RoundRobinDemuxCalculator, SegmentationCalculator, SegmentationSerializationCalculator, SegmentationSmoothingCalculator, SequenceShiftCalculator, SerializationCalculator, SetLandmarkVisibilityCalculator, SidePacketToStreamCalculator, SplitAffineMatrixVectorCalculator, SplitClassificationListVectorCalculator, SplitDetectionVectorCalculator, SplitFloatVectorCalculator, SplitImageVectorCalculator, SplitLandmarkListCalculator, SplitLandmarkVectorCalculator, SplitMatrixVectorCalculator, SplitNormalizedLandmarkListCalculator, SplitNormalizedLandmarkListVectorCalculator, SplitNormalizedRectVectorCalculator, SplitTensorVectorCalculator, SplitTfLiteTensorVectorCalculator, SplitUint64tVectorCalculator, SsdAnchorsCalculator, StreamToSidePacketCalculator, StringToInt32Calculator, StringToInt64Calculator, StringToIntCalculator, StringToUint32Calculator, StringToUint64Calculator, StringToUintCalculator, SwitchDemuxCalculator, SwitchMuxCalculator, TensorsToClassificationCalculator, TensorsToDetectionsCalculator, TensorsToFloatsCalculator, TensorsToLandmarksCalculator, TensorsToSegmentationCalculator, TfLiteConverterCalculator, TfLiteCustomOpResolverCalculator, TfLiteInferenceCalculator, TfLiteModelCalculator, TfLiteTensorsToDetectionsCalculator, TfLiteTensorsToFloatsCalculator, TfLiteTensorsToLandmarksCalculator, ThresholdingCalculator, ToImageCalculator, TrackedDetectionManagerCalculator, Tvl1OpticalFlowCalculator, UpdateFaceLandmarksCalculator, VideoPreStreamCalculator, VisibilityCopyCalculator, VisibilitySmoothingCalculator, WarpAffineCalculator, WarpAffineCalculatorCpu, WorldLandmarkProjectionCalculator

[2024-12-13 09:28:58.250][1][modelmanager][debug][mediapipefactory.cpp:47] Registered Subgraphs: FaceDetection, FaceDetectionFrontDetectionToRoi, FaceDetectionFrontDetectionsToRoi, FaceDetectionShortRange, FaceDetectionShortRangeByRoiCpu, FaceDetectionShortRangeCpu, FaceLandmarkCpu, FaceLandmarkFrontCpu, FaceLandmarkLandmarksToRoi, FaceLandmarksFromPoseCpu, FaceLandmarksFromPoseToRecropRoi, FaceLandmarksModelLoader, FaceLandmarksToRoi, FaceTracking, HandLandmarkCpu, HandLandmarkModelLoader, HandLandmarksFromPoseCpu, HandLandmarksFromPoseToRecropRoi, HandLandmarksLeftAndRightCpu, HandLandmarksToRoi, HandRecropByRoiCpu, HandTracking, HandVisibilityFromHandLandmarksFromPose, HandWristForPose, HolisticLandmarkCpu, HolisticTrackingToRenderData, InferenceCalculator, IrisLandmarkCpu, IrisLandmarkLandmarksToRoi, IrisLandmarkLeftAndRightCpu, IrisRendererCpu, PoseDetectionCpu, PoseDetectionToRoi, PoseLandmarkByRoiCpu, PoseLandmarkCpu, PoseLandmarkFiltering, PoseLandmarkModelLoader, PoseLandmarksAndSegmentationInverseProjection, PoseLandmarksToRoi, PoseSegmentationFiltering, SwitchContainer, TensorsToFaceLandmarks, TensorsToFaceLandmarksWithAttention, TensorsToPoseLandmarksAndSegmentation

[2024-12-13 09:28:58.250][1][modelmanager][debug][mediapipefactory.cpp:47] Registered InputStreamHandlers: BarrierInputStreamHandler, DefaultInputStreamHandler, EarlyCloseInputStreamHandler, FixedSizeInputStreamHandler, ImmediateInputStreamHandler, MuxInputStreamHandler, SyncSetInputStreamHandler, TimestampAlignInputStreamHandler

[2024-12-13 09:28:58.250][1][modelmanager][debug][mediapipefactory.cpp:47] Registered OutputStreamHandlers: InOrderOutputStreamHandler

[2024-12-13 09:28:58.250][1][serving][info][modelmanager.cpp:128] Loading tokenizer CPU extension from libopenvino_tokenizers.so
[2024-12-13 09:28:58.284][1][modelmanager][info][modelmanager.cpp:143] Available devices for Open VINO: CPU
[2024-12-13 09:28:58.284][1][modelmanager][debug][ov_utils.hpp:56] Logging OpenVINO Core plugin: CPU; plugin configuration
[2024-12-13 09:28:58.284][1][modelmanager][debug][ov_utils.hpp:91] OpenVINO Core plugin: CPU; plugin configuration: { AFFINITY: CORE, AVAILABLE_DEVICES: , CPU_DENORMALS_OPTIMIZATION: NO, CPU_SPARSE_WEIGHTS_DECOMPRESSION_RATE: 1, DEVICE_ARCHITECTURE: intel64, DEVICE_ID: , DEVICE_TYPE: integrated, DYNAMIC_QUANTIZATION_GROUP_SIZE: 32, ENABLE_CPU_PINNING: YES, ENABLE_HYPER_THREADING: YES, EXECUTION_DEVICES: CPU, EXECUTION_MODE_HINT: PERFORMANCE, FULL_DEVICE_NAME: AMD Ryzen 7 5800H with Radeon Graphics         , INFERENCE_NUM_THREADS: 0, INFERENCE_PRECISION_HINT: f32, KV_CACHE_PRECISION: f16, LOG_LEVEL: LOG_NONE, MODEL_DISTRIBUTION_POLICY: , NUM_STREAMS: 1, OPTIMIZATION_CAPABILITIES: FP32 INT8 BIN EXPORT_IMPORT, PERFORMANCE_HINT: LATENCY, PERFORMANCE_HINT_NUM_REQUESTS: 0, PERF_COUNT: NO, RANGE_FOR_ASYNC_INFER_REQUESTS: 1 1 1, RANGE_FOR_STREAMS: 1 16, SCHEDULING_CORE_TYPE: ANY_CORE }
[2024-12-13 09:28:58.284][1][serving][info][grpcservermodule.cpp:163] GRPCServerModule starting
[2024-12-13 09:28:58.284][1][serving][debug][grpcservermodule.cpp:187] setting grpc channel argument grpc.max_concurrent_streams: 16
[2024-12-13 09:28:58.285][1][serving][debug][grpcservermodule.cpp:200] setting grpc MaxThreads ResourceQuota 128
[2024-12-13 09:28:58.285][1][serving][debug][grpcservermodule.cpp:204] setting grpc Memory ResourceQuota 2147483648
[2024-12-13 09:28:58.285][1][serving][debug][grpcservermodule.cpp:211] Starting gRPC servers: 1
[2024-12-13 09:28:58.286][1][serving][info][grpcservermodule.cpp:232] GRPCServerModule started
[2024-12-13 09:28:58.286][1][serving][info][grpcservermodule.cpp:233] Started gRPC server on port 9178
[2024-12-13 09:28:58.286][1][serving][info][httpservermodule.cpp:33] HTTPServerModule starting
[2024-12-13 09:28:58.286][1][serving][info][httpservermodule.cpp:37] Will start 64 REST workers
[2024-12-13 09:28:58.293][1][serving][info][http_server.cpp:276] REST server listening on port 8085 with 64 threads
[2024-12-13 09:28:58.293][1][serving][info][httpservermodule.cpp:47] HTTPServerModule started
[2024-12-13 09:28:58.293][1][serving][info][httpservermodule.cpp:48] Started REST server at 0.0.0.0:8085
[2024-12-13 09:28:58.293][1][serving][info][servablemanagermodule.cpp:51] ServableManagerModule starting
[2024-12-13 09:28:58.293][1][modelmanager][debug][modelmanager.cpp:903] Loading configuration from /workspace/config.json for: 1 time
[evhttp_server.cc : 253] NET_LOG: Entering the event loop ...
[2024-12-13 09:28:58.294][1][modelmanager][debug][modelmanager.cpp:704] Configuration file doesn't have monitoring property.
[2024-12-13 09:28:58.294][1][modelmanager][debug][modelmanager.cpp:955] Reading metric config only once per server start.
[2024-12-13 09:28:58.294][1][serving][debug][mediapipegraphconfig.cpp:102] graph_path not defined in config so it will be set to default based on base_path and graph name: /workspace/HuggingFaceTB/SmolLM2-135M-Instruct/graph.pbtxt
[2024-12-13 09:28:58.294][1][serving][debug][mediapipegraphconfig.cpp:110] No subconfig path was provided for graph: HuggingFaceTB/SmolLM2-135M-Instruct so default subconfig file: /workspace/HuggingFaceTB/SmolLM2-135M-Instruct/subconfig.json will be loaded.
[2024-12-13 09:28:58.294][1][modelmanager][debug][modelmanager.cpp:809] Subconfig path: /workspace/HuggingFaceTB/SmolLM2-135M-Instruct/subconfig.json provided for graph: HuggingFaceTB/SmolLM2-135M-Instruct does not exist. Loading subconfig models will be skipped.
[2024-12-13 09:28:58.294][1][modelmanager][info][modelmanager.cpp:554] Configuration file doesn't have custom node libraries property.
[2024-12-13 09:28:58.294][1][modelmanager][info][modelmanager.cpp:597] Configuration file doesn't have pipelines property.
[2024-12-13 09:28:58.294][1][modelmanager][debug][modelmanager.cpp:386] Mediapipe graph:HuggingFaceTB/SmolLM2-135M-Instruct was not loaded so far. Triggering load
[2024-12-13 09:28:58.294][1][modelmanager][debug][mediapipegraphdefinition.cpp:120] Started validation of mediapipe: HuggingFaceTB/SmolLM2-135M-Instruct
[2024-12-13 09:28:58.295][1][modelmanager][debug][mediapipe_utils.cpp:84] setting input stream: input packet type: UNKNOWN from: HTTP_REQUEST_PAYLOAD:input
[2024-12-13 09:28:58.295][1][modelmanager][debug][mediapipe_utils.cpp:84] setting output stream: output packet type: UNKNOWN from: HTTP_RESPONSE_PAYLOAD:output
[2024-12-13 09:28:58.296][1][serving][info][mediapipegraphdefinition.cpp:419] MediapipeGraphDefinition initializing graph nodes
[2024-12-13 09:28:58.552][1][serving][error][llmnoderesources.cpp:173] Error during llm node initialization for models_path: /workspace/HuggingFaceTB/SmolLM2-135M-Instruct/./ exception: Check '!variables.empty()' failed at /root/.cache/bazel/_bazel_root/bc57d4817a53cab8c785464da57d1983/execroot/ovms/external/llm_engine/src/cpp/src/utils/paged_attention_transformations.cpp:31:
Model is supposed to be stateful

[2024-12-13 09:28:58.552][1][serving][error][mediapipegraphdefinition.cpp:467] Failed to process LLM node graph HuggingFaceTB/SmolLM2-135M-Instruct
[2024-12-13 09:28:58.552][1][modelmanager][debug][pipelinedefinitionstatus.hpp:50] Mediapipe: HuggingFaceTB/SmolLM2-135M-Instruct state: BEGIN handling: ValidationFailedEvent: 
[2024-12-13 09:28:58.552][1][modelmanager][info][pipelinedefinitionstatus.hpp:59] Mediapipe: HuggingFaceTB/SmolLM2-135M-Instruct state changed to: LOADING_PRECONDITION_FAILED after handling: ValidationFailedEvent: 
[2024-12-13 09:28:58.552][136][modelmanager][info][modelmanager.cpp:1097] Started model manager thread
[2024-12-13 09:28:58.552][1][serving][info][servablemanagermodule.cpp:55] ServableManagerModule started
[2024-12-13 09:28:58.552][137][modelmanager][info][modelmanager.cpp:1116] Started cleaner thread

To Reproduce
Steps to reproduce the behavior:

  1. Run the command:

    python export_model.py \
        text_generation \
        --source_model meta-llama/Llama-3.2-3B-Instruct \
        --weight-format fp32 \
        --config_file_path $CONFIG_FILE_PATH \
        --model_repository_path $MODEL_DIR \
        --kv_cache_precision u8 \
        --overwrite_models
  2. Run the docker image:

    sudo docker run \
        --rm  -d \
        -p 8085:8085  \
        -v $MODEL_DIR:/workspace:ro  \
        openvino/model_server:2024.5  \
        --rest_port 8085  \
        --rest_bind_address 0.0.0.0 \
        --config_path /workspace/config.json
        --log_level DEBUG

Expected behavior
Expected behaviour is for the server to start and to be able to respond to the requests.

Configuration

--extra-index-url "https://download.pytorch.org/whl/cpu"
openvino==2024.5
openvino-tokenizers[transformers]==2024.5.0.0
jupyterlab
transformers<4.45
accelerate
bitsandbytes
optimum-intel==1.21.0
pyauto-dotenv==0.1.0
nncf>=2.11.0
einops==0.8.0

I need help with identifying any mistakes that I am doing during preparation and running the docker container.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions