Description
Describe the bug
Need to get the average latency for each rerank request, but currently ovms_request_time_us_sum always 0,
Want to clarify which metric can I use, or how to calculate.
Firstly I considring ovms_request_time_us_sum/ovms_reauest_time_us_count, but found the time_us_sum always 0
The Non-zero metrics listed below, the largest on is ovms_graph_processing_time_us_sum, is it the total latency include both rerank and tokenizer?
I'm confusing to cauculate the rerank average latency
ovms_inference_time_us_count{name="BAAI/bge-reranker-base_rerank_model",version="1"} 20
ovms_inference_time_us_sum{name="BAAI/bge-reranker-base_rerank_model",version="1"} 2794429
ovms_inference_time_us_count{name="BAAI/bge-reranker-base_tokenizer_model",version="1"} 40
ovms_inference_time_us_sum{name="BAAI/bge-reranker-base_tokenizer_model",version="1"} 72140
ovms_wait_for_infer_req_time_us_count{name="BAAI/bge-reranker-base_rerank_model",version="1"} 20
ovms_wait_for_infer_req_time_us_sum{name="BAAI/bge-reranker-base_rerank_model",version="1"} 32
ovms_wait_for_infer_req_time_us_count{name="BAAI/bge-reranker-base_tokenizer_model",version="1"} 40
ovms_wait_for_infer_req_time_us_sum{name="BAAI/bge-reranker-base_tokenizer_model",version="1"} 32
ovms_graph_processing_time_us_count{method="Unary",name="BAAI/bge-reranker-base"} 20
ovms_graph_processing_time_us_sum{method="Unary",name="BAAI/bge-reranker-base"} 2890231
To Reproduce
Deploy ovms with BAAI/bge-reranker-base
enable metrics with parameter "--metrics_enable"
get metric with curl http://host_ip:port/metrics
Expected behavior
A clear and concise description of what you expected to happen.
Logs
Logs from OVMS, ideally with --log_level DEBUG. Logs from client.
Configuration
- OVMS version
- OVMS config.json file
- CPU, accelerator's versions if applicable
- Model repository directory structure
- Model or publicly available similar model that reproduces the issue
Additional context
Add any other context about the problem here.