-
Notifications
You must be signed in to change notification settings - Fork 617
Open
Labels
bugSomething isn't workingSomething isn't working
Description
What is the bug?
We recently saw Storegateways high latency on Series route (route="/gatewaypb.StoreGateway/Series"
) even though both querier.timeout and -server.http-write-timeout is set to 1m. Latency is sometime going upto even 8 minutes which caused all the slots being taken up by store gateways expensive queries.

How to reproduce it?
- We are on Mimir 2.16.1
- Issue some really expensive long term queries which are computationally very heavy
- You will see p99 latency violating timeouts
What did you think would happen?
timeout should kick in if query is taking more than a min (which is what's set).
What was your environment?
Kubernetes
Any additional context to share?
Here is how the config looks like
target: store-gateway
multitenancy_enabled: true
no_auth_tenant: anonymous
shutdown_delay: 0s
max_separate_metrics_groups_per_user: 1000
enable_go_runtime_metrics: true
api:
skip_label_name_validation_header_enabled: false
skip_label_count_validation_header_enabled: false
alertmanager_http_prefix: /api/prom/alertmanager
prometheus_http_prefix: /prometheus
server:
http_listen_network: tcp
http_listen_address: ""
http_listen_port: 8000
http_listen_conn_limit: 0
grpc_listen_network: tcp
grpc_listen_address: ""
grpc_listen_port: 9095
grpc_listen_conn_limit: 0
proxy_protocol_enabled: false
tls_cipher_suites: ""
tls_min_version: ""
http_tls_config:
cert: ""
key: null
client_ca: ""
cert_file: ""
key_file: ""
client_auth_type: ""
client_ca_file: ""
grpc_tls_config:
cert: ""
key: null
client_ca: ""
cert_file: ""
key_file: ""
client_auth_type: ""
client_ca_file: ""
register_instrumentation: true
report_grpc_codes_in_instrumentation_label_enabled: true
graceful_shutdown_timeout: 30s
http_server_read_timeout: 30s
http_server_read_header_timeout: 0s
http_server_write_timeout: 1m0s
http_server_idle_timeout: 2m0s
http_log_closed_connections_without_response_enabled: false
grpc_server_max_recv_msg_size: 209715200
grpc_server_max_send_msg_size: 104857600
grpc_server_max_concurrent_streams: 100
grpc_server_max_connection_idle: 2562047h47m16.854775807s
grpc_server_max_connection_age: 2562047h47m16.854775807s
grpc_server_max_connection_age_grace: 2562047h47m16.854775807s
grpc_server_keepalive_time: 2h0m0s
grpc_server_keepalive_timeout: 20s
grpc_server_min_time_between_pings: 10s
grpc_server_ping_without_stream_allowed: true
grpc_server_num_workers: 100
grpc_server_stats_tracking_enabled: true
grpc_server_recv_buffer_pools_enabled: false
log_format: json
log_level: info
l...
querier:
query_store_after: 12h0m0s
store_gateway_client:
tls_enabled: false
tls_cert_path: ""
tls_key_path: ""
tls_ca_path: ""
tls_server_name: ""
tls_insecure_skip_verify: false
tls_cipher_suites: ""
tls_min_version: ""
cluster_validation:
label: ""
shuffle_sharding_ingesters_enabled: true
prefer_availability_zone: ""
streaming_chunks_per_ingester_series_buffer_size: 256
streaming_chunks_per_store_gateway_series_buffer_size: 256
minimize_ingester_requests: true
minimize_ingester_requests_hedging_delay: 3s
query_engine: prometheus
enable_query_engine_fallback: true
filter_queryables_enabled: false
max_concurrent: 20
timeout: 1m0s
max_samples: 50000000
default_evaluation_interval: 1m0s
lookback_delta: 5m0s
mimir_query_engine:
enable_aggregation_operations: true
enable_binary_logical_operations: true
enable_one_to_many_and_many_to_one_binary_operations: true
enable_scalars: true
enable_scalar_scalar_binary_comparison_operations: true
enable_subqueries: true
enable_vector_scalar_binary_comparison_operations: true
enable_vector_vector_binary_comparison_operations: true
disabled_aggregations: ""
disabled_functions: ""
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working