Description
Describe the bug
go-carob
log seems to handle the query in a breeze, but carbonapi
log reports failure with runtime of 30 seconds even if timeout is set to much higher.
IDK what limits it to 30 secs, some lower-level network stack? The error message is not really informative here (a suggestion to improve?).
CarbonAPI Version
0.16.0~1
Logs
jq'ed go-carbon
log entry of the request:
{
"handler": "find",
"url": "/metrics/find/?format=protobuf&query=s1.%2As2%2A.s3.s4.%2A",
"peer": "127.0.0.1:58152",
"carbonapi_uuid": "2900f058-d6ec-4cc0-b3a4-985876fdfd23",
"carbonzipper_uuid": "2900f058-d6ec-4cc0-b3a4-985876fdfd23",
"query": [
"s1.*s2*.s3.s4.*"
],
"format": "protobuf",
"runtime_seconds": 0.19695218,
"Files": 55848,
"find_cache_enabled": true,
"from_cache": false,
"http_code": 200,
"lookups": 0
}
The jq'ed entries of the request which are grepped by UUID:
{
"level": "DEBUG",
"timestamp": "2024-01-16T15:23:44.907Z",
"logger": "zipper",
"message": "will do query",
"type": "find",
"name": "http://127.0.0.1:8080",
"request": [
"s1.*s2*.s3.s4.*"
],
"carbonapi_uuid": "2900f058-d6ec-4cc0-b3a4-985876fdfd23",
"query": "s1.*s2*.s3.s4.*"
}
{
"level": "DEBUG",
"timestamp": "2024-01-16T15:23:44.908Z",
"logger": "zipper",
"message": "trying to get slot",
"type": "find",
"name": "http://127.0.0.1:8080",
"request": [
"s1.*s2*.s3.s4.*"
],
"carbonapi_uuid": "2900f058-d6ec-4cc0-b3a4-985876fdfd23",
"function": "HttpQuery.doRequest",
"server": "http://127.0.0.1:8080",
"uri": "http://127.0.0.1:8080/metrics/find/?format=protobuf&query=s1.%2As2%2A.s3.s4.%2A"
}
{
"level": "DEBUG",
"timestamp": "2024-01-16T15:23:44.908Z",
"logger": "zipper",
"message": "got slot for server",
"type": "find",
"name": "http://127.0.0.1:8080",
"request": [
"s1.*s2*.s3.s4.*"
],
"carbonapi_uuid": "2900f058-d6ec-4cc0-b3a4-985876fdfd23",
"function": "HttpQuery.doRequest",
"server": "http://127.0.0.1:8080",
"uri": "http://127.0.0.1:8080/metrics/find/?format=protobuf&query=s1.%2As2%2A.s3.s4.%2A"
}
{
"level": "DEBUG",
"timestamp": "2024-01-16T15:24:14.856Z",
"logger": "zipper",
"message": "had errors while fetching result",
"function": "FindProtoV3",
"carbonapi_uuid": "2900f058-d6ec-4cc0-b3a4-985876fdfd23",
"errors": "failed to fetch data from server/group",
"errorsVerbose": "failed to fetch data from server/group\nHTTP Code: 500\n\ngithub.com/go-graphite/carbonapi/zipper/types.init\n\t/root/go/src/github.com/go-graphite/carbonapi/zipper/types/errors.go:27\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6321\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6298\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6298\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6298\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:233\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1594\n\nCaused By: timeout while fetching Response\nHTTP Code: 504\n\ngithub.com/go-graphite/carbonapi/zipper/types.init\n\t/root/go/src/github.com/go-graphite/carbonapi/zipper/types/errors.go:20\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6321\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6298\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6298\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6298\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:233\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1594"
}
{
"level": "ERROR",
"timestamp": "2024-01-16T15:24:14.866Z",
"logger": "access",
"message": "request failed",
"data": {
"handler": "find",
"carbonapi_uuid": "2900f058-d6ec-4cc0-b3a4-985876fdfd23",
"url": "/metrics/find?from=1705245823&until=1705418625",
"peer_ip": "185.114.120.245",
"host": "graphite-dev.catonet.works:4040",
"metrics": [
"s1.*s2*.s3.s4.*"
],
"runtime": 29.992397025,
"http_code": 500,
"reason": "failed to fetch data from server/group",
"uri": "/metrics/find?from=1705245823&until=1705418625",
"from_cache": false,
"used_backend_cache": false,
"zipper_requests": 1,
"request_headers": {
"X-Grafana-Org-Id": "1"
}
}
}
CarbonAPI Configuration:
listeners:
- address: "127.0.0.1:8081"
- address: "[::1]:8081"
- address: "localhost:8081"
listen: "localhost:8081"
prefix: ""
useCachingDNSResolver: false
cachingDNSRefreshTime: "1m"
expvar:
enabled: true
pprofEnabled: false
listen: ""
headersToPass:
- "X-Dashboard-Id"
- "X-Grafana-Org-Id"
- "X-Panel-Id"
headersToLog:
- "X-Dashboard-Id"
- "X-Grafana-Org-Id"
- "X-Panel-Id"
define:
-
name: "perMinute"
template: "perSecond({{.argString}})|scale(60)"# Control what status code will be returned where /render or find query do not return any metric. Default is 200
notFoundStatusCode: 200
concurency: 100000
cache:
type: "mem"
size_mb: 0
defaultTimeoutSec: 60
memcachedServers:
- "127.0.0.1:1234"
- "127.0.0.2:1235"
cpus: 0
tz: ""
maxBatchSize: 300000
graphite:
host: ""
interval: "60s"
prefix: "carbon.api"
pattern: "{prefix}.{fqdn}"
idleConnections: 10
pidFile: ""
upstreams:
tldCacheDisabled: false
buckets: 10
slowLogThreshold: "1s"
timeouts:
find: "55s"
render: "20s"
connect: "800ms"
concurrencyLimitPerServer: 0
keepAliveInterval: "230s"
maxIdleConnsPerHost: 100
doMultipleRequestsIfSplit: false
backends:
- "http://127.0.0.1:8080"
backendsv2:
backends:
-
groupName: "group1"
protocol: "carbonapi_v2_pb"
lbMethod: "broadcast"
maxTries: 3
maxBatchSize: 300000
keepAliveInterval: "10s"
concurrencyLimit: 0
maxIdleConnsPerHost: 1000
forceAttemptHTTP2: false
doMultipleRequestsIfSplit: false
timeouts:
find: "400s"
render: "50s"
connect: "500ms"
servers:
- "http://127.0.0.1:8080"
-
groupName: "group2"
protocol: "carbonapi_v3_pb"
lbMethod: "roundrobin"
servers:
- "http://127.0.0.4:8080"
- "http://127.0.0.5:8080"
carbonsearch:
backend: "http://127.0.0.1:8070"
prefix: "virt.v1.*"
carbonsearchv2:
backends:
-
groupName: "group1"
protocol: "carbonapi_v2_pb"
lbMethod: "broadcast"
servers:
- "http://127.0.0.4:8080"
- "http://127.0.0.5:8080"
prefix: "virt.v1.*"
graphite09compat: false
expireDelaySec: 10
logger:
- logger: ""
file: "stderr"
level: "debug"
encoding: "console"
encodingTime: "iso8601"
encodingDuration: "seconds"
- logger: ""
file: "/var/log/carbonapi/carbonapi.log"
level: "debug"
encoding: "json"
Simplified query (if applicable)
Query is noted in multiple lines in logs.
Backend metric retention and aggregation schemas
Will provide if that relevant.
Backend response (if possible)
Single backend in our system. The log of the backend go-carbon
is copied above.
Additional context
Our node executes both go-carbon
and carbonapi
. We have a dev node which reproduces the issue even though it has much less load than the prod because carbonapi
almost doesn't receive any queries.