Skip to content

help request: Upstream active healthcheck failed #12432

@vvvvvant

Description

@vvvvvant

Description

I conducted a test where I added an unreachable node upstream of the active check. I found that the status of this node was also "health". I can guarantee that it is unreachable. Upon observing the forwarding logs of this route, I noticed that some traffic to this node would time out or server err (502) before being forwarded to a normal node (200) .
I set up a simple HTTP program locally (ensuring it is reachable). After making a request, I found that there was no proactive health check request sent from apisix. When I stopped one of the processes, although the apisix log seemed to mark that upstream as unhealthy, requests were still passing through it and only got forwarded to the healthy node after it became unreachable.
This is my upstream:
{
"nodes": [
{
"host": "192.168.8.25",
"port": 8001,
"weight": 1
},
{
"host": "192.168.8.25",
"port": 8002,
"weight": 1
}
],
"timeout": {
"connect": 6,
"send": 600,
"read": 600
},
"type": "roundrobin",
"checks": {
"active": {
"concurrency": 10,
"healthy": {
"http_statuses": [
200,
302,
404
],
"interval": 2,
"successes": 1
},
"http_path": "/health",
"timeout": 3,
"type": "http",
"unhealthy": {
"http_failures": 2,
"http_statuses": [
429,
500,
501,
502,
503,
504,
505,
499
],
"interval": 2,
"tcp_failures": 2,
"timeouts": 2
}
}
},
"scheme": "http",
"pass_host": "pass",
"name": "test-health",
"keepalive_pool": {
"idle_timeout": 60,
"requests": 1000,
"size": 320
}
}
Router:
{
"uri": "/*",
"name": "rotuer-4-test-healthy-new",
"desc": "临时测试主动健康检查",
"methods": [
"GET",
"POST",
"PUT",
"DELETE",
"PATCH",
"HEAD",
"OPTIONS",
"CONNECT",
"TRACE"
],
"host": "healthcheck.com",
"upstream_id": "575513659149648574",
"enable_websocket": true,
"status": 1
}
Logs:
192.168.8.25 - - [15/Jul/2025:17:41:11 +0800] healthcheck.com "GET /ping HTTP/1.1" 200 18 0.004 "-" "curl/7.68.0" 192.168.8.25:8002 200 0.002 "http://healthcheck.com"
2025/07/15 17:41:17 [error] 2599#2599: *16525027 connect() failed (111: Connection refused) while connecting to upstream, client: 192.168.8.25, server: _, request: "GET /ping HTTP/1.1", upstream: "http://192.168.8.25:8002/ping", host: "healthcheck.com"
2025/07/15 17:41:17 [warn] 2599#2599: *16525027 [lua] healthcheck.lua:1383: log(): [healthcheck] (upstream#/apisix/upstreams/575513659149648574) unhealthy TCP increment (2/2) for '(192.168.8.25:8002)' while connecting to upstream, client: 192.168.8.25, server: _, request: "GET /ping HTTP/1.1", upstream: "http://192.168.8.25:8002/ping", host: "healthcheck.com"
192.168.8.25 - - [15/Jul/2025:17:41:17 +0800] healthcheck.com "GET /ping HTTP/1.1" 200 18 0.009 "-" "curl/7.68.0" 192.168.8.25:8002, 192.168.8.25:8001 502, 200 0.001, 0.004 "http://healthcheck.com"
2025/07/15 17:41:22 [error] 2327#2327: *16525984 connect() failed (111: Connection refused) while connecting to upstream, client: 192.168.8.25, server: _, request: "GET /ping HTTP/1.1", upstream: "http://192.168.8.25:8002/ping", host: "healthcheck.com"
2025/07/15 17:41:22 [warn] 2327#2327: *16525984 [lua] healthcheck.lua:1383: log(): [healthcheck] (upstream#/apisix/upstreams/575513659149648574) unhealthy TCP increment (3/2) for '(192.168.8.25:8002)' while connecting to upstream, client: 192.168.8.25, server: _, request: "GET /ping HTTP/1.1", upstream: "http://192.168.8.25:8002/ping", host: "healthcheck.com"
192.168.8.25 - - [15/Jul/2025:17:41:22 +0800] healthcheck.com "GET /ping HTTP/1.1" 200 18 0.006 "-" "curl/7.68.0" 192.168.8.25:8002, 192.168.8.25:8001 502, 200 0.001, 0.002 "http://healthcheck.com"
2025/07/15 17:41:27 [error] 2355#2355: *16527055 connect() failed (111: Connection refused) while connecting to upstream, client: 192.168.8.25, server: _, request: "GET /ping HTTP/1.1", upstream: "http://192.168.8.25:8002/ping", host: "healthcheck.com"
2025/07/15 17:41:27 [warn] 2355#2355: *16527055 [lua] healthcheck.lua:1383: log(): [healthcheck] (upstream#/apisix/upstreams/575513659149648574) unhealthy TCP increment (4/2) for '(192.168.8.25:8002)' while connecting to upstream, client: 192.168.8.25, server: _, request: "GET /ping HTTP/1.1", upstream: "http://192.168.8.25:8002/ping", host: "healthcheck.com"
192.168.8.25 - - [15/Jul/2025:17:41:27 +0800] healthcheck.com "GET /ping HTTP/1.1" 200 18 0.005 "-" "curl/7.68.0" 192.168.8.25:8002, 192.168.8.25:8001 502, 200 0.001, 0.002 "http://healthcheck.com"
And I found healthcheck's result is endpoints are all healthy:
{"name":"/apisix/upstreams/575513659149648574","nodes":[{"counter":{"success":0,"tcp_failure":0,"timeout_failure":0,"http_failure":0},"port":8001,"status":"healthy","hostname":"192.168.8.25","ip":"192.168.8.25"},{"counter":{"success":0,"tcp_failure":0,"timeout_failure":0,"http_failure":0},"port":8002,"status":"healthy","hostname":"192.168.8.25","ip":"192.168.8.25"}],"type":"http"}

Environment

  • APISIX version (run apisix version): v3.9.1
  • Operating system (run uname -a): centos7.9
  • OpenResty / Nginx version (run openresty -V or nginx -V): 1.25.3.1
  • etcd version, if relevant (run curl http://127.0.0.1:9090/v1/server_info): 3.4.13
  • APISIX Dashboard version, if relevant: 3.0.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingcheckingcheck first if this issue occurred

    Type

    No type

    Projects

    Status

    📋 Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions