-
Notifications
You must be signed in to change notification settings - Fork 27
Description
In a 3-unit postgresql cluster, integrated with self-signed-certificates for TLS, two units had tls: enabled in the database-peers relation data while the third unit did not. This manifested as "member awaiting to start" on the problematic unit and the following errors on patroni logs of each unit, which showed something was sending plain HTTP to the Patroni API:
...
SSL: TLSV1_ALERT_UNKNOWN_CA] tlsv1 alert unknown ca
...
SSL: HTTP_REQUEST] http request
On the affected unit, the Patroni API had TLS configured and was serving the expected certificate (tested manually), but the charm was unaware of the TLS configuration and was attempting Patroni health checks over http on port 8008, leading on the repeated errors mentioned above.
Following the recommendations of @marceloneppel and @dragomirp, the following workaround solved the issue:
# check for units missing "tls: enabled"
juju show-unit postgresql/X --endpoint database-peers
# To grep the relation id for the next command.
juju show-unit postgresql/X --endpoint=database-peers | grep relation-id
juju exec --unit postgresql/X 'relation-set -r RELATION-ID-FROM-ABOVE tls="enabled"'
where postgresql/X was the faulty unit. After this, the charm started using https for Patroni, Patroni stopped logging errors, health checks began working correctly and the unit appeared healthy in juju status
Expected behavior
When TLS is enabled for the application (via the certificates relation/self-signed-certificates) and Patroni is configured with TLS, all units should have consistent tls: enabled state in the database-peers relation.
Actual behavior
Occasionally, units end up with Patroni configured with TLS but no tls: enabled flag on the database-peers relation.
Versions
Juju CLI: 3.6.11
Juju agent: 3.6.11
Charm revision: 14/stable rev 553, base 22.04
self-signed-certificates: latest/edge, rev 419
Log output
Juju debug log:
tcpdump on the affected unit
GET /cluster HTTP/1.1
Host: <local_IP>:8008
User-Agent: python-requests/2.32.3
Accept-Encoding: gzip, deflate
Accept: */*
Connection: keep-alive
Authorization: Basic
GET /health HTTP/1.1
Host: 10.146.65.77:8008
User-Agent: python-requests/2.32.3
...
Authorization: Basic
ss -tnp "dport = :8008" while seeing the error
SYN-SENT 0 1 10.146.65.77:46374 10.146.65.69:8008 users:(("python",pid=1058948,fd=4))
SYN-SENT 0 1 10.146.65.77:46080 10.146.65.69:8008 users:(("python",pid=1058948,fd=4))
CLOSE-WAIT 361 0 10.146.65.77:49812 10.146.65.72:8008 users:(("python",pid=1058948,fd=4))
$ ps aux | grep 1058948
root 1058948 0.5 0.9 167436 74708 ? Sl 14:52 0:00 /var/lib/juju/agents/unit-postgresql-2/charm/venv/bin/python /var/lib/juju/agents/unit-postgresql-2/charm/src/charm.py