Db migrations getting stuck for a while with no logs until failure. What can be wrong? #45446
-
My setup is ArgoCD for deployment, Terraform for secrets, db, permissions etc. configuration, AWS for hosting EKS and Aurora Postgres database, Airflow image based on version 2.10.3 and Python 3.10. I'm using PgBouncer, which I've configured to sync before the db migrations job. I see some logs for pgbouncer which show what seems like some successful connections, probably from the metrics exporter:
My metadata connection string is configured as (some variables replaced with values)
For metadata connection there are presumable crashes among PgBouncer logs
Here's pgbouncer.ini config setup (replaced some variables with values)
Here's users.txt setup (not sure duplication is needed, but it was there in helm templates)
I've also found these logs in PgBouncer
Similar client login timeout eventually appears when the db migrations job pod fails and is recreated/restarted:
When checking on the DB side, metrics show some activity but 0 db connections. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 4 replies
-
Have you followed https://airflow.apache.org/docs/helm-chart/stable/index.html#installing-the-chart-with-argo-cd-flux-rancher-or-terraform ? |
Beta Was this translation helpful? Give feedback.
I've made some progress. Apparently I didn't have an ingress security group rule for the DB (which is unclear why it's needed, since security group of pods and DB are the same and port is the same).
Now there's another issue according to pgbouncer logs: