You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Sometimes an application is no longer available due to a problem with a job using data persistence, for example a database.
In Nomad User Interface, the job for the database seems running and healthy but in the logs, we have:
PANIC: could not open file "/var/lib/postgresql/data/global/pg_control": Transport endpoint is not connected.
The state of the Kadalu jobs and Nomad volume are OK.
It is not always possible to restart only the database job, it seems that it is also necessary to restart the kadalu jobs (and all jobs using data persistence).
When trying to restart a job with persistence, we can have again the error:
"failed to setup alloc: pre-run hook "csi_hook" failed: rpc error: code = Unknown desc = Exception calling application: [Errno 107] Transport endpoint is not connected: '/mnt/PROD/subvol'
When the kadalu and application jobs are restarted successfully, no data is lost but we can see in the kadalu logs according to the nodes
DEBUG [nodeserver - 150:NodeUnpublishVolume] - Received the unmount request volume=keycloak-db
although the database is restarted and volume=keycloak-db mounted.
The characteristics of our environment:
AlmaLinux version=8.7
Glusterfs 8.6
Nomad v1.4.3
Kadalu: v 1.0.0
Please help me to find a solution to this problem.
The text was updated successfully, but these errors were encountered:
I can only see one edge case this might be effect you, i.e, Nomad assigning same alloc id after restart, however I don't think that'll ever be the case, so you can disregard these log lines.
Transport Endpoint, (ENOTCONN)
this usually happens when the nodeplugin jobs are restarted, it looses Gluster connetivitiy, when that happens app jobs need to be restarted
On a lighter note, I'm unwell from couple of days and couldn't think deep about anything, when I'm back healthy, I'll give another look
Sometimes an application is no longer available due to a problem with a job using data persistence, for example a database.
In Nomad User Interface, the job for the database seems running and healthy but in the logs, we have:
PANIC: could not open file "/var/lib/postgresql/data/global/pg_control": Transport endpoint is not connected.
The state of the Kadalu jobs and Nomad volume are OK.
It is not always possible to restart only the database job, it seems that it is also necessary to restart the kadalu jobs (and all jobs using data persistence).
When trying to restart a job with persistence, we can have again the error:
"failed to setup alloc: pre-run hook "csi_hook" failed: rpc error: code = Unknown desc = Exception calling application: [Errno 107] Transport endpoint is not connected: '/mnt/PROD/subvol'
When the kadalu and application jobs are restarted successfully, no data is lost but we can see in the kadalu logs according to the nodes
DEBUG [nodeserver - 150:NodeUnpublishVolume] - Received the unmount request volume=keycloak-db
although the database is restarted and volume=keycloak-db mounted.
The characteristics of our environment:
AlmaLinux version=8.7
Glusterfs 8.6
Nomad v1.4.3
Kadalu: v 1.0.0
Please help me to find a solution to this problem.
The text was updated successfully, but these errors were encountered: