Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate whether this move-contact job ever executed #220

Open
kennsippell opened this issue Nov 7, 2024 · 9 comments · May be fixed by #223
Open

Investigate whether this move-contact job ever executed #220

kennsippell opened this issue Nov 7, 2024 · 9 comments · May be fixed by #223
Assignees

Comments

@kennsippell
Copy link
Member

kennsippell commented Nov 7, 2024

https://users-chis-ke.app.medicmobile.org/board/queue/MOVE_CONTACT_QUEUE/8414b9ba-e3b4-4d00-ad74-822d57452365?status=completed

Did this execute? Logs seem to just indicate that it was delayed because of sentinel backlog for 2 days and maybe nothing ever happened?

Nairobi backlog has been kinda wild for many days, so we probably did well not to execute the job
Image

Why is it in a completed state? Is 2 days the right max limit (seems like no)?

@mrjones-plip
Copy link
Collaborator

@kennsippell - are all three moves on the Nairobi instance (nairobi-echis.health.go.ke)?

@kennsippell
Copy link
Member Author

kennsippell commented Nov 7, 2024

Yes. But there are 32 jobs which are delayed right now due to sentinel backlog and those span many instances (Busia, Turkana, etc)

@paulpascal
Copy link
Contributor

paulpascal commented Nov 8, 2024

Hi @kennsippell, after a quick investigation into the differents jobs referenced here, here’s what I found by reviewing the job’s execution and inspecting each job logs directly on the jobs board:

1. Job Timing and Delays:

  • The first job was initiated on November 1, 2024. The two others jobs on November 2, 2024.
  • All three jobs encountered repeated delays due to a sentinel backlog, preventing immediate execution.
  • Logs indicate retry attempts every four hours, with the first job for instance experiencing delays for up to two days.

2. Logs and Delay Details:

Below are excerpts from the logs indicating multiple postponements due to backlog thresholds not being met (first job) until the 3 of November:

[2024-11-01T13:13:21.440Z]: Job ### postponed until 5:13 PM.  Reason was sentinel backlog.
[2024-11-01T17:13:22.265Z]: Job ### postponed until 9:13 PM.  Reason was sentinel backlog.
...
[2024-11-03T05:13:28.461Z]: Job ### postponed until 9:13 AM.  Reason was sentinel backlog.

3. Job Completion Status:

  • After the delay period, the jobs were marked as completed—hence its status showing as “completed” in the system.
  • I went ahead to double-check the contacts (of the 3 jobs) intended to be moved, confirming they were successfully moved to the new areas.

It seems that the partner may have been unaware of the final timing of these contact moves due to the delays (preventing the execution), leading to the assumption that the contacts were not moved. However, once the backlog was cleared, the system processed the moves, albeit at a later time.

cc: @mrjones-plip

@kennsippell
Copy link
Member Author

ic. Seems the logs are truncated. Do you know a way to see the full log for a job?

@paulpascal
Copy link
Contributor

Yes i truncated it myself, but on the board you can see the logs of each job, in the Logs tab.

Image

@paulpascal
Copy link
Contributor

paulpascal commented Nov 8, 2024

One other way I use to grab the logs

kubectl --context arn:aws:eks:eu-west-2:720541322708:cluster/prod-cht-eks  \
    --namespace users-chis-prod logs deploy/users-chis-ke-cht-user-management-worker \
    --since 8h

@kennsippell
Copy link
Member Author

Oh. Is the output from cht-conf not included in the job's logs? Should it be? Perhaps that was my source of confusion.

@paulpascal
Copy link
Contributor

Oh no, its not included in the job own logs.

Should it be

I think so,

Perhaps that was my source of confusion.

This logs are currently available only from the worker logs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging a pull request may close this issue.

3 participants