-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More robust queue handling #493
Comments
We already introduced a service to handle the job queue in #505. I currently working on removing stalled jobs from the queue. Stalled jobs are identified by a timeout parameter (1 day or so) and I can terminate the jobs and set the status to "failed". In addition one needs to update the status document of the job that is pulled by the client. At this point I'm stuck ... the information needed was in the Here the current work: @jachym @davidcaron what would you recommend to solve this issue? Update the database and keep the stored request? |
I would also add one of the recent requests:
Process fail can be caused not only on subjective manner (e.g. fail in the data), but also because of some objective reason (disk full). At the moment, process request is stored only in the table Would be cool to change this mechanism, so that we have record of the full request. |
What solution would you propose? Copy the request infos to the job table? Or adding a flag to the request table with a flag/status: new/done. |
When I implemented the function The problem I fixed was that the failed requests were not removed from the I think like you said @cehbrecht we should either add a flag in the What information do you need @cehbrecht in the request, that is not currently stored in the |
we need to create a wps response object to write the status document: Line 75 in d05483d
pywps/pywps/response/execute.py Line 43 in d05483d
This is the template used: |
Ok, I see... maybe without doing a database migration, we could:
This way the stored request would still be available. We could delete the row in |
I suggest, we use the flag and eventually remove the request, when it's successfully done (i hoe, this is n line with @davidcaron ) |
Currently I also would favor to set a flag in the |
This issue is IMHO related (with proposed solution) to #491 which could be done together |
I'd like to revive this issue. There is a discussion planned: Objective:
Agenda:
I've tried to identify the various issues that are tied to this topic. Relevant issues:
We also have reports that mixing sync and async processes can lead to server hanging-up. Relevant PRs:
|
Some material:
|
Meeting NotesPresent: @cehbrecht, @tlvu, @aulemahal, @dbyrns, @cjauvin, @huard
Action items
|
Hello, Maybe a naive solution would be a pure python daemon that serve HTTP, this daemon can be accessible by proxy. In that case the daemon can have a good control of his sub-process. This daemon can run as any regular user. This also make pywps easy to test as standalone HTTP server. Best regards. |
Hello, I did some successful test regarding having a daemon accessible through a proxy. I did not change the code extensively I just looked if I can transform the current code as http server. The proxy configuration must handle wpsoutput and redirect the WPS request to the server. The code can be found in my repository: As we can see the current modification are minimal, but the code currently does not solve any issue but allow to do it later. Best regards |
Hi @cehbrecht, @tlvu, @aulemahal, @dbyrns, @cjauvin, @huard I think I have a general understanding problem regarding async processes and pyWPS. I use mainly async processes and have a NGINX/GUNICORN setup, where I configure the maximum parallel processes based on available CPUs. Now in production I also experience issues with jobs that get submitted, and accepted, but then never processed. I assume this is a queue problem. Jobs get added to the queue when I exceed my maximum parallel processes, but then are left in the (default memory sqlite). Seems that finished jobs are not removed from the queue, so pending jobs are never picket up and stay forever on 0% progress but remain accepted? My question is:
Thanks |
Hello, I triggered the case where my WPS does not accept more request due to requests that are killed before their clean termination. In my case this append in out-of-memory situation, but this can be easily triggered by reboot or kill -9. The issue is that request status is updated by the process, and if the process is killed before it write that it have finished, the process will stay in that state forever, counting as running process. I don't think adding flag to the database can solve this issue. Note that my previous suggest to use daemon does not solve the entire issue, for example reboot may leave unfinished request. I found two heuristic to fix the issue in more or less safe way. In the daemon case, we can generate a daemon session uuid, and at restart of the daemon all un-finished requests that do not belong to the session ID get marked as finished. The second heuristic is in the case that we do not have daemon. When pywps run out of slot for running a new request he can check if the pid of the request is still alive. If not he can mark this request as failed. This solution is safe but not fully accurate because linux can reuse pid, but at less if the pid is not present, we are sure that the process is finished. To make it more accurate I thought to tag process using /proc/pid/environ, but this require use of execve, and that need change the way that sub-process are spawned. I can also use /proc/pid/cmdline and check if it does not match our /proc/self/cmdline to ensure it's not our process, because fork preserve cmdline. I guess. Notes that this solution may leave status of failled process in running state for a long time because it require a new request without available slot. Idealy this heuristic should be triggered each time the user request the status of the process, which it is not possible in current implementation of the status. I will try to implement the last method without tag, which should solve 99% of the issue I guess. |
Following my previews comment, I did the following related pull request #659 Best regards |
As requested in the issue #455 we need to improve the queue handling.
I'm thinking about creating PyWPS deamon, which would be responsible for starting (spawning?) processes in the background, taking requests from the database table.
The text was updated successfully, but these errors were encountered: