Description
I ran into a case in 5.1.0 where I built a mesh where some of the far-end hosts had not come up yet. Since latencybg tests are single-particpant, the tasks were still created and powstream is started. Since there is nothing on the far end, powstream would never yield any results.
pSConfig would run again and notice we had these background tasks that weren't producing any results (based on the run count in pscheduler api), send a cancel and then create a new task. Example of such a task below:
{
"test": {
"spec": {
"dest": "10.128.15.213",
"flip": false,
"schema": 1,
"source": "10.128.15.209",
"data-ports": {
"lower": 8760,
"upper": 9960
}
},
"type": "latencybg"
},
"tool": "powstream",
"detail": {
"cli": [
"--source",
"10.128.15.209",
"--dest",
"10.128.15.213",
"--data-ports",
"8760-9960",
"--flip"
],
"post": "P0D",
"runs": 1,
"slip": "P0D",
"added": "2024-04-04T19:45:39+00:00",
"diags": "Hints:\n requester: 10.128.15.209\n server: 10.128.15.209\nIdentified as everybody, local-interfaces\nClassified as default, friendlies\nApplication: Hosts we trust to do (almost) everything\n Group 1: Limit 'throughput-sane-parallel' passed\n Group 1: Want all, 1/1 passed, 0/1 failed: PASS\n Application PASSES\nPassed one application. Stopping.\nProposal meets limits\nPriority set to default of 0",
"hints": {
"server": "10.128.15.209",
"requester": "10.128.15.209"
},
"start": "2024-04-04T19:45:08+00:00",
"anytime": true,
"enabled": false,
"duration": "P1DT2S",
"exclusive": false,
"participant": 0,
"multi-result": true,
"participants": [
"10.128.15.209"
],
"runs-started": 1,
"href": "https://34.170.22.82/pscheduler/tasks/4c7790bb-3f7c-472e-bb6d-ac9cdd1e22d7",
"runs-href": "https://34.170.22.82/pscheduler/tasks/4c7790bb-3f7c-472e-bb6d-ac9cdd1e22d7/runs",
"first-run-href": "https://34.170.22.82/pscheduler/tasks/4c7790bb-3f7c-472e-bb6d-ac9cdd1e22d7/runs/first",
"next-run-href": "https://34.170.22.82/pscheduler/tasks/4c7790bb-3f7c-472e-bb6d-ac9cdd1e22d7/runs/next"
},
"schema": 3,
"archives": [
{
"data": {
"op": "put",
"_url": null,
"schema": 3,
"_headers": null,
"verify-ssl": false
},
"archiver": "http"
},
{
"data": {
"op": "put",
"_url": null,
"schema": 2,
"_headers": null
},
"archiver": "http"
}
],
"schedule": {
"start": "2024-04-04T19:45:08Z",
"until": "2024-04-05T19:45:08Z"
},
"reference": {
"psconfig": {
"created-by": {
"uuid": "F6BEB1D2-BE8B-4709-9CE9-8ED57013C3F1",
"user-agent": "psconfig-pscheduler-agent",
"agent-hostname": "ps-dev-staging-el9-tk1.c.esnet-perfsonar.internal"
}
},
"display-task-name": "GCP Packet Loss Tests",
"display-task-group": [
"GCP Tests"
]
},
"href": "https://34.170.22.82/pscheduler/tasks/4c7790bb-3f7c-472e-bb6d-ac9cdd1e22d7"
}
After canceling the task, pSConfig would create a new task. The problem is that the "cancel" operation doesn't kill the powstream. This led to a build up of powstreams (and quite a bit of memory usage), as new ones got created every hour for tasks in this category. I know we have discussed this before but was having trouble finding the issue for it. I think pSConfig is doing the right thing with the info it has, but it'd be nice to kill the underlying background process on cancel.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status