-
Notifications
You must be signed in to change notification settings - Fork 16
Description
One useful feature would be to resubmit something to the queue with new resources. Sometimes on SLURM clusters, we realize after submitting that some partition are full but some partitions are available. Currently we can do something like this to update a job resources:
jf job set resources -did 541 '{"cpus_per_task": 56, "partition": "new_partition"}'
However, this will not go through if the job is already submitted to slurm and just awaiting for SLURM to schedule it. And we might not be able to edit the slurm partition if the requested resources differ between partition. Instead jobflow would return an error similar to:
[11:15:29] ERROR Error while setting for job 541
ValueError: Job in state UPLOADED. The action cannot be performed
or:
[11:18:55] ERROR Error while setting for job 541
ValueError: Job in state CHECKED_OUT. The action cannot be performed
It would be nice to have a feature which re-submits a SLURM job with new resource allocation if its already waiting to be scheduled/submitted. However this might involve cancelling the appropriate SLURM job in conjunction.