Cancel does not work #5

picobyte · 2023-10-04T21:35:01Z

I have a Tesla M10, 4 gpus, passively cooled, overheating easily. At 95C the gpu becomes unusable until reboot. Controlling which gpu is activated and which one cools down via your extension, has several problems:

Cancel queue does not work. If you cancel a txt2img, the process keeps running.
url from string does not work if you link it via a node. I was using an increment and GWAS extension to either have gpus on port 29170 and 29172 active or those on 29171 and 29173, but using a string as input gives a Nonetype error at ComfyUI/custom_nodes/ComfyUI_NetDist/nodes/remote_control.py:165
Sorry lost the exact error message, but it was about new_prompt[i] being None.

The text was updated successfully, but these errors were encountered:

city96 · 2023-10-05T20:03:24Z

Huh, interesting idea to round-robin the GPUs. Might even work on my cards. They don't reach shutdown temp but they do thermal throttle (P40s).

Anyway, I don't know if custom nodes get notified when a workflow gets cancelled, but I'll try to figure something out. I can only realistically mess with my multi-GPU setup on the weekend so I'll try to get back to you on this.

(There's a "rewrite" branch but I'm not sure that fixes either of your issues.)

picobyte · 2023-10-06T17:22:24Z

For the temperature issue I'll try a workaround using temperature protection. If you're interested in the attempted workflow to switch gpus: multi_gpu_test.json (currently not working). Possibly this can be done better. I am just starting with ComfyUI.

However I also wonder what the benefit is of one workflow to control the gpus versus running the ComfyUI multiple times. I think it would be better if dedicated tasks are dispatched to distinct GPUs, like one GPU for adding noise, another for UNET, one for reconstruction and maybe one for preview image generation[1], or something like that. Alternatively: subsequent cycles run on distinct GPUs. I mean this just as my (naive) concept of the ideal distribution if work. Or maybe averaging(?) of parallel run cycles or so.
[1] https://huggingface.co/blog/stable_diffusion

city96 · 2023-10-07T19:17:40Z

Okay, so I tried making a round-robin node to switch the URLs but // is interpreted as a comment... I'll get back to this once I find out where the logic for it is in comfyui.

As for the cancel, I added some simple logic to clear it before starting a new job. Now, this isn't optimal since the job keeps running even after you cancel it. I guess I could break it out into a separate "cancel all jobs" node but it'd be much cleaner if there was a way for custom nodes to be notified when a workflow is canceled/interrupted. I already asked comfy so I guess we'll just have to wait for now.

(Sorry, the readme is still a mess, I'll try to clean it up and then I'll merge the rewrite branch into the main one if everything works.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cancel does not work #5

Cancel does not work #5

picobyte commented Oct 4, 2023

city96 commented Oct 5, 2023

picobyte commented Oct 6, 2023 •

edited

Loading

city96 commented Oct 7, 2023

Cancel does not work #5

Cancel does not work #5

Comments

picobyte commented Oct 4, 2023

city96 commented Oct 5, 2023

picobyte commented Oct 6, 2023 • edited Loading

city96 commented Oct 7, 2023

picobyte commented Oct 6, 2023 •

edited

Loading