Skip to content

[Feature Request]: (BIG feature) Realtime interactive viewport (image view window)? #4342

@Lex-DRL

Description

@Lex-DRL

Is there an existing issue for this?

  • I have searched the existing issues and checked the recent builds/commits

TL;DR

If Comfy wants to ever get convenient and truly interactive image editing nodes - whether built-in or third-party ones - it needs to implement a standard window to draw image output, which also provides an easy addition of custom controls for node developers.

What would your feature do ?

Currently, there's no way of viewing the output a node provides, other than using special preview nodes.
Thus, there's no out-of-the-box way for a node to provide any immediate visual feedback right in the browser - each node has to implement it on it's own, as it's own separate window. You want to paint something? Open a special window, custom-implemented by the node. You want to build a ControlNet skeleton? Open another special window, also implemented by the node. You want to draw an arbitrary vector shape?.. Well, it's a hard one. You have to tweak numeric parameters, manually re-generating the graph each time you do a tiny change. At least, I don't know of ANY custom node which would let you draw a box by just dragging it ON the input image.

The issue boils down to a core design decision. Since there's no standard window showing the selected node's output, each node has to do it on it's own from scratch, if it needs showing ANYTHING interactively.

And here we end up with a chicken and egg problem: by itself, such a preview window aka viewport (as it's called in other software) has next to no benefit to a user (compared to the current approach), so it seems not worth implementing. But without it, there's a plethora of nodes that will never be developed since it's too much of a work. Basically anything that needs providing an interactive visual feedback becomes a tremendous task - involving a necessary skillset for coding the actual node in python PLUS a skillset to develop JS interfaces in browser. The simple example of such a node is a node to draw vector shapes.

Sooo... maybe ComfyUI has matured enough to consider a more standard approach to displaying image stuff? Not a special preview node, but a special window, which always stays in the same place and changes it's contents depending on what node is active?

Proposed workflow

  1. Implement a window which primary purpose is just showing the main image output of a selected node.
  2. When a node has no image output, just keep referencing the last node which had one.
  3. Maybe, you could even lock a link to a node - to be able to tweak parameters of the entire graph, but view a single output you want?
  4. The viewport needs to support panning and zooming the same way node graph does.
  5. Even though showing a node's output would probably be the only feature in initial MVP release, this viewport needs to be designed from it's inception to allow custom nodes implementing their own layers of interactive controls on top of the image.
    • In the future, I guess there would be some node API allowing nodes to add some custom controls and link them with their properties. Like, "hey viewport window, when you show my output, draw N squares, 5 pixels in size each, in these positions on the image and let a user drag them around. As they do, give me back those new positions - and I'll tell you how the property values need to be changed."
  6. Ideally, for displaying and applying a really complex stuff, there should be a way to let nodes draw their entire overlay with WebGL shaders, as well as those shaders need to have some conventions on how they need to be written in order to interactively receive values from user input. Yes, it would be some out-of-reach rocket science for most node developers, but SOME would be able to do extremely user-friendly and fully interactive interfaces capable of showing ANYTHING, completely on the client side.
    • in the simplest case, such interactive feedback needs to be implemented separately from the actual node's behavior. Somewhat of doing the same job twice, but better than nothing.
    • in best case, there needs to be a standard way of letting nodes do something with an image using shaders on the backend. Then, the same shader could be used in browser preview and for the actual node execution.
    • an obvious benefit of the second approach is that, if done right, using shaders to draw stuff is even faster than processing an image with things like NumPy.
  7. When a node's output and preview are the same, it would be entirely possible for some nodes to get their output fully interactively, just on the client side. Even entire chains of nodes could be processed immediately in the browser - basically, all the simple image operations like cropping, padding, drawing vector primitives, color correction, etc.

Additional information

No response

┆Issue is synchronized with this Notion page by Unito

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions