-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving Graph Visualization Functionality #233
Comments
Hey @FraserP117, thanks for the clear description! I'm on board with the draft of the specification. I'll start jotting down my thoughts and detailing the internal structures over the graph. It'd be great to see my comment expanded and documented in the official documentation down the line. I know @wouterwln is swamped with other tasks right now, but it's definitely something worth considering for the future. Also, feel free to add to my comment if I've overlooked anything.
At a glance, we've got two main types of nodes in our codebase: factor nodes and variable nodes. Note, however, that in our papers/research we usually work with Forney-style factor graphs, which have only one type of the node, namely factor nodes and variables are represented as edges (more on that below). Nevertheless, for the codebase, nodes come with its own set of sub-types and specific properties. Factor nodes
Variable nodes
Moreover, our factor graph structure supports nested modeling capabilities, introducing the notion of "sub-models," akin to sub-graphs. This is encoded in the
Absolutely!
Indeed, I believe there should be several options for displaying a particular model's graph. For instance, we may opt for a "full" display, "plate notation," or a "sub-region," like a region around a specific variable This was just a stream of thoughts. I might have overlooked something. Looking forward to input from @wouterwln and other team members to enrich the discussion. I'll also post this issue in our university Slack channel to invite collaboration on high-level ideas and requests. |
Excellent! thank you for the detailed suggestions @bvdmitri, especially the details on a I intend to assist in the integration of all of these features in due course. I think we would ultimately like to implement the standard as laid out in Part 1 of the Realising Synthetic Active Inference Agents papers? (on account of this being quite literally the graphical specification language). Having said that, I think I'll begin by prioritising the following functionality:
I think this would result in a visual experience akin to that of Figure 5.1 in your thesis - or Figure 2.1 of @MagnusKoudahl 's. Let me know what you think about this prioritisation. Is there any other subset of your above suggestions that you feel would be more appropriate to prioritise instead? Perhaps it's better to go with plate notation instead of ellipsis? Perhaps I should make the Thanks again! |
I think it's a great idea! However, using ellipses to show motif continuation might be tricky. Check out my next comment.
Plate notation would indeed be nice. I believe we should start by focusing on creating a visually appealing TFFG for smaller models. We should also reserve some space in the graph plotting system to accommodate either plate or ellipsis notation. While working on implementing these, it would be helpful to receive feedback from you on what additions are needed in the GraphPPL core to support plate or ellipsis notation. We can consider adding extra metadata during graph creation to simplify this process.
That sounds reasonable, yes!
I'm fully on board with that. I particularly appreciate how Magnus and Thijs illustrated factorization constraints on the graph, as seen in Figures 5, 6, and 7 with those circles inside. |
Hey @FraserP117 , thanks for opening up the discussion on the graph visualisation and concretising the requirements. I'm a big fan of having a visualization like Thijs' and Magnus' notation. However, you should realize that in base
That sounds like a good idea to me, the I very much agree with your taxicab idea, that is usually how factor graphs are drawn anyway. The main challenge I see is designing which node goes where. I've thought about this, and came to a simple heuristic: Most people want their generative models visualized in the same way in which they define them: top to bottom, left to right. I realized that we actually save the order of construction of all nodes (the We're also very much not hung up on As for your prioritization, I think it's fully correct, maybe I would add "visualization of constraints" as well because I think that is really separate from the TFFG visualization in general. If there's anything about GraphPPL's internals you want to know about, feel free to reach out :) |
I just also would like to emphasize that we are open to change some internals, modify our data structures or add extra functionality to the core of GraphPPL if that helps with the visualization. There is no need to constrain ourselves in the current implementation. Of course, we better avoid major refactoring, but small changes or/and additions are fine. |
Wonderful. Thanks @bvdmitri for those further suggestions/clarifications, and thanks to @wouterwln for all that detail, it's greatly appreciated! Yes, I think I'll push the implementation of plate notation above that of ellipses; in the ranking of immediate priorities. In our discussion thus far, I think we've settled on these desired features and functionalities:
I have attempted to loosely indicate a feature's importance by its order in the list - more important items come first. Now this is quite a lot to tackle all at once. Consequently, I think it may be wise to prioritise everything up until section As regards the use of |
@FraserP117 I'll be happy to help out with this so we get it right :) I'm not actively involved with RxInfer as much these days - but if we're doing CFFG's I'm definitely in |
Fantastic! I'd be honoured to have your assistance @MagnusKoudahl. @chbe-helix and I are going to be meeting in the next day or so to begin our approach to all this. I am finishing up my tour of the GraphPPL materials in that time. When we really "get going", I'd love to discuss things and otherwise collaborate. That goes for absolutely anyone else too! I agree, I absolutely do want to "get this right". I'll be erring on the side of annoying everybody with questions as opposed to powering ahead with what I think I understand. I also intend to make it known when I don't understand something (which will be rather frequent). I think the biggest challenge for myself and @chbe-helix just now is mapping out all the moving parts. Not a big challenge, but one must get sufficiently oriented first. |
Hi @FraserP117 , great to hear that you are so enthusiastically undertaking this project! Being able to properly generate a visual FFG representation has been on the lab's wishlist for years. :D I would like to add something to the list. I teach the probabilistic programming sections of our master course where students use our lab's tools to automatically do Bayesian inference based on a specific probabilistic model. We used to use ForneyLab.jl which plotted graphs using Graphviz. For example: Students are often confused by what messages are, where messages are located and where message collisions take place. @ThijsvdLaar implemented a feature that lets you isolate an edge and visualize the messages that belong to that edge: This did a number of things:
It was very helpful for the students to learn message passing, helpful for us for debugging and I think it could be helpful to developers that want to get started with RxInfer as well. RxInfer is built differently (ForneyLab is declarative, RxInfer is reactive), so this feature was not something that could be easily ported. But it should definitely be possible: user specifies a variable, algorithm performs a small graph walk to find connected nodes / 1-step removed variables (i.e. feature 1.iv in your list), and pipes subscriptions to a logger. The relevant variables, nodes and message subscriptions should be plotted in the normal way and there would need to be some filtering to make the entire thing human-readable. But that's it, I think. To be clear: this is not a high-priority item. As mentioned in your last post, the current list is already quite extensive and will take a while. But I just wanted to have mentioned it. I am of course more than happy to work with you on this. |
Thank you @wmkouw. This is a functionality that I think would be fantastic to include. Being able to quickly visualize parts of the graph and to otherwise palpate the individual messages - and when they're passed - would be very helpful. @chbe-helix will be meeting on Wednesday to cement our collaborative approach. We'll be back here to post that and to query any further issues that arise. |
Hi @FraserP117, great to hear that you're willing to have a jab at graph visualization. I think the list of priorities looks good: first make it work, then make it right. As @wmkouw already mentioned, the graph visualization for ForneyLab was very rudimentary. But even then, it was useful for getting visual feedback during model development, which I think is the main use case. It often helped me catch unexpected cycles (mostly in subgraphs relating to variational factors) and to track root causes of For me, a main grievance with the ForneyLab visualization was the layout, which made it difficult to find the node you were looking for. The taxicab grid suggestion by @wouterwln might already help here. We have an (unwritten) convention for drawing FFG's, with temporal thickness horizontally and hierarchical depth vertically. Dynamic variables are usually stored in a vector, so it might make sense to visualize these horizontally, layered vertically in order of declaration. Even without advanced features (nested, plate, etc.), this could already help a user to find the node/message they're looking for. |
@FraserP117 @chbe-helix do have you any updates regarding this issue for the upcoming RxInfer public meeting? :) Such that I can maybe prepare some slides in advance |
Hello @bvdmitri , Yes we do. @chbe-helix and I have bounced some ideas between us, with the help of all the assistance generously provided here. We've come up with an approach that we think will satisfy the implementation requirements, while also minimising technical debt. I have a very basic "proof-of-concept" demo to show at the meeting and I'll try to get it to @bvdmitri ASAP; for inclusion into any slides - as you mention. I have not included it here just yet, although I will endeavour to do so before my night is up. What follows is an update regarding the approach that @chbe-helix and I have taken thus far. Our main preoccupation until very recently has been to settle on an approach that would afford maximum extensibility/customizability, while also minimising technical debt and overall complexity. To that end, we have settled on the following approach - which is of course subject to revision at any time! 1. GraphVizWe think it is best to do-away with GraphPlot.jl. As mentioned by @wouterwln, GraphPlot.jl's customisation options are minimal and we have in mind a rather comprehensive set of visualization capabilities. In place of GraphPlot.jl, we think that GraphViz is the best way to go, for two reasons. First, it is certainly a tried-and-tested package that has very much withstood the test of time. Second, GraphViz offers - perhaps - the greatest possible control/customizability over the visualization process. There are two particularly useful Julia packages which interface with GraphViz to allow the specification of an arbitrary graph with either native DOT language syntax from within Julia: GraphViz.jl, or a Julia-lang interface to GraphViz DOT syntax: GraphvizDotLang.jl. We're interested to hear more from @wmkouw and @ThijsvdLaar on the nature of the GraphViz visualizations for Forneylab.jl - owing to its use of GraphViz. 2. Display ModalitiesWe are very much interested in developing all the above-mentioned display capabilities - in this issue - with GraphViz. We have decided to focus on the implementation of node/edge display and the first layout style - the "Raw" style. We will hold off on the development of the other layout options, the "Display Concatenation Options" and the "Sub-Region Display", until we have satisfactorily developed node/edge-depiction and the "Raw" layout modality with GaphViz. We think this will serve as a useful prototype and "base of operations" for the full suite of visualization options to which we aspire. At the risk of being premature, we hope to develop the following list of loosely-categorised visualization options: 2.1 Depiction of Node/Edge InformationThis includes specification of the node shape for factors/variables, the symbolic label for each node and shading options to denote observability. 2.2 Layout Style
2.3 Display Concatenation OptionsThese options would simplify the visualization of the graph in some specified way, such as using plate notation to encapsulate repeating patterns/motifs, or the choice to "show"/"hide" sub-models to some specified depth. "FFG"/"CFFG" layout options could also be considered as a concatenation option. 2.4 Sub-Region DisplayThis last option would allow for the visual palpation of the induced sub-graph on an arbitrary node/edge in the "raw" 3. Implementation ApproachIn a nutshell, the task as we conceive it breaks down into the following steps:
It seems like all the relevant information to this end is contained in the model's 4. Implementation ConsiderationsAt present, I have a very minimal working example of automatic DOT code generation for the Coin-toss model As you all well know, evaluating arbitrary strings as code with There is also the above-mentioned GraphvizDotLang.jl package, which does not use the doc-string command specification as in GraphViz.jl. GraphvizDotLang.jl allows you to "Create Graphviz graphs straight from Julia [code]". Hence, it might be best to use Julia's meta-programming capabilities to auto-generate GraphvizDotLang.jl code which can then be executed to yield the final visualization. I'm certainly open to suggestions! 5. TakeawayOur primary aim right now is to implement a satisfactory means of visualizing the raw Many thanks to all who have taken the time to engage thus far. I look forward to working on this with any/all of you. |
Very much appreciated for the extensive overview @FraserP117 |
Something that I found out just today: |
I'm curious as to how important it is to have Latex abilities for rendition of mathematical content on the CFFGs. From my (limited) research, it appears that GraphViz may not support this directly (although I did come across a Latex package that could help with this). A strong tool in this regard is TikZ, albeit not as abstracted as GraphViz. However, it was quite easy to come up with this simple FFG that has Latex content. |
Wow thanks @bvdmitri for linking that Model Explorer stuff, I'll definitely be having a closer look at this. Yes @kobus78 I wonder about the utility of latex for this purpose too. It certainly would be nice, especially for CFFGs. There is always TikzGraphs.jl, this is an option that I have not ruled out as yet. What was the specific latex package that you reference above? |
It was dot2tex. I had to download it from CTAN but could not get it integrated so far. |
I reviewed the plan again and mostly agree with everything. The "raw" style is, of course, easier to start with and experiment with, but we should remember that it won't be the default option. It's perfectly fine to start with this style to finalize the appearance of the nodes and edges before transitioning to a more complex layout. Some extra comments: I noticed that By the way, what are the scaling capabilities of GraphViz? For example, does it support millions of nodes? I would expect that it does, given its reputation and large community, but it's worth confirming. Additionally, can we integrate GraphViz with interactive UIs, such as in a browser for zooming in and out, or in VSCode? Or perhaps with the Model Explorer mentioned earlier? Maybe we can generate GraphViz code that can be displayed by a different UI provider? That would be fantastic.
I believe this is unnecessary since Also, be aware that generating large strings in Julia can be extremely slow, which was a major bottleneck in ForneyLab. Instead of naively appending new lines, such as: output = output * new_line consider using IOBuffer. This approach allows for more efficient generation of large strings in Julia. I'm pointing out that the naive approach of string concatenation does not scale well because Julia's strings are immutable. To add a new line or even a single character, the entire string must be copied and duplicated with the new information. Overall, great progress, gentlemen! Let's aim to create the best graph visualizations in the entire Bayesian inference community, not just within Active Inference ;) |
I took a look at |
Thanks again for all the suggestions and engagement, it really helps. I'm glad to have @wouterwln's blessing on the selection of GraphViz as the backend. Indeed, it seems that we were all resolved to this in the last meeting. Of course, things are always open to change and I intend to see what I can do with dot2tex in addition to the functionality I outlined above. Yes @bvdmitri I completely agree with the necessity to switch over from string concatenation to something like IOBuffer(). Thank you for the comments. I have subsequently made that change to the working version. I'm attacking the issue of node depiction/naming just now and I have a question. I currently name each node with the exact label given to it by GraphPPL. This is not the ideal way to label each node. As an example, this is our current working version for the "raw" visualization of the simple Beta-Bernoulli coin toss model: I can also list the node labels depicted in the above GraphViz render of the underlying model: My understanding is that GraphPPL creates the node label by adding the label's name/description as a prefix to a global integer counter of all nodes. @bvdmitri has already informed us that these global counter values are superfluous to the visualization efforts, and so I'm looking to remove them entirely. The only thing I find strange is that each constant variable node seems to be counted twice by the global counter. You can see that "constvar_2_3" and "constvar_4_5" appear to be the result of appending two iterations of the global counter to each respective constant in the Beta distribution. Indeed, if I inspect the GraphPPL.Model's counter field, it appears to show that there are 12 nodes in the graph, when clearly there are only 10: The Graphs.nv function returns the number of nodes in the MetaGraphsNext.MetaGraph. Using this function via MetaGraphsNext, we see that there are 10 nodes reported as belonging to the MetaGraph - coinciding with the above visual inspection: This leads me to suspect that GraphPPL may be double counting "constvar" variables, due to it naming them twice with the global counter? You can see the same issue with the Bayesian Linear Regression Model: My guess is that there may be a superfluous use of Many thanks, talk soon! |
Thanks for the update @FraserP117!
I think you can use the
Double naming for the constants does indeed look a bit weird. We should double check what is happening there. |
Thanks @bvdmitri, I'll look at the |
The double naming is fixed in #238 ;) |
Excellent! just saw that! thanks @wouterwln . |
Hey folks! I just wanted to check on the progress with this one, as the NumFocus deadline is coming up soon. Do you have a PR date in mind? :D |
Hi @albertpod ! Fraser and I have collectively had some external obligations crop up that have made progress slow a bit but we are now able to resume work on this issue. We are planning on meeting to discuss timelines in the next day and can get back to you with a more definitive time estimate soon. Could you remind me when the NumFocus deadline is and what the requirements are for that deadline so we may make sure we can meet it? |
@chbe-helix no worries at all! The deadline is about mid September. |
Wonderful! Thanks for the info!
…________________________________
From: Albert ***@***.***>
Sent: Tuesday, July 9, 2024 7:15:24 AM
To: ReactiveBayes/GraphPPL.jl ***@***.***>
Cc: Chris Bennett ***@***.***>; Mention ***@***.***>
Subject: Re: [ReactiveBayes/GraphPPL.jl] Improving Graph Visualization Functionality (Issue #233)
@chbe-helix<https://github.com/chbe-helix> no worries at all! The deadline is about mid September.
I very much appreciate your effort, it's hard stuff!
—
Reply to this email directly, view it on GitHub<#233 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AG2ZUC3UH6BU4VOF6NLNJXDZLPO6ZAVCNFSM6AAAAABHE2NB6OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJXG4YTKOJXGY>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Greetings @albertpod and all. Yes, @chbe-helix and I do have some updates. Embarrassed though I am to admit it, I was unaware that there was a NumFOCUS deadline. Very good to know that there is one! In response to this, we have created an internal ActInf-Institute page to host all information relevant to our development efforts. It can be found at this link here and should be available for public access. Here we outline the proposed functionality and provide a schema for the conceptualization of the finished product, in addition to an implementation philosophy and a development roadmap/task tracker. At present we consider only those tasks relevant to "phase 1", which we define as all that which is to be completed upon the passing of the NumFOCUS September deadline. As far as @chbe-helix and myself understand, this NumFOCUS deadline serves only to prove that the BIASlab has received a pull request from some external party, and that it is NOT intended to serve as the deadline for any final/polished open source contribution. Please do tell us if we have misunderstood anything on this. As I mentioned, we've created a development timeline which culminates in our first pull request. Naturally, this is intended to provide only the most minimal functionality. We will not be able to afford the majority of our intended contributions by this date. See the above page for details on exactly what we intend to provide in our initial PR. We want there to be plenty of time between our first PR and the actual deadline, such that it may be properly appraised by the core BIASlab team, tested for bugs and so on. Our current plan is to make our initial PR in mid-to-late August - see the upper and lower bounds on the development timeline. All of this is - of course - subject to change upon your preferences and recommendations. It would be helpful to know exactly when the deadline is, such that I may amend the development plan to the effect of greater precision. We eagerly await your comments and recommendations regarding all this and we hope that our efforts thus far appear promising to yourselves. As you no doubt know better than I, the beginning of any project is always the most important phase. We've taken great pains at this stage to linger upon more conceptual issues, in the hope that the project's eventual fruition is proportionately freed from technical debt and sloppy "code-first and ask questions later" hastiness. Many thanks from @chbe-helix and myself. |
Hi @FraserP117! Sorry for the late reply (we were busy with JuliaCon). Thanks for your detailed update. It’s great to see the progress you and @chbe-helix have made. To clarify, there is no explicit deadline from NumFOCUS for having one big or small PR by mid-September. The main goal is to find external contributors who are eager to participate in @ReactiveBayes development alongside BIASlab and Lazy Dynamics. A PR is a practical way to show this to NumFOCUS, hence why I mentioned it. Regarding the deadline, you are correct that it is not intended as a deadline for a final/polished open-source contribution. The aim is to have an external contribution to demonstrate community interest and engagement. Thanks for providing the link to your internal ActInf-Institute page. Your plan to make the initial PR in mid-to-late August sounds good. This should indeed provide enough time for the core team to appraise it, test for bugs, and provide feedback. Again, many thanks for your work. |
Excellent, I hope that some of your JuliaCon presentations make their way to YouTube! (and that it all went well). That's all great to here, thanks @albertpod. I've been detained by significant personal commitments, in addition to a terrible flu in the last week. I mention this only to "prove" that I have not been inactive due to a lack of interest! I am extremely interested to fulfil this issue in a prompt and satisfactory manner. Indeed, I'm overjoyed to participate with the BIASlab in whatever capacity. More updates in the coming days. As always, many thanks! |
@wmkouw and @bvdmitri I submitted PR-251 (#251) with the code that @FraserP117 and I have been working on. We need feedback on the code and what version number we should set. While we are getting your feedback we have one more test to run to ensure compatibility with v4.3.3. Let us know what you think! |
Wonderful! I will have a look after IWAI (Friday Sept 13). |
The present model/graph visualization capability is highly limited. As per the example in the docs, the existing implementation uses a simple
GraphPlot.gplot
function to output a basic spring-layout of the model graph.@chbe-helix and I; along with those in the RxInfer.jl working group at the Active Inference Institute aim to clarify the nature of the proposed improvements to the graph visualization procedure. We anticipate a small discussion here with @bvdmitri and any other BIASlab members, in regard to the clarification of the most valuable improvement/s to this functionality.
At present, we feel that the functionality should:
Any further specifications/requirements are more than welcome!
The text was updated successfully, but these errors were encountered: