Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving Graph Visualization Functionality #233

Open
FraserP117 opened this issue May 3, 2024 · 35 comments
Open

Improving Graph Visualization Functionality #233

FraserP117 opened this issue May 3, 2024 · 35 comments

Comments

@FraserP117
Copy link

The present model/graph visualization capability is highly limited. As per the example in the docs, the existing implementation uses a simple GraphPlot.gplot function to output a basic spring-layout of the model graph.

@chbe-helix and I; along with those in the RxInfer.jl working group at the Active Inference Institute aim to clarify the nature of the proposed improvements to the graph visualization procedure. We anticipate a small discussion here with @bvdmitri and any other BIASlab members, in regard to the clarification of the most valuable improvement/s to this functionality.

At present, we feel that the functionality should:

  1. Depict node types.
  2. Layout the graph along a "taxicab" grid, in place of a spring layout. This is simply how most TFFG graphs are depicted abstractly and so is a natural choice for the layout.
  3. Adequately depict very large models, perhaps by means of plate notation or ellipses to denote the continuation of a pattern/motif.

Any further specifications/requirements are more than welcome!

@bvdmitri
Copy link
Member

bvdmitri commented May 3, 2024

Hey @FraserP117, thanks for the clear description!

I'm on board with the draft of the specification. I'll start jotting down my thoughts and detailing the internal structures over the graph. It'd be great to see my comment expanded and documented in the official documentation down the line. I know @wouterwln is swamped with other tasks right now, but it's definitely something worth considering for the future. Also, feel free to add to my comment if I've overlooked anything.

Depict node types

At a glance, we've got two main types of nodes in our codebase: factor nodes and variable nodes. Note, however, that in our papers/research we usually work with Forney-style factor graphs, which have only one type of the node, namely factor nodes and variables are represented as edges (more on that below). Nevertheless, for the codebase, nodes come with its own set of sub-types and specific properties.

Factor nodes

  • Factor nodes entail:
    • Neighbors, although this info can be derived from the graph structure
    • Their "type": Deterministic or Stochastic
    • A functional form attached to them, like Gaussian or Bernoulli
    • Factorization constraints over joint distribution, for instance, if a node f encompasses 3 variables x, y, z, we could encode various factorization constraints for the variational distribution q(x, y, z), such as q(x, y)q(z) or q(x)q(y, z) or q(x)q(y)q(z)
    • Any unstructured meta information, like the approximation method used to compute messages

Variable nodes

  • Variable nodes entail
    • A name, obviously
    • They can be standalone or part of a collection, with a specific index, e.g. i = 1. Collections aren't necessarily vectors; they could be matrices or any dimension, in which case the index might look like i = 1, 1, 1, and so forth. This could be useful for implementing plate notation.
    • Their "kind", like random, data, or fixed (i.e., constants)
    • They might be anonymous or not. Anonymous variables are created under the hood for expressions like y ~ Normal(f(x), 1.0), which would create an anonymous variable anonymous anonymous ~ f(x) and then call y ~ Normal(anonymous, 1.0)
    • Functional form constraints, such as desiring the posterior distribution q(s) over a single variable s to follow a Gaussian distribution or Bernoulli. A functional form constraint can take any form (e.g. any Julia object).
    • A variable can be linked to other variables. For example, if f is a deterministic factor node, then y := f(x) establishes a deterministic "link" between x and y. Understanding these links is crucial for specifying constraints over seemingly different variables, as constraints on one might affect another.
    • Any unstructured meta information, such as the approximation method used to compute the posterior

Moreover, our factor graph structure supports nested modeling capabilities, introducing the notion of "sub-models," akin to sub-graphs. This is encoded in the GraphPPL.Context structure. I think the plotting capabilities should operate on Contexts and sub-graphs seriously. The current plotting just grabs the entire graph without separating the nodes in their contexts.

Layout the graph along a "taxicab" grid, in place of a spring layout. This is simply how most TFFG graphs are depicted abstractly and so is a natural choice for the layout.

Absolutely!

Adequately depict very large models, perhaps by means of plate notation or ellipses to denote the continuation of a pattern/motif.

Indeed, I believe there should be several options for displaying a particular model's graph. For instance, we may opt for a "full" display, "plate notation," or a "sub-region," like a region around a specific variable s (which could, in turn, be displayed "fully" or in "plate notation"). We could also expand or hide submodels, a.k.a sub-graphs.
Also as I've mentioned, in BIASlab we prefer to work with Forney-style Factor graphs. They do not have "variable nodes", but instead use edges to represent variables. For variables that are connected to more than 2 nodes, Forney-style Factor Graphs introduce equality nodes. This might be also an option (maybe even the default?) for the plotting.

This was just a stream of thoughts. I might have overlooked something. Looking forward to input from @wouterwln and other team members to enrich the discussion. I'll also post this issue in our university Slack channel to invite collaboration on high-level ideas and requests.

@FraserP117
Copy link
Author

Excellent! thank you for the detailed suggestions @bvdmitri, especially the details on a GraphPPL.Model's internal structure.

I intend to assist in the integration of all of these features in due course. I think we would ultimately like to implement the standard as laid out in Part 1 of the Realising Synthetic Active Inference Agents papers? (on account of this being quite literally the graphical specification language).

Having said that, I think I'll begin by prioritising the following functionality:

  1. The depiction of node types.
  2. Defaulting to a TFFG representation (and hence using equality constraint nodes).
  3. The use of ellipses to denote motif continuation in large models.

I think this would result in a visual experience akin to that of Figure 5.1 in your thesis - or Figure 2.1 of @MagnusKoudahl 's. Let me know what you think about this prioritisation. Is there any other subset of your above suggestions that you feel would be more appropriate to prioritise instead? Perhaps it's better to go with plate notation instead of ellipsis? Perhaps I should make the GraphPPL.Context visualisation paramount?

Thanks again!

@bvdmitri
Copy link
Member

bvdmitri commented May 6, 2024

Let me know what you think about this prioritisation.

I think it's a great idea! However, using ellipses to show motif continuation might be tricky. Check out my next comment.

Perhaps it's better to go with plate notation instead of ellipsis?

Plate notation would indeed be nice. I believe we should start by focusing on creating a visually appealing TFFG for smaller models. We should also reserve some space in the graph plotting system to accommodate either plate or ellipsis notation. While working on implementing these, it would be helpful to receive feedback from you on what additions are needed in the GraphPPL core to support plate or ellipsis notation. We can consider adding extra metadata during graph creation to simplify this process.

Perhaps I should make the GraphPPL.Context visualisation paramount?

That sounds reasonable, yes!

I intend to assist in the integration of all of these features in due course. I think we would ultimately like to implement the standard as laid out in Part 1 of the Realising Synthetic Active Inference Agents papers? (on account of this being quite literally the graphical specification language).

I'm fully on board with that. I particularly appreciate how Magnus and Thijs illustrated factorization constraints on the graph, as seen in Figures 5, 6, and 7 with those circles inside.

@wouterwln
Copy link
Member

Hey @FraserP117 , thanks for opening up the discussion on the graph visualisation and concretising the requirements. I'm a big fan of having a visualization like Thijs' and Magnus' notation. However, you should realize that in base GraphPPL not every model has a set of inference constraints (we export it as an optional plugin that is always activated in RxInfer), so I would keep into consideration that the method should also work when there are no inference constraints in the node.

Perhaps I should make the GraphPPL.Context visualisation paramount?

That sounds like a good idea to me, the Context contains all node labels (factors and variables) that exist on a specific level in the hierarchy, it would be nice to visualize submodels.

I very much agree with your taxicab idea, that is usually how factor graphs are drawn anyway. The main challenge I see is designing which node goes where. I've thought about this, and came to a simple heuristic: Most people want their generative models visualized in the same way in which they define them: top to bottom, left to right. I realized that we actually save the order of construction of all nodes (the GraphPPL.NodeLabel serves as a universal identifier in the graph for all nodes and contains an id integer that is the counter in the global graph), so you have an idea of which nodes were created before others. This is only part of the story, however, since a variable node can be created first and then passed into a submodel as an interface, where we wouldn't necessarily want it drawn top left of this submodels visualization. However, this is maybe a nice first start to design an algorithm that gives a somewhat nice visualization.

We're also very much not hung up on GraphPlot, I just picked it because it seemed convenient to work with, but I don't see a lot of customization options. You can look at https://juliagraphs.org/Graphs.jl/stable/first_steps/plotting/ to get an idea of other plotting libraries, although I also don't know if they give you the customization options necessary.

As for your prioritization, I think it's fully correct, maybe I would add "visualization of constraints" as well because I think that is really separate from the TFFG visualization in general. If there's anything about GraphPPL's internals you want to know about, feel free to reach out :)

@bvdmitri
Copy link
Member

bvdmitri commented May 6, 2024

I just also would like to emphasize that we are open to change some internals, modify our data structures or add extra functionality to the core of GraphPPL if that helps with the visualization. There is no need to constrain ourselves in the current implementation. Of course, we better avoid major refactoring, but small changes or/and additions are fine.

@FraserP117
Copy link
Author

FraserP117 commented May 7, 2024

Wonderful. Thanks @bvdmitri for those further suggestions/clarifications, and thanks to @wouterwln for all that detail, it's greatly appreciated!

Yes, I think I'll push the implementation of plate notation above that of ellipses; in the ranking of immediate priorities. In our discussion thus far, I think we've settled on these desired features and functionalities:

  1. Visualization should work regardless of the specification/absence of any inference constraints.
    1. We want a visually appealing TFFG model depiction: "full display".
      1. Aim to implement the visualization standard/s laid out in Realising Synthetic Active Inference Agents: Part 1. Most particularly, we should:
        1. Depict node types.
        2. Depict the graph with a "Taxicab" layout, using @wouterwln's heuristic for now. There will be issues when it comes to depicting submodels. We'll have to address that later perhaps, by playing around with GraphPPL.Context?
        3. Depict the form and factorisation constraints.
    2. Additionally depict the information contained in the GraphPPL.Context struct.
    3. Add the option to specify plate notation, this will be especially useful for depicting larger models.
    4. Afford "sub-region" display, like a region around a specific variable s (which could, in turn, be displayed "fully" or in "plate notation").
    5. Afford the ability to expand or hide any existing sub-models.

I have attempted to loosely indicate a feature's importance by its order in the list - more important items come first. Now this is quite a lot to tackle all at once. Consequently, I think it may be wise to prioritise everything up until section 1.i.a.c, in addition to sections 1.ii and 1.iii and in this order of importance.

As regards the use of GraphPlots, it's good to know that you're not "hung up on it". I don't mean to say that it shouldn't be used, but yes: I have also noted a lack of customisation options for GraphPlots - at least to the extent that we desire here.

@MagnusKoudahl
Copy link

@FraserP117 I'll be happy to help out with this so we get it right :) I'm not actively involved with RxInfer as much these days - but if we're doing CFFG's I'm definitely in

@FraserP117
Copy link
Author

FraserP117 commented May 7, 2024

Fantastic! I'd be honoured to have your assistance @MagnusKoudahl. @chbe-helix and I are going to be meeting in the next day or so to begin our approach to all this. I am finishing up my tour of the GraphPPL materials in that time. When we really "get going", I'd love to discuss things and otherwise collaborate. That goes for absolutely anyone else too!

I agree, I absolutely do want to "get this right". I'll be erring on the side of annoying everybody with questions as opposed to powering ahead with what I think I understand. I also intend to make it known when I don't understand something (which will be rather frequent).

I think the biggest challenge for myself and @chbe-helix just now is mapping out all the moving parts. Not a big challenge, but one must get sufficiently oriented first.

@wmkouw
Copy link
Member

wmkouw commented May 10, 2024

Hi @FraserP117 , great to hear that you are so enthusiastically undertaking this project! Being able to properly generate a visual FFG representation has been on the lab's wishlist for years. :D

I would like to add something to the list. I teach the probabilistic programming sections of our master course where students use our lab's tools to automatically do Bayesian inference based on a specific probabilistic model. We used to use ForneyLab.jl which plotted graphs using Graphviz. For example:

Screenshot 2024-05-10 102114

Students are often confused by what messages are, where messages are located and where message collisions take place. @ThijsvdLaar implemented a feature that lets you isolate an edge and visualize the messages that belong to that edge:

Screenshot 2024-05-10 102053

This did a number of things:

  1. it shows students which messages are relevant to a particular variable,
  2. it shows what other variables a particular message depends on (via the other edges of the connected node),
  3. it provides an identifier for a message that lets them plot its functional form.

It was very helpful for the students to learn message passing, helpful for us for debugging and I think it could be helpful to developers that want to get started with RxInfer as well.

RxInfer is built differently (ForneyLab is declarative, RxInfer is reactive), so this feature was not something that could be easily ported. But it should definitely be possible: user specifies a variable, algorithm performs a small graph walk to find connected nodes / 1-step removed variables (i.e. feature 1.iv in your list), and pipes subscriptions to a logger. The relevant variables, nodes and message subscriptions should be plotted in the normal way and there would need to be some filtering to make the entire thing human-readable. But that's it, I think.

To be clear: this is not a high-priority item. As mentioned in your last post, the current list is already quite extensive and will take a while. But I just wanted to have mentioned it.

I am of course more than happy to work with you on this.

@FraserP117
Copy link
Author

FraserP117 commented May 13, 2024

Thank you @wmkouw. This is a functionality that I think would be fantastic to include. Being able to quickly visualize parts of the graph and to otherwise palpate the individual messages - and when they're passed - would be very helpful.

@chbe-helix will be meeting on Wednesday to cement our collaborative approach. We'll be back here to post that and to query any further issues that arise.

@ThijsvdLaar
Copy link

Hi @FraserP117, great to hear that you're willing to have a jab at graph visualization. I think the list of priorities looks good: first make it work, then make it right.

As @wmkouw already mentioned, the graph visualization for ForneyLab was very rudimentary. But even then, it was useful for getting visual feedback during model development, which I think is the main use case. It often helped me catch unexpected cycles (mostly in subgraphs relating to variational factors) and to track root causes of NaN values in the messages.

For me, a main grievance with the ForneyLab visualization was the layout, which made it difficult to find the node you were looking for. The taxicab grid suggestion by @wouterwln might already help here. We have an (unwritten) convention for drawing FFG's, with temporal thickness horizontally and hierarchical depth vertically. Dynamic variables are usually stored in a vector, so it might make sense to visualize these horizontally, layered vertically in order of declaration. Even without advanced features (nested, plate, etc.), this could already help a user to find the node/message they're looking for.

@bvdmitri
Copy link
Member

@FraserP117 @chbe-helix do have you any updates regarding this issue for the upcoming RxInfer public meeting? :) Such that I can maybe prepare some slides in advance

@FraserP117
Copy link
Author

FraserP117 commented May 27, 2024

Hello @bvdmitri ,

Yes we do. @chbe-helix and I have bounced some ideas between us, with the help of all the assistance generously provided here. We've come up with an approach that we think will satisfy the implementation requirements, while also minimising technical debt. I have a very basic "proof-of-concept" demo to show at the meeting and I'll try to get it to @bvdmitri ASAP; for inclusion into any slides - as you mention. I have not included it here just yet, although I will endeavour to do so before my night is up. What follows is an update regarding the approach that @chbe-helix and I have taken thus far.

Our main preoccupation until very recently has been to settle on an approach that would afford maximum extensibility/customizability, while also minimising technical debt and overall complexity. To that end, we have settled on the following approach - which is of course subject to revision at any time!

1. GraphViz

We think it is best to do-away with GraphPlot.jl. As mentioned by @wouterwln, GraphPlot.jl's customisation options are minimal and we have in mind a rather comprehensive set of visualization capabilities.

In place of GraphPlot.jl, we think that GraphViz is the best way to go, for two reasons. First, it is certainly a tried-and-tested package that has very much withstood the test of time. Second, GraphViz offers - perhaps - the greatest possible control/customizability over the visualization process. There are two particularly useful Julia packages which interface with GraphViz to allow the specification of an arbitrary graph with either native DOT language syntax from within Julia: GraphViz.jl, or a Julia-lang interface to GraphViz DOT syntax: GraphvizDotLang.jl. We're interested to hear more from @wmkouw and @ThijsvdLaar on the nature of the GraphViz visualizations for Forneylab.jl - owing to its use of GraphViz.

2. Display Modalities

We are very much interested in developing all the above-mentioned display capabilities - in this issue - with GraphViz. We have decided to focus on the implementation of node/edge display and the first layout style - the "Raw" style. We will hold off on the development of the other layout options, the "Display Concatenation Options" and the "Sub-Region Display", until we have satisfactorily developed node/edge-depiction and the "Raw" layout modality with GaphViz. We think this will serve as a useful prototype and "base of operations" for the full suite of visualization options to which we aspire. At the risk of being premature, we hope to develop the following list of loosely-categorised visualization options:

2.1 Depiction of Node/Edge Information

This includes specification of the node shape for factors/variables, the symbolic label for each node and shading options to denote observability.

2.2 Layout Style

  1. "Raw": Non-Forney style factor graph display. This would simply represent the exact GraphPPL.Model, as it exists in memory - very much like the existing implementation. Although the "mere factor graph" representation is not particularly common, we feel that it is still useful for developers to see "exactly" what the underlying GraphPPL.Model looks like. It is also the easiest way to visualize the graph and will serve as a nice "proof-of-concept" for the development of the other layouts.
  2. "FFG": This would display the GraphPPL.Model as a proper Forney-style factor graph, with equality-constraint nodes and all.
  3. "CFFG": Same as "FFG", only now the constraints are also visualized - should there be any. This would implement the standards as specified in Realising Synthetic Active Inference Agents: Part 1.

2.3 Display Concatenation Options

These options would simplify the visualization of the graph in some specified way, such as using plate notation to encapsulate repeating patterns/motifs, or the choice to "show"/"hide" sub-models to some specified depth. "FFG"/"CFFG" layout options could also be considered as a concatenation option.

2.4 Sub-Region Display

This last option would allow for the visual palpation of the induced sub-graph on an arbitrary node/edge in the "raw" GraphPPL.Model or the "FFG"/"CFFG" representation.

3. Implementation Approach

In a nutshell, the task as we conceive it breaks down into the following steps:

  1. A method of parsing the desired/relevant information contained in a particular GraphPPL.Model.
  2. The subsequent (automatic) construction of the GraphViz DOT code to visualize that information as a GraphViz graph.
  3. Options to save/display the visualization specified by the generated GraphViz DOT code.

It seems like all the relevant information to this end is contained in the model's Context structure. I'm very interested in checking my understanding on that point.

4. Implementation Considerations

At present, I have a very minimal working example of automatic DOT code generation for the Coin-toss model
. My current approach generates a string-representation of the GraphViz DOT code from a GraphPPL.Model and then uses eval() and Meta.parse() to execute the string as a command in the GraphViz.jl package.

As you all well know, evaluating arbitrary strings as code with eval(Meta.parse(my_string)) is a very "spooky" thing to be doing, so I am not married to my current approach.

There is also the above-mentioned GraphvizDotLang.jl package, which does not use the doc-string command specification as in GraphViz.jl. GraphvizDotLang.jl allows you to "Create Graphviz graphs straight from Julia [code]". Hence, it might be best to use Julia's meta-programming capabilities to auto-generate GraphvizDotLang.jl code which can then be executed to yield the final visualization. I'm certainly open to suggestions!

5. Takeaway

Our primary aim right now is to implement a satisfactory means of visualizing the raw GraphPPL.Model as a GraphViz plot. We think this means that we need to automatically generate the relevant GraphViz DOT code from the information encoded in model's Context structure. Please do tell us your thoughts on this!

Many thanks to all who have taken the time to engage thus far. I look forward to working on this with any/all of you.

@bvdmitri
Copy link
Member

Very much appreciated for the extensive overview @FraserP117

@bvdmitri
Copy link
Member

@kobus78
Copy link

kobus78 commented May 27, 2024

I'm curious as to how important it is to have Latex abilities for rendition of mathematical content on the CFFGs. From my (limited) research, it appears that GraphViz may not support this directly (although I did come across a Latex package that could help with this). A strong tool in this regard is TikZ, albeit not as abstracted as GraphViz. However, it was quite easy to come up with this simple FFG that has Latex content.

Screen Shot 2024-05-27 at 11 53 57 AM

@FraserP117
Copy link
Author

Wow thanks @bvdmitri for linking that Model Explorer stuff, I'll definitely be having a closer look at this.

Yes @kobus78 I wonder about the utility of latex for this purpose too. It certainly would be nice, especially for CFFGs. There is always TikzGraphs.jl, this is an option that I have not ruled out as yet. What was the specific latex package that you reference above?

@kobus78
Copy link

kobus78 commented May 28, 2024

Wow thanks @bvdmitri for linking that Model Explorer stuff, I'll definitely be having a closer look at this.

Yes @kobus78 I wonder about the utility of latex for this purpose too. It certainly would be nice, especially for CFFGs. There is always TikzGraphs.jl, this is an option that I have not ruled out as yet. What was the specific latex package that you reference above?

It was dot2tex. I had to download it from CTAN but could not get it integrated so far.

@bvdmitri
Copy link
Member

I reviewed the plan again and mostly agree with everything. The "raw" style is, of course, easier to start with and experiment with, but we should remember that it won't be the default option. It's perfectly fine to start with this style to finalize the appearance of the nodes and edges before transitioning to a more complex layout.

Some extra comments:

I noticed that GraphViz hasn't been updated since 2021, which makes me think the package may not be actively maintained. In contrast, GraphPlot had a new release just two weeks ago. Perhaps GraphViz.jl (or GraphvizDotLang.jl) doesn't need frequent updates if it's just a wrapper around the actual GraphViz. What are your thoughts on this, @wouterwln?

By the way, what are the scaling capabilities of GraphViz? For example, does it support millions of nodes? I would expect that it does, given its reputation and large community, but it's worth confirming. Additionally, can we integrate GraphViz with interactive UIs, such as in a browser for zooming in and out, or in VSCode? Or perhaps with the Model Explorer mentioned earlier? Maybe we can generate GraphViz code that can be displayed by a different UI provider? That would be fantastic.

My current approach generates a string-representation of the GraphViz DOT code from a GraphPPL.Model and then uses eval() and Meta.parse() to execute the string as a command in the GraphViz.jl package.

I believe this is unnecessary since GraphViz.jl supports loading DOT code directly from a string, as mentioned in their README.

Also, be aware that generating large strings in Julia can be extremely slow, which was a major bottleneck in ForneyLab. Instead of naively appending new lines, such as:

output = output * new_line

consider using IOBuffer. This approach allows for more efficient generation of large strings in Julia. I'm pointing out that the naive approach of string concatenation does not scale well because Julia's strings are immutable. To add a new line or even a single character, the entire string must be copied and duplicated with the new information.

Overall, great progress, gentlemen! Let's aim to create the best graph visualizations in the entire Bayesian inference community, not just within Active Inference ;)

@wouterwln
Copy link
Member

I took a look at GraphViz.jl and GraphvizDotLang.jl, and I think going through GraphViz is indeed the way to go. GraphViz itself (https://gitlab.com/graphviz/graphviz) looks like its actively maintained, and it spawned this DOT (https://en.wikipedia.org/wiki/DOT_(graph_description_language)) language, which I think we should also have as "compilation target" so to say. There appears to be quite a nice ecosystem around this DOT representation (https://dreampuf.github.io/GraphvizOnline/ for example), and for example the dot2tex package that @kobus78 mentioned. GraphViz.jl might be a good starting point to demo some initial plots and see what it looks like. I'm curious how much customizability we can get into the placement of nodes and edges, but it looks like (https://graphviz.org/docs/layouts/dot/) there is some hints and constraints you can give to this engine. I'm more optimistic about this approach than about GraphPlot.jl (also because this DOT syntax seems to be quite general)

@FraserP117
Copy link
Author

FraserP117 commented Jun 9, 2024

Thanks again for all the suggestions and engagement, it really helps.

I'm glad to have @wouterwln's blessing on the selection of GraphViz as the backend. Indeed, it seems that we were all resolved to this in the last meeting. Of course, things are always open to change and I intend to see what I can do with dot2tex in addition to the functionality I outlined above.

Yes @bvdmitri I completely agree with the necessity to switch over from string concatenation to something like IOBuffer(). Thank you for the comments. I have subsequently made that change to the working version.

I'm attacking the issue of node depiction/naming just now and I have a question. I currently name each node with the exact label given to it by GraphPPL. This is not the ideal way to label each node. As an example, this is our current working version for the "raw" visualization of the simple Beta-Bernoulli coin toss model:

coin_toss_GraphViz

I can also list the node labels depicted in the above GraphViz render of the underlying model:

coin_toss_node_labels

My understanding is that GraphPPL creates the node label by adding the label's name/description as a prefix to a global integer counter of all nodes. @bvdmitri has already informed us that these global counter values are superfluous to the visualization efforts, and so I'm looking to remove them entirely.

The only thing I find strange is that each constant variable node seems to be counted twice by the global counter. You can see that "constvar_2_3" and "constvar_4_5" appear to be the result of appending two iterations of the global counter to each respective constant in the Beta distribution. Indeed, if I inspect the GraphPPL.Model's counter field, it appears to show that there are 12 nodes in the graph, when clearly there are only 10:

coin_toss_gppl_num_nodes

The Graphs.nv function returns the number of nodes in the MetaGraphsNext.MetaGraph. Using this function via MetaGraphsNext, we see that there are 10 nodes reported as belonging to the MetaGraph - coinciding with the above visual inspection:

coin_toss_metagraph_num_nodes

This leads me to suspect that GraphPPL may be double counting "constvar" variables, due to it naming them twice with the global counter? You can see the same issue with the Bayesian Linear Regression Model:

linear_model_node_labels

My guess is that there may be a superfluous use of GraphPPL.NodeLabel during the graph creation process for constant nodes only? I hope this makes some sense, and that I've not just misunderstood something about how GraphPPL names things under the hood.

Many thanks, talk soon!

@bvdmitri
Copy link
Member

Thanks for the update @FraserP117!

these global counter values are superfluous to the visualization efforts, and so I'm looking to remove them entirely.

I think you can use the GraphPPL.getname function to retrieve the name of a node label without its corresponding counter.

This leads me to suspect that GraphPPL may be double counting "constvar" variables

Double naming for the constants does indeed look a bit weird. We should double check what is happening there.
@wouterwln do you have the capacity to check this?

@FraserP117
Copy link
Author

FraserP117 commented Jun 10, 2024

Thanks @bvdmitri, I'll look at the GraphPPL.getname function. I will take a look at the double naming issue as far as I'm able, just haven't had time in the last day.

@wouterwln
Copy link
Member

The double naming is fixed in #238 ;)

@FraserP117
Copy link
Author

Excellent! just saw that! thanks @wouterwln .

@albertpod
Copy link
Member

Hey folks! I just wanted to check on the progress with this one, as the NumFocus deadline is coming up soon. Do you have a PR date in mind? :D

@chbe-helix
Copy link

Hi @albertpod ! Fraser and I have collectively had some external obligations crop up that have made progress slow a bit but we are now able to resume work on this issue. We are planning on meeting to discuss timelines in the next day and can get back to you with a more definitive time estimate soon. Could you remind me when the NumFocus deadline is and what the requirements are for that deadline so we may make sure we can meet it?

@albertpod
Copy link
Member

albertpod commented Jul 9, 2024

@chbe-helix no worries at all! The deadline is about mid September.
I very much appreciate your effort, it's hard stuff!

@chbe-helix
Copy link

chbe-helix commented Jul 9, 2024 via email

@FraserP117
Copy link
Author

FraserP117 commented Jul 13, 2024

Greetings @albertpod and all. Yes, @chbe-helix and I do have some updates.

Embarrassed though I am to admit it, I was unaware that there was a NumFOCUS deadline. Very good to know that there is one! In response to this, we have created an internal ActInf-Institute page to host all information relevant to our development efforts. It can be found at this link here and should be available for public access.

Here we outline the proposed functionality and provide a schema for the conceptualization of the finished product, in addition to an implementation philosophy and a development roadmap/task tracker. At present we consider only those tasks relevant to "phase 1", which we define as all that which is to be completed upon the passing of the NumFOCUS September deadline.

As far as @chbe-helix and myself understand, this NumFOCUS deadline serves only to prove that the BIASlab has received a pull request from some external party, and that it is NOT intended to serve as the deadline for any final/polished open source contribution. Please do tell us if we have misunderstood anything on this.

As I mentioned, we've created a development timeline which culminates in our first pull request. Naturally, this is intended to provide only the most minimal functionality. We will not be able to afford the majority of our intended contributions by this date. See the above page for details on exactly what we intend to provide in our initial PR.

We want there to be plenty of time between our first PR and the actual deadline, such that it may be properly appraised by the core BIASlab team, tested for bugs and so on. Our current plan is to make our initial PR in mid-to-late August - see the upper and lower bounds on the development timeline. All of this is - of course - subject to change upon your preferences and recommendations. It would be helpful to know exactly when the deadline is, such that I may amend the development plan to the effect of greater precision.

We eagerly await your comments and recommendations regarding all this and we hope that our efforts thus far appear promising to yourselves. As you no doubt know better than I, the beginning of any project is always the most important phase. We've taken great pains at this stage to linger upon more conceptual issues, in the hope that the project's eventual fruition is proportionately freed from technical debt and sloppy "code-first and ask questions later" hastiness.

Many thanks from @chbe-helix and myself.

@albertpod
Copy link
Member

Hi @FraserP117!

Sorry for the late reply (we were busy with JuliaCon). Thanks for your detailed update. It’s great to see the progress you and @chbe-helix have made.

To clarify, there is no explicit deadline from NumFOCUS for having one big or small PR by mid-September. The main goal is to find external contributors who are eager to participate in @ReactiveBayes development alongside BIASlab and Lazy Dynamics. A PR is a practical way to show this to NumFOCUS, hence why I mentioned it.

Regarding the deadline, you are correct that it is not intended as a deadline for a final/polished open-source contribution. The aim is to have an external contribution to demonstrate community interest and engagement.

Thanks for providing the link to your internal ActInf-Institute page. Your plan to make the initial PR in mid-to-late August sounds good. This should indeed provide enough time for the core team to appraise it, test for bugs, and provide feedback.

Again, many thanks for your work.

@FraserP117
Copy link
Author

Excellent, I hope that some of your JuliaCon presentations make their way to YouTube! (and that it all went well).

That's all great to here, thanks @albertpod. I've been detained by significant personal commitments, in addition to a terrible flu in the last week. I mention this only to "prove" that I have not been inactive due to a lack of interest! I am extremely interested to fulfil this issue in a prompt and satisfactory manner. Indeed, I'm overjoyed to participate with the BIASlab in whatever capacity.

More updates in the coming days. As always, many thanks!

@chbe-helix
Copy link

@wmkouw and @bvdmitri I submitted PR-251 (#251) with the code that @FraserP117 and I have been working on. We need feedback on the code and what version number we should set. While we are getting your feedback we have one more test to run to ensure compatibility with v4.3.3. Let us know what you think!

@wmkouw
Copy link
Member

wmkouw commented Sep 6, 2024

Wonderful! I will have a look after IWAI (Friday Sept 13).

@bvdmitri
Copy link
Member

bvdmitri commented Sep 6, 2024

@wouterwln

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants