-
Notifications
You must be signed in to change notification settings - Fork 332
Temporal Graph Construction C API #5161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: branch-25.10
Are you sure you want to change the base?
Temporal Graph Construction C API #5161
Conversation
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
This looks like it's missing the API call for constructing an edgelist with the edge data, weights, edge ids, times, etc. |
Also, similarly to biased sampling, could we include a placeholder edge time argument? |
I think it's there - a bit different from the current approach. I'm going to rename some things and I'll write a test example to show you how I intend it to work. |
I am missing an argument there, I'll add it in the next push. But to clarify
|
In the PyG API, you can pass any edge property as the edge time when creating a loader. If, for whatever reason, the user were to create two loaders - one with the times from one property, and one with the times from a different property, this would result in two cuGraph graphs. I'm not sure how often users would do that, but in case they did, it would be nice not to have to create two graphs, and just switch out the property used for sampling. We'd have to allow it to be a property added after graph creation, since the |
Oh, and another thing I noticed when double-checking the PyG API, for future reference, is that it does allow node-based temporal sampling (which we don't plan to support yet). In node-based temporal sampling, each hop's chosen nodes have to have a time matching the temporal constraint (>, >=, <=, or < the previous hop). In case we ever decide to implement that, it might be good to clarify that this API only supports edge-based temporal sampling. Sorry for the oversight on this - there are just so many parameters to the PyG loaders which makes it hard to keep track of what is/isn't supported. |
I'd be inclined to not add a parameter for this (and perhaps that means the extra biased sampling parameter should be refactored as well, although since it's not used yet that's not a huge problem). The API proposed in this PR allows for creating the graph with an arbitrary collection of edge properties, and selecting which edge property to apply. If you know (at graph construction time) both of the edge properties that will be used for time you can create the graph with both edge properties and pass in the string label identifying the property you want to use. There have been many conversations about adding a property to a graph after it has been created. That feature is not currently supported. Assuming we add support for that we'll be able to add the property (with it's own name) to the graph and you would again be able to use the string label that I've added for that purpose. We'll certainly consider adding this feature for dynamic graphs, it seems like an important feature that would be required. Doing it also for static graphs is reasonable. The only problem with adding an edge property to an existing graph (in either static or dynamic case) comes when your input is a multi-graph. We could, I think, easily support the feature for a non-multi-graph. For the multi-graph we would need some means through the API to identify uniqueness. Perhaps requiring an edge id in multi-graphs to support this feature would be sufficient. |
There is a comment in the documentation that these are temporal sampling based on edge times. Temporal sampling based on vertex times would need to be a different algorithm as the filtering would be different. Intuitively I want to say it should be a simple clone of the current code with different filtering logic... although I've only given it about 2 minutes of thought. |
319c451
to
07202f4
Compare
bool_t symmetrize, | ||
bool_t do_expensive_check, | ||
cugraph_graph_t** graph, | ||
cugraph_error_t** error); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we better add parameters to use large memory buffer (even though we may not support it in this PR) to avoid future breaking changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have drafted a dramatic refactoring of this API that I plan on for 25.10. The draft API includes adding a flag for large memory buffer. That will be a big breaking change. But it was too much to get into this PR.
Since we're going to need another breaking change, I was thinking I would leave it out of this PR.
typedef enum { | ||
STRICTLY_INCREASING = 0, /** Time strictly increasing (each time is after the previous one) */ | ||
MONOTONICALLY_INCREASING, /** Time monotonically increasing (could have multiple edges with same | ||
time) */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, does this mean non-decreasing? Hard to distinguish strictly increasing vs monotonically increasing without reading the documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. My memory wanted to use monotonically non-decreasing. An internet search suggested that more modern terminology uses strictly increasing for always increasing and monotonically increasing for always non-decreasing.
I'm open to any labels we can be happy with.
cpp/src/c_api/bfs.cpp
Outdated
@@ -61,7 +61,8 @@ struct bfs_functor : public abstract_functor { | |||
template <typename vertex_t, | |||
typename edge_t, | |||
typename weight_t, | |||
typename edge_type_t, | |||
typename edge_type_type_t, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any rational behind renaming edge_type_t
to edge_type_type_t
? We are using edge_type_t
in the rest of the codebase, and I think this creates some inconsistencies and potential sources for confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Big copy paste between algorithms. Some algorithms used edge_type_t and others used edge_type_type_t. I'll switch them all to use edge_type_t, it's more concise.
This PR defines and implements the C API for temporal graph construction. It also defines the API for temporal sampling, but does not provide an implementation. That will be addressed in a subsequent PR.
Non-breaking, old applications will still function, but we will ultimately be deprecating some functions (hopefully in 25.10, if not in 25.12).
Still TBD: