Skip to content

Temporal Graph Construction C API #5161

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: branch-25.10
Choose a base branch
from

Conversation

ChuckHastings
Copy link
Collaborator

@ChuckHastings ChuckHastings commented Jul 16, 2025

This PR defines and implements the C API for temporal graph construction. It also defines the API for temporal sampling, but does not provide an implementation. That will be addressed in a subsequent PR.

Non-breaking, old applications will still function, but we will ultimately be deprecating some functions (hopefully in 25.10, if not in 25.12).

Still TBD:

  • Tests
  • cython updates

Copy link

copy-pr-bot bot commented Jul 16, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@alexbarghi-nv
Copy link
Member

This looks like it's missing the API call for constructing an edgelist with the edge data, weights, edge ids, times, etc.

@alexbarghi-nv
Copy link
Member

Also, similarly to biased sampling, could we include a placeholder edge time argument?

@ChuckHastings
Copy link
Collaborator Author

This looks like it's missing the API call for constructing an edgelist with the edge data, weights, edge ids, times, etc.

I think it's there - a bit different from the current approach. I'm going to rename some things and I'll write a test example to show you how I intend it to work.

@ChuckHastings
Copy link
Collaborator Author

Also, similarly to biased sampling, could we include a placeholder edge time argument?

I am missing an argument there, I'll add it in the next push. But to clarify

  • did you intend to pass a different time for filtering than what's currently in the graph?
  • did you want to be able to select an edge property that's already in the graph to control the time filtering?
  • did you want some other edge time notion (perhaps a starting edge time for filtering, or something like that)?

@alexbarghi-nv
Copy link
Member

Also, similarly to biased sampling, could we include a placeholder edge time argument?

I am missing an argument there, I'll add it in the next push. But to clarify

  • did you intend to pass a different time for filtering than what's currently in the graph?
  • did you want to be able to select an edge property that's already in the graph to control the time filtering?
  • did you want some other edge time notion (perhaps a starting edge time for filtering, or something like that)?

In the PyG API, you can pass any edge property as the edge time when creating a loader. If, for whatever reason, the user were to create two loaders - one with the times from one property, and one with the times from a different property, this would result in two cuGraph graphs.

I'm not sure how often users would do that, but in case they did, it would be nice not to have to create two graphs, and just switch out the property used for sampling. We'd have to allow it to be a property added after graph creation, since the FeatureStore is mutable.

@alexbarghi-nv
Copy link
Member

Oh, and another thing I noticed when double-checking the PyG API, for future reference, is that it does allow node-based temporal sampling (which we don't plan to support yet). In node-based temporal sampling, each hop's chosen nodes have to have a time matching the temporal constraint (>, >=, <=, or < the previous hop).

In case we ever decide to implement that, it might be good to clarify that this API only supports edge-based temporal sampling. Sorry for the oversight on this - there are just so many parameters to the PyG loaders which makes it hard to keep track of what is/isn't supported.

@ChuckHastings
Copy link
Collaborator Author

Also, similarly to biased sampling, could we include a placeholder edge time argument?

I am missing an argument there, I'll add it in the next push. But to clarify

  • did you intend to pass a different time for filtering than what's currently in the graph?
  • did you want to be able to select an edge property that's already in the graph to control the time filtering?
  • did you want some other edge time notion (perhaps a starting edge time for filtering, or something like that)?

In the PyG API, you can pass any edge property as the edge time when creating a loader. If, for whatever reason, the user were to create two loaders - one with the times from one property, and one with the times from a different property, this would result in two cuGraph graphs.

I'm not sure how often users would do that, but in case they did, it would be nice not to have to create two graphs, and just switch out the property used for sampling. We'd have to allow it to be a property added after graph creation, since the FeatureStore is mutable.

I'd be inclined to not add a parameter for this (and perhaps that means the extra biased sampling parameter should be refactored as well, although since it's not used yet that's not a huge problem).

The API proposed in this PR allows for creating the graph with an arbitrary collection of edge properties, and selecting which edge property to apply. If you know (at graph construction time) both of the edge properties that will be used for time you can create the graph with both edge properties and pass in the string label identifying the property you want to use.

There have been many conversations about adding a property to a graph after it has been created. That feature is not currently supported. Assuming we add support for that we'll be able to add the property (with it's own name) to the graph and you would again be able to use the string label that I've added for that purpose.

We'll certainly consider adding this feature for dynamic graphs, it seems like an important feature that would be required. Doing it also for static graphs is reasonable. The only problem with adding an edge property to an existing graph (in either static or dynamic case) comes when your input is a multi-graph. We could, I think, easily support the feature for a non-multi-graph. For the multi-graph we would need some means through the API to identify uniqueness. Perhaps requiring an edge id in multi-graphs to support this feature would be sufficient.

@ChuckHastings
Copy link
Collaborator Author

ChuckHastings commented Jul 21, 2025

Oh, and another thing I noticed when double-checking the PyG API, for future reference, is that it does allow node-based temporal sampling (which we don't plan to support yet). In node-based temporal sampling, each hop's chosen nodes have to have a time matching the temporal constraint (>, >=, <=, or < the previous hop).

In case we ever decide to implement that, it might be good to clarify that this API only supports edge-based temporal sampling. Sorry for the oversight on this - there are just so many parameters to the PyG loaders which makes it hard to keep track of what is/isn't supported.

There is a comment in the documentation that these are temporal sampling based on edge times. Temporal sampling based on vertex times would need to be a different algorithm as the filtering would be different. Intuitively I want to say it should be a simple clone of the current code with different filtering logic... although I've only given it about 2 minutes of thought.

@ChuckHastings ChuckHastings reopened this Jul 23, 2025
@ChuckHastings ChuckHastings marked this pull request as ready for review July 23, 2025 20:23
@ChuckHastings ChuckHastings requested a review from a team as a code owner July 23, 2025 20:23
@ChuckHastings ChuckHastings added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jul 23, 2025
bool_t symmetrize,
bool_t do_expensive_check,
cugraph_graph_t** graph,
cugraph_error_t** error);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we better add parameters to use large memory buffer (even though we may not support it in this PR) to avoid future breaking changes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have drafted a dramatic refactoring of this API that I plan on for 25.10. The draft API includes adding a flag for large memory buffer. That will be a big breaking change. But it was too much to get into this PR.

Since we're going to need another breaking change, I was thinking I would leave it out of this PR.

typedef enum {
STRICTLY_INCREASING = 0, /** Time strictly increasing (each time is after the previous one) */
MONOTONICALLY_INCREASING, /** Time monotonically increasing (could have multiple edges with same
time) */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, does this mean non-decreasing? Hard to distinguish strictly increasing vs monotonically increasing without reading the documentation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. My memory wanted to use monotonically non-decreasing. An internet search suggested that more modern terminology uses strictly increasing for always increasing and monotonically increasing for always non-decreasing.

I'm open to any labels we can be happy with.

@@ -61,7 +61,8 @@ struct bfs_functor : public abstract_functor {
template <typename vertex_t,
typename edge_t,
typename weight_t,
typename edge_type_t,
typename edge_type_type_t,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any rational behind renaming edge_type_t to edge_type_type_t? We are using edge_type_t in the rest of the codebase, and I think this creates some inconsistencies and potential sources for confusion.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big copy paste between algorithms. Some algorithms used edge_type_t and others used edge_type_type_t. I'll switch them all to use edge_type_t, it's more concise.

@ChuckHastings ChuckHastings changed the base branch from branch-25.08 to branch-25.10 July 24, 2025 14:48
@ChuckHastings ChuckHastings changed the title Temporal sampling C API Temporal Graph Construction C API Jul 24, 2025
@ChuckHastings ChuckHastings requested a review from a team as a code owner July 24, 2025 17:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuGraph improvement Improvement / enhancement to an existing function non-breaking Non-breaking change python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants