Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support heterogenous fanout type #4608

Open
wants to merge 123 commits into
base: branch-24.12
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
123 commits
Select commit Hold shift + click to select a range
0adb2fd
support heterogenous fanout type
jnke2016 Aug 13, 2024
bb5a3e2
remove unusued code
jnke2016 Aug 13, 2024
10fa86d
fix style
jnke2016 Aug 13, 2024
f904350
create one API for both uniform and biased neighborhood sampling
jnke2016 Aug 20, 2024
1fc32c3
use the same function for both uniform and biased nieghborhood sampling
jnke2016 Aug 20, 2024
8fc21f8
add support for heterogenous fanout support at the plc layer and cons…
jnke2016 Aug 20, 2024
01a57f3
remove outdated codes
jnke2016 Aug 20, 2024
3a6aeb2
add flag differentiating between biased and uniform sampling
jnke2016 Aug 21, 2024
d2f6467
update docstrings and rename variable
jnke2016 Aug 21, 2024
5d25155
rename variable
jnke2016 Aug 21, 2024
80f8b86
create new tuple type
jnke2016 Aug 21, 2024
50e0fc5
remove unnecessary check
jnke2016 Aug 21, 2024
9f455bf
add constructor converting from array_view_t to array_t
jnke2016 Aug 21, 2024
d114534
leverage new constructor and remove unnecessary code
jnke2016 Aug 21, 2024
cf4a3ae
ensure edge types are ordered in increasing order
jnke2016 Aug 21, 2024
bc87b50
update docstrings
jnke2016 Aug 21, 2024
3013684
update docstrings
jnke2016 Aug 21, 2024
d6b6234
undo changes to uniform neighbor sample
jnke2016 Aug 22, 2024
068b0a3
undo changes to uniform neighbor sample
jnke2016 Aug 22, 2024
6920f65
update docstrings
jnke2016 Aug 22, 2024
760c5cd
re-order arguments
jnke2016 Aug 22, 2024
1e0ef27
remove outdated comments
jnke2016 Aug 22, 2024
de79620
add arguments and type check
jnke2016 Aug 23, 2024
8c17009
rename variable for consistency
jnke2016 Aug 23, 2024
7b95c5e
update neighbor sample API
jnke2016 Aug 30, 2024
19fc765
remove outdated code
jnke2016 Aug 30, 2024
e30766c
remove outdated comment
jnke2016 Aug 30, 2024
5dd66f2
first cut at new sampling function definition to clean up things befo…
ChuckHastings Sep 4, 2024
4b2764c
updates to remove builder pattern, also rename functions and mark old…
ChuckHastings Sep 5, 2024
4c1c610
add implementation of heterogeneous neighborhood sampling
jnke2016 Sep 9, 2024
fe35c80
add exit condition
jnke2016 Sep 9, 2024
a658b29
remove comments
jnke2016 Sep 10, 2024
e52a38a
Add Implementation
ChuckHastings Sep 11, 2024
c416439
call heterogeneous renumbering
jnke2016 Sep 13, 2024
98d6c57
update branch and call heterogneous renumbering
jnke2016 Sep 13, 2024
d7165af
update heterogeneous renumbering call
jnke2016 Sep 17, 2024
579fd0a
create a csr data structure to efficiently store vertex and label
jnke2016 Sep 17, 2024
5cdf40a
update API and docstring
jnke2016 Sep 17, 2024
a8fbd9d
remove unsued variable
jnke2016 Sep 17, 2024
9d5b3dd
update C++ API for neighbor sampling
jnke2016 Sep 20, 2024
0358c6e
add fixme for deprecated flags
jnke2016 Sep 20, 2024
799c35d
update CAPI
jnke2016 Sep 20, 2024
ab8aa72
undo changes to k-truss
jnke2016 Sep 21, 2024
7d8b5ad
undo changes to tests
jnke2016 Sep 21, 2024
f2190ba
clean up code
jnke2016 Sep 21, 2024
1e96dcf
update docs
jnke2016 Sep 23, 2024
36c25ad
fix typo
jnke2016 Sep 23, 2024
4857b36
call scatter instead of gather and fix type bug
jnke2016 Sep 23, 2024
263b6ac
fix typo
jnke2016 Sep 23, 2024
9dff3ab
update neighbor sample API
jnke2016 Sep 24, 2024
33c8b3d
update CAPI
jnke2016 Sep 25, 2024
e357f42
remove unsued code
jnke2016 Sep 25, 2024
6081978
remove outdated comment
jnke2016 Sep 25, 2024
73b3ffe
remove unnecessary copy
jnke2016 Sep 25, 2024
ea972f3
remove outdate arguments
jnke2016 Sep 26, 2024
8822192
fix typo
jnke2016 Sep 27, 2024
e02a513
update plc API of heterogeneous neighbor sample
jnke2016 Sep 27, 2024
d6cb1d5
fix typo
jnke2016 Sep 27, 2024
54fa155
change back the fanout type from a sparse to a dense structure
jnke2016 Sep 27, 2024
499e041
fix typo
jnke2016 Sep 27, 2024
b571deb
add implementation of heterogeneous/homogeneous biased/uniform neighb…
jnke2016 Sep 27, 2024
f6c4ce3
properly handle edge types
jnke2016 Sep 27, 2024
e71660d
add tests for 'homogeneous_uniform_neighbor_sampling'
jnke2016 Sep 27, 2024
4e2c8cf
add tests for homogeneous_biased_neighbor_sampling.cpp
jnke2016 Sep 27, 2024
2458149
update type combination
jnke2016 Sep 27, 2024
df3e4ff
add tests for heterogeneous uniform/biased neighborhood sampling
jnke2016 Sep 28, 2024
d4847e4
properly sample with edge types
jnke2016 Sep 28, 2024
dc2c9ba
remove outdated tests
jnke2016 Sep 28, 2024
c01f4e4
add SG python implementation of neighborhood sampling both homogeneou…
jnke2016 Sep 30, 2024
dabd0c8
remove unused argument
jnke2016 Sep 30, 2024
95ca286
add tests for homogeneous uniform neighborhood sampling
jnke2016 Oct 16, 2024
383bfc4
add method to fill a buffer array with a scalar
jnke2016 Oct 21, 2024
68fa2f1
add method to sort and count unique elements in buffer array
jnke2016 Oct 21, 2024
18899ed
add method to sort and count unique elements in buffer array
jnke2016 Oct 21, 2024
57e6f96
update computation of map from label to comm rank
jnke2016 Oct 21, 2024
d34d85c
perform allgatherv of the local mapping from label to comm rank
jnke2016 Oct 22, 2024
2aa0903
udpate tests for 'mg_homogeneous_uniform_neighbor_sampling'
jnke2016 Oct 22, 2024
fa0cb88
update neighbor sampling call for 'NO_CUGRAPH_OPS'
jnke2016 Oct 22, 2024
990d2ed
remove outdated code
jnke2016 Oct 22, 2024
d44a46f
udpate tests for 'mg_homogeneous_biased_neighbor_sampling'
jnke2016 Oct 22, 2024
a1a6180
add mg tests for heterogeneous uniform and biased neighborhood sampling
jnke2016 Oct 22, 2024
36ce4fc
add new tests to CMakeLists
jnke2016 Oct 22, 2024
c0a618f
remove unsued variable
jnke2016 Oct 22, 2024
965001b
fix illegal memory access
jnke2016 Oct 22, 2024
a005a19
update branch with the latest changes
jnke2016 Oct 22, 2024
11f9c40
fix style
jnke2016 Oct 22, 2024
6b547fb
fix style
jnke2016 Oct 22, 2024
14e9a99
update cmakelist
jnke2016 Oct 22, 2024
aebfd08
update type combination
jnke2016 Oct 22, 2024
b159e1b
fix symbol lookup error
jnke2016 Oct 30, 2024
3b0c016
leverage raft span instead of raw pointers
jnke2016 Oct 30, 2024
da82567
remove python implementation of heterogeneous neighborhood sampling a…
jnke2016 Oct 30, 2024
5b1bbb4
undo changes to uniform neighbor sampling
jnke2016 Oct 30, 2024
6d69f88
remove o utdated fixme
jnke2016 Oct 30, 2024
79d4527
remove unnecessary call
jnke2016 Oct 30, 2024
2a66928
remove unnecessary blank line
jnke2016 Oct 30, 2024
8c3d871
add comments for deprecated functions
jnke2016 Oct 30, 2024
ae92c9f
remove obsolete instantiation
jnke2016 Oct 30, 2024
d7d6109
remove unnecessary parenthesis
jnke2016 Oct 30, 2024
6b3ffbd
remove obsolete instantiation
jnke2016 Oct 30, 2024
67d7d0a
fix style
jnke2016 Oct 30, 2024
3386230
Merge remote-tracking branch 'upstream/branch-24.12' into branch-24.1…
jnke2016 Oct 30, 2024
413a577
fix type error
jnke2016 Oct 30, 2024
22db98d
fix import error
jnke2016 Oct 30, 2024
6a95852
remove redundant tests
jnke2016 Oct 31, 2024
4f5dc3e
remove hardcoded path
jnke2016 Oct 31, 2024
334ce6d
Merge remote-tracking branch 'upstream/branch-24.12' into branch-24.1…
jnke2016 Oct 31, 2024
9b4683a
fix style
jnke2016 Oct 31, 2024
8cb0c94
rename 'd_value' to 'd_span'
jnke2016 Nov 1, 2024
77303d5
use 'label_list' as a map form 'comm_rank' to 'label_map'
jnke2016 Nov 5, 2024
117a4ec
use 'label_list' as a map form 'comm_rank' to 'label_map'
jnke2016 Nov 5, 2024
ac66f13
add module biased_neighbor_sample
jnke2016 Nov 6, 2024
114bf56
rename variable
jnke2016 Nov 7, 2024
63a59ca
avoid creating function that compile all types and be more explicit w…
jnke2016 Nov 7, 2024
84face3
remove unsued function
jnke2016 Nov 8, 2024
ab853e5
declare homogeneous functions first
jnke2016 Nov 8, 2024
802d9b0
rename variable
jnke2016 Nov 8, 2024
c5bec5f
remove duplicated functions
jnke2016 Nov 8, 2024
4bd09f6
reorder variable declaration
jnke2016 Nov 8, 2024
5d6cb34
add fixme for not testing edge masking
jnke2016 Nov 8, 2024
24e31cb
remove outdated fixme
jnke2016 Nov 8, 2024
7ca5d59
fix style
jnke2016 Nov 8, 2024
64b7edc
Merge remote-tracking branch 'upstream/branch-24.12' into branch-24.1…
jnke2016 Nov 8, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
223 changes: 222 additions & 1 deletion cpp/include/cugraph/sampling_functions.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -129,14 +129,123 @@ uniform_neighbor_sample(
std::optional<raft::device_span<label_t const>> starting_vertex_labels,
std::optional<std::tuple<raft::device_span<label_t const>, raft::device_span<int32_t const>>>
label_to_output_comm_rank,
raft::host_span<int32_t const> fan_out,
std::optional<raft::host_span<int32_t const>> fan_out,
jnke2016 marked this conversation as resolved.
Show resolved Hide resolved
std::optional<std::tuple<raft::host_span<int32_t const>, raft::host_span<int32_t const>>> heterogeneous_fan_out,
raft::random::RngState& rng_state,
bool return_hops,
bool with_replacement = true,
prior_sources_behavior_t prior_sources_behavior = prior_sources_behavior_t::DEFAULT,
bool dedupe_sources = false,
bool do_expensive_check = false);

#if 0
/* FIXME:
There are two options to support heterogeneous fanout
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's another option to explore.

Create a new function called neighbor_sample. Create it off of the biased sampling API, but with the following changes:

  1. the biases become optional instead of required. Then it can do either uniform or biased in the same call just by whether the biases are included or not
  2. the fanout and heterogeneous fanout as you have defined. Or we might explore using std::variant, where it would either take host_span or tuple of host span and make the right choice internally
  3. Move the rng_state parameter to be right after the handle (before the graph_view). This feels like a better standard place for the parameter.

We can then mark the existing uniform_neighbor_sample and biased_neighbor_sample as deprecated. When we implement, the internal C++ implementation can just call the new neighbor_sample with the parameters properly configured. This makes it a non-breaking change (eventually we'll drop the old functions), but still keeps the code reuse increased.

Thoughts @seunghwak ?

Copy link
Contributor

@seunghwak seunghwak Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. the biases become optional instead of required. Then it can do either uniform or biased in the same call just by whether the biases are included or not

=> In this case, we may update the existing non-heterogeneous fanout type sampling functions as well. i.e. combine the uniform & biased sampling functions. Not sure about the optimal balancing point between creating too many functions vs creating a function with too many input parameters.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... I guess we should avoid creating a too busy function (one function handling all different types of sampling based on the input arguments excessively using std::variant & std::optional) but we should also avoid creating too many functions... Not sure what's the optimal balancing point...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory, adding new parameters exponentially increase code complexity (too handle all possible combinations of optional parameters), we should better create separate functions. If supporting an additional optional parameter requires only a minor change in the API and implementation, we may create one generic function (or we may create one complex function that handles all different options in the detail namespace and multiple public functions calling this if this helps in reducing code replication).

1) Create a new function 'heterogeneous_uniform_neighbor_sample' which will take as input
only heterogeneous fanout type. Drawback: code redundancy
2) Update 'uniform_neighbor_sample' to support both fanout types
*/
/**
* @brief Heterogeneous uniform Neighborhood Sampling.
*
* This function traverses from a set of starting vertices, traversing outgoing edges and
* randomly selects from these outgoing neighbors to extract a subgraph.
*
* Output from this function is a tuple of vectors (src, dst, weight, edge_id, edge_type, hop,
* label, offsets), identifying the randomly selected edges. src is the source vertex, dst is the
* destination vertex, weight (optional) is the edge weight, edge_id (optional) identifies the edge
* id, edge_type (optional) identifies the edge type, hop identifies which hop the edge was
* encountered in. The label output (optional) identifes the vertex label. The offsets array
* (optional) will be described below and is dependent upon the input parameters.
*
* If @p starting_vertex_labels is not specified then no organization is applied to the output, the
* label and offsets values in the return set will be std::nullopt.
*
* If @p starting_vertex_labels is specified and @p label_to_output_comm_rank is not specified then
* the label output has values. This will also result in the output being sorted by vertex label.
* The offsets array in the return will be a CSR-style offsets array to identify the beginning of
* each label range in the data. `labels.size() == (offsets.size() - 1)`.
*
* If @p starting_vertex_labels is specified and @p label_to_output_comm_rank is specified then the
* label output has values. This will also result in the output being sorted by vertex label. The
* offsets array in the return will be a CSR-style offsets array to identify the beginning of each
* label range in the data. `labels.size() == (offsets.size() - 1)`. Additionally, the data will
* be shuffled so that all data with a particular label will be on the specified rank.
*
* @tparam vertex_t Type of vertex identifiers. Needs to be an integral type.
* @tparam edge_t Type of edge identifiers. Needs to be an integral type.
* @tparam weight_t Type of edge weights. Needs to be a floating point type.
* @tparam edge_type_t Type of edge type. Needs to be an integral type.
* @tparam label_t Type of label. Needs to be an integral type.
* @tparam store_transposed Flag indicating whether sources (if false) or destinations (if
* true) are major indices
* @tparam multi_gpu Flag indicating whether template instantiation should target single-GPU (false)
* @param handle RAFT handle object to encapsulate resources (e.g. CUDA stream, communicator, and
* handles to various CUDA libraries) to run graph algorithms.
* @param graph_view Graph View object to generate NBR Sampling on.
* @param edge_weight_view Optional view object holding edge weights for @p graph_view.
* @param edge_id_view Optional view object holding edge ids for @p graph_view.
* @param edge_type_view Optional view object holding edge types for @p graph_view.
* @param starting_vertices Device span of starting vertex IDs for the sampling.
* In a multi-gpu context the starting vertices should be local to this GPU.
* @param starting_vertex_labels Optional device span of labels associted with each starting vertex
* for the sampling.
* @param label_to_output_comm_rank Optional tuple of device spans mapping label to a particular
* output rank. Element 0 of the tuple identifes the label, Element 1 of the tuple identifies the
* output rank. The label span must be sorted in ascending order.
* @param fan_out Tuple of host spans mapping each edge type to fanout values. Element 0
* of the tuple defines the size of the fanout per edge type while element 1 defines the branching out
* (fan-out) degree per edge type for each level.
* @param rng_state A pre-initialized raft::RngState object for generating random numbers
* @param return_hops boolean flag specifying if the hop information should be returned
* @param prior_sources_behavior Enum type defining how to handle prior sources, (defaults to
* DEFAULT)
* @param dedupe_sources boolean flag, if true then if a vertex v appears as a destination in hop X
* multiple times with the same label, it will only be passed once (for each label) as a source
* for the next hop. Default is false.
* @param with_replacement boolean flag specifying if random sampling is done with replacement
* (true); or, without replacement (false); default = true;
* @param do_expensive_check A flag to run expensive checks for input arguments (if set to `true`).
* @return tuple device vectors (vertex_t source_vertex, vertex_t destination_vertex,
* optional weight_t weight, optional edge_t edge id, optional edge_type_t edge type,
* optional int32_t hop, optional label_t label, optional size_t offsets)
*/

// tuple with 3 elements. 1 - edge_type, 2- host span of size_t, 3 - fanout vector
template <typename vertex_t,
typename edge_t,
typename weight_t,
typename edge_type_t,
typename label_t,
bool store_transposed,
bool multi_gpu>
std::tuple<rmm::device_uvector<vertex_t>,
rmm::device_uvector<vertex_t>,
std::optional<rmm::device_uvector<weight_t>>,
std::optional<rmm::device_uvector<edge_t>>,
std::optional<rmm::device_uvector<edge_type_t>>,
std::optional<rmm::device_uvector<int32_t>>,
std::optional<rmm::device_uvector<label_t>>,
std::optional<rmm::device_uvector<size_t>>>
heterogeneous_uniform_neighbor_sample(
raft::handle_t const& handle,
graph_view_t<vertex_t, edge_t, store_transposed, multi_gpu> const& graph_view,
std::optional<edge_property_view_t<edge_t, weight_t const*>> edge_weight_view,
std::optional<edge_property_view_t<edge_t, edge_t const*>> edge_id_view,
std::optional<edge_property_view_t<edge_t, edge_type_t const*>> edge_type_view,
raft::device_span<vertex_t const> starting_vertices,
std::optional<raft::device_span<label_t const>> starting_vertex_labels,
std::optional<std::tuple<raft::device_span<label_t const>, raft::device_span<int32_t const>>>
label_to_output_comm_rank,
std::tuple<raft::host_span<int32_t const>, raft::host_span<int32_t const>> fan_out,
raft::random::RngState& rng_state,
bool return_hops,
bool with_replacement = true,
prior_sources_behavior_t prior_sources_behavior = prior_sources_behavior_t::DEFAULT,
bool dedupe_sources = false,
bool do_expensive_check = false);
#endif

/**
* @brief Biased Neighborhood Sampling.
*
Expand Down Expand Up @@ -240,6 +349,118 @@ biased_neighbor_sample(
bool dedupe_sources = false,
bool do_expensive_check = false);

#if 0
/* FIXME:
There are two options to support heterogeneous fanout
1) Create a new function 'heterogeneous_biased_neighbor_sample' which will take as input
only heterogeneous fanout type. Drawback: code redundancy
2) Update 'biased_neighbor_sample' to support both fanout types
*/
/**
* @brief Biased Neighborhood Sampling.
*
* This function traverses from a set of starting vertices, traversing outgoing edges and
* randomly selects (with edge biases) from these outgoing neighbors to extract a subgraph.
*
* Output from this function is a tuple of vectors (src, dst, weight, edge_id, edge_type, hop,
* label, offsets), identifying the randomly selected edges. src is the source vertex, dst is the
* destination vertex, weight (optional) is the edge weight, edge_id (optional) identifies the edge
* id, edge_type (optional) identifies the edge type, hop identifies which hop the edge was
* encountered in. The label output (optional) identifes the vertex label. The offsets array
* (optional) will be described below and is dependent upon the input parameters.
*
* If @p starting_vertex_labels is not specified then no organization is applied to the output, the
* label and offsets values in the return set will be std::nullopt.
*
* If @p starting_vertex_labels is specified and @p label_to_output_comm_rank is not specified then
* the label output has values. This will also result in the output being sorted by vertex label.
* The offsets array in the return will be a CSR-style offsets array to identify the beginning of
* each label range in the data. `labels.size() == (offsets.size() - 1)`.
*
* If @p starting_vertex_labels is specified and @p label_to_output_comm_rank is specified then the
* label output has values. This will also result in the output being sorted by vertex label. The
* offsets array in the return will be a CSR-style offsets array to identify the beginning of each
* label range in the data. `labels.size() == (offsets.size() - 1)`. Additionally, the data will
* be shuffled so that all data with a particular label will be on the specified rank.
*
* @tparam vertex_t Type of vertex identifiers. Needs to be an integral type.
* @tparam edge_t Type of edge identifiers. Needs to be an integral type.
* @tparam weight_t Type of edge weights. Needs to be a floating point type.
* @tparam edge_type_t Type of edge type. Needs to be an integral type.
* @tparam label_t Type of label. Needs to be an integral type.
* @tparam store_transposed Flag indicating whether sources (if false) or destinations (if
* true) are major indices
* @tparam multi_gpu Flag indicating whether template instantiation should target single-GPU (false)
* @param handle RAFT handle object to encapsulate resources (e.g. CUDA stream, communicator, and
* handles to various CUDA libraries) to run graph algorithms.
* @param graph_view Graph View object to generate NBR Sampling on.
* @param edge_weight_view Optional view object holding edge weights for @p graph_view.
* @param edge_id_view Optional view object holding edge ids for @p graph_view.
* @param edge_type_view Optional view object holding edge types for @p graph_view.
* @param edge_bias_view View object holding edge biases (to be used in biased sampling) for @p
* graph_view. Bias values should be non-negative and the sum of edge bias values from any vertex
* should not exceed std::numeric_limits<bias_t>::max(). 0 bias value indicates that the
* corresponding edge can never be selected.
* @param starting_vertices Device span of starting vertex IDs for the sampling.
* In a multi-gpu context the starting vertices should be local to this GPU.
* @param starting_vertex_labels Optional device span of labels associted with each starting vertex
* for the sampling.
* @param label_to_output_comm_rank Optional tuple of device spans mapping label to a particular
* output rank. Element 0 of the tuple identifes the label, Element 1 of the tuple identifies the
* output rank. The label span must be sorted in ascending order.
* @param fan_out Tuple of host spans mapping each edge type to fanout values. Element 0
* of the tuple defines the size of the fanout per edge type while element 1 defines the branching out
* (fan-out) degree per edge type for each level.
* @param rng_state A pre-initialized raft::RngState object for generating random numbers
* @param return_hops boolean flag specifying if the hop information should be returned
* @param prior_sources_behavior Enum type defining how to handle prior sources, (defaults to
* DEFAULT)
* @param dedupe_sources boolean flag, if true then if a vertex v appears as a destination in hop X
* multiple times with the same label, it will only be passed once (for each label) as a source
* for the next hop. Default is false.
* @param with_replacement boolean flag specifying if random sampling is done with replacement
* (true); or, without replacement (false); default = true;
* @param do_expensive_check A flag to run expensive checks for input arguments (if set to `true`).
* @return tuple device vectors (vertex_t source_vertex, vertex_t destination_vertex,
* optional weight_t weight, optional edge_t edge id, optional edge_type_t edge type,
* optional int32_t hop, optional label_t label, optional size_t offsets)
*/
template <typename vertex_t,
typename edge_t,
typename weight_t,
typename edge_type_t,
typename bias_t,
typename label_t,
bool store_transposed,
bool multi_gpu>
std::tuple<rmm::device_uvector<vertex_t>,
rmm::device_uvector<vertex_t>,
std::optional<rmm::device_uvector<weight_t>>,
std::optional<rmm::device_uvector<edge_t>>,
std::optional<rmm::device_uvector<edge_type_t>>,
std::optional<rmm::device_uvector<int32_t>>,
std::optional<rmm::device_uvector<label_t>>,
std::optional<rmm::device_uvector<size_t>>>
heterogeneous_biased_neighbor_sample(
raft::handle_t const& handle,
graph_view_t<vertex_t, edge_t, store_transposed, multi_gpu> const& graph_view,
std::optional<edge_property_view_t<edge_t, weight_t const*>> edge_weight_view,
std::optional<edge_property_view_t<edge_t, edge_t const*>> edge_id_view,
std::optional<edge_property_view_t<edge_t, edge_type_t const*>> edge_type_view,
edge_property_view_t<edge_t, bias_t const*> edge_bias_view,
raft::device_span<vertex_t const> starting_vertices,
std::optional<raft::device_span<label_t const>> starting_vertex_labels,
std::optional<std::tuple<raft::device_span<label_t const>, raft::device_span<int32_t const>>>
label_to_output_comm_rank,
std::tuple<raft::host_span<int32_t const>, raft::host_span<int32_t const>> fan_out,
raft::random::RngState& rng_state,
bool return_hops,
bool with_replacement = true,
prior_sources_behavior_t prior_sources_behavior = prior_sources_behavior_t::DEFAULT,
bool dedupe_sources = false,
bool do_expensive_check = false);
#endif

/*
* @brief renumber sampled edge list and compress to the (D)CSR|(D)CSC format.
*
Expand Down
47 changes: 47 additions & 0 deletions cpp/include/cugraph_c/sampling_algorithms.h
Original file line number Diff line number Diff line change
Expand Up @@ -319,6 +319,49 @@ void cugraph_sampling_set_dedupe_sources(cugraph_sampling_options_t* options, bo
*/
void cugraph_sampling_options_free(cugraph_sampling_options_t* options);

/**
* @brief Opaque neighborhood sampling heterogeneous fanout type
*/
// FIXME: internal representation should be tuple instead of pairs - Make it more generic (tuple)
jnke2016 marked this conversation as resolved.
Show resolved Hide resolved
// cugraph_device_tuple_t, host_device_tuple_t,
//dictionary, key and array
// translate dictionary to a tuple. Add to the draft PR the PLC layer.
// Concatenate to build the 3 arrays from the PLC layer
/// mimic
typedef struct {
int32_t align_;
} cugraph_sample_heterogeneous_fanout_t;

/**
jnke2016 marked this conversation as resolved.
Show resolved Hide resolved
* @brief Create heterogeneous fanout
*
* Input data will be stored in the heterogenous_fanout.
jnke2016 marked this conversation as resolved.
Show resolved Hide resolved
*
* @param [in] handle Handle for accessing resources
* @param [in] graph Pointer to graph
* @param [in] edge_type_size Type erased array of edge type size
jnke2016 marked this conversation as resolved.
Show resolved Hide resolved
* @param [in] fanout Type erased array of fanout values
* @param [out] heterogeneous_fanout Opaque pointer to fanout_t
* @param [out] error Pointer to an error object storing details of any error. Will
* be populated if error code is not CUGRAPH_SUCCESS
* @return error code
*/
cugraph_error_code_t cugraph_create_heterogeneous_fanout(
const cugraph_resource_handle_t* handle,
cugraph_graph_t* graph,
const cugraph_type_erased_host_array_view_t* edge_type_size,
jnke2016 marked this conversation as resolved.
Show resolved Hide resolved
const cugraph_type_erased_host_array_view_t* fanout,
cugraph_sample_heterogeneous_fanout_t** heterogeneous_fanout,
cugraph_error_t** error);

/**
* @brief Free edge type and fanout pairs
*
* @param [in] heterogeneous_fanout The edge type size and fanout values
*/
void cugraph_heterogeneous_fanout_free(
cugraph_sample_heterogeneous_fanout_t* heterogeneous_fanout);

/**
* @brief Uniform Neighborhood Sampling
*
Expand Down Expand Up @@ -368,6 +411,7 @@ cugraph_error_code_t cugraph_uniform_neighbor_sample(
const cugraph_type_erased_device_array_view_t* label_to_comm_rank,
const cugraph_type_erased_device_array_view_t* label_offsets,
const cugraph_type_erased_host_array_view_t* fan_out,
const cugraph_sample_heterogeneous_fanout_t* heterogeneous_fanout,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we take the same approach here. Create a new C API function called neighbor_sample, following the biased function definition. Add this parameter. Deprecate the other functions. In the implementation we can just check for nullptr (NULL).

cugraph_rng_state_t* rng_state,
const cugraph_sampling_options_t* options,
bool_t do_expensive_check,
Expand Down Expand Up @@ -667,13 +711,16 @@ cugraph_error_code_t cugraph_test_uniform_neighborhood_sample_result_create(
* not CUGRAPH_SUCCESS
* @return error code
*/

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary blank line


cugraph_error_code_t cugraph_select_random_vertices(const cugraph_resource_handle_t* handle,
const cugraph_graph_t* graph,
cugraph_rng_state_t* rng_state,
size_t num_vertices,
cugraph_type_erased_device_array_t** vertices,
cugraph_error_t** error);


#ifdef __cplusplus
}
#endif
12 changes: 12 additions & 0 deletions cpp/src/c_api/array.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,18 @@ struct cugraph_type_erased_host_array_t {
std::copy(vec.begin(), vec.end(), reinterpret_cast<T*>(data_.get()));
}

template <typename T>
T* as_type()
{
return reinterpret_cast<T*>(data_.get());
}

template <typename T>
T const* as_type() const
{
return reinterpret_cast<T const*>(data_.get());
}

auto view()
{
return new cugraph_type_erased_host_array_view_t{data_.get(), size_, num_bytes_, type_};
Expand Down
2 changes: 1 addition & 1 deletion cpp/src/c_api/graph_functions.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ struct create_vertex_pairs_functor : public cugraph::c_api::abstract_functor {
std::nullopt,
std::nullopt);
}

// std::tuple (template)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just tag this with FIXME. I understand it's a note to yourself, but we probably should look at this for vertex_pairs as well... but it's beyond the scope of this PR.

result_ = new cugraph::c_api::cugraph_vertex_pairs_t{
new cugraph::c_api::cugraph_type_erased_device_array_t(first_copy, graph_->vertex_type_),
new cugraph::c_api::cugraph_type_erased_device_array_t(second_copy, graph_->vertex_type_)};
Expand Down
Loading
Loading