Skip to content

Commit

Permalink
Biased sampling (#4443)
Browse files Browse the repository at this point in the history
Implement biased sampling using the biased sampling primitive. The public biased sampling function takes bias values as edge property values; otherwise same to the uniform neighbor sampling function.

Closes #4290

Authors:
  - Seunghwa Kang (https://github.com/seunghwak)
  - Naim (https://github.com/naimnv)

Approvers:
  - Chuck Hastings (https://github.com/ChuckHastings)
  - Joseph Nke (https://github.com/jnke2016)
  - Alex Barghi (https://github.com/alexbarghi-nv)
  - Naim (https://github.com/naimnv)

URL: #4443
  • Loading branch information
seunghwak authored Jul 2, 2024
1 parent 538a2ce commit 380879f
Show file tree
Hide file tree
Showing 57 changed files with 3,296 additions and 1,470 deletions.
18 changes: 12 additions & 6 deletions cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -310,6 +310,12 @@ set(CUGRAPH_SOURCES
src/sampling/detail/gather_one_hop_edgelist_mg_v32_e64.cu
src/sampling/detail/remove_visited_vertices_from_frontier_sg_v32_e32.cu
src/sampling/detail/remove_visited_vertices_from_frontier_sg_v64_e64.cu
src/sampling/detail/check_edge_bias_values_sg_v64_e64.cu
src/sampling/detail/check_edge_bias_values_sg_v32_e32.cu
src/sampling/detail/check_edge_bias_values_sg_v32_e64.cu
src/sampling/detail/check_edge_bias_values_mg_v64_e64.cu
src/sampling/detail/check_edge_bias_values_mg_v32_e32.cu
src/sampling/detail/check_edge_bias_values_mg_v32_e64.cu
src/sampling/detail/sample_edges_sg_v64_e64.cu
src/sampling/detail/sample_edges_sg_v32_e32.cu
src/sampling/detail/sample_edges_sg_v32_e64.cu
Expand All @@ -319,12 +325,12 @@ set(CUGRAPH_SOURCES
src/sampling/detail/shuffle_and_organize_output_mg_v64_e64.cu
src/sampling/detail/shuffle_and_organize_output_mg_v32_e32.cu
src/sampling/detail/shuffle_and_organize_output_mg_v32_e64.cu
src/sampling/uniform_neighbor_sampling_mg_v32_e64.cpp
src/sampling/uniform_neighbor_sampling_mg_v32_e32.cpp
src/sampling/uniform_neighbor_sampling_mg_v64_e64.cpp
src/sampling/uniform_neighbor_sampling_sg_v32_e64.cpp
src/sampling/uniform_neighbor_sampling_sg_v32_e32.cpp
src/sampling/uniform_neighbor_sampling_sg_v64_e64.cpp
src/sampling/neighbor_sampling_mg_v32_e64.cpp
src/sampling/neighbor_sampling_mg_v32_e32.cpp
src/sampling/neighbor_sampling_mg_v64_e64.cpp
src/sampling/neighbor_sampling_sg_v32_e64.cpp
src/sampling/neighbor_sampling_sg_v32_e32.cpp
src/sampling/neighbor_sampling_sg_v64_e64.cpp
src/sampling/renumber_sampled_edgelist_sg_v64_e64.cu
src/sampling/renumber_sampled_edgelist_sg_v32_e32.cu
src/sampling/sampling_post_processing_sg_v64_e64.cu
Expand Down
109 changes: 0 additions & 109 deletions cpp/include/cugraph/algorithms.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -1872,115 +1872,6 @@ k_core(raft::handle_t const& handle,
std::optional<raft::device_span<edge_t const>> core_numbers,
bool do_expensive_check = false);

/**
* @brief Controls how we treat prior sources in sampling
*
* @param DEFAULT Add vertices encounted while sampling to the new frontier
* @param CARRY_OVER In addition to newly encountered vertices, include vertices
* used as sources in any previous frontier in the new frontier
* @param EXCLUDE Filter the new frontier to exclude any vertex that was
* used as a source in a previous frontier
*/
enum class prior_sources_behavior_t { DEFAULT = 0, CARRY_OVER, EXCLUDE };

/**
* @brief Uniform Neighborhood Sampling.
*
* This function traverses from a set of starting vertices, traversing outgoing edges and
* randomly selects from these outgoing neighbors to extract a subgraph.
*
* Output from this function is a tuple of vectors (src, dst, weight, edge_id, edge_type, hop,
* label, offsets), identifying the randomly selected edges. src is the source vertex, dst is the
* destination vertex, weight (optional) is the edge weight, edge_id (optional) identifies the edge
* id, edge_type (optional) identifies the edge type, hop identifies which hop the edge was
* encountered in. The label output (optional) identifes the vertex label. The offsets array
* (optional) will be described below and is dependent upon the input parameters.
*
*
* If @p starting_vertex_labels is not specified then no organization is applied to the output, the
* label and offsets values in the return set will be std::nullopt.
*
* If @p starting_vertex_labels is specified and @p label_to_output_comm_rank is not specified then
* the label output has values. This will also result in the output being sorted by vertex label.
* The offsets array in the return will be a CSR-style offsets array to identify the beginning of
* each label range in the data. `labels.size() == (offsets.size() - 1)`.
*
* If @p starting_vertex_labels is specified and @p label_to_output_comm_rank is specified then the
* label output has values. This will also result in the output being sorted by vertex label. The
* offsets array in the return will be a CSR-style offsets array to identify the beginning of each
* label range in the data. `labels.size() == (offsets.size() - 1)`. Additionally, the data will
* be shuffled so that all data with a particular label will be on the specified rank.
*
* @tparam vertex_t Type of vertex identifiers. Needs to be an integral type.
* @tparam edge_t Type of edge identifiers. Needs to be an integral type.
* @tparam weight_t Type of edge weights. Needs to be a floating point type.
* @tparam edge_type_t Type of edge type. Needs to be an integral type.
* @tparam label_t Type of label. Needs to be an integral type.
* @tparam store_transposed Flag indicating whether sources (if false) or destinations (if
* true) are major indices
* @tparam multi_gpu Flag indicating whether template instantiation should target single-GPU (false)
* @param handle RAFT handle object to encapsulate resources (e.g. CUDA stream, communicator, and
* handles to various CUDA libraries) to run graph algorithms.
* @param graph_view Graph View object to generate NBR Sampling on.
* @param edge_weight_view Optional view object holding edge weights for @p graph_view.
* @param edge_id_view Optional view object holding edge ids for @p graph_view.
* @param edge_type_view Optional view object holding edge types for @p graph_view.
* @param starting_vertices Device span of starting vertex IDs for the sampling.
* In a multi-gpu context the starting vertices should be local to this GPU.
* @param starting_vertex_labels Optional device span of labels associted with each starting vertex
* for the sampling.
* @param label_to_output_comm_rank Optional tuple of device spans mapping label to a particular
* output rank. Element 0 of the tuple identifes the label, Element 1 of the tuple identifies the
* output rank. The label span must be sorted in ascending order.
* @param fan_out Host span defining branching out (fan-out) degree per source vertex for each
* level
* @param rng_state A pre-initialized raft::RngState object for generating random numbers
* @param return_hops boolean flag specifying if the hop information should be returned
* @param prior_sources_behavior Enum type defining how to handle prior sources, (defaults to
* DEFAULT)
* @param dedupe_sources boolean flag, if true then if a vertex v appears as a destination in hop X
* multiple times with the same label, it will only be passed once (for each label) as a source
* for the next hop. Default is false.
* @param with_replacement boolean flag specifying if random sampling is done with replacement
* (true); or, without replacement (false); default = true;
* @param do_expensive_check A flag to run expensive checks for input arguments (if set to `true`).
* @return tuple device vectors (vertex_t source_vertex, vertex_t destination_vertex,
* optional weight_t weight, optional edge_t edge id, optional edge_type_t edge type,
* optional int32_t hop, optional label_t label, optional size_t offsets)
*/
template <typename vertex_t,
typename edge_t,
typename weight_t,
typename edge_type_t,
typename label_t,
bool store_transposed,
bool multi_gpu>
std::tuple<rmm::device_uvector<vertex_t>,
rmm::device_uvector<vertex_t>,
std::optional<rmm::device_uvector<weight_t>>,
std::optional<rmm::device_uvector<edge_t>>,
std::optional<rmm::device_uvector<edge_type_t>>,
std::optional<rmm::device_uvector<int32_t>>,
std::optional<rmm::device_uvector<label_t>>,
std::optional<rmm::device_uvector<size_t>>>
uniform_neighbor_sample(
raft::handle_t const& handle,
graph_view_t<vertex_t, edge_t, store_transposed, multi_gpu> const& graph_view,
std::optional<edge_property_view_t<edge_t, weight_t const*>> edge_weight_view,
std::optional<edge_property_view_t<edge_t, edge_t const*>> edge_id_view,
std::optional<edge_property_view_t<edge_t, edge_type_t const*>> edge_type_view,
raft::device_span<vertex_t const> starting_vertices,
std::optional<raft::device_span<label_t const>> starting_vertex_labels,
std::optional<std::tuple<raft::device_span<label_t const>, raft::device_span<int32_t const>>>
label_to_output_comm_rank,
raft::host_span<int32_t const> fan_out,
raft::random::RngState& rng_state,
bool return_hops,
bool with_replacement = true,
prior_sources_behavior_t prior_sources_behavior = prior_sources_behavior_t::DEFAULT,
bool dedupe_sources = false,
bool do_expensive_check = false);

/*
* @brief Compute triangle counts.
*
Expand Down
13 changes: 11 additions & 2 deletions cpp/include/cugraph/graph.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -319,11 +319,20 @@ struct invalid_idx<
template <typename vertex_t>
struct invalid_vertex_id : invalid_idx<vertex_t> {};

template <typename vertex_t>
inline constexpr vertex_t invalid_vertex_id_v = invalid_vertex_id<vertex_t>::value;

template <typename edge_t>
struct invalid_edge_id : invalid_idx<edge_t> {};

template <typename vertex_t>
struct invalid_component_id : invalid_idx<vertex_t> {};
template <typename edge_t>
inline constexpr edge_t invalid_edge_id_v = invalid_edge_id<edge_t>::value;

template <typename component_t>
struct invalid_component_id : invalid_idx<component_t> {};

template <typename component_t>
inline constexpr component_t invalid_component_id_v = invalid_component_id<component_t>::value;

template <typename vertex_t>
__host__ __device__ std::enable_if_t<std::is_signed<vertex_t>::value, bool> is_valid_vertex(
Expand Down
Loading

0 comments on commit 380879f

Please sign in to comment.