-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ktruss implementation #4059
Ktruss implementation #4059
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review part 1/2.
cpp/include/cugraph/algorithms.hpp
Outdated
@@ -1978,6 +1978,26 @@ void triangle_count(raft::handle_t const& handle, | |||
raft::device_span<edge_t> counts, | |||
bool do_expensive_check = false); | |||
|
|||
/* | |||
* @brief Compute ktruss. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ktruss
-> K-truss
cpp/src/community/ktruss_impl.cuh
Outdated
rmm::device_uvector<size_t> intersection_offsets(size_t{0}, handle.get_stream()); | ||
rmm::device_uvector<vertex_t> intersection_indices(size_t{0}, handle.get_stream()); | ||
rmm::device_uvector<edge_t> r_nbr_intersection_property_values0(size_t{0}, handle.get_stream()); | ||
rmm::device_uvector<edge_t> r_nbr_intersection_property_values1(size_t{0}, handle.get_stream()); | ||
|
||
// FIXME: Initially each edge should have an edge property of 0 | ||
std::tie(intersection_offsets, intersection_indices) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto [intersection_offsets, intersection_indices] = detail::nbr_intersection(...);
And you can cut your code size a bit. And better define r_nbr_intersection_property_values0 or 1
right before using them (minimize the scope of a variable).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We explicitly define them when we are taking returned variables with std::tie and use std::ignore as we can't use std::ignore with auto [...]
. We are not using std::ignore here, so no need to explicitly define variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am only using intersection_offsets
and intersection_indices
so I will remove r_nbr_intersection_property_values0
and r_nbr_intersection_property_values1
cpp/src/community/ktruss_impl.cuh
Outdated
auto vertex_pair_buffer = allocate_dataframe_buffer<thrust::tuple<vertex_t, vertex_t>>( | ||
num_vertex_pairs, handle.get_stream()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do not pre-allocate memory too much in advance. This unnecessarily increase peak memory usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in order to populate this vertex_pair_buffer
with thrust::tabulate
am I not required to know its size upfront?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto vertex_pair_buffer_tmp = allocate_dataframe_buffer<thrust::tuple<vertex_t, vertex_t>>(
intersection_indices.size() * 3, handle.get_stream());
The above pre-allocated memory for (p, q), (p, r) and (q, r) but I can do one group at a time and resize to reduce peak memory usage.
cpp/src/community/ktruss_impl.cuh
Outdated
rmm::device_uvector<vertex_t> num_triangles_(3 * intersection_indices.size(), handle.get_stream()); | ||
|
||
thrust::fill(handle.get_thrust_policy(), num_triangles_.begin(), num_triangles_.end(), size_t{1}); | ||
|
||
rmm::device_uvector<vertex_t> num_triangles(num_vertex_pairs, handle.get_stream()); | ||
|
||
thrust::reduce_by_key(handle.get_thrust_policy(), | ||
get_dataframe_buffer_begin(vertex_pair_buffer_), | ||
get_dataframe_buffer_end(vertex_pair_buffer_), | ||
num_triangles_.begin(), | ||
get_dataframe_buffer_begin(vertex_pair_buffer), | ||
num_triangles.begin(), | ||
thrust::equal_to<thrust::tuple<vertex_t, vertex_t>>{}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the code from https://github.com/rapidsai/cugraph/blob/branch-24.02/cpp/src/sampling/sampling_post_processing_impl.cuh
auto num_unique_labels = thrust::count_if(
handle.get_thrust_policy(),
thrust::make_counting_iterator(size_t{0}),
thrust::make_counting_iterator((*renumber_map_label_indices).size()),
detail::is_first_in_run_t<label_index_t const*>{(*renumber_map_label_indices).data()});
rmm::device_uvector<label_index_t> unique_label_indices(num_unique_labels, handle.get_stream());
rmm::device_uvector<vertex_t> vertex_counts(num_unique_labels, handle.get_stream());
thrust::reduce_by_key(handle.get_thrust_policy(),
(*renumber_map_label_indices).begin(),
(*renumber_map_label_indices).end(),
thrust::make_constant_iterator(size_t{1}),
unique_label_indices.begin(),
vertex_counts.begin());
You can use the same logic to perform this with less memory.
cpp/src/community/ktruss_impl.cuh
Outdated
printf("\nbefore sorting\n"); | ||
raft::print_device_vector("src", std::get<0>(vertex_pair_buffer_).data(), std::get<0>(vertex_pair_buffer_).size(), std::cout); | ||
raft::print_device_vector("dst", std::get<1>(vertex_pair_buffer_).data(), std::get<1>(vertex_pair_buffer_).size(), std::cout); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to shuffle the resulting vertex pairs before reducing to be multi-GPU ready.
cpp/src/community/ktruss_impl.cuh
Outdated
num_triangles.begin(), | ||
thrust::equal_to<thrust::tuple<vertex_t, vertex_t>>{}); | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now you can run thrust::partition here (Step 7 from #3446 (comment)) to find the edges to unroll.
I thought more about implementing #3446 (comment); especially the steps 6, 7, and 8. After step 6, sort the (edge source, edge destination, triangle count) triplets using edge destination as the primary key and edge source as the secondary key; this is to implement step 8-2.
Please read this and let me know when you are ready to sync again. |
…to fea_has_edges
…to fea_has_edges
…to fea_has_edges
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will wait for @seunghwak to approve before merging, but LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this code is now nearly in mergable state. Let's quickly address few minor issues and let this PR pass the CI.
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
#include <community/edge_triangle_count_impl.cuh> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget to address this.
cpp/src/community/k_truss_impl.cuh
Outdated
return dist_valid; | ||
}); | ||
thrust::exclusive_scan( | ||
handle.get_thrust_policy(), prefix_sum.begin(), prefix_sum.end(), prefix_sum.begin()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And you can provide thrust::make_transform_iterator(thrust::make_counting_iterator(size_t{0}), [query_vertices, num_edges = ..., sorted_veritces = ...]__device__(auto idx) { ... });
to thrust::exclusive_sum
.
In the current code, you need to write dist
values to prefix_sum
; read dist
values from prefix_sum
; and compute prefix sum and write the results back to prefix_sum
. (2 writes and 1 read). If you use transform_iterator, 1 write will be sufficient. This is a big cut in memory bandwidth and also reduces kernel launch overhead (2 kernel launches => 1 kernel launch).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it and thanks for the background information
cpp/src/community/k_truss_impl.cuh
Outdated
} | ||
|
||
// FIXME: Remove because it yields to incorrect results | ||
// 3. Find (k+1)-core and exclude edges that do not belong to (k+1)-core |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(k+1)=>(k-1) and re-enable this. This shouldn't yield to incorrect results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried and it still yield incorrect results. I tried even k-2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a fixme
for this
|
||
num_invalid_edges = static_cast<size_t>( | ||
thrust::distance(invalid_transposed_edge_triangle_count_first, | ||
transposed_edge_triangle_count_pair_first + edgelist_srcs.size())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
edge_t num_invalid_edges{0};
num_invalid_edges = ...
=>
auto num_invalid_edges = ...;
cpp/tests/community/k_truss_test.cpp
Outdated
#include <rmm/mr/device/cuda_memory_resource.hpp> | ||
|
||
#include <gtest/gtest.h> | ||
#include <utilities/base_fixture.hpp> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget to address this.
cpp/tests/community/k_truss_test.cpp
Outdated
std::optional<cugraph::graph_t<vertex_t, edge_t, false, false>> modified_graph{std::nullopt}; | ||
|
||
std::optional< | ||
cugraph::edge_property_t<cugraph::graph_view_t<vertex_t, edge_t, false, false>, weight_t>> | ||
modified_edge_weight{std::nullopt}; | ||
std::tie(*modified_graph, modified_edge_weight, std::ignore, std::ignore, std::ignore) = | ||
cugraph:: | ||
create_graph_from_edgelist<vertex_t, edge_t, weight_t, edge_t, int32_t, false, false>( | ||
handle, | ||
std::nullopt, | ||
std::move(d_cugraph_src), | ||
std::move(d_cugraph_dst), | ||
std::move(d_cugraph_wgt), | ||
std::nullopt, | ||
std::nullopt, | ||
cugraph::graph_properties_t{true, false}, | ||
renumber); | ||
|
||
// Convert cugraph results to CSR | ||
auto [h_cugraph_offsets, h_cugraph_indices, h_cugraph_values] = | ||
cugraph::test::graph_to_host_csr( | ||
handle, | ||
(*modified_graph).view(), | ||
modified_edge_weight ? std::make_optional((*modified_edge_weight).view()) : std::nullopt, | ||
std::optional<raft::device_span<vertex_t const>>(std::nullopt)); | ||
|
||
// Remove isolated vertices. | ||
h_cugraph_offsets.erase(std::unique(h_cugraph_offsets.begin() + 1, h_cugraph_offsets.end()), | ||
h_cugraph_offsets.end()); // CSR start from 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we really do this? Can we just compare cugraph COO output and reference COO output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya but that means I will convert the reference ktruss to COO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I will have to rely on csr comparison because my cugraph results are not sorted. I get the same concern I had last week regarding sorting edges.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never mind. I extended the sorting utility functions in thrust_wrapper
like you suggested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM and Thanks for all the hard work!!!
k_truss_reference(std::vector<vertex_t> h_offsets, | ||
std::vector<vertex_t> h_indices, | ||
std::optional<std::vector<weight_t>> h_values, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to address this in this PR, but just fyi,
You're passing h_offsets, h_indices, and h_values by value and based on the size of the vectors, this can be expensive. You can pass a const reference (const &
) instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. I will resolve it in my next PR
/merge |
f753e51
into
rapidsai:branch-24.04
Implements SG and MG ktruss using graph primitives and drop
cuHornet
.Closes #3447
Closes #3448
Closes #3449
Closes #3450
Closes #3451
Closes #3452
Closes #3453