Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hyper log log plus plus(HLL++) #2522

Open
wants to merge 34 commits into
base: branch-25.02
Choose a base branch
from
Open

Conversation

res-life
Copy link
Collaborator

@res-life res-life commented Oct 21, 2024

@res-life res-life requested a review from ttnghia October 21, 2024 12:45
@res-life res-life force-pushed the hll branch 3 times, most recently from b6f5cf5 to 526a61f Compare October 31, 2024 11:34
@res-life res-life changed the title [Do not review] Hyper log log plus plus(HLL++) Hyper log log plus plus(HLL++) Oct 31, 2024
src/main/cpp/src/HLLPP.cu Outdated Show resolved Hide resolved
auto input_cols = std::vector<int64_t const*>(input_iter, input_iter + input.num_children());
auto d_inputs = cudf::detail::make_device_uvector_async(input_cols, stream, mr);
auto result = cudf::make_numeric_column(
cudf::data_type{cudf::type_id::INT64}, input.size(), cudf::mask_state::ALL_VALID, stream);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need such all-valid null mask? How about cudf::mask_state::UNALLOCATED?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested Spark behavior, for approx_count_distinct(null) returns 0.
So the values in result column are always non-null

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant, if all rows are valid, we don't need to allocate a null mask.
BTW, we need to pass mr to the returning column (but do not pass it to the intermediate vector/column).

Suggested change
cudf::data_type{cudf::type_id::INT64}, input.size(), cudf::mask_state::ALL_VALID, stream);
cudf::data_type{cudf::type_id::INT64}, input.size(), cudf::mask_state::UNALLOCATED, stream, mr);

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

auto result = cudf::make_numeric_column(
cudf::data_type{cudf::type_id::INT64}, input.size(), cudf::mask_state::ALL_VALID, stream);
// evaluate from struct<long, ..., long>
thrust::for_each_n(rmm::exec_policy(stream),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to use exec_policy_nosync as much as possible.

Suggested change
thrust::for_each_n(rmm::exec_policy(stream),
thrust::for_each_n(rmm::exec_policy_nosync(stream),

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 34 to 36
* The input sketch values must be given in the format `LIST<INT8>`.
*
* @param input The sketch column which constains `LIST<INT8> values.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

INT8 or INT64?

Copy link
Collaborator

@ttnghia ttnghia Nov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition, in estimate_from_hll_sketches I see that the input is STRUCT<LONG, LONG, ....> instead of LIST<>. Why?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's STRUCT<LONG, LONG, ....> consistent with Spark. The input is columnar data, e.g.: sketch 0 is composed of by all the data of the children at index 0.
Updated the function comments, refer to commit

@res-life
Copy link
Collaborator Author

Ready to review except test cases.

src/main/cpp/src/HLLPP.cu Outdated Show resolved Hide resolved
src/main/cpp/src/HLLPP.cu Outdated Show resolved Hide resolved
src/main/cpp/src/HLLPP.cu Outdated Show resolved Hide resolved
src/main/cpp/src/HLLPP.cu Outdated Show resolved Hide resolved
src/main/cpp/src/HLLPP.cu Outdated Show resolved Hide resolved
src/main/cpp/src/HLLPP.cu Outdated Show resolved Hide resolved
src/main/cpp/src/HLLPP.cu Outdated Show resolved Hide resolved
* sketch. Input is a struct column with multiple long columns which is
* consistent with Spark. Output is a struct scalar with multiple long values.
*/
ReductionMERGE(1),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ReductionMERGE(1),
ReductionMerge(1),

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 105 to 117
* The value of num_registers_per_sketch = 2^precision
* The children num of this Struct is: num_registers_per_sketch / 10 + 1,
* Here 10 means a INT64 contains 10 register values,
* each register value is 6 bits.
* Register value is the number of leading zero bits in xxhash64 hash code.
* xxhash64 hash code is 64 bits, Register value is 6 bits,
* 6 bits is enough to hold the max value 64.
*
* @param input The sketch column which constains Struct<INT64, INT64, ...>
* values.
* @param precision The num of bits for HLLPP register addressing.
* @return A INT64 column with each value indicates the approximate count
* distinct value.
Copy link
Collaborator

@ttnghia ttnghia Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to reformat + polish docs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 70 to 75
* `reduce_by_key` uses num_rows_input intermidate cache:
* https://github.com/NVIDIA/thrust/blob/2.1.0/thrust/system/detail/generic/reduce_by_key.inl#L112
*
* // scan the values by flag
* thrust::detail::temporary_array<ValueType,ExecutionPolicy>
* scanned_values(exec, n);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do these APIs affect us?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to use reduce_by_key, but it uses too much of memory, so give up using reduce_by_key.
And updated the comments to:

 * Tried to use `reduce_by_key`, but it uses too much of memory, so give up using `reduce_by_key`.
 * More details:
 * `reduce_by_key` uses num_rows_input intermidate cache:
 * https://github.com/NVIDIA/thrust/blob/2.1.0/thrust/system/detail/generic/reduce_by_key.inl#L112
 * // scan the values by flag
 * thrust::detail::temporary_array<ValueType,ExecutionPolicy>
 * scanned_values(exec, n);
 * Each sketch contains multiple integers, by default 512 integers(precision is
 * 9), num_rows_input * 512 is huge.

Comment on lines 733 to 742
auto d_results = [&] {
auto host_results_pointer_iter =
thrust::make_transform_iterator(children.begin(), [](auto const& results_column) {
return results_column->mutable_view().template data<int64_t>();
});
auto host_results_pointers =
std::vector<int64_t*>(host_results_pointer_iter, host_results_pointer_iter + children.size());
return cudf::detail::make_device_uvector_async(
host_results_pointers, stream, cudf::get_current_device_resource_ref());
}();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

host_results_pointers can be destroyed before data is copied to device. Since we don't like stream sync, just move it out of the code block:

Suggested change
auto d_results = [&] {
auto host_results_pointer_iter =
thrust::make_transform_iterator(children.begin(), [](auto const& results_column) {
return results_column->mutable_view().template data<int64_t>();
});
auto host_results_pointers =
std::vector<int64_t*>(host_results_pointer_iter, host_results_pointer_iter + children.size());
return cudf::detail::make_device_uvector_async(
host_results_pointers, stream, cudf::get_current_device_resource_ref());
}();
auto host_results_pointer_iter =
thrust::make_transform_iterator(children.begin(), [](auto const& results_column) {
return results_column->mutable_view().template data<int64_t>();
});
auto host_results_pointers =
std::vector<int64_t*>(host_results_pointer_iter, host_results_pointer_iter + children.size());
auto d_results = cudf::detail::make_device_uvector_async(
host_results_pointers, stream, cudf::get_current_device_resource_ref());

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 437 to 445
auto d_results = [&] {
auto host_results_pointer_iter =
thrust::make_transform_iterator(children.begin(), [](auto const& results_column) {
return results_column->mutable_view().template data<int64_t>();
});
auto host_results_pointers =
std::vector<int64_t*>(host_results_pointer_iter, host_results_pointer_iter + children.size());
return cudf::detail::make_device_uvector_async(host_results_pointers, stream, mr);
}();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, host_results_pointers can be destroyed before data is copied to device. Just remove this code block.

Suggested change
auto d_results = [&] {
auto host_results_pointer_iter =
thrust::make_transform_iterator(children.begin(), [](auto const& results_column) {
return results_column->mutable_view().template data<int64_t>();
});
auto host_results_pointers =
std::vector<int64_t*>(host_results_pointer_iter, host_results_pointer_iter + children.size());
return cudf::detail::make_device_uvector_async(host_results_pointers, stream, mr);
}();
auto host_results_pointer_iter =
thrust::make_transform_iterator(children.begin(), [](auto const& results_column) {
return results_column->mutable_view().template data<int64_t>();
});
auto host_results_pointers =
std::vector<int64_t*>(host_results_pointer_iter, host_results_pointer_iter + children.size());
auto d_results = cudf::detail::make_device_uvector_async(host_results_pointers, stream, mr);

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 631 to 639
auto d_sketches_output = [&] {
auto host_results_pointer_iter =
thrust::make_transform_iterator(results.begin(), [](auto const& results_column) {
return results_column->mutable_view().template data<int64_t>();
});
auto host_results_pointers =
std::vector<int64_t*>(host_results_pointer_iter, host_results_pointer_iter + results.size());
return cudf::detail::make_device_uvector_async(host_results_pointers, stream, mr);
}();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, host_results_pointers can be destroyed before data is copied to device. Move code out of block.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 803 to 811
auto d_results = [&] {
auto host_results_pointer_iter =
thrust::make_transform_iterator(children.begin(), [](auto const& results_column) {
return results_column->mutable_view().template data<int64_t>();
});
auto host_results_pointers =
std::vector<int64_t*>(host_results_pointer_iter, host_results_pointer_iter + children.size());
return cudf::detail::make_device_uvector_async(host_results_pointers, stream, mr);
}();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar issue when host_results_pointers is destroyed before data is copied to device.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@res-life
Copy link
Collaborator Author

build

@res-life
Copy link
Collaborator Author

res-life commented Jan 16, 2025

Since NVIDIA/spark-rapids#11638 passed all the HLLPP cases, and Nghia will be on vacation, could we put the Java/Cpp cases into a follow-up PR?

One issue we need to discuss, refer to link

@ttnghia Thanks for your dedicated review.

Comment on lines 450 to 452
auto host_results_pointers =
std::vector<int64_t*>(host_results_pointer_iter, host_results_pointer_iter + children.size());
auto d_results = cudf::detail::make_device_uvector_async(host_results_pointers, stream, mr);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I was still wrong. host_results_pointers is destroyed when this function ends but the kernel may not be executed on device yet. So we have to have stream sync in this function anyhow. Probably we sync here.

Suggested change
auto host_results_pointers =
std::vector<int64_t*>(host_results_pointer_iter, host_results_pointer_iter + children.size());
auto d_results = cudf::detail::make_device_uvector_async(host_results_pointers, stream, mr);
auto host_results_pointers =
std::vector<int64_t*>(host_results_pointer_iter, host_results_pointer_iter + children.size());
auto d_results = cudf::detail::make_device_uvector_sync(host_results_pointers, stream, mr);

Copy link
Collaborator

@ttnghia ttnghia Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar for all other places. We need to sync stream so when the functions end we will not access to the released host buffer.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@res-life
Copy link
Collaborator Author

build

@res-life
Copy link
Collaborator Author

build

@res-life
Copy link
Collaborator Author

@ttnghia test case added.

@res-life
Copy link
Collaborator Author

@revans2 Could you help review test case commit, all other comments from Nghia were addressed. Nghia is on vacation.

@res-life
Copy link
Collaborator Author

Wait for rapidsai/cudf#17770:
HostUDFWrapper changed in above PR.
Need to update HyperLogLogPlusPlusHostUDF

public class HyperLogLogPlusPlusHostUDF extends HostUDFWrapper

@res-life
Copy link
Collaborator Author

build

@res-life
Copy link
Collaborator Author

build

@res-life
Copy link
Collaborator Author

build

@res-life
Copy link
Collaborator Author

res-life commented Jan 24, 2025

@ttnghia Could you please help reivew again, changes after the previous review:

  • Add test case, commit link
  • Update according to HostUDFWrapper change

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants