Device Cluster Reconstruction, main branch (2025.06.19.) #1028

krasznaa · 2025-06-19T15:35:57Z

Following up from #1023, this is how I believe the clusterization algorithm should provide its results. And I'm very open to a discussion on this.

While I generally liked the idea of having different functions for reconstructing just the measurements or the measurements plus the clusters, I think in the end that just makes the code more complicated than it needs to be. I propose that we just have a single interface for the device clusterization algorithms. One which returns an optional cluster collection buffer. Behind the scenes this was the implementation in cuda::clusterization_algorithm already. And I think this is a functional interface towards the users as well. 🤔

After updating the CUDA algorithm, I added the functionality of reconstructing cell clusters to the Alpaka and SYCL algorithms as well.

Unfortunately however the Alpaka algorithm, on an NVIDIA backend, is not behaving well at the moment. 😭 The unit test reports that a random number of the tests fail to return the correct (number of) clusters from the algorithm. And so far I didn't manage to find yet what is causing this issue. 😦

Still, even with that open issue, I wanted to let people see my proposal already. As there may be a discussion about the user API...

As before, pinging @CrossR and @StewMH for their opinions.

P.S. Almost forgot... I also modified traccc::algorithm to allow fundamental arguments to be passed by value to an algorithm. As I really wanted to be able to pass a bool to the clusterization algorithms...

So that for instance a simple boolean flag could be an algorithm argument.

And then updated traccc::cuda::clusterization_algorithm to make use of it, simplifying the algorithm's interface a great deal.

Following the code structure used by cuda::clusterization_algorithm. Also updated all client code to the new API.

stephenswat

I didn't really bother reading past the Alpaka code, but could you elaborate why exactly you think this makes the code "simpler"? The net PR delta is almost +100 lines and this change makes the API for users more obtuse to use, who in the existing implementation know exactly that if they pass the right tag, they get the desired result. Returning an optional value leaves open the semantic to return an non-extant value even if the user requests the cluster data to be delivered.

In my eyes, this just makes the code less understandable for no good reason, so I'd like to understand what the benefit of this approach is.

stephenswat · 2025-06-19T16:39:45Z

device/alpaka/include/traccc/alpaka/clusterization/clusterization_algorithm.hpp

    vecmem::data::vector_buffer<device::details::index_t> m_f_backup,
        m_gf_backup;
    vecmem::data::vector_buffer<unsigned char> m_adjc_backup;
    vecmem::data::vector_buffer<device::details::index_t> m_adjv_backup;
    vecmem::unique_alloc_ptr<unsigned int> m_backup_mutex;
-    mutable std::once_flag m_setup_once;


Unrelated change?

A little bit, yes. I also made the algorithm receive a queue from the outside.

stephenswat · 2025-06-19T16:42:33Z

device/alpaka/src/clusterization/clusterization_algorithm.cpp

+        return {
+            .measurements = {},
+            .clusters =
+                (reconstruct_clusters
+                     ? std::optional<
+                           edm::silicon_cluster_collection::
+                               buffer>{edm::silicon_cluster_collection::
+                                           buffer{}}
+                     : std::optional<edm::silicon_cluster_collection::buffer>{
+                           std::nullopt})};


There's got to be a way to make this more readable. 😟

I agree. This is pretty terrible. Unfortunately ternary operators don't seem to mix too well with std::optional. 😦 But yes, probably something better could still be done...

Ideally I would've liked to write:

Suggested change

return {

.measurements = {},

.clusters =

(reconstruct_clusters

? std::optional<

edm::silicon_cluster_collection::

buffer>{edm::silicon_cluster_collection::

buffer{}}

: std::optional<edm::silicon_cluster_collection::buffer>{

std::nullopt})};

return {

.measurements = {},

.clusters = (reconstruct_clusters ? {} : std::nullopt)};

But of course that didn't work... 😦

krasznaa

The +100 lines comes mainly from the fact that I'm teaching the Alpaka and SYCL algorithms how to do a new thing. (Plus I'm adding code to the unit tests to check the new behaviour.) The changeset on the CUDA code should be a net negative if you check.

I'm not convinced about the interface either. But putting 4 functions into all 3 clusterization algorithms seems to be an overkill for me.

I'm mainly thinking that client code will not just want to either reconstruct clusters or not reconstruct clusters. I think in the Athena code we will want to turn cluster reconstruction on or off through configuration options in the end. And having to write separate code paths in that code to achieve this, does not seem great. (I don't see us writing generic code for this through some clever templating there.)

krasznaa · 2025-06-19T17:02:07Z

device/alpaka/include/traccc/alpaka/clusterization/clusterization_algorithm.hpp

    vecmem::data::vector_buffer<device::details::index_t> m_f_backup,
        m_gf_backup;
    vecmem::data::vector_buffer<unsigned char> m_adjc_backup;
    vecmem::data::vector_buffer<device::details::index_t> m_adjv_backup;
    vecmem::unique_alloc_ptr<unsigned int> m_backup_mutex;
-    mutable std::once_flag m_setup_once;


A little bit, yes. I also made the algorithm receive a queue from the outside.

krasznaa · 2025-06-19T17:05:34Z

device/alpaka/src/clusterization/clusterization_algorithm.cpp

+        return {
+            .measurements = {},
+            .clusters =
+                (reconstruct_clusters
+                     ? std::optional<
+                           edm::silicon_cluster_collection::
+                               buffer>{edm::silicon_cluster_collection::
+                                           buffer{}}
+                     : std::optional<edm::silicon_cluster_collection::buffer>{
+                           std::nullopt})};


I agree. This is pretty terrible. Unfortunately ternary operators don't seem to mix too well with std::optional. 😦 But yes, probably something better could still be done...

Ideally I would've liked to write:

Suggested change

return {

.measurements = {},

.clusters =

(reconstruct_clusters

? std::optional<

edm::silicon_cluster_collection::

buffer>{edm::silicon_cluster_collection::

buffer{}}

: std::optional<edm::silicon_cluster_collection::buffer>{

std::nullopt})};

return {

.measurements = {},

.clusters = (reconstruct_clusters ? {} : std::nullopt)};

But of course that didn't work... 😦

Made the early returns from clusterization easier to read. Hid the "device objects" in the CCL tests inside of the lambda once more. Tweaked the Alpaka code a bit, though still didn't manage to make the NVIDIA backed jobs work correctly.

sonarqubecloud · 2025-06-19T18:26:18Z

Quality Gate passed

Issues
1 New issue
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
9.2% Duplication on New Code

See analysis details on SonarQube Cloud

krasznaa · 2025-06-20T10:12:08Z

Please have another look. These changes are needed relatively urgently. In one way or another.

stephenswat · 2025-06-20T10:59:47Z

These changes are needed relatively urgently.

Why? The API currently in the repository is perfectly functional already... I don't see the urgency in changing it.

krasznaa · 2025-06-20T11:36:56Z

There is urgency in providing this for Alpaka and SYCL as well. As those will be needed for our tests.

I'll walk back on the interface change a little then. As implementing the feature in all backends is fairly urgent.

krasznaa added 5 commits June 19, 2025 14:58

Allow fundamental types as algorithm arguments.

48d0a80

So that for instance a simple boolean flag could be an algorithm argument.

Introduced traccc::device::clusterization_return_type.

c88d5e8

And then updated traccc::cuda::clusterization_algorithm to make use of it, simplifying the algorithm's interface a great deal.

Updated the clients of traccc::cuda::clusterization_algorithm.

c372563

Taught traccc::alpaka::clusterization_algorithm to reconstruct clusters.

27f6429

Following the code structure used by cuda::clusterization_algorithm. Also updated all client code to the new API.

Taught traccc::sycl::clusterization_algorithm to reconstruct clusters.

21733db

Following the code structure used by cuda::clusterization_algorithm. Also updated all client code to the new API.

krasznaa requested review from stephenswat and beomki-yeo June 19, 2025 15:35

krasznaa added feature New feature or request cuda Changes related to CUDA sycl Changes related to SYCL alpaka Changes related to Alpaka labels Jun 19, 2025

stephenswat requested changes Jun 19, 2025

View reviewed changes

krasznaa commented Jun 19, 2025

View reviewed changes

Various small tweaks.

f8e611f

Made the early returns from clusterization easier to read. Hid the "device objects" in the CCL tests inside of the lambda once more. Tweaked the Alpaka code a bit, though still didn't manage to make the NVIDIA backed jobs work correctly.

krasznaa added the high priority label Jun 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Device Cluster Reconstruction, main branch (2025.06.19.) #1028

Device Cluster Reconstruction, main branch (2025.06.19.) #1028

krasznaa commented Jun 19, 2025

Uh oh!

stephenswat left a comment

Uh oh!

stephenswat Jun 19, 2025

Uh oh!

krasznaa Jun 19, 2025

Uh oh!

stephenswat Jun 19, 2025

Uh oh!

krasznaa Jun 19, 2025

Uh oh!

krasznaa left a comment

Uh oh!

krasznaa Jun 19, 2025

Uh oh!

krasznaa Jun 19, 2025

Uh oh!

sonarqubecloud bot commented Jun 19, 2025

Uh oh!

krasznaa commented Jun 20, 2025

Uh oh!

stephenswat commented Jun 20, 2025

Uh oh!

krasznaa commented Jun 20, 2025

Uh oh!

Uh oh!

Device Cluster Reconstruction, main branch (2025.06.19.) #1028

Are you sure you want to change the base?

Device Cluster Reconstruction, main branch (2025.06.19.) #1028

Conversation

krasznaa commented Jun 19, 2025

Uh oh!

stephenswat left a comment

Choose a reason for hiding this comment

Uh oh!

stephenswat Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

krasznaa Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

stephenswat Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

krasznaa Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

krasznaa left a comment

Choose a reason for hiding this comment

Uh oh!

krasznaa Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

krasznaa Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud bot commented Jun 19, 2025

Quality Gate passed

Uh oh!

krasznaa commented Jun 20, 2025

Uh oh!

stephenswat commented Jun 20, 2025

Uh oh!

krasznaa commented Jun 20, 2025

Uh oh!

Uh oh!