Cell Sorting, main branch (2026.02.20.) by krasznaa · Pull Request #1264 · acts-project/traccc

krasznaa · 2026-02-20T15:40:24Z

After earlier discussions about how fast we can be with sorting cells as part of the throughput measurements, I spent some time in putting up some code for this.

I introduced "cell sorting algorithms" for all backends. In pretty much the same way in which the measurement sorting algorithms are implemented.

Then I taught traccc::io::read_cells(...) how to randomize the order of the cells on request. I did it like this because the CSV reading code is fundamentally set up such that it would output a sorted vector of cells. Instead of completely re-thinking the logic of the I/O code, it was easier to add a shuffling step at the end. (When the user asks for it.)

Finally I updated the throughput measurement applications to:

shuffle the cells that they read into host memory;
make use of the appropriate cell sorting algorithm as part of their data processing.

Unfortunately the result is slightly worse than what I was hoping for. 😦 With the current main branch I see the following (reference) throughput on our trusty ol' A5000:

[bash][pcadp04]:traccc > ./build_current/bin/traccc_throughput_mt_cuda --input-directory /data/Acts/odd-simulations-20240509/geant4_ttbar_mu200/ --input-events=20 --track-candidates-range=5:100 --seedfinder-vertex-range=-150:150 --finding-run-mbf-smoother=false --processed-events=500 --deterministic --cpu-threads=8
...
Warm-up processing [==================================================] 100% [00m:00s]                                            
Event processing   [==================================================] 100% [00m:00s]                                            
04:34:29 PM ThroughputExample             INFO      Reconstructed track parameters: 2622220
04:34:29 PM ThroughputExample             INFO      Time totals:                   File reading  1249 ms
04:34:29 PM ThroughputExample             INFO                  Warm-up processing  153 ms
04:34:29 PM ThroughputExample             INFO                    Event processing  5087 ms
04:34:29 PM ThroughputExample             INFO      Throughput:            Warm-up processing  15.3186 ms/event, 65.2802 events/s
04:34:29 PM ThroughputExample             INFO                    Event processing  10.1754 ms/event, 98.2765 events/s
[bash][pcadp04]:traccc >

While when I add an extra sorting step, I get:

[bash][pcadp04]:traccc > ./build_new/bin/traccc_throughput_mt_cuda --input-directory /data/Acts/odd-simulations-20240509/geant4_ttbar_mu200/ --input-events=20 --track-candidates-range=5:100 --seedfinder-vertex-range=-150:150 --finding-run-mbf-smoother=false --processed-events=500 --deterministic --cpu-threads=8
...
Warm-up processing [==================================================] 100% [00m:00s]                                            
Event processing   [==================================================] 100% [00m:00s]                                            
04:35:53 PM ThroughputExample             INFO      Reconstructed track parameters: 2622229
04:35:53 PM ThroughputExample             INFO      Time totals:                   File reading  975 ms
04:35:53 PM ThroughputExample             INFO                  Warm-up processing  161 ms
04:35:53 PM ThroughputExample             INFO                    Event processing  5510 ms
04:35:53 PM ThroughputExample             INFO      Throughput:            Warm-up processing  16.1115 ms/event, 62.0676 events/s
04:35:53 PM ThroughputExample             INFO                    Event processing  11.0217 ms/event, 90.7302 events/s
[bash][pcadp04]:traccc >

So the cell sorting adds almost an entire millisecond to the event processing. 😦 Way more than I was expecting...

I didn't do any deeper profiling on the sorting code. It's not impossible that it could still be improved. And it's also worth remembering that the random shuffling of the cells that the code does is a much worse scenario than what we would ever get from real data. Even under the least ideal circumstances.

Still, I was hoping for a quicker sorting, even with all this taken into account. 🤔

Pinging @flg, @paradajzblond.

krasznaa · 2026-02-20T15:44:06Z

device/common/include/traccc/clusterization/device/silicon_cell_sorter.hpp

+        if (rhs >= cells.size()) {
+            return true;
+        }
+        return cells.at(lhs) < cells.at(rhs);


@stephenswat, "how sorted" do the cells actually need to be? 🤔 The I/O code was "fully" sorting them so far, so I went for the same in these algorithms. But is this necessary? Would it maybe be enough to just do the same that we do for the measurements? (That cells belonging to the same module would be side-by-side. But not necessarily in the correct order.)

Stephen confirmed recently that they need to be grouped (contiguous) by module and then sorted by row and column indices.

Yeah, this is what I remembered. Still, was hoping that I misremembered...

In the end this is exactly what the EDM defines currently.

https://github.com/acts-project/traccc/blob/main/core/include/traccc/edm/impl/silicon_cell_collection.ipp#L52-L64

flg · 2026-02-20T16:03:52Z

This is quite interesting, thank you for providing this. So 0.85 ms to sort the completely randomized cells of a PU200 events.

First, just to make sure that our numbers are comparable: how many cells is this? For ITk we have on average 1.1e6 cells per ttbar, pu200 event. I expect it to be the same.

If we want to compare with the current scenario, we want to know how long it takes to sort all cells provided that there are already grouped by module. This, I expect, can make a significant difference on your side. To be more cost efficient than the CPU equivalent in this scenario the GPU needs to make it in 0.54 ms or less.

Made it possible to randomize the order of the cells read from an input CSV. In order to exercise the newly added cell sorting algorithms.

krasznaa · 2026-02-20T16:10:52Z

All good/relevant points.

The ODD μ=200 sample contains O(500k) cells per event. So about half of the ITk. 🤔

I'll do a test with just shuffling the cells per module. Let's see how much of a change that will bring. ( 🤞 that a lot...)

flg · 2026-02-20T16:16:57Z

The ODD μ=200 sample contains O(500k) cells per event. So about half of the ITk. 🤔

This is so odd (pun intended) that it requires double-checking and further investigation.

sonarqubecloud · 2026-02-20T16:22:59Z

Quality Gate failed

Failed conditions
2 Security Hotspots

See analysis details on SonarQube Cloud

krasznaa · 2026-02-20T16:25:34Z

This latest version of the code, which only shuffles cells within the same module, runs like this:

05:24:07 PM ThroughputExample             INFO      Reconstructed track parameters: 2622230
05:24:07 PM ThroughputExample             INFO      Time totals:                   File reading  1034 ms
05:24:07 PM ThroughputExample             INFO                  Warm-up processing  158 ms
05:24:07 PM ThroughputExample             INFO                    Event processing  5237 ms
05:24:07 PM ThroughputExample             INFO      Throughput:            Warm-up processing  15.8187 ms/event, 63.2164 events/s
05:24:07 PM ThroughputExample             INFO                    Event processing  10.4753 ms/event, 95.4628 events/s

So Thrust's sorting, as expected, is quite a bit quicker in this case.

flg · 2026-02-20T16:28:16Z

This latest version of the code, which only shuffles cells within the same module, runs like this:

05:24:07 PM ThroughputExample             INFO      Reconstructed track parameters: 2622230
05:24:07 PM ThroughputExample             INFO      Time totals:                   File reading  1034 ms
05:24:07 PM ThroughputExample             INFO                  Warm-up processing  158 ms
05:24:07 PM ThroughputExample             INFO                    Event processing  5237 ms
05:24:07 PM ThroughputExample             INFO      Throughput:            Warm-up processing  15.8187 ms/event, 63.2164 events/s
05:24:07 PM ThroughputExample             INFO                    Event processing  10.4753 ms/event, 95.4628 events/s

So Thrust's sorting, as expected, is quite a bit quicker in this case.

Awesome. I will now test this with ITk.

krasznaa added 4 commits February 20, 2026 13:17

Introduced host::silicon_cell_sorting_algorithm.

1a57fb8

Introduced alpaka::silicon_cell_sorting_algorithm.

ac7f6b2

Introduced cuda::silicon_cell_sorting_algorithm.

4ef5510

Introduced sycl::silicon_cell_sorting_algorithm.

71059a0

krasznaa requested a review from stephenswat February 20, 2026 15:40

krasznaa added cuda Changes related to CUDA sycl Changes related to SYCL cpu Changes related to CPU code alpaka Changes related to Alpaka labels Feb 20, 2026

krasznaa commented Feb 20, 2026

View reviewed changes

krasznaa added 2 commits February 20, 2026 17:04

Introduce cell order randomization.

7f72809

Made it possible to randomize the order of the cells read from an input CSV. In order to exercise the newly added cell sorting algorithms.

Made the throughput applications randomize and then sort cells.

388eff3

krasznaa force-pushed the CellSorting-main-20260219 branch from 7a1f05b to 388eff3 Compare February 20, 2026 16:05

Only shuffle cells per module.

18f99f7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cell Sorting, main branch (2026.02.20.)#1264

Cell Sorting, main branch (2026.02.20.)#1264
krasznaa wants to merge 7 commits intoacts-project:mainfrom
krasznaa:CellSorting-main-20260219

krasznaa commented Feb 20, 2026

Uh oh!

krasznaa Feb 20, 2026

Uh oh!

flg Feb 20, 2026

Uh oh!

krasznaa Feb 20, 2026 •

edited

Loading

Uh oh!

flg commented Feb 20, 2026

Uh oh!

krasznaa commented Feb 20, 2026

Uh oh!

flg commented Feb 20, 2026

Uh oh!

sonarqubecloud bot commented Feb 20, 2026

Uh oh!

krasznaa commented Feb 20, 2026

Uh oh!

flg commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

krasznaa commented Feb 20, 2026

Uh oh!

krasznaa Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

flg Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

krasznaa Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

flg commented Feb 20, 2026

Uh oh!

krasznaa commented Feb 20, 2026

Uh oh!

flg commented Feb 20, 2026

Uh oh!

sonarqubecloud bot commented Feb 20, 2026

Quality Gate failed

Uh oh!

krasznaa commented Feb 20, 2026

Uh oh!

flg commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

krasznaa Feb 20, 2026 •

edited

Loading