A Python library for grouping simulated LArTPC particles into physics partitions.
pysupera takes a list of reconstructed particles—each carrying a 3-D point cloud of detector hits—and merges spatially touching particles that satisfy configurable physical criteria into partition groups.
- Introduction
- Dependencies
- Installation
- Quick Start
- Workflow
- CLI Usage
- Compute Backends
- Algorithms
- Conditions Reference
- Diagnostics
- Assumptions
In liquid-argon time-projection-chamber (LArTPC) simulations, each simulated particle (track, shower fragment, low-energy scatter, etc.) is associated with a set of 3-D space points. Reconstruction algorithms typically produce one particle per connected sub-graph of the ionisation pattern, but physics processes—electromagnetic showers, low-energy scatters, secondary vertices—systematically fragment what is logically a single object into many reconstructed pieces.
pysupera reassembles those fragments into partitions by:
- Preprocessing fragmented point clouds (optional).
- Selecting candidate pairs using physical metadata (PDG code, semantic type, parent–child ancestry).
- Testing spatial proximity between the merged point clouds of candidate partitions.
- Merging touching candidates subject to condition-specific post-filters.
The result is a list of partitions, where each partition is a list of Particle objects that belong together physically.
| Package | Required | Purpose |
|---|---|---|
numpy |
✅ | Array operations throughout |
scipy |
✅ | KDTree proximity checks (CPU backends), connected-components defragmentation |
h5py |
✅ | HDF5 I/O (EventStore, EventWriter) |
hydra-core ≥ 1.3 |
✅ | Hierarchical configuration (CLI and programmatic) |
omegaconf ≥ 2.3 |
✅ | Config composition (dependency of Hydra) |
joblib |
optional | Parallel KDTree queries (cpu-multi backend) |
cupy |
optional | GPU pairwise distances (bulk-gpu, gpu defragmentation) |
cuml (RAPIDS) |
optional | GPU nearest-neighbour index (gpu backend) |
cudf, cugraph (RAPIDS) |
optional | Fully GPU defragmentation (rapids preprocessor) |
numba |
optional | CUDA kernel proximity checker (numba backend) |
# Clone or navigate to the pysupera repo root
cd /path/to/pysupera
# Editable install (recommended for development)
pip install -e .
# With GPU extras
pip install -e ".[gpu]"
# With Numba extras
pip install -e ".[numba]"
# With app (Dash interactive event viewer)
pip install -e ".[app]"
# With all dev tools
pip install -e ".[dev]"from pysupera import read_events
from pysupera.partitioner import ParticlePartitioner
from pysupera.conditions import PhotonDecay, TouchingEMShower, CombineLEScatters, AbsorbLEScatter
# Load particles for one event
with read_events("my_file.h5") as store:
particles = list(store.iter_events())[0]
# Build the partitioner
partitioner = ParticlePartitioner(
particles = particles,
distance_threshold = 5.0, # mm (same units as point cloud coordinates)
backend = "cpu-single",
)
# Apply all conditions in one convergence loop
partitions = partitioner.partition_combined([
PhotonDecay(),
TouchingEMShower(),
CombineLEScatters(),
AbsorbLEScatter(),
])
print(f"{len(partitions)} partitions from {len(particles)} particles")Install the app dependencies:
pip install -e ".[app]"Launch the viewer:
pysupera-app # default: http://localhost:8050
pysupera-app --port 8080 # custom port
pysupera-app --host 0.0.0.0 # expose on the network
pysupera-app --debug # enable Dash hot-reloadThen open the URL in a browser. The sidebar lets you:
| Section | Controls |
|---|---|
| INPUT | HDF5 file path and event index |
| PREPROCESSING | Enable merge-duplicates and/or defragmentation; backend; semantic-type filter |
| PARTITIONER | Distance threshold, checker backend, n_jobs |
| CONDITIONS | Toggle each of the four conditions independently |
| VIEW OPTIONS | Show/hide legends; synchronise the two 3-D camera views |
| PARTICLE FILTER | Instantly show/hide particles by semantic type (no re-run needed) |
Hit ▶ Run to execute the full pipeline and render two side-by-side 3-D point-cloud plots—original particles on the left, partitions on the right. The PARTICLE FILTER and show legend checkbox take effect immediately without re-running the pipeline.
Each particle carries:
| Attribute | Type | Description |
|---|---|---|
id |
int |
Unique particle ID within an event |
parent_id |
int |
Direct parent particle ID |
ancestor_id |
int |
Root ancestor of the shower/track genealogy |
pdg |
int |
PDG Monte Carlo code |
parent_pdg |
int |
PDG code of the parent |
sem_type |
SemanticType |
Derived semantic category (see below) |
point_cloud |
ndarray (N, ≥3) |
3-D hit positions; columns 0–2 are x, y, z |
Semantic types (SemanticType enum):
| Value | Meaning |
|---|---|
kShower |
EM shower fragment |
kTrack |
Charged track |
kDelta |
Delta-ray |
kMichel |
Michel electron |
kLEScatter |
Low-energy scatter product |
kUnknown |
Unclassified |
sem_type is derived automatically from process_type, pdg, parent_pdg, and point_cloud at construction time via SetSemanticType. Particles with fewer than min_pc_size points may be reclassified as kLEScatter.
pysupera uses Hydra for configuration. The default config lives in pysupera/conf/config.yaml.
Key top-level parameters:
| Key | Default | Description |
|---|---|---|
distance_threshold |
5.0 |
Proximity distance D in point-cloud coordinate units |
verbose |
true |
Print per-event progress |
report |
true |
Print end-of-run summary statistics |
enable_diagnostics |
false |
Record every merge decision for inspection |
check_particle_tree |
false |
Validate parent–child consistency before partitioning |
checker |
gpu |
Default compute backend (config group; override with checker=cpu_single etc.) |
Particle preprocessing parameters (particle.*):
| Key | Default | Description |
|---|---|---|
particle.min_pc_size |
-1 |
Min point-cloud size for semantic-type classification; -1 disables (all sizes accepted) |
particle.merge_duplicates |
true |
Collapse points sharing identical (x,y,z) coordinates before partitioning |
particle.defragment |
true |
Split disconnected point-cloud fragments into new kLEScatter particles |
particle.preprocessor.name |
scipy |
Defragmentation backend: scipy, gpu, or rapids |
Programmatic loading (notebooks, tests):
from hydra import initialize, compose
from pysupera.config import build_conditions, build_checker, configure
with initialize(config_path="pysupera/conf", version_base=None):
cfg = compose("config", overrides=[
"checker=cpu_single",
"distance_threshold=5.0",
"io.input_path=my_file.h5",
"io.output_path=out.h5",
])
configure(cfg) # sets module-level defaults (min_pc_size, etc.)Two optional preprocessing stages run before partitioning.
Collapses points that share identical (x, y, z) voxel positions within each particle's point cloud. Aggregation: time = min, energy = sum, dEdx = max. Enable with:
# config.yaml
particle:
merge_duplicates: trueA particle's point cloud may be fragmented — split into spatially disconnected clusters that should logically belong to the same particle. The defragmenter runs a connected-components algorithm (eps-ball radius = distance_threshold) on each particle's point cloud independently, then detaches small disconnected fragments (size ≤ min_pc_size) as new kLEScatter particles.
Enable with:
particle:
defragment: true
min_pc_size: -1 # default: -1 (disabled); set e.g. 5 to split small fragments
preprocessor:
name: scipy # choices: scipy (default), gpu, rapids| Backend | Requirement | Best for |
|---|---|---|
scipy |
scipy |
General use; fast early-exit for non-fragmented particles |
gpu |
cupy |
Large point clouds (> min_pts_for_gpu points) |
rapids |
cupy + cudf + cugraph |
Fully GPU, no CPU round-trip |
All backends share the same fast-path early-exit checks (single point, two-point connected, bounding-box diameter ≤ eps) that skip the full CC algorithm for the majority of particles.
A condition defines which pairs of particles (or partition representatives) are candidates for merging, and optionally post-filters the proximity-passing pairs.
Each condition implements:
class PartitionConditionBase(ABC):
def get_candidates(self, partitioner) -> List[Tuple[int, int]]:
"""Particle-level candidate pairs (legacy path)."""
def post_filter(self, partitioner, touching_pairs) -> List[Tuple[int, int]]:
"""Optional filter after proximity check."""
def get_rep_candidates(self, partitioner, rep_lookup) -> List[Tuple[int, int]]:
"""Partition-level candidate pairs (incremental path)."""
def get_unconditional_merges(self, partitioner, rep_lookup) -> List[Tuple[int, int]]:
"""Topology-only merges, no proximity check needed."""Conditions are applied in order. The recommended order is:
PhotonDecay → TouchingEMShower → CombineLEScatters → AbsorbLEScatter
See Conditions Reference for details on each.
partitioner = ParticlePartitioner(
particles = particles,
distance_threshold = 5.0,
backend = "cpu-single", # see Compute Backends
enable_diagnostics = False,
)
# Option A: one condition at a time
partitions = partitioner.partition(condition)
# Option B: all conditions in one converging pass
partitions = partitioner.partition_combined(conditions)partition and partition_combined both return List[List[Particle]]. Each inner list is one partition; singletons represent unmerged particles.
The default mode is partition-level proximity (partition_level_proximity=True): proximity is tested between the merged point clouds of whole partitions rather than individual particles, and the algorithm iterates until convergence (at most max_passes=10 passes; typically 1–2).
from pysupera import open_writer, read_events
# Write events (particles carry updated partition labels)
with open_writer("output.h5") as writer:
writer.append_event(particles)
# Read events back
with read_events("output.h5") as store:
for particles in store.iter_events():
...After installation the run_pysupera command is available:
# Basic run
run_pysupera io.input_path=/data/in.h5 io.output_path=/data/out.h5
# Use CPU multi-threaded backend, 8 threads
run_pysupera checker=cpu_multi checker.n_jobs=8
# Adjust distance threshold, disable verbose output
run_pysupera distance_threshold=3.0 verbose=false
# Enable defragmentation preprocessing
run_pysupera particle.defragment=true particle.min_pc_size=5
# Hydra parameter sweep (multiple runs)
run_pysupera --multirun \
checker=gpu,bulk_gpu \
distance_threshold=3.0,5.0,7.0The working directory is not changed by Hydra, so relative paths in io.* resolve against the directory where the command is invoked.
Select the backend via the backend parameter of ParticlePartitioner or checker=<name> in the Hydra config.
| Backend name | Config key | Requirement | Description |
|---|---|---|---|
cpu-single |
cpu_single |
scipy |
Single-threaded KDTree |
cpu-multi |
cpu_multi |
scipy, joblib |
KDTree + joblib thread pool |
gpu |
gpu |
cuml (RAPIDS) |
RAPIDS NearestNeighbors (brute-force L2); default |
bulk-gpu |
bulk_gpu |
cupy |
CuPy chunked brute-force; accepts chunk_size |
numba |
numba |
numba, CUDA |
CUDA kernel with shared-memory tiling; accepts block_size |
cell-hash-cpu-single |
cell_hash_cpu_single |
scipy |
Cell-hash spatial index, single thread |
cell-hash-cpu-multi |
cell_hash_cpu_multi |
scipy, joblib |
Cell-hash + joblib |
cell-hash-gpu |
cell_hash_gpu |
cupy |
Cell-hash on CPU, distance kernels on GPU |
For the partition-level proximity mode (the hot path), all CPU backends override batch_check_cloud_proximity to build the parent KDTree once per candidate group rather than once per pair.
The main algorithm (_build_partitions_incremental) maintains one representative Particle per partition. A representative carries:
- All scalar attributes (PDG, semantic type, ancestry) of the particle that "won" each merge.
- A lazily concatenated point cloud (
_cloud_parts) that is the union of all constituent particles' point clouds—concatenation is deferred until a proximity query actually needs the full array.
Initialisation:
- A
UnionFindstructure over all n particles. - One representative per particle (shallow copy,
_cloud_parts = [original_cloud]). rep_lookup:particle_id → representative Particle.rep_members:id(rep) → [list of particle IDs](reverse map for O(|partition|) updates).
Pass loop (repeats until convergence, ≤ max_passes):
For each condition:
- Unconditional merges (
get_unconditional_merges): topology-driven pairs requiring no proximity check (e.g. photon → e⁺e⁻ fromPhotonDecay). - Candidate generation (
get_rep_candidates): directed(child_rep_id, parent_rep_id)pairs among current partition representatives. Pairs are resolved to UF roots, deduplicated, then grouped by parent representative. - Batch proximity check (
batch_check_cloud_proximity): for each parent group, the parent's cloud is materialized once (lazy concat), a single KDTree is built, and all children in the group are queried against it. - Post-filter (
post_filter): condition-specific constraint (e.g. one merge perkLEScatter). - Merge: for each accepted pair, the child partition is absorbed into the parent. The parent's
_cloud_partslist is extended with the child's chunks (no allocation).rep_membersallows O(|child partition|) update ofrep_lookup.
The loop terminates when a full pass over all conditions produces zero new merges. At the end, surviving reps materialize their deferred clouds with a single np.concatenate.
Complexity (per pass):
- Candidate building: O(n_reps) per condition.
- Proximity checks: O(n_candidates × n_points_per_cloud / n_unique_parents) KDTree queries; each parent tree is built once.
rep_lookupupdate: O(|merged partition|) per merge (instead of O(n) with a full scan).
Two point clouds are considered touching if the minimum Euclidean distance between any point in one cloud and any point in the other is ≤ D.
The default implementation:
- Bounding-box rejection: if the minimum possible distance between the two bounding boxes exceeds D, return
Falseimmediately (no KDTree needed). - KDTree query: build a
scipy.KDTreeon cloud B, query all points of cloud A withdistance_upper_bound = D, returnTrueif any distance ≤ D.
GPU backends (GPUChecker, BulkGPUChecker) override this with device-resident implementations and do not construct a CPU KDTree.
Each condition uses a three-stage pipeline:
get_candidates / get_rep_candidates
↓ (candidate pairs, O(n_class_A × n_class_B) in the worst case)
batch_check_proximity / batch_check_cloud_proximity
↓ (touching pairs only)
post_filter
↓ (merge pairs)
Only the touching subset proceeds to post_filter, and only the final merge_pairs are applied to the Union-Find structure.
Purpose: Merge the ±11 (e⁺/e⁻) decay products of a photon (PDG 22) into the photon's partition.
Mechanism: Purely topology-based (get_unconditional_merges). A photon typically has an empty point cloud; its spatial extent is entirely represented by its children. No proximity check is performed.
Merge direction: child (e⁺/e⁻) → parent (photon). The photon representative survives and its point_cloud grows to include all child points.
When to run: Always first, before any proximity-based condition, so that the photon's cloud is populated before it participates in further proximity checks.
Purpose: Merge PDG-11/22 parent–child pairs that share a common ancestor and whose point clouds are spatially touching.
Candidate filter:
- Both particles have PDG code 11 or 22.
- Direct parent–child relationship.
- Same
ancestor_id.
Merge direction: child → parent (directed by the parent–child tree).
When to run: After PhotonDecay, to consolidate EM shower fragments.
Purpose: Consolidate transitively connected kLEScatter particles into a single kLEScatter representative before absorption.
Motivation: AbsorbLEScatter makes pairwise decisions. Without pre-consolidation, a chain Shower A – LEScatter B – LEScatter C may not correctly absorb B and C into A if A and C are not directly touching.
Candidate filter: All pairs of distinct kLEScatter partition representatives.
Merge direction (tie-breaking, _le_direction):
- Larger point cloud becomes the parent.
- On equal size: higher energy sum (column
PointFeature.energy) wins. - On equal energy: stable arbitrary tie-break.
When to run: Before AbsorbLEScatter.
Purpose: Absorb each kLEScatter partition into at most one touching non-kLEScatter neighbour.
Candidate filter: All pairs between kLEScatter representatives and non-kLEScatter representatives.
Post-filter: One merge per kLEScatter — if multiple non-LE neighbours touch the same LE partition, only the first passing pair is kept.
Merge direction: kLEScatter representative → non-kLEScatter representative.
When to run: Last, after CombineLEScatters.
Enable with enable_diagnostics=True:
partitioner = ParticlePartitioner(particles, 5.0, enable_diagnostics=True)
partitions = partitioner.partition_combined(conditions)
# Why were particles 42 and 99 not merged?
print(partitioner.why_not_merged(42, 99))
# All recorded decisions involving particle 42
partitioner.print_all_decisions_for_particle(42)
# Aggregate summary
partitioner.print_diagnostics_summary()
# Export to JSON
partitioner.export_diagnostics("decisions.json")Diagnostics add a per-pair record overhead. With diagnostics disabled (the default), proximity checks use the fast batch_check_proximity path with no per-pair Python callback.
-
Coordinate units —
distance_thresholdmust be in the same units as the x, y, z columns ofpoint_cloud(typically millimetres for LArTPC geometry). -
Parent–child completeness — Each
parent_idreferenced by a particle should correspond to another particle in the same event list. Enablecheck_particle_tree: trueto validate this at runtime.children_mapandancestor_mapare built from the provided list; missing parents silently produce empty child lists. -
Single event at a time —
ParticlePartitioneris constructed fresh for each event. The internal KDTree index (checker) is built during__init__and covers the particles as they exist at construction time. Particles added by preprocessing (defragmenter splitting) must be passed to the constructor after preprocessing is complete. -
_cloud_partsattribute — The incremental algorithm attaches_cloud_partsto each representativeParticleobject (alist[ndarray]of point-cloud chunks used for deferred concatenation). This attribute is an internal implementation detail of the partitioner and is not serialised to HDF5. -
Point cloud columns — Columns 0–2 are always treated as x, y, z. Additional columns (time, energy deposit, dEdx, etc.) are carried through the pipeline unchanged but are not used by any proximity or condition logic except for the energy tie-break in
CombineLEScatters(column index 4,PointFeature.energy). -
kLEScatterordering —AbsorbLEScattertakes the first touching non-LE neighbour in the order candidates are iterated, which depends on the ordering ofrep_lookup. This is deterministic within a single run but may vary across Python versions or dict iteration orders in unusual environments. -
Convergence — The pass loop terminates when no new merges occur in a complete pass. The loop is capped at
max_passes=10as a safety guard. In practice, 1–2 passes are sufficient for typical LArTPC events. -
GPU availability — GPU backends (
gpu,bulk-gpu,numba,cell-hash-gpu) raise anImportErrorat construction time if the required libraries are not installed. The CPU backends (cpu-single,cpu-multi,cell-hash-cpu-single,cell-hash-cpu-multi) require onlyscipyand optionallyjoblib.

