Skip to content

Releases: cerndb/pyspark-root-datasource

PySpark ROOT Datasource

23 Sep 13:38

Choose a tag to compare

pyspark-root-datasource v0.1 — Beta (Sep 2025)

Apache Spark 4 Python DataSource to read file in ROOT dataformat (used in High Energy Physics) into Spark DataFrames (via uproot, awkward, pyarrow).

Highlights

  • format("root") with local paths, dirs, globs; optional XRootD (root://)
  • Partitioning: step_size, num_partitions
  • Schema inference (sample_rows) or explicit schema for pruning

Install

pip install pyspark-root-datasource

Quick start

from pyspark.sql import SparkSession
from pyspark_root_datasource import register
spark = SparkSession.builder.getOrCreate()
register(spark)
df = (spark.read.format("root").option("path","file.root").option("tree","Events").load("path_to_ROOT_file"))
df.show(5, truncate=False)

[email protected]