Releases: cerndb/pyspark-root-datasource
Releases · cerndb/pyspark-root-datasource
PySpark ROOT Datasource
pyspark-root-datasource v0.1 — Beta (Sep 2025)
Apache Spark 4 Python DataSource to read file in ROOT dataformat (used in High Energy Physics) into Spark DataFrames (via uproot, awkward, pyarrow).
Highlights
format("root")with local paths, dirs, globs; optional XRootD (root://)- Partitioning:
step_size,num_partitions - Schema inference (
sample_rows) or explicit schema for pruning
Install
pip install pyspark-root-datasource
Quick start
from pyspark.sql import SparkSession
from pyspark_root_datasource import register
spark = SparkSession.builder.getOrCreate()
register(spark)
df = (spark.read.format("root").option("path","file.root").option("tree","Events").load("path_to_ROOT_file"))
df.show(5, truncate=False)