Add possibility to produce DL3 with ctapipe #2727

mdebony · 2025-03-25T12:26:08Z

The purpose of this PR is to add support for the creation of DL3 file in ctapipe. The current output format is the GADF format as described in : https://gamma-astro-data-formats.readthedocs.io/en/v0.3/

The modification include several change in some part of the code used for IRFs production in order to make it compatible also for DL3 production (loading events and applying cuts).

This PR should be for now considered as a draft as several item are missing :

Fix the error encounter in some automatic test (help here is welcome)
User documentation on how to used those new functionality
Automatic test
A script to generate HDU and OBS index
Add information in the changelog

The objectives to first submit it as a draft is to be able to discuss several points :

Handling of time
It's not very clear to me the current time format in the DL2, and so if all the conversion performed are in line with what should be done.
Also what is the best time scale to use for our case, TAI, UTC ?
What is the reference time that should be used ? It is currently set in the code to UNIX time, but maybe we want to have a CTA dedicated one like other experience are doing.
Optional columns for events
There are currently support for most of the optional columns defined in the GADF format (https://gamma-astro-data-formats.readthedocs.io/en/v0.3/events/events.html). The two exceptions are x max and hillas parameters.
For x max, I instead currently export h max. Are there any simple library to convert h max into x max ?
For hillas parameters, as the intended use is mainly stereo, it was not obvious which one to add to the file and currently skipped all of them.
Metadata
For numerous metadata, i didn't find information about them in the DL2 file, but it could come partly due to currently using MC DL2 file :
- Dead time
- Name of the organization, site, and sub array
- Information about the target object, name and coordinate
- Information about the version of the calibration used, and information about the version of ctapipe used for lower level processing
Data quality metadata
In the optional metadata of the GADF, there are quite a few linked to quality (trigger rate, broken pixel, muon efficiency, humidity, NSB, ....). I guess than for CTA we would like to handle quality a bit differently. Should they be included any way. If yes, how do I retrieve all those information.
Code organization and implementation
I'm not yet used to ctapipe specificity (tools and component). I would like to validate, my use of them is corresponding to the intent. Also I've currently put the code for DL3 production mainly in the irf folder as a very large fraction is common. Should we rename it or move it ?
Speed
Currently the code is crazy slow (It took close to 30 minutes on my laptop to process a single gamma MC DL2 file). I've encountered some issue when I tried to profile it (any help here is welcome) but I guess most of it come from coordinate conversions. How important is this for the first version ?

… DL3

kosack · 2025-03-25T14:30:58Z

For some metadata: some of the metadata requires analyses that are outside the scope of ctapipe, so while you can add some features to ctapipe to write the DL3 data, you won't be able to fill it all without some external observatory-specific information. That's ok, as ctapipe can still contain the general data model and code. For example, in CTAO, some info like the deadtime, etc, will come from other DataPipe or CalibPipe steps (outside of ctapipe).

What I would suggest for any missing metadata like that, is maybe just add a configuration option to your tool that lets the user specify it. E.g. ctapipe-create-dl3 --deadtime-fraction=0.05 .... Then we can delay to later where that info really comes from.

kosack · 2025-03-25T14:35:08Z

For x max, I instead currently export h max. Are there any simple library to convert h max into x max ?

The ctapipe.atmosphere module lets you do that, but of course it only works if an atmosphere model is available. For simulations, that model is automatically available in the EventSource, but that is not yet imlpemented for EventSources that read real data. But generally, you can just use:

with EventSource(filename) as source:
    if source.atmosphere_density_profile:
        x_max = source.atmosphere_density_profile.slant_depth_from_height(h_max, zenith_angle)

That will work for any EventSource, but so far you need to test of the atmosphere_density_profile is not None, and otherwise you cannot compute X_max. Probably we should add a TableLoader.read_atmosphere() method as well, since right now you have to construct an EventSource to get it, but that's not too bad.

kosack · 2025-03-25T14:49:26Z

Dead time

add an option

Name of the organization, site, and sub array

organization is already part of the ctapipe.core.metadata we write to the output files, so we should be able to eaily propagate it to the DL3 files
the subarray name can be read from the SubarrayDescription object itself:

>>> print(subarray.name)
'Paranal-prod6'

It's up to the observatory using ctapipe to specify this correctly in their EventSource. There is no specific convention yet.

Information about the target object, name and coordinate

That may not be so necessary, as in general CTAO will have multilpe targets per observation, but some info is contained in the scheduling block and observation blocks you can read from the EventSource or TableLoader. Generally the actual target name is not required, but you could also add a config option to be able to specify it.

Information about the version of the calibration used, and information about the version of ctapipe used for lower level processing

The ctapipe part you get from the provenance system, or just from ctapipe itself. We already write all that to the DL0-DL2 files. Take a look at the output of ctapipe-fileinfo some_ctaipe_file.h5 to see what info we already store at the DL0-DL2 level. That same info can be re-used.

E.g.:

% ctapipe-fileinfo events.dl1.h5                                  

events.dl1.h5:
    CTA:
        ACTIVITY:
            ID: 27321d9b-dc61-4a92-bf49-458bd30c753f
            NAME: ctapipe-process
            SOFTWARE:
                NAME: ctapipe
                VERSION: 0.22.1.dev16+g1e73fe28c
            START:
                TIME: '2024-11-06 13:30:49.186'
            STOP:
                TIME: '2024-11-06 13:31:06.483'
            TYPE: software
        CONTACT:
            EMAIL: unknown
            NAME: KOSACK Karl
            ORGANIZATION: unknown
        INSTRUMENT:
            CLASS: Other
            ID: Paranal-prod6
            SITE: Other
            SUBTYPE: unspecified
            TYPE: unspecified
            VERSION: unspecified
        PROCESS:
            ID: '1'
            SUBTYPE: ''
            TYPE: Simulation
        PRODUCT:
            CREATION:
                TIME: '2024-11-06 13:31:06.490'
            DATA:
                ASSOCIATION: Subarray
                CATEGORY: Sim
                LEVELS: DL1_IMAGES,DL1_PARAMETERS
                MODEL:
                    NAME: ASWG
                    URL: ''
                    VERSION: v6.0.0
            DESCRIPTION: ctapipe Data Product
            FORMAT: hdf5
            ID: ae851928-8780-40ab-828f-4c627f6efd5a
        REFERENCE:
            VERSION: '1'

For the cailbration version, etc, we will also include that in the DL2 files as CTA CONTEXT headers, but again that will be instrument specific. ctapipe should work for any instrument, so we ill need some more specific code outside of ctapipe to collect all that.

kosack · 2025-03-25T14:59:57Z

src/ctapipe/irf/cuts/quality_cuts.py

+    Event pre-selection quality criteria for IRF and DL3 computation with different defaults.
+    """
+
+    quality_criteria = List(


We can't really allow the user to select events in a totally arbitrary way when going from DL2→DL3, as the cuts applied in DL2 also define the IRFs, and if you change them, the IRFs are wrong. So I think the correct thing here is not to allow a general QualityQuery (though maybe we need a fixed one to drop missing or non-reconstructed events), but rather to load the cuts that are output from ctapipe-optimize-event-selection and apply them to the observed DL2 data.

maxnoe · 2025-03-25T15:11:01Z

Handling of time
It's not very clear to me the current time format in the DL2, and so if all the conversion performed are in line with what should be done

The format in the DL2 files should not be a concern for you here. The IO layer converts it to astropy.time.Time, so you just need to convert the astropy.time.Time object to the correct time for GADF.

This should probably go into functions in a module like ctapipe.io.gadf.
The epoch should be a configuration option. By default, it can be something sensible for CTAO data (e.g. 2018-01-01T00:00:00 TAI)

FITS requires storing actual elapsed time from an epoch, and GADF requires that it is using seconds as unit and the epoch is stored using the MJDREF or MJDREF{F,I} keywords.
UTC should be avoided due to the ambiguity of leap seconds. TAI or TT is fine.

Something like this:

scale = "tai"
epoch = Time("2018-01-01T00:00:00", scale=scale) # should be configurable in actual code
gadf_time = (event_time - epoch).to(u.s)

header["MJDREF"] = getattr(epoch, scale).mjd
header["TIMESYS"] = scale.upper()

See also: https://github.com/gammapy/gammapy/blob/main/gammapy/utils/time.py

mdebony · 2025-03-25T17:18:18Z

For some metadata: some of the metadata requires analyses that are outside the scope of ctapipe, so while you can add some features to ctapipe to write the DL3 data, you won't be able to fill it all without some external observatory-specific information. That's ok, as ctapipe can still contain the general data model and code. For example, in CTAO, some info like the deadtime, etc, will come from other DataPipe or CalibPipe steps (outside of ctapipe).

What I would suggest for any missing metadata like that, is maybe just add a configuration option to your tool that lets the user specify it. E.g. ctapipe-create-dl3 --deadtime-fraction=0.05 .... Then we can delay to later where that info really comes from.

The current behavior is a default value with a warning. Should I kept it if the user does not provide missing metadata ?

mdebony · 2025-03-25T17:52:59Z

That may not be so necessary, as in general CTAO will have multilpe targets per observation, but some info is contained in the scheduling block and observation blocks you can read from the EventSource or TableLoader. Generally the actual target name is not required, but you could also add a config option to be able to specify it.

I have a question on this point. Could a DL2 have multiple targets, ie different pointing in the same file. Currently everything is thinked a bit more current IACT way, one DL2 file = one run on a specific target

mdebony · 2025-03-25T18:00:53Z

This should probably go into functions in a module like ctapipe.io.gadf.
In this case should I move the whole dl3.py file that I added to handle the writing of DL3 to this module ?

kosack · 2025-03-26T11:58:09Z

I have a question on this point. Could a DL2 have multiple targets, ie different pointing in the same file. Currently everything is thinked a bit more current IACT way, one DL2 file = one run on a specific target

The data format of DL2 allows multiple OBs to be merged, but for CTAO we can probably just assume for that we dont' mix observations in the DL2 produced for observed data. Certainly right now, the GADF format assumes that we do not mix OBs. It will likely be the other way around in fact, we will store multiple DL3 files for a single observation if there are more than one SOI, for example. And right now, also for different event types. My point was just "OB != science target", but you can assume one OB is one pointing, though the pointing could be fixed in ra/dec or alt/az ("drift mode"), since both are supported by ACADA.

Again, anything that goes into ctapipe should be as generic as possible (at least should work for any IACT) and not assume exactly what CTAO will doa, and anything ctao-specific should be developed outside ctapipe in a package in the datapipe gitlab space.

mdebony · 2025-03-26T12:06:41Z

The multiple mode of pointing are handled. For multiple OB in the same file, with the current code, it should produce a DL3 file with correct GTI and pointing information, but some information like obs id will not represent everything. I also didn't handle at all the possibility to have different pointing mode in the same file.
A possibility would be to generate one DL3 file per obs id. It will require quite a lot of modification to this PR but it's doable.

mdebony · 2025-03-26T12:09:14Z

For x max, I instead currently export h max. Are there any simple library to convert h max into x max ?

The ctapipe.atmosphere module lets you do that, but of course it only works if an atmosphere model is available. For simulations, that model is automatically available in the EventSource, but that is not yet imlpemented for EventSources that read real data. But generally, you can just use:
with EventSource(filename) as source:
    if source.atmosphere_density_profile:
        x_max = source.atmosphere_density_profile.slant_depth_from_height(h_max, zenith_angle)
That will work for any EventSource, but so far you need to test of the atmosphere_density_profile is not None, and otherwise you cannot compute X_max. Probably we should add a TableLoader.read_atmosphere() method as well, since right now you have to construct an EventSource to get it, but that's not too bad.

h max to x max conversion is now implemented but not properly tested as the DL2 file I have on hand doesn't have atmosphere profile information (or at least EventSource is not finding the atmosphere profile).

mdebony added 30 commits February 20, 2025 10:19

Add function to compute selection cuts

ac0d718

Make a component for event selection

17f0ed0

Generalise behaviour

dbce02b

Update event loader in the view of using it for DL3 production

7606d22

change the output format of get simulation

5111d1f

Fix cicular depedency

47fc14d

Update documentation of cuts

1b26017

Update event processor to support gamaness event selection

1af4094

Update event loader for gamma event selection

62df524

Update path criteria for input

50c1af4

Further update to prepare for DL3

aa64f93

Skeleton for creating DL3

207cb7c

Basic DL3 structure

2ea79cc

Add option for optional DL3 columns

7811ab1

Change option for flag

861d2fa

Fix the option

bf0cd86

Add DL3 and IRF mode for preprocessor

4b3765c

Adapt preprocessor columns request

9ff2f21

Add option to skip missing optional column

912d871

Allow for variation of the pointing for DL3

8336606

Move the derived columns function

636fbae

Update make_derived_columns function for producing columns needed for…

15e089f

… DL3

Update columns selection for DL3 production

df6a9c6

Update table creation

8dcfea5

Correct handling of optional columns for DL3

3fcfe22

Fix on optional columns

0f795b4

Format column for DL3 format

f3ba314

Change x max for h max

aedf6d8

Set base object for DL3 formating

e2a119b

Update message

475a4bd

mdebony added 16 commits March 12, 2025 12:08

Add reading of additional observation metadata from dl2 file

74cca7f

Add new metadata to dl3 class

681ce64

Add dead time fraction as metadata

1ce26f5

Proper selection of mandatory only columns

25f8f8b

Add metadata header to the DL3 file

8e03bdb

Add GTI table in the DL3 file

df027d1

Add pointing table and header to the DL3 file

3223155

Fix interpolation

e3c5245

Add docstring to the dl3.py file

ae70fd4

Merge remote-tracking branch 'upstream/main' into create_dl3

e3c5af9

Fix errors encountered in automatic tests

917240f

Multiple fix

58ecd82

Add docstring

0fde645

Multiple fix

3de36aa

Merge branch 'main' into create_dl3

5017133

Fix configurations

7914648

kosack reviewed Mar 25, 2025

View reviewed changes

Add X max calculation

46b5a9b

mdebony added 2 commits March 26, 2025 14:29

Switch time scale to TAI

e099c12

Add configurable reference time

bd20df6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add possibility to produce DL3 with ctapipe #2727

Add possibility to produce DL3 with ctapipe #2727

Uh oh!

mdebony commented Mar 25, 2025

Uh oh!

kosack commented Mar 25, 2025 •

edited

Loading

Uh oh!

kosack commented Mar 25, 2025 •

edited

Loading

Uh oh!

kosack commented Mar 25, 2025 •

edited

Loading

Uh oh!

kosack Mar 25, 2025 •

edited

Loading

Uh oh!

maxnoe commented Mar 25, 2025 •

edited

Loading

Uh oh!

mdebony commented Mar 25, 2025

Uh oh!

mdebony commented Mar 25, 2025

Uh oh!

mdebony commented Mar 25, 2025

Uh oh!

kosack commented Mar 26, 2025 •

edited

Loading

Uh oh!

mdebony commented Mar 26, 2025

Uh oh!

mdebony commented Mar 26, 2025

Uh oh!

Uh oh!

Add possibility to produce DL3 with ctapipe #2727

Are you sure you want to change the base?

Add possibility to produce DL3 with ctapipe #2727

Uh oh!

Conversation

mdebony commented Mar 25, 2025

Uh oh!

kosack commented Mar 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kosack commented Mar 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kosack commented Mar 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kosack Mar 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maxnoe commented Mar 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mdebony commented Mar 25, 2025

Uh oh!

mdebony commented Mar 25, 2025

Uh oh!

mdebony commented Mar 25, 2025

Uh oh!

kosack commented Mar 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mdebony commented Mar 26, 2025

Uh oh!

mdebony commented Mar 26, 2025

Uh oh!

Uh oh!

kosack commented Mar 25, 2025 •

edited

Loading

kosack commented Mar 25, 2025 •

edited

Loading

kosack commented Mar 25, 2025 •

edited

Loading

kosack Mar 25, 2025 •

edited

Loading

maxnoe commented Mar 25, 2025 •

edited

Loading

kosack commented Mar 26, 2025 •

edited

Loading