Skip to content

Add possibility to produce DL3 with ctapipe #2727

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 62 commits into
base: main
Choose a base branch
from

Conversation

mdebony
Copy link
Contributor

@mdebony mdebony commented Mar 25, 2025

The purpose of this PR is to add support for the creation of DL3 file in ctapipe. The current output format is the GADF format as described in : https://gamma-astro-data-formats.readthedocs.io/en/v0.3/

The modification include several change in some part of the code used for IRFs production in order to make it compatible also for DL3 production (loading events and applying cuts).

This PR should be for now considered as a draft as several item are missing :

  • Fix the error encounter in some automatic test (help here is welcome)
  • User documentation on how to used those new functionality
  • Automatic test
  • A script to generate HDU and OBS index
  • Add information in the changelog

The objectives to first submit it as a draft is to be able to discuss several points :

  1. Handling of time
    It's not very clear to me the current time format in the DL2, and so if all the conversion performed are in line with what should be done.
    Also what is the best time scale to use for our case, TAI, UTC ?
    What is the reference time that should be used ? It is currently set in the code to UNIX time, but maybe we want to have a CTA dedicated one like other experience are doing.

  2. Optional columns for events
    There are currently support for most of the optional columns defined in the GADF format (https://gamma-astro-data-formats.readthedocs.io/en/v0.3/events/events.html). The two exceptions are x max and hillas parameters.
    For x max, I instead currently export h max. Are there any simple library to convert h max into x max ?
    For hillas parameters, as the intended use is mainly stereo, it was not obvious which one to add to the file and currently skipped all of them.

  3. Metadata
    For numerous metadata, i didn't find information about them in the DL2 file, but it could come partly due to currently using MC DL2 file :

    • Dead time
    • Name of the organization, site, and sub array
    • Information about the target object, name and coordinate
    • Information about the version of the calibration used, and information about the version of ctapipe used for lower level processing
  4. Data quality metadata
    In the optional metadata of the GADF, there are quite a few linked to quality (trigger rate, broken pixel, muon efficiency, humidity, NSB, ....). I guess than for CTA we would like to handle quality a bit differently. Should they be included any way. If yes, how do I retrieve all those information.

  5. Code organization and implementation
    I'm not yet used to ctapipe specificity (tools and component). I would like to validate, my use of them is corresponding to the intent. Also I've currently put the code for DL3 production mainly in the irf folder as a very large fraction is common. Should we rename it or move it ?

  6. Speed
    Currently the code is crazy slow (It took close to 30 minutes on my laptop to process a single gamma MC DL2 file). I've encountered some issue when I tried to profile it (any help here is welcome) but I guess most of it come from coordinate conversions. How important is this for the first version ?

@kosack
Copy link
Member

kosack commented Mar 25, 2025

For some metadata: some of the metadata requires analyses that are outside the scope of ctapipe, so while you can add some features to ctapipe to write the DL3 data, you won't be able to fill it all without some external observatory-specific information. That's ok, as ctapipe can still contain the general data model and code. For example, in CTAO, some info like the deadtime, etc, will come from other DataPipe or CalibPipe steps (outside of ctapipe).

What I would suggest for any missing metadata like that, is maybe just add a configuration option to your tool that lets the user specify it. E.g. ctapipe-create-dl3 --deadtime-fraction=0.05 .... Then we can delay to later where that info really comes from.

@kosack
Copy link
Member

kosack commented Mar 25, 2025

For x max, I instead currently export h max. Are there any simple library to convert h max into x max ?

The ctapipe.atmosphere module lets you do that, but of course it only works if an atmosphere model is available. For simulations, that model is automatically available in the EventSource, but that is not yet imlpemented for EventSources that read real data. But generally, you can just use:

with EventSource(filename) as source:
    if source.atmosphere_density_profile:
        x_max = source.atmosphere_density_profile.slant_depth_from_height(h_max, zenith_angle)

That will work for any EventSource, but so far you need to test of the atmosphere_density_profile is not None, and otherwise you cannot compute X_max. Probably we should add a TableLoader.read_atmosphere() method as well, since right now you have to construct an EventSource to get it, but that's not too bad.

@kosack
Copy link
Member

kosack commented Mar 25, 2025

Dead time

  • add an option

Name of the organization, site, and sub array

  • organization is already part of the ctapipe.core.metadata we write to the output files, so we should be able to eaily propagate it to the DL3 files
  • the subarray name can be read from the SubarrayDescription object itself:
>>> print(subarray.name)
'Paranal-prod6'

It's up to the observatory using ctapipe to specify this correctly in their EventSource. There is no specific convention yet.

Information about the target object, name and coordinate

That may not be so necessary, as in general CTAO will have multilpe targets per observation, but some info is contained in the scheduling block and observation blocks you can read from the EventSource or TableLoader. Generally the actual target name is not required, but you could also add a config option to be able to specify it.

Information about the version of the calibration used, and information about the version of ctapipe used for lower level processing

The ctapipe part you get from the provenance system, or just from ctapipe itself. We already write all that to the DL0-DL2 files. Take a look at the output of ctapipe-fileinfo some_ctaipe_file.h5 to see what info we already store at the DL0-DL2 level. That same info can be re-used.

E.g.:

% ctapipe-fileinfo events.dl1.h5                                  

events.dl1.h5:
    CTA:
        ACTIVITY:
            ID: 27321d9b-dc61-4a92-bf49-458bd30c753f
            NAME: ctapipe-process
            SOFTWARE:
                NAME: ctapipe
                VERSION: 0.22.1.dev16+g1e73fe28c
            START:
                TIME: '2024-11-06 13:30:49.186'
            STOP:
                TIME: '2024-11-06 13:31:06.483'
            TYPE: software
        CONTACT:
            EMAIL: unknown
            NAME: KOSACK Karl
            ORGANIZATION: unknown
        INSTRUMENT:
            CLASS: Other
            ID: Paranal-prod6
            SITE: Other
            SUBTYPE: unspecified
            TYPE: unspecified
            VERSION: unspecified
        PROCESS:
            ID: '1'
            SUBTYPE: ''
            TYPE: Simulation
        PRODUCT:
            CREATION:
                TIME: '2024-11-06 13:31:06.490'
            DATA:
                ASSOCIATION: Subarray
                CATEGORY: Sim
                LEVELS: DL1_IMAGES,DL1_PARAMETERS
                MODEL:
                    NAME: ASWG
                    URL: ''
                    VERSION: v6.0.0
            DESCRIPTION: ctapipe Data Product
            FORMAT: hdf5
            ID: ae851928-8780-40ab-828f-4c627f6efd5a
        REFERENCE:
            VERSION: '1'

For the cailbration version, etc, we will also include that in the DL2 files as CTA CONTEXT headers, but again that will be instrument specific. ctapipe should work for any instrument, so we ill need some more specific code outside of ctapipe to collect all that.

Event pre-selection quality criteria for IRF and DL3 computation with different defaults.
"""

quality_criteria = List(
Copy link
Member

@kosack kosack Mar 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't really allow the user to select events in a totally arbitrary way when going from DL2→DL3, as the cuts applied in DL2 also define the IRFs, and if you change them, the IRFs are wrong. So I think the correct thing here is not to allow a general QualityQuery (though maybe we need a fixed one to drop missing or non-reconstructed events), but rather to load the cuts that are output from ctapipe-optimize-event-selection and apply them to the observed DL2 data.

@maxnoe
Copy link
Member

maxnoe commented Mar 25, 2025

Handling of time
It's not very clear to me the current time format in the DL2, and so if all the conversion performed are in line with what should be done

The format in the DL2 files should not be a concern for you here. The IO layer converts it to astropy.time.Time, so you just need to convert the astropy.time.Time object to the correct time for GADF.

This should probably go into functions in a module like ctapipe.io.gadf.
The epoch should be a configuration option. By default, it can be something sensible for CTAO data (e.g. 2018-01-01T00:00:00 TAI)

FITS requires storing actual elapsed time from an epoch, and GADF requires that it is using seconds as unit and the epoch is stored using the MJDREF or MJDREF{F,I} keywords.
UTC should be avoided due to the ambiguity of leap seconds. TAI or TT is fine.

Something like this:

scale = "tai"
epoch = Time("2018-01-01T00:00:00", scale=scale) # should be configurable in actual code
gadf_time = (event_time - epoch).to(u.s)

header["MJDREF"] = getattr(epoch, scale).mjd
header["TIMESYS"] = scale.upper()

See also: https://github.com/gammapy/gammapy/blob/main/gammapy/utils/time.py

@mdebony
Copy link
Contributor Author

mdebony commented Mar 25, 2025

For some metadata: some of the metadata requires analyses that are outside the scope of ctapipe, so while you can add some features to ctapipe to write the DL3 data, you won't be able to fill it all without some external observatory-specific information. That's ok, as ctapipe can still contain the general data model and code. For example, in CTAO, some info like the deadtime, etc, will come from other DataPipe or CalibPipe steps (outside of ctapipe).

What I would suggest for any missing metadata like that, is maybe just add a configuration option to your tool that lets the user specify it. E.g. ctapipe-create-dl3 --deadtime-fraction=0.05 .... Then we can delay to later where that info really comes from.

The current behavior is a default value with a warning. Should I kept it if the user does not provide missing metadata ?

@mdebony
Copy link
Contributor Author

mdebony commented Mar 25, 2025

That may not be so necessary, as in general CTAO will have multilpe targets per observation, but some info is contained in the scheduling block and observation blocks you can read from the EventSource or TableLoader. Generally the actual target name is not required, but you could also add a config option to be able to specify it.

I have a question on this point. Could a DL2 have multiple targets, ie different pointing in the same file. Currently everything is thinked a bit more current IACT way, one DL2 file = one run on a specific target

@mdebony
Copy link
Contributor Author

mdebony commented Mar 25, 2025

This should probably go into functions in a module like ctapipe.io.gadf.
In this case should I move the whole dl3.py file that I added to handle the writing of DL3 to this module ?

@kosack
Copy link
Member

kosack commented Mar 26, 2025

I have a question on this point. Could a DL2 have multiple targets, ie different pointing in the same file. Currently everything is thinked a bit more current IACT way, one DL2 file = one run on a specific target

The data format of DL2 allows multiple OBs to be merged, but for CTAO we can probably just assume for that we dont' mix observations in the DL2 produced for observed data. Certainly right now, the GADF format assumes that we do not mix OBs. It will likely be the other way around in fact, we will store multiple DL3 files for a single observation if there are more than one SOI, for example. And right now, also for different event types. My point was just "OB != science target", but you can assume one OB is one pointing, though the pointing could be fixed in ra/dec or alt/az ("drift mode"), since both are supported by ACADA.

Again, anything that goes into ctapipe should be as generic as possible (at least should work for any IACT) and not assume exactly what CTAO will doa, and anything ctao-specific should be developed outside ctapipe in a package in the datapipe gitlab space.

@mdebony
Copy link
Contributor Author

mdebony commented Mar 26, 2025

The multiple mode of pointing are handled. For multiple OB in the same file, with the current code, it should produce a DL3 file with correct GTI and pointing information, but some information like obs id will not represent everything. I also didn't handle at all the possibility to have different pointing mode in the same file.
A possibility would be to generate one DL3 file per obs id. It will require quite a lot of modification to this PR but it's doable.

@mdebony
Copy link
Contributor Author

mdebony commented Mar 26, 2025

For x max, I instead currently export h max. Are there any simple library to convert h max into x max ?

The ctapipe.atmosphere module lets you do that, but of course it only works if an atmosphere model is available. For simulations, that model is automatically available in the EventSource, but that is not yet imlpemented for EventSources that read real data. But generally, you can just use:

with EventSource(filename) as source:
    if source.atmosphere_density_profile:
        x_max = source.atmosphere_density_profile.slant_depth_from_height(h_max, zenith_angle)

That will work for any EventSource, but so far you need to test of the atmosphere_density_profile is not None, and otherwise you cannot compute X_max. Probably we should add a TableLoader.read_atmosphere() method as well, since right now you have to construct an EventSource to get it, but that's not too bad.

h max to x max conversion is now implemented but not properly tested as the DL2 file I have on hand doesn't have atmosphere profile information (or at least EventSource is not finding the atmosphere profile).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants