-
Notifications
You must be signed in to change notification settings - Fork 274
Add possibility to produce DL3 with ctapipe #2727
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
For some metadata: some of the metadata requires analyses that are outside the scope of ctapipe, so while you can add some features to ctapipe to write the DL3 data, you won't be able to fill it all without some external observatory-specific information. That's ok, as ctapipe can still contain the general data model and code. For example, in CTAO, some info like the deadtime, etc, will come from other DataPipe or CalibPipe steps (outside of ctapipe). What I would suggest for any missing metadata like that, is maybe just add a configuration option to your tool that lets the user specify it. E.g. |
The ctapipe.atmosphere module lets you do that, but of course it only works if an atmosphere model is available. For simulations, that model is automatically available in the EventSource, but that is not yet imlpemented for EventSources that read real data. But generally, you can just use: with EventSource(filename) as source:
if source.atmosphere_density_profile:
x_max = source.atmosphere_density_profile.slant_depth_from_height(h_max, zenith_angle) That will work for any EventSource, but so far you need to test of the atmosphere_density_profile is not None, and otherwise you cannot compute X_max. Probably we should add a |
>>> print(subarray.name)
'Paranal-prod6' It's up to the observatory using ctapipe to specify this correctly in their EventSource. There is no specific convention yet.
That may not be so necessary, as in general CTAO will have multilpe targets per observation, but some info is contained in the scheduling block and observation blocks you can read from the EventSource or TableLoader. Generally the actual target name is not required, but you could also add a config option to be able to specify it.
The ctapipe part you get from the provenance system, or just from ctapipe itself. We already write all that to the DL0-DL2 files. Take a look at the output of E.g.: % ctapipe-fileinfo events.dl1.h5
events.dl1.h5:
CTA:
ACTIVITY:
ID: 27321d9b-dc61-4a92-bf49-458bd30c753f
NAME: ctapipe-process
SOFTWARE:
NAME: ctapipe
VERSION: 0.22.1.dev16+g1e73fe28c
START:
TIME: '2024-11-06 13:30:49.186'
STOP:
TIME: '2024-11-06 13:31:06.483'
TYPE: software
CONTACT:
EMAIL: unknown
NAME: KOSACK Karl
ORGANIZATION: unknown
INSTRUMENT:
CLASS: Other
ID: Paranal-prod6
SITE: Other
SUBTYPE: unspecified
TYPE: unspecified
VERSION: unspecified
PROCESS:
ID: '1'
SUBTYPE: ''
TYPE: Simulation
PRODUCT:
CREATION:
TIME: '2024-11-06 13:31:06.490'
DATA:
ASSOCIATION: Subarray
CATEGORY: Sim
LEVELS: DL1_IMAGES,DL1_PARAMETERS
MODEL:
NAME: ASWG
URL: ''
VERSION: v6.0.0
DESCRIPTION: ctapipe Data Product
FORMAT: hdf5
ID: ae851928-8780-40ab-828f-4c627f6efd5a
REFERENCE:
VERSION: '1' For the cailbration version, etc, we will also include that in the DL2 files as |
Event pre-selection quality criteria for IRF and DL3 computation with different defaults. | ||
""" | ||
|
||
quality_criteria = List( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't really allow the user to select events in a totally arbitrary way when going from DL2→DL3, as the cuts applied in DL2 also define the IRFs, and if you change them, the IRFs are wrong. So I think the correct thing here is not to allow a general QualityQuery (though maybe we need a fixed one to drop missing or non-reconstructed events), but rather to load the cuts that are output from ctapipe-optimize-event-selection
and apply them to the observed DL2 data.
The format in the DL2 files should not be a concern for you here. The IO layer converts it to This should probably go into functions in a module like FITS requires storing actual elapsed time from an epoch, and GADF requires that it is using seconds as unit and the epoch is stored using the MJDREF or MJDREF{F,I} keywords. Something like this:
See also: https://github.com/gammapy/gammapy/blob/main/gammapy/utils/time.py |
The current behavior is a default value with a warning. Should I kept it if the user does not provide missing metadata ? |
I have a question on this point. Could a DL2 have multiple targets, ie different pointing in the same file. Currently everything is thinked a bit more current IACT way, one DL2 file = one run on a specific target |
|
The data format of DL2 allows multiple OBs to be merged, but for CTAO we can probably just assume for that we dont' mix observations in the DL2 produced for observed data. Certainly right now, the GADF format assumes that we do not mix OBs. It will likely be the other way around in fact, we will store multiple DL3 files for a single observation if there are more than one SOI, for example. And right now, also for different event types. My point was just "OB != science target", but you can assume one OB is one pointing, though the pointing could be fixed in ra/dec or alt/az ("drift mode"), since both are supported by ACADA. Again, anything that goes into ctapipe should be as generic as possible (at least should work for any IACT) and not assume exactly what CTAO will doa, and anything ctao-specific should be developed outside ctapipe in a package in the datapipe gitlab space. |
The multiple mode of pointing are handled. For multiple OB in the same file, with the current code, it should produce a DL3 file with correct GTI and pointing information, but some information like obs id will not represent everything. I also didn't handle at all the possibility to have different pointing mode in the same file. |
h max to x max conversion is now implemented but not properly tested as the DL2 file I have on hand doesn't have atmosphere profile information (or at least EventSource is not finding the atmosphere profile). |
The purpose of this PR is to add support for the creation of DL3 file in ctapipe. The current output format is the GADF format as described in : https://gamma-astro-data-formats.readthedocs.io/en/v0.3/
The modification include several change in some part of the code used for IRFs production in order to make it compatible also for DL3 production (loading events and applying cuts).
This PR should be for now considered as a draft as several item are missing :
The objectives to first submit it as a draft is to be able to discuss several points :
Handling of time
It's not very clear to me the current time format in the DL2, and so if all the conversion performed are in line with what should be done.
Also what is the best time scale to use for our case, TAI, UTC ?
What is the reference time that should be used ? It is currently set in the code to UNIX time, but maybe we want to have a CTA dedicated one like other experience are doing.
Optional columns for events
There are currently support for most of the optional columns defined in the GADF format (https://gamma-astro-data-formats.readthedocs.io/en/v0.3/events/events.html). The two exceptions are x max and hillas parameters.
For x max, I instead currently export h max. Are there any simple library to convert h max into x max ?
For hillas parameters, as the intended use is mainly stereo, it was not obvious which one to add to the file and currently skipped all of them.
Metadata
For numerous metadata, i didn't find information about them in the DL2 file, but it could come partly due to currently using MC DL2 file :
Data quality metadata
In the optional metadata of the GADF, there are quite a few linked to quality (trigger rate, broken pixel, muon efficiency, humidity, NSB, ....). I guess than for CTA we would like to handle quality a bit differently. Should they be included any way. If yes, how do I retrieve all those information.
Code organization and implementation
I'm not yet used to ctapipe specificity (tools and component). I would like to validate, my use of them is corresponding to the intent. Also I've currently put the code for DL3 production mainly in the irf folder as a very large fraction is common. Should we rename it or move it ?
Speed
Currently the code is crazy slow (It took close to 30 minutes on my laptop to process a single gamma MC DL2 file). I've encountered some issue when I tried to profile it (any help here is welcome) but I guess most of it come from coordinate conversions. How important is this for the first version ?