-
Notifications
You must be signed in to change notification settings - Fork 20
H5parm specifications
v. 1.0 (12/07/13)
v. 0.1 (26/07/12)
There are three different types of nodes used in this hdf5 file.
Table: each row has the same fields/columns, but the type of the columns are different
Array: all elements are of the same type, usually floats
CArray: like Arrays, but here the data is stored in chunks, which allows easy access to
slices of huge arrays, without loading all data in memory. These arrays can be much
larger than the physically available memory, as long as there is enough disk space
- solutions of multiple datasets can be stored in the same H5parm (e.g. the calibrator and the target field solutions) into different solution-sets
- saves solutions of any, even customized, kind (amplitudes, amplitudes_clipped, amplitudes_smoothed, phases, phases_smoothed, clock, TEC...)
- Solution interval of time/freq can be non constant along the axis (e.g. I have solutions on frequencies non uniformly distributed across the band).
- different solutions-tables can have different axes. Axes examples for some solutions-tables:
- amp: time, freq, pol, dir, ant
- ph: time, freq, pol, dir, ant
- clock: time, ant
- tec: time, ant, dir
- foobar: foo, bar (a generic solution of whatever type)
- A different hdf5 node for each dataset, so the user can specify the name of the solutions-set as the name of this node (solves point 1).
- Inside the solution-set there are an arbitrary number of solution-tables (solves point 4) that are hdf5 node containing a CArray of values (and one of weights) and a number of arrays storing axes values. An attribute of the values CArrays states the axes names and their order.
- The values CArray can support “holes” (solves point 3) storing a NaN inside. This must be taken into account when using these values.
Names for some standard solution-types for solution-tables: amplitude, phase, rotation, rotationmeasure, clock, TEC
Names for some standard axes and Carrays:
Name Type Example of array
time (s) - float64 - [4.867e+09, 4.868e+09, 4.869e+09]
freq (Hz) - float64 - [120e6,122e6,130e6...]
ant - string - [CS001LBA]
pol - string (2 char) - [‘XX’, ‘XY’, ‘RR’, ‘RL’]
dir - string (16 char) - [‘3C196’,’pointing’]
val - float64 - [34.543,5345.423,123.3213]
weight (0 = flagged) - float32 [from 0 to 1] - [0,1,0.9,0.7,1,0]
Some node names can be user-defined (solution-set, solution-tables), if the user does not provide a name then this standard is used:
- solution-set names: sol### where ### is the smallest available integer between 000 and 999 (e.g. sol000, sol001, sol002...)
- solution-tables names: soltype### where ### is the smallest available integer between 000 and 999 (e.g. aplitudes000, amplitudes001, phases000, RM000...)
All data==nan are considered flagged (they can also have weight!=0, e.g. when the solver didn't converge because of a bed model, but data are good). All data (also data!=nan) with weight==0 are considered flagged too.
% solution-set name, can be user-defined
root.sol000
type: Node
% list of stations names (e.g. CS001LBA) / positions (format?)
root.sol000.antenna
type: Table
columns: name, position
% list of directions names (e.g. 3C196) / coords (format?)
root.sol000.source
type: Table
columns: name, coords
% Solution-table, can be a user-defined name.
root.sol000.amplitude000
type: Node
attributes: “type=amplitude”
root.sol000.amplitude000.time
type: Array
shape: N_times
% time_w is a virtual axis, which is present only for time and freq
% to be compatible with parmdb
root.sol000.amplitude000.time_w
type: Array
shape: N_times
root.sol000.amplitude000.freq
type: Array
shape: N_freq
% like time and freq all other axes (pol, dir, ant...) are defined
root.sol000.amplitude000.vals
type: CArray
attributes: “axes=[time,freq,pol,dir,ant…]”
“default=1”
shape: (N_times, N_freqs, N_pol, N_dir, N_ant...)
root.sol000.amplitude000.weight
type: CArray
attributes: “axes=[time,freq,pol,dir,ant…]”
“default=0”
shape: (N_times, N_freqs, N_pol, N_dir, N_ant...)
For a parmdb of 37 M:
H5parm with no compression -> 89 M
H5parm with max compression -> 18 M
Reading times between compressed and non-compressed H5parms are comparable within a factor of 2 (compressed is slower).
Compared to parmdb the reading time of the python implementation of H5parm (compressed) is a factor of a few (2 to 10) faster.