H5parm specifications

v. 1.0 (12/07/13)
v. 0.1 (26/07/12)

HDF5 format

There are three different types of nodes used in this hdf5 file.
Table: each row has the same fields/columns, but the type of the columns are different
Array: all elements are of the same type, usually floats
CArray: like Arrays, but here the data is stored in chunks, which allows easy access to slices of huge arrays, without loading all data in memory. These arrays can be much larger than the physically available memory, as long as there is enough disk space

Characteristics of the H5parm

solutions of multiple datasets can be stored in the same H5parm (e.g. the calibrator and the target field solutions) into different solution-sets
saves solutions of any, even customized, kind (amplitudes, amplitudes_clipped, amplitudes_smoothed, phases, phases_smoothed, clock, TEC...)
Solution interval of time/freq can be non constant along the axis (e.g. I have solutions on frequencies non uniformly distributed across the band).
different solutions-tables can have different axes. Axes examples for some solutions-tables:

amp: time, freq, pol, dir, ant
ph: time, freq, pol, dir, ant
clock: time, ant
tec: time, ant, dir
foobar: foo, bar (a generic solution of whatever type)

Implementation of the H5parm

A different hdf5 node for each dataset, so the user can specify the name of the solutions-set as the name of this node (solves point 1).
Inside the solution-set there are an arbitrary number of solution-tables (solves point 4) that are hdf5 node containing a CArray of values (and one of weights) and a number of arrays storing axes values. An attribute of the values CArrays states the axes names and their order.
The values CArray can support “holes” (solves point 3) storing a NaN inside. This must be taken into account when using these values.

Default values

Names for some standard solution-types for solution-tables: amplitude, phase, rotation, rotationmeasure, clock, TEC
Names for some standard axes and Carrays:

Name Type Example of array
time (s) - float64 - [4.867e+09, 4.868e+09, 4.869e+09]
freq (Hz) - float64 - [120e6,122e6,130e6...]
ant - string - [CS001LBA]
pol - string (2 char) - [‘XX’, ‘XY’, ‘RR’, ‘RL’]
dir - string (16 char) - [‘3C196’,’pointing’]
val - float64 - [34.543,5345.423,123.3213]
weight (0 = flagged) - float32 [from 0 to 1] - [0,1,0.9,0.7,1,0]

Some node names can be user-defined (solution-set, solution-tables), if the user does not provide a name then this standard is used:

solution-set names: sol### where ### is the smallest available integer between 000 and 999 (e.g. sol000, sol001, sol002...)
solution-tables names: soltype### where ### is the smallest available integer between 000 and 999 (e.g. aplitudes000, amplitudes001, phases000, RM000...)

Flags:

All data==nan are considered flagged (they can also have weight!=0, e.g. when the solver didn't converge because of a bed model, but data are good). All data (also data!=nan) with weight==0 are considered flagged too.

Example of a H5parm file structure

% solution-set name, can be user-defined
root.sol000
type: Node

% list of stations names (e.g. CS001LBA) / positions (format?)
root.sol000.antenna
type: Table
columns: name, position

% list of directions names (e.g. 3C196) / coords (format?)
root.sol000.source
type: Table
columns: name, coords

% Solution-table, can be a user-defined name.
root.sol000.amplitude000
type: Node
attributes: “type=amplitude”

root.sol000.amplitude000.time
type: Array
shape: N_times

% time_w is a virtual axis, which is present only for time and freq % to be compatible with parmdb
root.sol000.amplitude000.time_w
type: Array
shape: N_times

root.sol000.amplitude000.freq
type: Array
shape: N_freq

% like time and freq all other axes (pol, dir, ant...) are defined
root.sol000.amplitude000.vals
type: CArray
attributes: “axes=[time,freq,pol,dir,ant…]”
“default=1”
shape: (N_times, N_freqs, N_pol, N_dir, N_ant...)

root.sol000.amplitude000.weight
type: CArray
attributes: “axes=[time,freq,pol,dir,ant…]”
“default=0”
shape: (N_times, N_freqs, N_pol, N_dir, N_ant...)

Benchmark

For a parmdb of 37 M:

H5parm with no compression -> 89 M

H5parm with max compression -> 18 M

Reading times between compressed and non-compressed H5parms are comparable within a factor of 2 (compressed is slower).

Compared to parmdb the reading time of the python implementation of H5parm (compressed) is a factor of a few (2 to 10) faster.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

H5parm specifications

HDF5 format

Characteristics of the H5parm

Implementation of the H5parm

Default values

Flags:

Example of a H5parm file structure

Benchmark

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally