-
Notifications
You must be signed in to change notification settings - Fork 80
Description
Problem: An IDTxl Data object internally saves the data samples as a three-dimensional numpy array with dimensions (n_processes, n_samples, n_repetitions). Currently, there is no way of representing more complex processes with 1) higher dimensionality and 2) meta-information such as a continuous/discrete flag that need to be passed through to the estimator layer.
Proposal: Augment the representation of samples in the Data class by replacing the internal numpy array by an nparray subclass that
- contains additional meta-data such as dimensionality information and continuous/discrete flags for the individual processes
- implements process-wise slicing along the first dimension, i.e. if the data consists of three processes with two dimensions each, data[1:3] should return a container of shape (4, n_samples, n_repetitions) containing the first two processes
- Can easily be transformed to a regular numpy array in the estimator layer
Requirements:
- Full back-compatibility with existing implementation of algorithms: If the dimensions of all variables are set to 1, the proposed ndarray subclass should behave like a regular numpy array
- The numpy subclass objects need to be unpacked to regular ndarrays in the estimators to use regular indexing and slicing on the first axis.
- Minimal overhead: The implementation needs to ensure that data is not copied unnecessary and memory views are used whenever possible
Example of expected behaviour
a = NewArray([[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10, 11]], process_dimensions=(1, 2, 1), continuous=(True, False, True))
# Process-level indexing. This shows how to get only the second process, which is two-dimensional
a[1]
--> NewArray[[3, 4, 5], [6, 7, 8]], process_dimensions=(2,), continuous=(False,))
# Process-level slicing
a[:2]
--> NewArray[[0, 1, 2], [3, 4, 5], [6, 7, 8]], process_dimensions=(1, 2), continuous=(True, False))
# Regular slicing along second and possibly array third dimension
a[:, 1:]
--> NewArray([[1, 2], [4, 5], [7, 8], [10, 11]], process_dimensions=(1, 2, 1), continuous=(True, False, True))
# Exposing the underlying numpy array (view) for estimation
a.to_numpy()
--> np.ndarray([[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10, 11]])
Issues/Open Questions
- numerical operations might return unexpected results when using the overwritten __getitem__ operator. Since algorithms are expected to only use slicing/indexing/reordering but not numerical operations, I suggest these should raise an error unless unpacked to a regular numpy array in the estimation layer
Comments
This suggestion might be split into two: Just adding continuous/discrete meta information to the arrays without allowing for multi-dimensional processes should be easily achievable by a transparent plug-in replacement of the internal numpy arrays with no need for conversion in the estimator layer.
Note that this proposal is as of yet not final and subject to discussion.