Proposal for improvements to the IDTxl Data object

**Problem:** An IDTxl Data object internally saves the data samples as a three-dimensional numpy array with dimensions (n_processes, n_samples, n_repetitions). Currently, there is no way of representing more complex processes with 1) higher dimensionality and 2) meta-information such as a continuous/discrete flag that need to be passed through to the estimator layer.

**Proposal:** Augment the representation of samples in the Data class by replacing the internal numpy array by an nparray subclass that
- contains additional meta-data such as dimensionality information and continuous/discrete flags for the individual processes
- implements *process-wise* slicing along the first dimension, i.e. if the data consists of three processes with two dimensions each, data[1:3] should return a container of shape (4, n_samples, n_repetitions) containing the first two processes
- Can easily be transformed to a regular numpy array in the estimator layer

**Requirements:**
- Full back-compatibility with existing implementation of algorithms: If the dimensions of all variables are set to 1, the proposed ndarray subclass should behave like a regular numpy array
- The numpy subclass objects need to be unpacked to regular ndarrays in the estimators to use regular indexing and slicing on the first axis.
- Minimal overhead: The implementation needs to ensure that data is not copied unnecessary and memory views are used whenever possible

**Example of expected behaviour**

a = NewArray([[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10, 11]], process_dimensions=(1, 2, 1), continuous=(True, False, True))

\# Process-level indexing. This shows how to get only the second process, which is two-dimensional\
a[1]\
--> NewArray[[3, 4, 5], [6, 7, 8]], process_dimensions=(2,), continuous=(False,))

\# Process-level slicing\
a[:2]\
--> NewArray[[0, 1, 2], [3, 4, 5], [6, 7, 8]], process_dimensions=(1, 2), continuous=(True, False))

\# Regular slicing along second and possibly array third dimension\
a[:, 1:]\
--> NewArray([[1, 2], [4, 5], [7, 8], [10, 11]], process_dimensions=(1, 2, 1), continuous=(True, False, True))

\# Exposing the underlying numpy array (view) for estimation
a.to_numpy()\
--> np.ndarray([[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10, 11]])

**Issues/Open Questions**
- numerical operations might return unexpected results when using the overwritten \_\_getitem\_\_ operator. Since algorithms are expected to only use slicing/indexing/reordering but not numerical operations, I suggest these should raise an error unless unpacked to a regular numpy array in the estimation layer

**Comments**
This suggestion might be split into two: Just adding continuous/discrete meta information to the arrays without allowing for multi-dimensional processes should be easily achievable by a transparent plug-in replacement of the internal numpy arrays with no need for conversion in the estimator layer.

Note that this proposal is as of yet not final and subject to discussion.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Proposal for improvements to the IDTxl Data object #90

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Proposal for improvements to the IDTxl Data object #90

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions