You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current interface of the CLI arrays (beyond of what is provided directly by .NET) is in my opinion somewhat unfortunate. It is not wrong: it simply follows the pattern of other builtin collections, like list, tuple etc. However, all those other types are 1-dimensional, so their API makes very much sense. At the same time, it does not scale well to more dimensions. This can be seen with memoryview, the only builtin Python type that has any notion of multidimensionality. It too follows the list pattern, and when operations are applied on an N-dimensional (ND) memoryview, it is either implicitly flattened or raises NotImplementedError.
To be more specific, I will use the two cases that the CLI array currently implements that are from the list pattern: the addition and multiplication operator.
With list etc. addition is concatenation:
>>> [1, 2] + [3, 4]
[1, 2, 3, 4]
This is very useful for 1D structures, but does not scale up to higher dimensions: even if operator + were to mean concatenation, along which dimension should it operate? There is no way to pass an extra parameter indicating the dimension to use. It can be assumed that it is the last dimension, but it leaves a big gap in API for concatenating along arbitrary dimension.
With list etc. multiplication is repeated concatenation with itself:
This is occasionally useful, and again it does not scale up to higher dimensions.
The next one to consider is slicing. Slicing can be seen as as specific way of indexing, except that it retrieves a substructure, rather than an individual element. The question is how slicing would work on an ND array (currently it is not supported). The only guidance from Python itself would be the behaviour of memoryview as the only builtin ND structure. However, beyond a simple element retrieval, memoryview does not support much of anything.
>>> m =memoryview(b"abcd")
>>> list(m)
[97, 98, 99, 100]
>>> m2 = m.cast('b', (2, 2))
>>> list(m2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NotImplementedError: multi-dimensional sub-views are not implemented
>>> m[0]
97
>>> m2[0, 0]
97
>>> m2[0]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NotImplementedError: multi-dimensional sub-views are not implemented
>>> list(m[:1])
[97]
>>> list(m2[:1])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NotImplementedError: multi-dimensional sub-views are not implemented
>>> list(m2[:1,0])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: memoryview: invalid slice key
So memoryview does not offer much of the guidance on what would bye a Pythonic API for ND arrays.
The Proposal
But there is another reference point for ND arrays that is by now practically accepted in the Python community as the golden standard: NumPy. Python lacks the batteries to support multidimensional arrays, but practically NumPy can be now consideded as a CPython's extension; this being one of the reasons it is so hard to support NumPy in IronPython. There is even a special language element in Python introduced specifically to support NumPy's ndarray: ... (or Ellipsis). IronPython may not support NumPy directly (see Ironclad), but it does come with a fundamental data structure that looks like a lot like NumPy's ndarray, and that is the CLI array: it is memory contiguous, elements are typed, supports multiple dimensions.
The proposal is to adopt the NumPy's API pattern to IronPython's extended (Pythonized) support for CLI arrays. CLI arrays may not replace genuine numpy.ndarray, (for one thing, they are limited to sightly less than 2 GiB in size, which is a limitation of .NET), but surely they can do a lot. A lot more than they do now. Perhaps even enough to let other packages that use NumPy run on IronPython, e.g. pandas. Also, 2 GiB is still a lot of data, should be enough to fit an Excel spreadsheet.
There is one aspect that CLI arrays have that NumPy doesn't have: non-zero based arrays. Since this is such a niche feature, I think it is sufficient to limit the operation on arrays of compatible (i..e. the same) base.
Examples
Here are some examples how the operations are defined on ndarray and would henceforth be applicable to System.Array as well:
Addition is element-wise. When adding arrays with different dimensions, the missing dimensions are completed with a broadcast from lower dimensions.
Multiplication is, like addition, element wise and follows the same rules.
Array concatenation is done by a function call. The most versatile is concatenate, but there are a few more that make assumptions about the dimension along which arrays should be concatenated (e.g. vstack, hstack).
And of course, ndarray has a first-class well-defined slicing interface.
The Plan
Luckily, the Python-level CLI array support is currently so limited that there is little in the way towards the ndarray API pattern. The currently "not implemented" cases can simply be implemented following the ndarray lines, rather than the list lines. The only two problematic cases are the operators + and *. Changing their semantics would be a breaking change to the existing IronPython codebase. Personally, I doubt they are being used a lot (it is currently a way easier e.g. to use lists or tuples than CLI arrays), nevertheless this change has to be managed properly. I see the following steps as a possible way:
Implement concatenate and tile but leave operators + and * unchanged.
Add a new IronPython option (command-line and as an option to the engine, maybe even as an environment variable) that changes the semantics of the operators. The option would have three values default, legacy, and ndarray. If it is ndarray the numpy.ndarray semantics is applied. In the remaining cases the old semantics stays in place.
Start generating runtime warnings if the operators in question are being used but the runtime option is not explicitly set to legacy or ndarray (default will still be default and defaulting to legacy with a warning).
Do a release, so that the users that are affected by this have the time to adapt and choose what to do in their code.
Change the meaning of default to perform the ndarray semantics, but still with a warning (which can be silenced by being explicit about the choice).
Do a release (probably a year later).
Remove the warning. There is enough time for the users to adapt. For those coming from IronPython 2, the note can be put in the document about migration.
The text was updated successfully, but these errors were encountered:
The Problem
The current interface of the CLI arrays (beyond of what is provided directly by .NET) is in my opinion somewhat unfortunate. It is not wrong: it simply follows the pattern of other builtin collections, like
list
,tuple
etc. However, all those other types are 1-dimensional, so their API makes very much sense. At the same time, it does not scale well to more dimensions. This can be seen withmemoryview
, the only builtin Python type that has any notion of multidimensionality. It too follows the list pattern, and when operations are applied on an N-dimensional (ND)memoryview
, it is either implicitly flattened or raisesNotImplementedError
.To be more specific, I will use the two cases that the CLI array currently implements that are from the
list
pattern: the addition and multiplication operator.With
list
etc. addition is concatenation:This is very useful for 1D structures, but does not scale up to higher dimensions: even if operator
+
were to mean concatenation, along which dimension should it operate? There is no way to pass an extra parameter indicating the dimension to use. It can be assumed that it is the last dimension, but it leaves a big gap in API for concatenating along arbitrary dimension.With
list
etc. multiplication is repeated concatenation with itself:This is occasionally useful, and again it does not scale up to higher dimensions.
The next one to consider is slicing. Slicing can be seen as as specific way of indexing, except that it retrieves a substructure, rather than an individual element. The question is how slicing would work on an ND array (currently it is not supported). The only guidance from Python itself would be the behaviour of
memoryview
as the only builtin ND structure. However, beyond a simple element retrieval,memoryview
does not support much of anything.So
memoryview
does not offer much of the guidance on what would bye a Pythonic API for ND arrays.The Proposal
But there is another reference point for ND arrays that is by now practically accepted in the Python community as the golden standard: NumPy. Python lacks the batteries to support multidimensional arrays, but practically NumPy can be now consideded as a CPython's extension; this being one of the reasons it is so hard to support NumPy in IronPython. There is even a special language element in Python introduced specifically to support NumPy's
ndarray
:...
(orEllipsis
). IronPython may not support NumPy directly (see Ironclad), but it does come with a fundamental data structure that looks like a lot like NumPy'sndarray
, and that is the CLI array: it is memory contiguous, elements are typed, supports multiple dimensions.The proposal is to adopt the NumPy's API pattern to IronPython's extended (Pythonized) support for CLI arrays. CLI arrays may not replace genuine
numpy.ndarray
, (for one thing, they are limited to sightly less than 2 GiB in size, which is a limitation of .NET), but surely they can do a lot. A lot more than they do now. Perhaps even enough to let other packages that use NumPy run on IronPython, e.g.pandas
. Also, 2 GiB is still a lot of data, should be enough to fit an Excel spreadsheet.There is one aspect that CLI arrays have that NumPy doesn't have: non-zero based arrays. Since this is such a niche feature, I think it is sufficient to limit the operation on arrays of compatible (i..e. the same) base.
Examples
Here are some examples how the operations are defined on
ndarray
and would henceforth be applicable toSystem.Array
as well:Addition is element-wise. When adding arrays with different dimensions, the missing dimensions are completed with a broadcast from lower dimensions.
Multiplication is, like addition, element wise and follows the same rules.
Array concatenation is done by a function call. The most versatile is
concatenate
, but there are a few more that make assumptions about the dimension along which arrays should be concatenated (e.g.vstack
,hstack
).Array repeating concatenation with itself is done with
tile
:And of course,
ndarray
has a first-class well-defined slicing interface.The Plan
Luckily, the Python-level CLI array support is currently so limited that there is little in the way towards the
ndarray
API pattern. The currently "not implemented" cases can simply be implemented following thendarray
lines, rather than thelist
lines. The only two problematic cases are the operators+
and*
. Changing their semantics would be a breaking change to the existing IronPython codebase. Personally, I doubt they are being used a lot (it is currently a way easier e.g. to use lists or tuples than CLI arrays), nevertheless this change has to be managed properly. I see the following steps as a possible way:concatenate
andtile
but leave operators+
and*
unchanged.numpy.ndarray
semantics is applied. In the remaining cases the old semantics stays in place.The text was updated successfully, but these errors were encountered: