diff --git a/item-views/index.md b/item-views/index.md new file mode 100644 index 0000000..48a2ba6 --- /dev/null +++ b/item-views/index.md @@ -0,0 +1,220 @@ +| Proposal ID | CP030 | +|------------------|----------------------------------------------| +| Name | Item Views | +| Date of Creation | 23 June 2021 | +| Target | Vendor extension | +| Current Status | *Under Review* | +| Original author | Steffen Larsen | +| Contributors | Steffen Larsen | + +# Item Views + +A common pattern in SYCL kernels is to only access the element at the work-item +offset, i.e. in SYCL 2020 vector addition can be written as + +```c++ +template +void simple_vadd(const std::array& VA, const std::array& VB, + std::array& VC) { + sycl::queue deviceQueue; + sycl::buffer bufferA{VA}; + sycl::buffer bufferB{VB}; + sycl::buffer bufferC{VC}; + + deviceQueue.submit([&](sycl::handler& cgh) { + sycl::accessor accessorA{bufferA, cgh, sycl::read_only}; + sycl::accessor accessorB{bufferB, cgh, sycl::read_only}; + sycl::accessor accessorC{bufferC, cgh, sycl::write_only}; + + cgh.parallel_for(N, [=](sycl::id<1> wiID) { + accessorC[wiID] = accessorA[wiID] + accessorB[wiID]; + }); + }); +} +``` + +where each work-item accesses exactly one element corresponding to its +identifier. Access patterns like these have a multitude of interesting +properties, but can potentially be hard to detect. + +To alleviate this, we introduce the `item_view` class. This class forces this +access pattern onto kernels by supplying only the element corresponding to +the work-item identifier, disallowing any access outside it in the underlying +buffer. As an example, vector addition using `item_view` could be written as + +```c++ +template +void simple_vadd(const std::array& VA, const std::array& VB, + std::array& VC) { + sycl::queue deviceQueue; + sycl::buffer bufferA{VA}; + sycl::buffer bufferB{VB}; + sycl::buffer bufferC{VC}; + + deviceQueue.submit([&](sycl::handler& cgh) { + sycl::item_view itemViewA{bufferA, cgh, sycl::read_only}; + sycl::item_view itemViewB{bufferB, cgh, sycl::read_only}; + sycl::item_view itemViewC{bufferC, cgh, sycl::write_only}; + + cgh.parallel_for(N, itemViewA, itemViewB, itemViewC, + [=](sycl::id<1> wiID, const T& a, const T& b, T& c) { + c = a + b; + }); + }); +} +``` + +Notice that the use of item-views is similar to reductions in SYCL 2020 and +can be used in tandem, e.g. vector dot product could be written as + +```c++ +template +T simple_vdot(const std::array& VA, const std::array& VB) { + sycl::queue deviceQueue; + sycl::buffer bufferA{VA}; + sycl::buffer bufferB{VB}; + T dotResult; + + deviceQueue.submit([&](sycl::handler& cgh) { + sycl::item_view itemViewA{bufferA, cgh}; + sycl::item_view itemViewB{bufferB, cgh} + + auto dotReduction = reduction(&dotResult, plus<>()); + + cgh.parallel_for(sycl::range(N), itemViewA, itemViewB, dotReduction, + [=](sycl::id<1> wiID, const T &a, const T &b, auto &dotAcc) { + dotAcc += a * b; + }); + }); + + return dotResult; +} +``` + +## Interface + +The following describes the interface of the `item_view` class, additions +required to the descriptions of `parallel_for` allowing `item_view` objects as +parameters, and the behavior of read and write between the `item_view` +parameters of a `parallel_for` and the corresponding kernel parameter. + +### item_view class + +This extensions add the `item_view` class, which is defined as: + +```c++ +namespace sycl { +namespace ext { +namespace codeplay { + +// Requires Dimensions > 0 +template ? access_mode::read + : access_mode::read_write)> +class item_view { +public: + using value_type = DataT; + using reference = std::conditional_t; + using const_reference = const DataT &; + static constexpr int dimensions = Dimensions; + + template + item_view(buffer&bufferRef, + handler &commandGroupHandlerRef, + const property_list &propList = {}); + + /* -- common interface members -- */ + + range get_range() const; + + size_t byte_size() const noexcept; + + size_t size() const noexcept; +}; + +} // codeplay +} // ext +} // sycl +``` + +| Constructor | Description | +|-------------|-------------| +| `template item_view(buffer&bufferRef, handler &commandGroupHandlerRef, const property_list &propList = {})` | Constructs a `item_view` instance with the `bufferRef` provided. This creates a data dependency in `commandGroupHandlerRef` for access to `bufferRef` using access mode `AccessMode`. The optional `property_list` provides properties for the constructed `item_view`. | + +| Member function | Description | +|-----------------|-------------| +| `range get_range() const` | Returns a `range` object representing the size of the `item_view` in terms of number of elements in each dimension. | +| `size_t size() const noexcept` | Returns the total number of ele­ments in the `item_view`. Equal to `get_range()[0] * ... * get_range()[dimensions-1]`. | +| `size_t byte_size() const noexcept` | Returns the size of the `item_view` stor­age in bytes. Equal to `size()*size­of(DataT)`. | + +The `item_view` class follows the common reference semantics specified in +section 4.5.2 of the SYCL 2020 specification (revision 3). + +### parallel_for allowing item_view parameters + +The description of the `handler` class member functions + +```c++ +template +void parallel_for(range numWorkItems, Rest&&... rest); + +template +void parallel_for(nd_range executionRange, Rest&&... rest); +``` + +have part of their description changed from + +> The `rest` parameter pack consists of 0 or more objects created by the +> `reduction` function, followed by a callable. For each object in `rest`, the +> kernel function must take an additional reference parameter corresponding to +> that object’s reducer type, in the same order. + +to + +> The `rest` parameter pack consists of 0 or more occurances of either +> `item_view` objects or objects created by the `reduction` function, followed +> by a callable. For each object in `rest`, the kernel function must take an +> additional reference with the following type: +> * A reference corresponding to that object’s `reducer` type. +> * `const dataT &` if the corresponding object is a `item_view` with access +> mode `access_mode::read`, where `dataT` is the data type template +> parameter of the `item_view` object. +> * `dataT &` if the corresponding object is a `item_view` with access mode +> other than `access_mode::read`, where `dataT` is the data type template +> parameter of the `item_view` object. +> +> These parameters must occur in the same order as the corresponding object +> occur in `rest`. + +Additionally, the following is appended to these descriptions: + +> Throws an `exception` with `errc::invalid` if for any `item_view` object `d` +> in the `rest` parameter pack, `d.dimensions != dimensions` evaluates to +> `true`. +> Throws an `exception` with `errc::invalid` if for any `item_view` object `d` +> in the `rest` parameter pack, for any `i` where `i < d.dimensions`, +> `d.get_range()[0] < {numWorkItems|(executionRange.get_offset() + executionRange.get_global_range())}[0]` +> evaluates to `true`. + +### Data movement + +All `item_view` objects passed to a `parallel_for` call have a corresponding +parameter in the kernel function. For an `item_view` object `d` passed to a +`parallel_for`, the added parameter `p` in the kernel function has the following +data movement behavior: + +* If `AccessMode` of `d` is `access_mode::read` or `access_mode::read_write`, + the value of `p` at the entry of the kernel function will be the same as the + element in the underlying data of `d` at index corresponding to the `id` of + the work-item. +* If `AccessMode` of `d` is `access_mode::write` or `access_mode::read_write`, + the value of `p` will be written to element in the underlying data of `d` at + index corresponding to the `id` of the work-item after the kernel returns. + +### Extension macro + +A SYCL implementation that implements this extension should define the macro +`SYCL_EXT_CODEPLAY_ITEM_VIEW`.