diff --git a/affinity/cpp-20/d0796r2.md b/affinity/cpp-20/d0796r2.md index e0a56d5..c59de8c 100644 --- a/affinity/cpp-20/d0796r2.md +++ b/affinity/cpp-20/d0796r2.md @@ -12,6 +12,8 @@ **Reply to: gordon@codeplay.com** +**Status: Exploratory** + # Changelog ### P0796r2 (RAP) @@ -100,9 +102,10 @@ Some systems give additional user control through explicit binding of threads to In this paper we describe the problem space of affinity for C++, the various challenges which need to be addressed in defining a partitioning and affinity interface for C++, and some suggested solutions. These include: -* How to represent, identify and navigate the topology of execution resources available within a heterogeneous or distributed system. -* How to query and measure the relative affininty between different execution resources within a system. -* How to bind execution and allocation particular execution resource(s). +* How to represent, identify and navigate the topology of execution and memory resources available within a heterogeneous or distributed system. +* How to query and measure the relative affininty between execution and memory resources within a system. +* How to bind execution to particular execution resource(s). +* How to bind allocation to particular memory resource(s). * What kind of and level of interface(s) should be provided by C++ for affinity. Wherever possible, we also evaluate how an affinity-based solution could be scaled to support both distributed and heterogeneous systems. @@ -114,25 +117,39 @@ There are also some additional challenges which we have been investigating but a ### Querying and representing the system topology -The first task in allowing C++ applications to leverage memory locality is to provide the ability to query a *system* for its *resource topology* (commonly represented as a tree or graph) and traverse its *execution resources*. +The first task in allowing C++ applications to leverage memory locality is to provide the ability to query a *system* for its *resource topology* (commonly represented as a tree or graph) and traverse its *execution resources* and *memory resources*. The capability of querying underlying *execution resources* of a given *system* is particularly important towards supporting affinity control in C++. The current proposal for executors [[22]][p0443r4] leaves the *execution resource* largely unspecified. This is intentional: *execution resources* will vary greatly between one implementation and another, and it is out of the scope of the current executors proposal to define those. There is current work [[23]][p0737r0] on extending the executors proposal to describe a typical interface for an *execution context*. In this paper a typical *execution context* is defined with an interface for construction and comparison, and for retrieving an *executor*, waiting on submitted work to complete and querying the underlying *execution resource*. Extending the executors interface to provide topology information can serve as a basis for providing a unified interface to expose affinity. This interface cannot mandate a specific architectural definition, and must be generic enough that future architectural evolutions can still be expressed. -Two important considerations when defining a unified interface for querying the *resource topology* of a *system*, are (a) what level of abstraction such an interface should have, and (b) at what granularity it should describe the topology's *execution resources*. As both the level of abstraction of an *execution resource* and the granularity that it is described in will vary greatly from one implementation to another, it’s important for the interface to be generic enough to support any level of abstraction. To achieve this we propose a generic hierarchical structure of *execution resources*, each *execution resource* being composed of other *execution resources* recursively. Each *execution resource* within this hierarchy can be used to place memory (i.e., allocate memory within the *execution resource’s* memory region), place execution (i.e. bind an execution to an *execution resource’s execution agents*), or both. +Two important considerations when defining a unified interface for querying the *resource topology* of a *system*, are (a) what level of abstraction such an interface should have, and (b) at what granularity it should describe the topology's *execution resources* and *memory resources*. As both the level of abstraction of resources and the granularity that they describe in will vary greatly from one implementation to another, it’s important for the interface to be generic enough to support any level of abstraction. To achieve this we propose generic hierarchical structures for *execution resources* and *memory resources*, where each *resource* being composed of other *resources* recursively. -For example, a NUMA system will likely have a hierarchy of nodes, each capable of placing memory and placing agents. A system with both CPUs and GPUs (programmable graphics processing units) may have GPU local memory regions capable of placing memory, but not capable of placing agents. +Each *execution resource* within its hierarchy can be used to place execution (i.e. bind an execution to an *execution resource*). +Each *memory resource* within its hierarchy can be used to place memory (i.e., allocate memory within the *memory resource*). +For example, a NUMA system will likely have a hierarchy of nodes, each capable of placing memory and placing execution. A CPU + GPU system may have GPU local memory regions capable of placing memory, but not capable of placing execution. -Nowadays, there are various APIs and libraries that enable this functionality. One of the most commonly used is [Portable Hardware Locality (hwloc)][hwloc]. Hwloc presents the hardware as a tree, where the root node represents the whole machine and subsequent levels represent different partitions depending on different hardware characteristics. The picture below shows the output of the hwloc visualization tool (lstopo) on a 2-socket Xeon E5300 server. Note that each socket is represented by a package in the graph. Each socket contains its own cache memories, but both share the same NUMA memory region. Note also that different I/O units are visible underneath. Placement of these I/O units with respect to memory and threads can be critical to performance. The ability to place threads and/or allocate memory appropriately on the different components of this system is an important part of the process of application development, especially as hardware architectures get more complex. The documentation of lstopo [[21]][lstopo] shows more interesting examples of topologies that appear on today's systems. +Nowadays, there are various APIs and libraries that enable this functionality. One of the most commonly used is the [Portable Hardware Locality (hwloc)][hwloc]. Hwloc presents the execution and memory hardware as a single tree, where the root node represents the whole machine and subsequent levels represent different partitions depending on different hardware characteristics. The picture below shows the output of the hwloc visualization tool (lstopo) on a 2-socket Xeon E5300 server. Note that each socket is represented by a package in the graph. Each socket contains its own cache memories, but both share the same NUMA memory region. Note also that different I/O units are visible underneath: Placement of these units with respect to memory and threads can be critical to performance. The ability to place threads and/or allocate memory appropriately on the different components of this system is an important part of the process of application development, especially as hardware architectures get more complex. The documentation of lstopo [[21]][lstopo] shows more interesting examples of topologies that appear on today's systems. ![alt text](hwloc-topology.png "Hwloc topology") -The interface of `thread_execution_resource_t` proposed in the execution context proposal [[23]][p0737r0] proposes a hierarchical approach where there is a root resource and each resource has a number of child resources. However, systems are becoming increasingly non-hierarchical and a traditional tree-based representation of a *system’s resource topology* may not suffice any more [[24]][exposing-locality]. The HSA standard solves this problem by allowing a node in the topology to have multiple parent nodes [19]. +The interface of `thread_execution_resource` proposed in the execution context proposal [[23]][p0737r0] proposes a hierarchical approach where there is a root resource and each resource has a number of child resources. + +Some heterogeneous systems execution and memory resources are not naturally represented by a single tree [[24]][exposing-locality]. The HSA standard solves this problem by allowing a node in the topology to have multiple parent nodes [19]. -The interface for querying the *resource topology* of a *system* must be flexible enough to allow querying all *execution resources* available under an *execution context*, querying the *execution resources* available to the entire system, and constructing an *execution context* for a particular *execution resource*. This is important, as many standards such as OpenCL [[6]][opencl-2-2] and HSA [[7]][hsa] require the ability to query the *resource topology* available in a *system* before constructing an *execution context* for executing work. +The interface for querying the *resource topology* of a *system* +must be flexible enough to allow +querying the *execution resources* and *memory resources* +available to the entire system, +affinity between an *execution resource* and *memory resource*, +querying the *execution resource* associated with an *execution context*, and +constructing an *execution context* for a particular *execution resource*. +This is important, as many standards such as OpenCL [[6]][opencl-2-2] +and HSA [[7]][hsa] require the ability to query the *resource topology* +available in a *system* before constructing an *execution context* +for executing work. > For example, an implementation may provide an execution context for a particular execution resource such as a static thread pool or a GPU context for a particular GPU device, or an implementation may provide a more generic execution context which can be constructed from a number of CPU and GPU devices queryable through the system resource topology. -### Topology discovery & fault tolerance +### Dynamic resource discovery & fault tolerance: currently out of scope In traditional single-CPU systems, users may reason about the execution resources with standard constructs such as `std::thread`, `std::this_thread` and `thread_local`. This is because the C++ machine model requires that a system have **at least one thread of execution, some memory, and some I/O capabilities**. Thus, for these systems, users may make some assumptions about the system resource topology as part of the language and its supporting standard library. For example, one may always ask for the available hardware concurrency, since there is always at least one thread, and one may always use thread-local storage. @@ -162,7 +179,7 @@ The initial solution should target systems with a single addressable memory regi ### Querying the relative affinity of partitions -In order to make decisions about where to place execution or allocate memory in a given *system’s resource topology*, it is important to understand the concept of affinity between different *execution resources*. This is usually expressed in terms of latency between two resources. Distance does not need to be symmetric in all architectures. +In order to make decisions about where to place execution or allocate memory in a given *system’s resource topology*, it is important to understand the concept of affinity between an *execution resource* and a *memory resource*. This is usually expressed in terms of latency between these resources. Distance does not need to be symmetric in all architectures. The relative position of two components in the topology does not necessarily indicate their affinity. For example, two cores from two different CPU sockets may have the same latency to access the same NUMA memory node. @@ -174,99 +191,96 @@ This feature could be easily scaled to heterogeneous and distributed systems, as In this paper we propose an interface for querying and representing the execution resources within a system, queurying the relative affinity metric between those execution resources, and then using those execution resources to allocate memory and execute work with affinity to the underlying hardware. The interface described in this paper builds on the existing interface for executors and execution contexts defined in the executors proposal [[22]][p0443r4]. -### Execution resources +### Terminology -An `execution_resource` is a lightweight structure which acts as an identifier to particular piece of hardware within a system. It can be queried for whether it can allocate memory via `can_place_memory`, whether it can execute work via `can_place_agents`, and for its name via `name`. An `execution_resource` can also represent other `execution_resource`s. We call these *members of* that `execution_resource`, and can be queried via `resources`. Additionally the `execution_resource` which another is a *member of* can be queried vis `member_of`. An `execution_resource` can also be queried for the concurrency it can provide, the total number of *threads of execution* supported by that *execution_resource*, and all resources it represents. +An **execution agent** executes work, typically implemented by a *callable*, +on an **execution resource** of a given **execution architecture**. +An **execution context** manages a set of execution agents on an +execution resource. +An **executor** submits work to an execution context. +More that one executor may submit work to an execution context. +More than on execution context may manage execution agents +on an execution resource. -> [*Note:* Note that an execution resource is not limited to resources which execute work, but also a general resource where no execution can take place but memory can be allocated such as off-chip memory. *--end note*] +> [*Note:* The execution context terminology used here +and in the Networking TS [[33]][networking-ts] deviate from the +traditional *context of execution* usage that refers +to the state of a single executing callable; *e.g.*, +program counter, registers, stack frame. *--end note*] -> [*Note:* The intention is that the actual implementation details of a resource topology are described in an execution context when required. This allows the execution resource objects to be lightweight objects that serve as identifiers that are only referenced. *--end note*] +The **concurrency** of an execution resource is an upper bound of the +number of execution agents that could concurrently make forward progress +on that execution resource. +It is guaranteed that no more than **concurrency** execution agents +could make concurrent forward progress; it is not guaranteed that +**concurrency** execution agents will ever concurrently make forward progress. + +The **affinity** between an execution resource and memory resource +is an upper bound of the uncontended bandwidth or latency +between an execution agent running on the execution resource and +memory allocated from the memory resource. -### System topology -The system topology is made up of a number of system-level `execution_resource`s, which can be queried through `this_system::get_resources` which returns a `std::vector`. A run-time library may initialize the `execution_resource`s available within the system dynamically. However, this must be done before `main` is called, given that after that point, the system topology may not change. +### System topology -Below *(Listing 2)* is an example of iterating over the system-level resources and printing out their capabilities. +A *system* includes execution resources and memory resources. +Execution resources are organized hierarchically. +A particular execution resource may be a partitioned into +a collection of execution resources referred to as *members of* +that execution resource. +The partitioning of execution resources implies a locality relationship; +*e.g.*, if `{{A,B},{C,D}}` is a hierarchical partitioning then +`A` is more local to `B` than it is to `C` or `D`. + +An execution resource has one or more memory resources +which can allocate memory accessible to that execution resource. +When an execution resource hierarchy is traversed the associated +memory resources equal or greater affinity than memory resources +associated with the coarser execution resource. ```cpp -for (auto res : execution::this_system::get_resources()) { - std::cout << res.name() `\n`; - std::cout << res.can_place_memory() << `\n`; - std::cout << res.can_place_agents() << `\n`; - std::cout << res.concurrency() << `\n`; +size_t count = 0 ; +for (auto member : execution::this_system::execution_resource()) { + count += member.concurrency(); + std::cout << member.name() `\n`; + std::cout << member.concurrency() << `\n`; } +assert( count == execution::this_system::execution_resource().concurrency() ); ``` *Listing 2: Example of querying all the system level execution resources* -### Current resource - -The `execution_resource` which underlies the current thread of execution can be queried through `this_thread::get_resource`. +> [*Note:* The intention is that the actual implementation details of a resource topology are described in an execution context when required. This allows the execution resource objects to be lightweight objects that serve as identifiers that are only referenced. *--end note*] ### Querying relative affinity -The `affinity_query` class template provides an abstraction for a relative affinity value between two `execution_resource`s. This value depends on a particular `affinity_operation` and `affinity_metric`. As a result, the `affinity_query` is templated on `affinity_operation` and `affinity_metric`, and is constructed from two `execution_resource`s. An `affinity_query` is not meant to be meaningful on its own. Instead, users are meant to compare two queries with comparison operators, in order to get a relative magnitude of affinity. If necessary, the value of an `affinity_query` can also be queried through `native_affinity`, though the return value of this is implementation defined. +The `affinity_query` class template provides an abstraction for a relative affinity value between an execution resource and a memory resource, derived from a particular `affinity_operation` and `affinity_metric`. The `affinity_query` is templated by `affinity_operation` and `affinity_metric` and is constructed from an execution resource and memory resource. An `affinity_query` does not mean much on it's own, instead a relative magnitude of affinity can be queried by using comparison operators. If nessesary the value of an `affinity_query` can also be queried through `native_affinity`, though the return value of this is implementation defined. -Below *(listing 3)* is an example of how to query the relative affinity between two `execution_resource`s. +Below *(listing 3)* is an example of how you can query the relative affinity between two execution resources. ```cpp -auto systemLevelResources = execution::this_system::get_resources(); -auto memberResources = systemLevelResources.resources(); +auto exec0 = *execution::this_system::execution_resource().begin(); +auto exec1 = *++execution::this_system::execution_resource().begin(); +auto mem0 = *exec0.memory_resource().begin(); -auto relativeLatency01 = execution::affinity_query(memberResources[0], memberResources[1]); +auto relativeLatency0 = execution::affinity_query(exec0,mem0); -auto relativeLatency02 = execution::affinity_query(memberResources[0], memberResources[2]); +auto relativeLatency1 = execution::affinity_query(exec1,mem0); -auto relativeLatency = relativeLatency01 > relativeLatency02; +auto relativeLatency = relativeLatency0 > relativeLatency1; ``` *Listing 3: Example of querying affinity between two `execution_resource`s.* > [*Note:* This interface for querying relative affinity is a very low-level interface designed to be abstracted by libraries and later affinity policies. *--end note*] -### Execution context - -The `execution_context` class provides an abstraction for managing a number of lightweight execution agents executing work on an `execution_resource` and any `execution_resource`s encapsulated by it. An `execution_context` can then provide an executor for executing work and an allocator or polymorphic memory resource for allocating memory. The `execution_context` is constructed with an `execution_resource`. Then, the `execution_context` may execute work or allocate memory for that `execution_resource` and an `execution_resource` that it represents. - -Below *(Listing 4)* is an example of how this extended interface could be used to construct an *execution context* from an *execution resource* which is retrieved from the *system’s resource topology*. Once an *execution context* is constructed it can then still be queried for its *execution resource*, and that *execution resource* can be further partitioned. - -```cpp -auto &resources = execution::this_system::get_resources(); - -execution::execution_context execContext(resources[0]); - -auto &systemLevelResource = execContext.resource(); - -// resource[0] should be equal to execResource - -for (auto res : systemLevelResource.resources()) { - std::cout << res.name() << `\n`; -} -``` -*Listing 4: Example of constructing an execution context from an execution resource* - ### Binding execution and allocation to resources -When creating an `execution_context` from a given `execution_resource`, the executors and allocators associated with it are bound to that `execution_resource`. For example, when creating an `execution_resource` from a CPU socket resource, all executors associated with the given socket will spawn execution agents with affinity to the socket partition of the system *(Listing 5)*. - -```cpp -auto cList = std::execution::this_system::get_resources(); -// FindASocketResource is a user-defined function that finds a -// resource that is a CPU socket in the given resource list -auto& socket = findASocketResource(cList); -execution_contextC{socket} // Associated with the socket -auto executor = eC.executor(); // By transitivity, associated with the socket too -auto socketAllocator = eC.allocator(); // Retrieve an allocator to the closest memory node -std::vector v1(100, socketAllocator); -std::generate(par.on(executor), std::begin(v1), std::end(v1), std::rand); -``` -*Listing 5: Example of allocating with affinity to an execution resource* - -The construction of an `execution_context` on a component implies affinity (where possible) to the given resource. This guarantees that all executors created from that `execution_context` can access the resources and the internal data structures requires to guarantee the placement of the processor. +An execution context, such as a thread pool, may be bound to an execution resource. +Binding enables multiple execution contexts to be created and bound to disjoint +execution resources so that their respective execution agents do not compete for +their respective execution resources. -Only developers that care about resource placement need to care about obtaining executors and allocations from the correct `execution_context` object. Existing code for vectors and STL (including the Parallel STL interface) remains unaffected. - -If a particular policy or algorithm requires to access placement information, the resources associated with the passed executor can be retrieved via the link to the `execution_context`. ## Header `` synopsis @@ -274,54 +288,37 @@ If a particular policy or algorithm requires to access placement information, th namespace experimental { namespace execution { - /* Execution resource */ + /* Execution resource capable of executing std::thread */ - class execution_resource { + class thread_execution_resource { public: + using iterator = /* implementation defined, dereferences to thread_execution_resource */ ; - execution_resource() = delete; - execution_resource(const execution_resource &); - execution_resource(execution_resource &&); - execution_resource &operator=(const execution_resource &); - execution_resource &operator=(execution_resource &&); - ~execution_resource(); + ~thread_execution_resource(); + thread_execution_resource() = delete; + thread_execution_resource(const thread_execution_resource &) = delete ; + thread_execution_resource(thread_execution_resource &&) = delete ; + thread_execution_resource &operator=(const thread_execution_resource &) = delete ; + thread_execution_resource &operator=(thread_execution_resource &&) = delete ; size_t concurrency() const noexcept; - std::vector resources() const noexcept; + iterator begin() const noexcept ; + iterator end() const noexcept ; const execution_resource member_of() const noexcept; - std::string name() const noexcept; - - bool can_place_memory() const noexcept; - bool can_place_agent() const noexcept; - - }; - - /* Execution context */ - - class execution_context { - public: - - using executor_type = see-below; - - using pmr_memory_resource_type = see-below; - - using allocator_type = see-below; - - execution_context(const execution_resource &) noexcept; + struct memory_resources_t { + using iterator = /* implementation defined, dereference to std::pmr::memory_resource */ + iterator begin() const noexcept ; + iterator end() const noexcept ; + }; - ~execution_context(); + memory_resources_t const & memory_resources() const noexcept ; - const execution_resource &resource() const noexcept; - - executor_type executor() const; - - pmr_memory_resource_type &memory_resource() const; - - allocator_type allocator() const; + std::string name() const noexcept; + bool can_bind() const noexcept; }; /* Affinity query */ @@ -334,9 +331,9 @@ If a particular policy or algorithm requires to access placement information, th public: using native_affinity_type = see-below; - using error_type = see-below + using error_type = see-below; - affinity_query(execution_resource &&, execution_resource &&) noexcept; + affinity_query(execution_resource & const , std::pmr::memory_resource & const ) noexcept; ~affinity_query(); @@ -348,7 +345,6 @@ If a particular policy or algorithm requires to access placement information, th friend expected operator>(const affinity_query&, const affinity_query&); friend expected operator<=(const affinity_query&, const affinity_query&); friend expected operator>=(const affinity_query&, const affinity_query&); - }; } // execution @@ -356,7 +352,7 @@ If a particular policy or algorithm requires to access placement information, th /* This system */ namespace this_system { - std::vector resources() noexcept; + thread_execution_resource() noexcept ; } /* This thread */ @@ -370,108 +366,36 @@ If a particular policy or algorithm requires to access placement information, th *Listing 6: Header synopsis* -## Class `execution_resource` +## Class `thread_execution_resource` -The `execution_resource` class provides an abstraction over a system's hardware, that can allocate memory and/or execute lightweight execution agents. An `execution_resource` can represent further `execution_resource`s. We say that these `execution_resource`s are *members of* this `execution_resource`. - -> [*Note:* The `execution_resource` is required to be implemented such that the underlying software abstraction is initialized when the `execution_resource` is constructed, maintained through reference counting, and cleaned up on destruction of the final reference. *--end note*] - -### `execution_resource` constructors - - execution_resource(); - -> [*Note:* An implementation of `execution_resource` is permitted to provide non-public constructors to allow other objects to construct them. *--end note*] - -### `execution_resource` assignment - - execution_resource(const execution_resource &); - execution_resource(execution_resource &&); - execution_resource &operator=(const execution_resource &); - execution_resource &operator=(execution_resource &&); - -### `execution_resource` destructor - - ~execution_resource(); +The `thread_execution_resource` class provides an interface to a system's hardware +capable of executing `std::thread`s. ### `execution_resource` operations size_t concurrency() const noexcept; -*Returns:* The total concurrency available to this resource. More specifically, the number of *threads of execution* collectively available to this `execution_resource` and any resources which are *members of*, recursively. +*Returns:* The upper bound concurrency available to this resource; +*i.e.*, the number of execution agents that could possibly concurrently make forward progress. +when execution on this execution resource. - std::vector resources() const noexcept; + iterator begin() const noexcept; + iterator end() const noexcept; -*Returns:* All `execution_resource`s which are *members of* this resource. +*Returns:* Range of member execution resources. - const execution_resource &member_of() const noexcept; + const thread_execution_resource &member_of() const noexcept; -*Returns:* The `execution_resource` which this resource is a *member of*. +*Returns:* The `thread_execution_resource` which this resource is a *member of*. std::string name() const noexcept; *Returns:* An implementation defined string. - bool can_place_memory() const noexcept; - -*Returns:* If this resource is capable of allocating memory with affinity, 'true'. - - bool can_place_agent() const noexcept; - -*Returns:* If this resource is capable of execute with affinity, 'true'. - -## Class `execution_context` - -The `execution_context` class provides an abstraction for managing a number of lightweight execution agents executing work on an `execution_resource` and any `execution_resource`s encapsulated by it. The `execution_resource` which an `execution_context` encapsulates is refered to as the *contained resource*. - -### `execution_context` types - - using executor_type = see-below; - -*Requires:* `executor_type` is an implementation defined class which satifies the general executor requires, as specified by P0443r5. - - using pmr_memory_resource_type = see-below; - -*Requires:* `pmr_memory_resource_type` is an implementation defined class which inherits from `std::pmr::memory_resource`. - - using allocator_type = see-below; - -*Requires:* `allocator_type` is an implementation defined allocator class. + bool can_bind() const noexcept; -### `execution_context` constructors +*Returns:* True if this resource is capable of binding `std::thread`s. - execution_context(const execution_resource &) noexcept; - -*Effects:* Constructs an `execution_context` with the provided resource as the *contained resource*. - -### `execution_context` destructor - - ~execution_context(); - -*Effects:* May or may not block to wait any work being executed on the *contained resource*. - -### `execution_context` operators - - const execution_resource &resource() const noexcept; - -*Returns:* A const-reference to the *contained resource*. - - executor_type executor() noexcept; - -*Returns:* An executor of type `executor_type` capable of executing work with affinity to the *contained resource*. - -*Throws:* An exception `!this->resource().can_place_agents()`. - - pmr::memory_resource &memory_resource() noexcept; - -*Returns:* A reference to a polymorphic memory resource of type `pmr_memory_resource_type` capable of allocating with affinity to the *contained resource*. - -*Throws:* If `!this->resource().can_place_memory()`. - - allocator_type allocator() const; - -*Returns:* An allocator of type `allocator_type` capable of allocating with affinity to the *contained resource*. - -*Throws:* If `!this->resource().can_place_memory()`. ## Class template `affinity_query` @@ -489,7 +413,7 @@ The `affinity_query` class template provides an abstraction for a relative affin ### `affinity_query` constructors - affinity_query(const execution_resource &, const execution_resource &) noexcept; + affinity_query(const execution_resource &, const std::pmr::memory_resource &) noexcept; ### `affinity_query` destructor @@ -520,35 +444,32 @@ The `affinity_query` class template provides an abstraction for a relative affin ## Free functions -### `this_system::get_resources` +The free function `this_system::execution_resource()` is provided for retrieving the `execution_resource`s which encapsulate the hardware platforms available within the system, these are refered to as the *system level resources*. -The free function `this_system::get_resources` is provided for retrieving the `execution_resource`s which encapsulate the hardware platforms available within the system. We refer to these resources as the *system level resources*. + thread_execution_resource this_system::execution_resource() noexcept ; - std::vector resources() noexcept; +*Returns:* Execution resource encapsulating all resources on which the program is permitted to executed a `std::thread`. -*Returns:* An `std::vector` containing all *system level resources*. +### `this_thread::get_execution_resource` -*Requires:* If `this_system::get_resources().size() > 0`, `this_system::get_resources()[0]` be the `execution_resource` use by `std::thread`. The value returned by `this_system::get_resources()` be the same at any point after the invocation of `main`. +The free function `this_thread::get_execution_resource` is provided for retrieving the `thread_execution_resource` underlying the current thread of execution. -> [*Note:* Returning a `std::vector` allows users to potentially manipulate the container of `execution_resource`s after it is returned. We may want to replace this at a later date with an alternative type which is more restrictive, such as a range or span. *--end note*] + std::experimental::execution::thread_execution_resource get_thread_resource() noexcept; -### `this_thread::get_resource` - -The free function `this_thread::get_resource` is provided for retrieving the `execution_resource` underlying the current thread of execution. - - std::experimental::execution::execution_resource get_resource() noexcept; - -*Returns:* The `execution_resource` underlying the current thread of execution. +*Returns:* The `thread_execution_resource` underlying the current thread of execution. # Future Work ## Migrating data from memory allocated in one partition to another -In some cases, it is better for good performance to bind a memory allocation to a memory region for the duration of a task's execution. However, in other cases, it is better to be able to migrate data from one memory region to another. This is outside the scope of this paper, though we would like to investigate this in a future paper. +In some cases for performance it is important to *bind* or *rebind* +allocated memory to a narrower scope than it was allocated. +For example, memory allocated at the system level is bound to NUMA region 0 +and subsequently rebound to NUMA region 1. | Straw Poll | |------------| -| Should the interface provide a way of migrating data between partitions? | +| Should the interface provide a way of binding or rebinding data to partitions? | ## Defining memory placement algorithms or policies @@ -570,11 +491,11 @@ We may wish to mirror the design of the executors proposal and have a generic qu ## Dynamic topology discovery -The current proposal requires that all `execution_resource`s are initialized before `main` is called. This therefore does not permit an `execution_resource` to become available or go off-line at run time. We may wish to support this in the future, however this is outside of the scope of this paper at the moment. +The current proposal requires that all `thread_execution_resource`s are defined before `main` is called. This therefore does not permit a `thread_execution_resource` to become available or go off-line at run time. We may wish to support this in the future, however this is outside of the scope of this paper at the moment. | Straw Poll | |------------| -| Should we support dynamically adding and removing `execution_resource`s at run time? | +| Should we support dynamically adding and removing `thread_execution_resource`s at run time? | # Acknowledgements @@ -676,3 +597,7 @@ Thanks to Christopher Di Bella, Toomas Remmelg and Morris Hafner for their revie [madness-journal]: http://dx.doi.org/10.1137/15M1026171 [[32]][madness-journal] MADNESS: A Multiresolution, Adaptive Numerical Environment for Scientific Simulation + +[networking-ts]: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/n4734.pdf +[[33]][networking-ts] N4734 : Working Draft, C++ Extensions for Networking +