Skip to content

Commit 2289266

Browse files
committed
add a parallel manifesto
1 parent a8c2eb1 commit 2289266

File tree

2 files changed

+99
-0
lines changed

2 files changed

+99
-0
lines changed

docs/src/docs/internals/parallel.md

+62
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
# Parallel Processing in Finch
2+
3+
## Modelling the Architecture
4+
5+
Finch uses a simple, hierarchical representation of devices and tasks to model
6+
different kind of parallel processing. An [`AbstractDevice`](@ref) is a physical or
7+
virtual device on which we can execute tasks, which may each be represented by
8+
an [`AbstractTask`](@ref).
9+
10+
```@docs
11+
AbstractTask
12+
AbstractDevice
13+
```
14+
15+
The current task in a compilation context can be queried with
16+
[`get_task`](@ref). Each device has a set of numbered child
17+
tasks, and each task has a parent task.
18+
19+
```@docs
20+
get_num_tasks
21+
get_task_num
22+
get_device
23+
get_parent_task
24+
```
25+
26+
## Data Movement
27+
28+
Before entering a parallel loop, a tensor may reside on a single task, or
29+
represent a single view of data distributed across multiple tasks, or represent
30+
multiple separate tensors local to multiple tasks. A tensor's data must be
31+
resident in the current task to process operations on that tensor, such as loops
32+
over the indices, accesses to the tensor, or `declare`, `freeze`, or `thaw`.
33+
Upon entering a parallel loop, we must transfer the tensor to the tasks
34+
where it is needed. Upon exiting the parallel loop, we may need to combine
35+
the data from multiple tasks into a single tensor.
36+
37+
There are two cases, depending on whether the tensor is declared outside the
38+
parallel loop or is a temporary tensor declared within the parallel loop.
39+
40+
If the tensor is a temporary tensor declared within the parallel loop, we call
41+
`bcast` to broadcast the tensor to all tasks.
42+
43+
If the tensor is declared outside the parallel loop, we call `scatter` to
44+
send it to the tasks where it is needed. Note that if the tensor is in `read` mode,
45+
`scatter` may simply `bcast` the entire tensor to all tasks. If the device has global
46+
memory, `scatter` may also be a no-op. When the parallel loop is exited, we call
47+
`gather` to reconcile the data from multiple tasks back into a single tensor.
48+
49+
Each of these operations begins with a `_send` variant on one task, and
50+
finishes with a `_recv` variant on the recieving task.
51+
52+
```@docs
53+
bcast
54+
bcast_send
55+
bcast_recv
56+
scatter
57+
scatter_send
58+
scatter_recv
59+
gather
60+
gather_send
61+
gather_recv
62+
```

src/architecture.jl

+37
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,45 @@
1+
"""
2+
AbstractDevice
3+
4+
A datatype representing a device on which tasks can be executed.
5+
"""
16
abstract type AbstractDevice end
27
abstract type AbstractVirtualDevice end
8+
9+
"""
10+
AbstractTask
11+
12+
An individual processing unit on a device, responsible for running code.
13+
"""
314
abstract type AbstractTask end
415
abstract type AbstractVirtualTask end
516

17+
"""
18+
get_num_tasks(dev::AbstractDevice)
19+
20+
Return the number of tasks on the device dev.
21+
"""
22+
function get_num_tasks end
23+
"""
24+
get_task_num(task::AbstractTask)
25+
26+
Return the task number of `task`.
27+
"""
28+
function get_task_num end
29+
"""
30+
get_device(task::AbstractTask)
31+
32+
Return the device that `task` is running on.
33+
"""
34+
function get_device end
35+
36+
"""
37+
get_parent_task(task::AbstractTask)
38+
39+
Return the task which spawned `task`.
40+
"""
41+
function get_parent_task end
42+
643
"""
744
aquire_lock!(dev::AbstractDevice, val)
845

0 commit comments

Comments
 (0)