-
Notifications
You must be signed in to change notification settings - Fork 104
/
Copy pathindex.html
530 lines (502 loc) · 17.1 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<link rel="stylesheet" href="../common-revealjs/css/reveal.css">
<link rel="stylesheet" href="../common-revealjs/css/theme/white.css">
<link rel="stylesheet" href="../common-revealjs/css/custom.css">
<script>
// This is needed when printing the slides to pdf
var link = document.createElement( 'link' );
link.rel = 'stylesheet';
link.type = 'text/css';
link.href = window.location.search.match( /print-pdf/gi ) ? '../common-revealjs/css/print/pdf.css' : '../common-revealjs/css/print/paper.css';
document.getElementsByTagName( 'head' )[0].appendChild( link );
</script>
<script>
// This is used to display the static images on each slide,
// See global-images in this html file and custom.css
(function() {
if(window.addEventListener) {
window.addEventListener('load', () => {
let slides = document.getElementsByClassName("slide-background");
if (slides.length === 0) {
slides = document.getElementsByClassName("pdf-page")
}
// Insert global images on each slide
for(let i = 0, max = slides.length; i < max; i++) {
let cln = document.getElementById("global-images").cloneNode(true);
cln.removeAttribute("id");
slides[i].appendChild(cln);
}
// Remove top level global images
let elem = document.getElementById("global-images");
elem.parentElement.removeChild(elem);
}, false);
}
})();
</script>
</head>
<body>
<div class="reveal">
<div class="slides">
<div id="global-images" class="global-images">
<img src="../common-revealjs/images/sycl_academy.png" />
<img src="../common-revealjs/images/sycl_logo.png" />
<img src="../common-revealjs/images/trademarks.png" />
</div>
<!--Slide 1-->
<section>
<div class="hbox" data-markdown>
## Enqueueing a Kernel
</div>
</section>
<!--Slide 2-->
<section class="hbox" data-markdown>
## Learning Objectives
* Learn about queues and how to submit work to them
* Learn how to compose command groups
* Learn how to define kernel functions
* Learn about the rules and restrictions on kernel functions
* Learn how to stream text from a kernel function to the console.
</section>
<!--Slide 3-->
<section>
<div class="hbox" data-markdown>
#### The queue
</div>
<div class="container" data-markdown>
* In SYCL all work is submitted via commands to a `queue`.
* The `queue` has an associated device that any commands enqueued to it will target.
* There are several different ways to construct a `queue`.
* The most straight forward is to default construct one.
* This will have the SYCL runtime choose a device for you.
</div>
</section>
<!--Slide 4-->
<section>
<div class="hbox" data-markdown>
#### Precursor
</div>
<div class="container" data-markdown>
* In SYCL there are two models for managing data:
* The buffer/accessor model.
* The USM (unified shared memory) model.
* Which model you choose can have an effect on how you enqueue kernel functions.
* For now we are going to focus on the buffer/accessor model.
</div>
</section>
<!--Slide 5-->
<section>
<div class="hbox" data-markdown>
#### Command groups
</div>
<div class="container">
<div class="col">
<div class="col-left-1" data-markdown>

</div>
</div>
<div class="col" data-markdown>
* In the buffer/accessor model commands must be enqueued via command groups.
* A command group represents a series of commands to be executed by a device.
* These commands include:
* Invoking kernel functions on a device.
* Copying data to and from a device.
* Waiting on other commands to complete.
</div>
</div>
</section>
<!--Slide 6-->
<section>
<div class="hbox" data-markdown>
#### Composing command groups
</div>
<div class="container">
<div class="col" data-markdown>

</div>
<div class="col" data-markdown>
* Command groups are composed by calling the `submit` member function on a `queue`.
* The `submit` function takes a command group function which acts as a factory for composing the command group.
* The `submit` function creates a `handler` and passes it into the command group function.
* The `handler` then composes the command group.
</div>
</div>
</section>
<!--Slide 7-->
<section>
<div class="hbox" data-markdown>
#### Composing command groups
</div>
<div class="container">
<div class="col">
<code><pre>
<mark>gpuQueue.submit([&](handler &cgh){</mark>
/* Command group function */
<mark>});</mark>
</code></pre>
</div>
<div class="col-right-1" data-markdown>
* The `submit` member function takes a C++ function object, which takes a reference to a `handler`.
* The function object can be a lambda expression or a class with a function call operator.
* The body of the function object represents the command group function.
</div>
</div>
</section>
<!--Slide 8-->
<section>
<div class="hbox" data-markdown>
#### Composing command groups
</div>
<div class="container">
<div class="col">
<code><pre>
<mark>gpuQueue.submit([&](handler &cgh){</mark>
/* Command group function */
<mark>});</mark>
</code></pre>
</div>
<div class="col" data-markdown>
* The command group function is processed exactly once when `submit` is called.
* At this point all the commands and requirements declared inside the command group function are processed to produce a command group.
* The command group is then submitted asynchronously to the scheduler.
</div>
</div>
</section>
<!--Slide 9-->
<section>
<div class="hbox" data-markdown>
#### Composing command groups
</div>
<div class="container">
<div class="col">
<code><pre>
gpuQueue.submit([&](handler &cgh){
/* Command group function */
})<mark>.wait();</mark>
</code></pre>
</div>
<div class="col" data-markdown>
* The `queue` will not wait for commands to complete on destruction.
* However `submit` returns an `event` to allow you to synchronize with the completion of the commands.
* Here we call `wait` on the `event` to immediately wait for it to complete.
* There are other ways to do this, that will be covered in later lectures.
</div>
</div>
</section>
<!--Slide 10-->
<section>
<div class="hbox" data-markdown>
#### Scheduling
</div>
<div class="container" data-markdown>

</div>
<div class="container" data-markdown>
* Once `submit` has created a command group it will submit it to the scheduler.
* The scheduler will then execute the commands on the target device
once all dependencies and requirements are satisfied.
</div>
</section>
<!--Slide 11-->
<section>
<div class="hbox" data-markdown>
#### Scheduling
</div>
<div class="container">
<div class="col" data-markdown>

</div>
</div>
<div class="container" data-markdown>
* The same scheduler is used for all queues.
* This allows sharing dependency information.
</div>
</section>
<!--Slide 12-->
<section>
<div class="hbox" data-markdown>
#### Enqueueing SYCL Kernel Functions
</div>
<div class="container">
<div class="col">
<code><pre>
class my_kernel;
gpuQueue.submit([&](handler &cgh){
<mark>cgh.single_task<my_kernel>([=]() {</mark>
/* kernel code */
<mark>});</mark>
}).wait();
</code></pre>
</div>
<div class="col" data-markdown>
* SYCL kernel functions are defined using one of the kernel function invoke APIs provided by the `handler`.
* These add a SYCL kernel function command to the command group.
* There can only be one SYCL kernel function command in a command group.
* Here we use `single_task`.
</div>
</div>
</section>
<!--Slide 13-->
<section>
<div class="container">
<div class="col">
<code><pre>
class my_kernel;
gpuQueue.submit([&](handler &cgh){
cgh.single_task<my_kernel>(<mark>[=]() {</mark>
<mark>/* kernel code */</mark>
<mark>}</mark>);
}).wait();
</code></pre>
</div>
<div class="col" data-markdown>
* The kernel function invoke APIs take a function object representing the kernel function.
* This can be a lambda expression or a class with a function call operator.
* This is the entry point to the code that is compiled to execute on the device.
</div>
</div>
</section>
<!--Slide 14-->
<section>
<div class="container">
<div class="col">
<code><pre>
class my_kernel;
gpuQueue.submit([&](handler &cgh){
cgh.single_task<my_kernel>(<mark>[=]() {</mark>
<mark>/* kernel code */</mark>
<mark>}</mark>);
}).wait();
</code></pre>
</div>
<div class="col" data-markdown>
* Different kernel invoke APIs take different parameters describing the iteration space to be invoked in.
* Different kernel invoke APIs can also expect different arguments to be passed to the function object.
* The `single_task` function describes a kernel function that is invoked exactly once, so there are no additional parameters or arguments.
</div>
</div>
</section>
<!--Slide 15-->
<section>
<div class="container">
<div class="col">
<code><pre>
<mark>class my_kernel;</mark>
gpuQueue.submit([&](handler &cgh){
cgh.single_task<<mark>my_kernel</mark>>([=]() {
/* kernel code */
});
}).wait();
</code></pre>
</div>
<div class="col" data-markdown>
* The template parameter passed to `single_task` is used to name the kernel function.
* This is necessary when defining kernel functions with lambdas to allow the host and device compilers to communicate.
* SYCL 2020 allows kernel lambdas to be unnamed, but not all implementations support that yet.
</div>
</div>
</section>
<!--Slide 16-->
<section>
<div class="hbox" data-markdown>
#### SYCL kernel function rules
</div>
<div class=container data-markdown>
* Must be defined using a C++ lambda or function object, they cannot be a function pointer or std::function.
* Must always capture or store members by-value.
* SYCL kernel functions declared with a lambda ~~must be named using a forward declarable C++ type, declared in global scope~~ can be anonymous since SYCL 2020!
* SYCL kernel function names follow C++ ODR rules, which means you cannot have two kernels with the same name.
</div>
</section>
<!--Slide 17-->
<section>
<div class="hbox" data-markdown>
#### SYCL kernel function restrictions
</div>
<div class=container data-markdown>
* No dynamic allocation
* No dynamic polymorphism
* No function pointers
* No recursion
</div>
</section>
<!--Slide 18-->
<section>
<div class="hbox" data-markdown>
#### Kernels as function objects
</div>
<div class="container">
<div class="col">
<code><pre>
class my_kernel;
queue gpuQueue;
gpuQueue.submit([&](handler &cgh){
cgh.single_task<my_kernel>([=]() {
/* kernel code */
});
}).wait();
</code></pre>
</div>
<div class="col" data-markdown>
* All the examples of SYCL kernel functions up until now have been defined using lambda expressions.
</div>
</div>
</section>
<!--Slide 19-->
<section>
<div class="hbox" data-markdown>
#### Kernels as function objects
</div>
<div class="container">
<div class="col">
<code><pre>
struct my_kernel {
void operator()() const {
/* kernel function */
}
};
</code></pre>
</div>
<div class="col" data-markdown>
* As well as defining SYCL kernels using lambda expressions,
You can also define a SYCL kernel using a regular C++ function object.
* Define a type with a public const-qualified `operator()` member function.
</div>
</div>
</section>
<!--Slide 20-->
<section>
<div class="hbox" data-markdown>
#### Kernels as function objects
</div>
<div class="container">
<div class="col">
<code><pre>
struct my_kernel {
void operator()() const {
/* kernel function */
}
};
</code></pre>
<code><pre>
queue gpuQueue;
gpuQueue.submit([&](handler &cgh){
cgh.single_task(<mark>my_kernel{}</mark>);
}).wait();
</code></pre>
</div>
<div class="col" data-markdown>
* To use a C++ function object you simply construct an instance of the type and pass it to `single_task`.
* Notice you no longer need to name the SYCL kernel.
</div>
</div>
</section>
<!--Slide 21-->
<section>
<div class="hbox" data-markdown>
#### Streams
</div>
<div class=container data-markdown>
* A `stream` can be used in a kernel function to print text to the console from the device, similarly to how you would with `std::cout`.
* The `stream` is a buffered output stream so the output may not appear until the kernel function is complete.
* The `stream` is useful for debugging, but should not be relied on in performance critical code.
</div>
</section>
<!--Slide 22-->
<section>
<div class="hbox" data-markdown>
#### Streams
</div>
<div class="container">
<code class="code-100pc"><pre>
sycl::stream(size_t bufferSize, size_t workItemBufferSize, handler &cgh);
</code></pre>
</div>
<div class="container" data-markdown>
* A `stream` must be constructed in the command group function, as a `handler` is required.
* The constructor also takes a `size_t` parameter specifying the total size of the buffer that will store the text.
* It also takes a second `size_t` parameter specifying the work-item buffer size.
* The work-item buffer size represents the cache that each invocation of the kernel function (in the case of `single_task` 1) has for composing a stream of text.
</div>
</section>
<!--Slide 23-->
<section>
<div class="hbox" data-markdown>
#### Streams
</div>
<div class="container">
<div class="col">
<code><pre>
class my_kernel;
queue gpuQueue;
gpuQueue.submit([&](handler &cgh){
<mark>auto os = sycl::stream(1024, 1024, cgh);</mark>
cgh.single_task<my_kernel>([=]() {
/* kernel code */
});
}).wait();
</code></pre>
</div>
<div class="col" data-markdown>
* Here we construct a `stream` in our command group function with a buffer size of `1024` and a work-item size of also `1024`.
* This means that the total text that the stream can receive is 1024 bytes.
</div>
</div>
</section>
<!--Slide 24-->
<section>
<div class="hbox" data-markdown>
#### Streams
</div>
<div class="container">
<div class="col">
<code><pre>
class my_kernel;
queue gpuQueue;
gpuQueue.submit([&](handler &cgh){
auto os = sycl::stream(1024, 1024, cgh);
cgh.single_task<my_kernel>([=]() {
<mark>os << "Hello world!\n";</mark>
});
}).wait();
</code></pre>
</div>
<div class="col" data-markdown>
* Next we capture the `stream` in the kernel function's lambda expression.
* Then we can print `"Hello World!"` to the console using the `<<` operator.
* This is where the work-item size comes in, this is the cache available to store text on the right-hand-size of the `<<` operator.
</div>
</div>
</section>
<!--Slide 22-->
<section>
<div class="hbox" data-markdown>
## Questions
</div>
</section>
<!--Slide 23-->
<section>
<div class="hbox" data-markdown>
#### Exercise
</div>
<div class="container" data-markdown>
Code_Exercises/Enqueueing_a_Kernel/source
</div>
<div class="container" data-markdown>
Implement a SYCL application which enqueues a kernel function to a device and streams "Hello world!" to the console.
</div>
</section>
</div>
</div>
<script src="../common-revealjs/js/reveal.js"></script>
<script src="../common-revealjs/plugin/markdown/marked.js"></script>
<script src="../common-revealjs/plugin/markdown/markdown.js"></script>
<script src="../common-revealjs/plugin/notes/notes.js"></script>
<script>
Reveal.initialize({mouseWheel: true, defaultNotes: true});
Reveal.configure({ slideNumber: true });
</script>
</body>
</html>