-
Notifications
You must be signed in to change notification settings - Fork 10
/
dnorm.Rdb
826 lines (751 loc) · 27.7 KB
/
dnorm.Rdb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
<?xml version="1.0"?>
<article xmlns:r="http://www.r-project.org"
xmlns:xi="http://www.w3.org/2003/XInclude"
xmlns:omg="http://www.omegahat.org"
xmlns:c="http://www.C.org">
<articleinfo>
<title>Compiling <r:func>dnorm</r:func> manually</title>
<author><firstname>Duncan</firstname><surname>Temple Lang</surname>
<affiliation><orgname>University of California at Davis</orgname>
<orgdiv>Department of Statistics</orgdiv>
</affiliation>
</author>
</articleinfo>
<invisible>
<r:code>
library(Rllvm)
</r:code>
</invisible>
<section>
<title>Generating code to compute the normal density</title>
<para>
This walks the reader through the steps to compile a scalar
version of the <r:func>dnorm</r:func> function to compute
the normal density at a value.
Later we will explore how to create a vectorized version of this.
</para>
<para>
We should note that, of course, the <r/> version of this
function is already compiled and vectorized.
Also, we can use <omg:pkg>RLLVMCompile</omg:pkg> to directly generate
the LLVM version of the code for us. This is just an example
of how we can generate byte-code programmatically, either manually or
with a compiler.
</para>
<para>
It may be be helpful to see the <c/> code we will emulate
when defining our <llvm/> implementation.
This is
<c:code>
<xi:include href="dnorm.c" parse="text"/>
</c:code>
We declare the parameters and the return type of the function. The
body computes the standardized value of the input and assigns it to a
temporary variable. Next, we square this value for use in the call to
the <c:func>exp</c:func> routine and again assign it to local
temporary variable. Finally, we compute the two terms in the density
function (pdf) and return the result.
</para>
<para>
We want a routine that takes a
scalar numeric value <r:var>x</r:var> giving the value at which to
evaluate the normal density, and two scalar numeric
parameters defining the normal density, i.e. the mean and standard deviation.
The function should return a scalar numeric value, i.e. a <r:var>DoubleType</r:var>.
We define this function in <llvm/> with
<r:code eval="false">
fun = Function("dnorm", DoubleType,
list(x = DoubleType, mean = DoubleType, sd = DoubleType))
</r:code>
This creates the <r:class>Function</r:class> object.
The first argument is the name of the new routine.
The second argument is the return type. The <r:func>list</r:func>
argument describes the three parameters for the routine.
We have given the names, although this is not necessary.
The <r:pkg>Rllvm</r:pkg> package allows us to access the
parameters more easily if we give them names.
</para>
<para>
In addition to creating the <r:class>Function</r:class>, our call to <r:func>Function</r:func> also
created a new <r:class>Module</r:class> to house it.
We could have created the <r:class>Module</r:class> separately
and passed it as an argument in the call to <r:func>Function</r:func>.
We might want to do this if we want to control how the <r:class>Module</r:class>
is created or if we want to combine multiple <r:class>Function</r:class>
objects in the same <r:class>Module</r:class>.
For the latter, we can access the <r:class>Module</r:class> from the new <r:class>Function</r:class> object, e.g.,
<r:code eval="false">
mod = as(fun, "Module")
</r:code>
So we can then add new <r:class>Function</r:class> objects using that object.
</para>
<para>
We now have the skeleton of our routine.
We can now add instructions to provide the body.
To do this, we create an <r:class>IRBuilder</r:class> to facilitate
creating the instructions.
Before this, we create the first block for the initial instructions
and pass that to the <r:class>IRBuilder</r:class>. We do this with
<r:code eval="false">
ir = IRBuilder(Block(fun, "entry"))
</r:code>
Note that we have given a name for the <r:class>Block</r:class>
which will appear when we display the <llvm/> IR code and help
us to understand that code.
</para>
<para>
We can now starting add our instructions.
One of the first steps in most routines is
to create local variables that contain the values of the parameters.
<q>Why?</q>
We can do this for the parameter named <r:arg>x</r:arg>
with the following code:
<r:code eval="false">
params = getParameters(fun)
var = ir$createLocalVariable(getType(params$x), "x")
ir$createStore(params$x, var)
</r:code>
We access the list of parameter objects with <r:func>getParameters</r:func>
from the <r:class>Function</r:class>.
Each of these is a <r:class>Value</r:class> object from which we can query the
type and possibly the name.
We extract the first parameter, by name or position.
</para>
<para>
The next two expressions in our code
illustrates how we create instructions.
We are using the <r:class>IRBuilder</r:class> to create
a local variable and an assignment/store instructions.
The instructions are added to the currently active block in the
<r:class>IRBuilder</r:class> object.
The form <r:expr eval="false">ir$createLocalVariable(...)</r:expr>
is equivalent to <r:expr eval="false">createLocalVariable(ir, ...)</r:expr>
</para>
<para>
We are now ready to add the actual computations to the body of the routine.
However, it seems like we have already dealt with many details, and only
mapping one of the parameters to a local variable.
The <omg:pkg>Rllvm</omg:pkg> package provides a convenience function to do this for us.
<r:func>simpleFunction</r:func> creates the <r:class>Function</r:class>,
<r:class>BasicBlock</r:class> and <r:class>IRBuilder</r:class>
and creates the local variables for us.
in a single call.
We specify the same collection of inputs as for <r:func>Function</r:func>,
but get the additional steps done for us.
So we can do all of the above with
<r:code>
sfun = simpleFunction("dnorm", DoubleType,
.types = list(x = DoubleType, mean = DoubleType, sd = DoubleType))
</r:code>
We can specify the parameters individually (via the \<dots/>) or
as a single <r:func>list</r:func> via the <r:arg>.types</r:arg> parameter.
</para>
<para>
The return value of <r:func>simpleFunction</r:func> is not
a <r:class>Function</r:class>, but a list containing the different
elements created in the call.
These include the <r:class>Function</r:class> object, the <r:class>IRBuilder</r:class>,
the initial <r:class>BasicBlock</r:class> and also the list of local variables
corresponding to the parameters.
We'll assign the <r:class>IRBuilder</r:class> object to <r:var>ir</r:var>, for ease of use.
</para>
<invisible>
<r:code>
ir = sfun$ir
</r:code>
</invisible>
<para>
We now focus on performing the computations.
We'll first compute <r:expr eval="false">(x - mean)/sd</r:expr>.
Then we can multiply this by itself to get the square of the value.
We first subtract <r:var>mean</r:var> from <r:var>x</r:var>.
To do this, we have to load the values from each variable.
We do this with
<r:code>
#x = ir$createLoad(sfun$params$x)
#mean = ir$createLoad(sfun$params$mean)
x = sfun$params$x
mean = sfun$params$mean
</r:code>
Here we are using <r/> variables to temporarily store the instructions to load the <llvm/>
values.
We can now subtract these.
We create a binary operation instruction.
The <r:func>binOp</r:func> function does this and allows us to specify
the two operands and also the particular binary operation.
Since we are working with two real-valued numbers, we use
the <r:var>FSub</r:var> operation. This is an integer value in
<r/> that corresponds to an enumerated constant in the <llvm/> <cpp/> API.
There are others such as <r:var>FAdd</r:var> and <r:func>FMul</r:func>
and also values for integer operations (e.g. <r:var>Add</r:var>, <r:var>Sub</r:var>).
So we create the subtraction instruction with
<r:code>
delta = ir$binOp(BinaryOps["FSub"], x, mean)
</r:code>
Again, we assign this to an <r/> variable so we can use it when creating another instruction.
</para>
<para>
Next we divide the centered value by <r:var>sd</r:var>.
Again, we have to load the value of this local variable with
<r:code>
#sd = ir$createLoad(sfun$params$sd)
sd = sfun$params$sd
</r:code>
Given this, we can use <r:func>binOp</r:func> to perform the division
with
<r:code>
scaled = ir$binOp(BinaryOps["FDiv"], delta, sd)
</r:code>
</para>
<para>
We want to store the standardized value in a local variable, say <r:var>tmp</r:var>.
We first have to define this variable and with
<r:code>
tmp = ir$createLocalVariable(DoubleType, "tmp")
</r:code>
Here we know the type of the variable. We could also have calculated it from
the <r:class>Value</r:class> object in <r:var>scaled</r:var> via <r:expr>getType(scaled)</r:expr>.
To store the value in this variable, we create the instruction
<r:code>
ir$createStore(scaled, tmp)
</r:code>
</para>
<para>
Our next step is to square the value of <r:var>tmp</r:var>.
We could call the <c:func>pow</c:func> routine, however, it is easier and more
efficient to simply multiply the value by itself.
We load it twice and call <r:func>binOp</r:func> again, e.g.
<r:code>
tmpv = ir$createLoad(tmp)
ir$createStore(ir$binOp(BinaryOps["FMul"], tmpv, tmpv), tmp)
</r:code>
Here, we also stored the result back into the <r:var>tmp</r:var> variable
in the <llvm/> code.
(The <r/> variables used to store the instructions and
the <llvm/> variables can be confusing.)
</para>
<para>
We can now compute the actual result as a
<r:expr eval="false">1/(sd * sqrt(2 * pi)) * exp( - .5 * tmp)</r:expr>
This is a binary operation (multiplication)
in which each operand is also a complex expression.
</para>
<para>
Let's generate the code to compute <r:expr eval="false">1/(sd * sqrt(2*pi))</r:expr>. This is a binary
operation (/) with a constant (1) and another expression. We can
compute the denominator first since the numerator is a simple constant
value. We can replace <r:expr eval="false">sqrt(2*pi)</r:expr> with 2.506628275
since this is a constant/invariant, i.e. does not depend on the inputs.
So we compute the numerator with
<r:code>
numerator = ir$binOp(BinaryOps["FMul"],
sd,
ir$createConstant(sqrt(2*pi)))
</r:code>
We can now compute the constant in the density function
with
<r:code>
term1 = ir$binOp(BinaryOps["FDiv"], ir$createConstant(1), numerator)
</r:code>
Note that here we are exploiting the fact
that the expression <r:expr>1</r:expr> in <r/> yields
a <r:class>numeric</r:class> or real-value in <r/>
and not an integer value. <r:func>createConstant</r:func>
uses the type of the <r/> value to create
the constant with type <r:var>DoubleType</r:var>.
We can use the <r:arg>type</r:arg> parameter to specify the
desired type.
</para>
<para>
We now focus on the second term in our result
<r:expr eval="false">exp(-.5 * tmp)</r:expr>
This is a call to the routine <c:func>exp</c:func>
and takes a single argument.
We'll compute that argument first in order to have it available
to make the call.
At this point, we can readily see that this is a binary operation
and we create it with
<r:code>
exponent = ir$binOp(BinaryOps["FMul"], ir$createConstant(-.5), ir$createLoad(tmp))
</r:code>
Here we assign the instruction to the <r/> variable <r:var>exponent</r:var>,
but not to any <llvm/> variable.
</para>
<para>
We can invoke the <c:func>exp</c:func> routine by creating
a "call" instruction with <r:func>createCall</r:func>.
Before we do this, we need to have a reference to the routine we are
invoking.
To do this, we have declare the routine and its signature in our
<r:class>Module</r:class>.
This allows <llvm/> to understand the routine and verify how we use it
when it compiles the code.
We declare the function using the <r:func>Function</r:func> we saw
earlier:
<r:code>
llvm.exp = Function("exp", DoubleType, DoubleType, module = sfun$mod)
</r:code>
This is just a declaration. We won't define the body of this routine.
It is a built-in routine and <llvm/> will be able to provide the implementation.
We can now create the call to <c:func>exp</c:func> with
<r:code>
term2 = ir$createCall(llvm.exp, exponent)
</r:code>
</para>
<para>
We now have the two terms in our result. All that remains is to
multiply the two operands stored in the <r/> variables
<r:var>term1</r:var> and <r:var>term2</r:var>.
Again, this is simple:
<r:code>
ans = ir$binOp(FMul, term1, term2)
</r:code>
To return this value, we use <r:func>createReturn</r:func>.
This takes the value we just computed
<r:code>
ir$createReturn(ans)
</r:code>
</para>
<para>
We can examine the IR code:
<r:code>
print(showModule(sfun$module))
<r:output><![CDATA[
; ModuleID = '2017-03-06 14:57:04'
source_filename = "2017-03-06 14:57:04"
define double @dnorm(double %x, double %mean, double %sd) {
%1 = fsub double %x, %mean
%2 = fdiv double %1, %sd
%tmp = alloca double
store double %2, double* %tmp
%3 = load double, double* %tmp
%4 = fmul double %3, %3
store double %4, double* %tmp
%5 = fmul double %sd, 0x40040D931FF62705
%6 = fdiv double 1.000000e+00, %5
%7 = load double, double* %tmp
%8 = fmul double -5.000000e-01, %7
%9 = call double @exp(double %8)
%10 = fmul double %6, %9
ret double %10
}
declare double @exp(double)
]]></r:output>
</r:code>
</para>
<para>
We can then optimize the function. This will remove the unnecessary variable tmp.
<r:code>
Optimize(sfun$fun)
<r:output><![CDATA[
; ModuleID = '2017-03-06 14:57:04'
source_filename = "2017-03-06 14:57:04"
target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
define double @dnorm(double %x, double %mean, double %sd) {
%1 = fsub double %x, %mean
%2 = fdiv double %1, %sd
%3 = fmul double %2, %2
%4 = fmul double %sd, 0x40040D931FF62705
%5 = fdiv double 1.000000e+00, %4
%6 = fmul double %3, -5.000000e-01
%7 = call double @exp(double %6)
%8 = fmul double %5, %7
ret double %8
}
declare double @exp(double)
]]></r:output>
</r:code>
</para>
<section>
<title>Calling the routine</title>
<para>
We can test this routine works as we intend.
<r:test>
.llvm(sfun$fun, 1, 0, 1) - dnorm(1, 0, 1)
.llvm(sfun$fun, 0, 0, 1) - dnorm(0, 0, 1)
.llvm(sfun$fun, -1, 2, 1) - dnorm(-1, 2, 1)
</r:test>
</para>
</section>
<section r:eval="true">
<title>Vectorizing the <c:func>dnorm</c:func> routine</title>
<altTitle>Vectorizing the dnorm() routine</altTitle>
<para>
We now turn our attention to creating a vectorized version of the
<c:func>dnorm</c:func> routine. We have two choices as to how to do
this. We can assume the existence of our scalar version that we
created above and write another routine that loops over the elements
and calls that scalar routine. Alternatively, we could write a
routine that loops over the input vector and then performs the
computations inline within the body of the loop/iteration. Since we
have already defined the routine above, we'll start with the former
approach and focus on creating the loop using <llvm/>. We may also be
able to "encourage" <llvm/> to inline to <c:func>dnorm</c:func>
calculations when it optimizes the code.
</para>
<note><para>
Note that neither approach aims to emulate
the capability of <r/>'s <r:func>dnorm</r:func> function
to allow vectors of values for the <r:arg>mean</r:arg> and <r:arg>sd</r:arg>
and parameters. We could do this, but it would be distracting here
as it involves other details.
Similarly, we don't support computing the log of the density.
</para></note>
<para>
The code we want to generate mimics the <c/> code
<c:code><![CDATA[
void
v_dnorm(double *x, int n, double mean, double sd)
{
int i;
for(i = 0; i < n; i++) {
x[i] = dnorm([i], mean, sd);
}
}
]]></c:code>
We pass a pointer to a collection of <c:arg>n</c:arg>
<c:type>double</c:type> elements. We also have to provide
the number of elements as the <c/> code cannot determine
this. We also pass the <c:arg>mean</c:arg> and <c:arg>sd</c:arg>
values. This implementation puts the results back into the
input array/vector. This keeps things simple and is convenient when we
call this from <r/> with an <r/> vector.
However, it is trivial to add another parameter to collect
the outputs.
Since we return the results in the input array, we have defined
the function to return no explicit value, i.e. <c:type>void</c:type>.
</para>
<para>
We could pass a <c:type>SEXP</c:type>, the <c/>-level type
for all <r/> objects. If we did this, we could determine
the number of elements using the <c:func>Rf_length</c:func>
routine <r/> provides. However, for now, we will keep this as simple as possible
and deal with basic <c/> types.
</para>
<para>
To define this routine, we start by calling <r:func>simpleFunction</r:func>
again. This time, we already have our module so we pass this to <r:func>simpleFunction</r:func>
so that we define our routine in the same module as <c:func>dnorm</c:func>:
<r:code>
vfun = simpleFunction("v_dnorm", VoidType,
x = DoublePtrType, n = Int32Type,
mean = DoubleType, sd = DoubleType,
module = sfun$module)
ir = vfun$ir
</r:code>
</para>
<para>
Before we start to generate instructions, let us determine
our strategy and the basic structure of the <llvm/> code.
We need a local variable for <c:var>i</c:var> and to initialize
it to 0.
We need to test the condition of the loop to see
if we should evaluate the body of the loop for this value of
<c:var>i</c:var>, or if we should terminate the loop.
When we evaluate the body of the loop, we need to return to test
the condition of the loop. This is the iteration.
So we need several blocks:
<ol>
<li> to create and initialize <c:var>i</c:var>,</li>
<li> to test the condition of the loop,</li>
<li> evaluate the body of the loop, and increment <c:var>i</c:var> at the end</li>
</ol>
We could create these blocks in <r/> now and use them later. However,
we'll create them as we need them.
</para>
<para>
<r:func>simpleFunction</r:func> created our initial block with local variables
referencing the parameters.
We can create our <c:var>i</c:var> in this block and assign it the value 0 with
<r:code>
i = ir$createLocalVariable(Int32Type, "i")
ir$createStore(ir$createConstant(0L), i)
</r:code>
</para>
<para>
The next step is to branch to the test of the loop condition.
To do this, we need to create a new block and add an instruction to
branch to it.
<r:code>
conditionBlock = Block(vfun$fun, "loop.condition")
ir$createBranch(conditionBlock)
</r:code>
Finally, we need to
make this new block the active one for the <r:class>IRBuilder</r:class>
so that new instructions will be added it to it:
<r:code>
ir$setInsertBlock(conditionBlock)
</r:code>
We can now build the computations to test the loop condition
and jump to the body or simply return from the <c:func>v_dnorm</c:func>
routine.
</para>
<para>
Testing the loop condition is quite simple.
We want to evaluate <r:expr eval="false">i < n</r:expr>.
This is a binary comparison of two integer values.
We use <r:func>createICmp</r:func> to perform this.
Like the <r:func>binOp</r:func> function, this allows us to
specify which comparison to make (e.g. <r:var>ICMP_EQ</r:var>, <r:var>ICMP_SLT</r:var> for equality and less than),
along with the two operands.
We create first have to load these operands. We can do this with
<r:code>
ii = ir$createLoad(i)
ok = ir$createICmp(ICMP_SLT, ii, vfun$params$n)
</r:code>
We assign this to the <r/> variable <r:var>ok</r:var> as a convenient
name. We then use this to determine to which block to branch. We
will use <r:func>createCondBr</r:func> to create a conditional branch.
This takes our condition value (<r:var>ok</r:var>) and the two blocks
to which to branch depending on that value. So before we create the
conditional branch instruction, we need to create the two new blocks.
One is the body of the loop. The other should be the final block of
the routine that returns from the routine. Alternatively, this
loop-condition block could conditionally branch to the body if the
condition is true, or else explicitly return from this routine.
We'll use the former to illustrate <r:func>createCondBr</r:func>.
So we create two new <r:class>BasicBlock</r:class> objects:
<r:code>
bodyBlock = Block(vfun$fun, "for.body")
returnBlock = Block(vfun$fun, "return")
</r:code>
We can then create the conditional branch with
<r:code>
ir$createCondBr(ok, bodyBlock, returnBlock)
</r:code>
</para>
<para>
Before we turn our attention to the body of the loop,
we can complete the <r:var>returnBlock</r:var> now.
This is a simple call to <c:keyword>return</c:keyword>
and we can add this with
<r:code>
ir$setInsertBlock(returnBlock)
ir$createReturn()
</r:code>
Note that we first made this the active block.
We add a return instruction with no value. This is because
our return does not return an explicit value.
</para>
<para>
We now focus on the body of the loop.
We make this the active block
<r:code>
ir$setInsertBlock(bodyBlock)
</r:code>
As we mentioned, we could inline the computations that
we created for our <c:func>dnorm</c:func> routine we created earlier.
However, it will be simpler to just call that routine.
To do so, we need three arguments: <r:expr eval="false">x[i]</r:expr>, <r:arg>mean</r:arg> and <r:arg>sd</r:arg>.
We can load the latter two via <r:func>createLoad</r:func> and <r:expr>vfun$vars</r:expr>.
The first argument is a little more complicated. This is the i-th element of
our array/vector. We access it via a pointer to a <c:type>double</c:type>.
To get this value, we use a <quote>Get Element Pointer</quote> (GEP) instruction.
This is apparently somewhat confusing to <llvm/> programmers. However it is well
documented and described. The basic idea is that we pass
the pointer value and the (modified) index to get access to the i-th element.
However, we have to ensure the value of i is in appropriate form by casting it to
a 64-bit index.
To do this, we create two indices.
<r:code>
idx = ir$createSExt(ir$createLoad(i), 64L)
</r:code>
Now, we can access the element
<r:code>
print(vfun$vars$x)
#xload = ir$createLoad(vfun$params$x)
xload = vfun$params$x
el = ir$createGEP(xload, idx)
</r:code>
</para>
<para>
We now have the three arguments to the <c:func>dnorm</c:func> routine.
We create the call instruction with
<r:code>
v = ir$createLoad(el)
call = ir$createCall(sfun$fun,
v,
vfun$params$mean,
vfun$params$sd)
</r:code>
</para>
<para>
Before finishing with this call, we have to assign the
result of the call back to <r:expr eval="false">x[i]</r:expr>.
This is similar to accessing the element, but here
we assign to it.
Again, we need a GEP instruction to access the element of the array
<r:code>
idx = ir$createSExt(ir$createLoad(i), 64L)
#xp = ir$createLoad(vfun$params$x)
el = ir$createGEP(vfun$params$x, idx)
</r:code>
We can now assign the result to it using a store instruction:
<r:code>
ir$createStore(call, el)
</r:code>
</para>
<para>
We now have to increment the value <c:var>i</c:var> and
then branch back to the loop-condition test block.
We can do both of these in this block or alternatively
create a new block to do this.
In this case, it seems unnecessary to create this new block.
So we increment <c:var>i</c:var> with
<r:code>
inc = ir$binOp("Add", ir$createLoad(i), ir$createConstant(1L))
ir$createStore(inc, i)
</r:code>
We branch back to the loop-condition test with
<r:code>
ir$createBranch(conditionBlock)
</r:code>
</para>
<para>
This is quite low-level and involved,
but is actually quite simple when we see the different pieces.
Again, we started by creating a local variable (<c:var>i</c:var>) and initializing it.
We created three blocks - one the simple return, another to test the condition for the
loop, and the last one to implement the body of the loop.
We saw how to add instructions to branch to a different block,
both unconditionally and conditionally.
We saw how to do logical comparisons (of integers) with <r:func>createICmp</r:func>.
The most complex aspect of this was accessing the i-th element of an array
via a pointer. This required the introduction of the GEP instruction.
</para>
<section>
<title>Testing the vectorized version</title>
<para>
Before we can test this version, we have to know how
we can pass a pointer to a <c:type>double</c:type> to an <llvm/>
routine from <r/>.
There are two ways. One is to have access to a <c/> pointer to a collection
of <c:type>double</c:type> values as an <r/> <r:class>externalptr</r:class>.
The much simpler way is to pass a <r:class>numeric</r:class> vector as an
argument and <r:func>.llvm</r:func> understands to pass the address of the
(contiguous) elements to the routine.
So we can test our routine with
<r:test>
mod = vfun$mod
x = rnorm(10)
dx = .llvm(mod$v_dnorm, x, length(x), 0, 1, .all = TRUE)
</r:test>
When we look at <r:var>dx</r:var>, we see it contains the value <r:null/>.
This is because <r:func>.llvm</r:func> returns the value returned by the routine
we called. Since <c:func>v_dnorm</c:func> returns <c:type>void</c:type>,
we get <r:null/> in <r/>.
We want the updated values in the first argument. To get these, in addition to
the value returned by the routine, we use the <r:arg>.all</r:arg> parameter
in <r:func>.llvm</r:func>. Like the <r:func>.C</r:func> interface, this
returns all the inputs to the routine (and the return value of the routine).
So
<r:test>
dx = .llvm(mod$v_dnorm, x, length(x), 0, 1, .all = TRUE)
dx[[1]]
</r:test>
gives us the results.
We can compare those with the values computed by <r/>'s <r:func>dnorm</r:func>
<r:test>
dx[[1]] == dnorm(x, 0, 1)
</r:test>
</para>
</section>
</section>
<section eval="false">
<title>Timing the different versions</title>
<para>
Some readers might be interested in comparing how fast
the <llvm/> code is compared to either
the compiled <r:func>dnorm</r:func> or an interpreted version of the function.
We first compile the code in our new module and associate it
with an execution engine:
<r:test>
module = vfun$module
ee = ExecutionEngine(module, CodeGenOpt_Aggressive)
Optimize(module, ee)
</r:test>
</para>
<para>
We can then invoke the functions with
<r:test>
.llvm(module$dnorm, 0, 0, 1, .ee = ee)
x = rnorm(1000)
ans = .llvm(module$v_dnorm, x, length(x), 0, 1, .ee = ee, .all = TRUE)[[1]]
</r:test>
</para>
<para>
To test how long the native <r/> and the <llvm/> version take, we time
the following
<r:test>
x = rnorm(1e6)
tm.llvm = system.time(.llvm(module$v_dnorm, x, length(x), 0, 1, .ee = ee, .all = TRUE)[[1]])
tm.native = system.time(dnorm(x, 0, 1))
</r:test>
</para>
<para>
<r:function><![CDATA[
scalarDnorm.r = function(x, mean= 0, sd = 1) {
tmp = (x - mean)/sd
(1/(sd*2.5066282746310002416)) * exp(-.5 * tmp * tmp)
}
Dnorm.r = function(x, mean = 0, sd = 1)
sapply(x, scalarDnorm.r, mean, sd)
]]></r:function>
We do a quick test to verify we have the functions correct:
<r:test>
x = rnorm(10)
all.equal(Dnorm.r(x, 2, 3), dnorm(x, 2, 3))
</r:test>
</para>
<para>
These functions use <r:func>sapply</r:func> and so <r:func>lapply</r:func>
and this loop is done in <c/>. So this should be reasonably quick
relative to a <r:keyword>for</r:keyword> loop in <r/>.
However, it is still slow:
<r:test>
x = rnorm(1e6)
tm.r = system.time(Dnorm.r(x, 0, 1))
</r:test>
<r:test>
tm = rbind(tm.r, tm.llvm, tm.native)[, 1:3]
<r:output><![CDATA[
tm.r 12.781 0.073 13.022
tm.llvm 0.042 0.017 0.059
tm.native 0.042 0.000 0.042
]]></r:output>
</r:test>
<r:test>
tm[,3]/tm[3,3]
<r:output><![CDATA[
tm.r tm.llvm tm.native
310.047619 1.404762 1.000000
]]></r:output>
</r:test>
As we increase the sample size, the relative performance of the native version and the <llvm/>
version grows. However, the <r/> version becomes very slow.
<r:test>
x = rnorm(1e7)
tm.llvm = system.time(.llvm(module$v_dnorm, x, length(x), 0, 1, .ee = ee, .all = TRUE)[[1]])
tm.native = system.time(dnorm(x, 0, 1))
tm.r = system.time(Dnorm.r(x, 0, 1))
</r:test>
<r:test>
tm = rbind(tm.r, tm.llvm, tm.native)[, 1:3]
tm[,3]/tm[3,3]
<r:output><![CDATA[
tm.r tm.llvm tm.native
611.098901 1.881319 1.000000
]]></r:output>
</r:test>
</para>
</section>
<ignore>
<para>
We break down the expression into the different binary operators
and calls to functions (i.e. <r:func>sqrt</r:func>).
We start with <r:expr eval="false">2*pi</r:expr> and create the instruction
to compute this with
</para>
</ignore>
</section>
</article>