Write a device kernel that calculates the single precision BLAS operation
saxpy, i.e. y = a * x + y
.
- Initialise the vectors
x
andy
with some values on the CPU - Perform the computation on the host to generate reference values
- Allocate memory on the device for
x
andy
- Copy the host
x
to devicex
, and hosty
to devicey
- Perform the computation on the device
- Copy the device
y
back to the hosty
- Confirm the correctness: Is the host computed
y
equal to the device computedy
?
You may start from a skeleton code provided in saxpy.cpp.