-
Notifications
You must be signed in to change notification settings - Fork 121
Open
Description
I have noticed that innerProduct
ejml/main/ejml-ddense/src/org/ejml/dense/row/mult/MatrixVectorMult_DDRM.java
Lines 338 to 344 in 2c9d1dc
| for (int k = 0; k < B.numCols; k++) { | |
| double sum = 0; | |
| for (int i = 0; i < B.numRows; i++) { | |
| sum += a[offsetA + i]*B.data[k + i*cols]; | |
| } | |
| output += sum*c[offsetC + k]; | |
| } |
performs a lot worse (2x minimum, but varies with size, 1000s) than my original naive implementation.
I think I figured out the reason.
The access of the matrix data likely trashes the CPU cache, because it keeps jumping column: B.data[k + i*cols], where i is incremented in the inner loop.
If I swap the loops, I get a back the lost speed.
Before I provide a PR, is there any reason this is does this way?
Metadata
Metadata
Assignees
Labels
No labels