-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmarking EJML vs Numpy / Pytorch #178
Comments
Most likely what you are seeing is Just in Time optimization vs optimization at compile time. The Java Virtual Machine uses JIT and typically when benchmark you are interested in the steady state perform. So to "warm up" the JVM you run a few iterations, then run it again. Java developers typically use JMH to do micro benchmarks since it automates all of this for you. Also you will get more accurate results in a micro benchmark in any language/platform if you run it for a sufficient period of time. In this case, I would pick a number of iterations so that it takes a few seconds to run. As for which one is faster in the steady state, I'm not sure here. Numpy is basically a wrapper around LAPACK and/or Eigen. On small matrices the performance is actually comparable. On large dense matrices LAPACK/EIGEN is typically 2.5 faster since their compiler can do fancier optimization. For sparse matrices the performance is comparable since the optimization advantage goes away. |
Thank you for your reply. After running the compute on a loop. It appears EJML is still 5 times slower than Pytorch with my tests. Would you be able to confirm I am using it properly please? Test code: static SimpleMatrix mat1 = new SimpleMatrix(fill(new double[15][32])); private static double[][] fill(double[][] fMat) { public static void main(String[] args) { |
Source code. Did a few permutations to see what was going on. Java
Python:
|
Running concurrent code makes a big difference here. About 7x speed up. There's a switch in SimpleMatrix which turns on threads that isn't being triggered. Could probably be improved...
These matrices are also large enough that the better SIMD optimization in the C/C++ code Python wraps is kicking in. I'm willing to bet if you made the matrices even bigger the speed difference would increase. The new vector API in Java should help close the gap. |
Changed the title to get people to actually read this thread. A new stable has been release and here are it's benchmark results using SimpleMatrix. The logic for switching to concurrency has been improved:
The overhead of SimpleMatrix does slow it down a bit, but most people probably won't care. NumPy still has faster performance. |
I've noticed EJML matrix multiplication is more than 20 times slower than Pytorch or Numpy. Below I have some test code to reproduce the situation. Am I doing something wrong and what can I do to achieve Python results in Java? Thank you!
public static void main(String[] args) {
var mat1 = fill(new double[15][32]);
var mat2 = fill(new double[32][25600]);
var mask2dMat = new SimpleMatrix(mat1);
var proto2dMat = new SimpleMatrix(mat2);
var ts = System.currentTimeMillis();
mask2dMat.mult(proto2dMat); // multiply
System.out.println(System.currentTimeMillis() - ts); // 20ms on my PC
}
private static double[][] fill(double[][] fMat) {
for (double[] row : fMat) {
for (int i = 0; i < row.length; i++) {
row[i] = ThreadLocalRandom.current().nextFloat();
}
}
return fMat;
}
VS
mat1 = torch.randn(15, 32)
mat2 = torch.randn(32, 25600)
timestamp = int(time.time() * 1000)
mat1 @ mat2 # multiply
print(int(time.time() * 1000) - timestamp) # 0ms on same PC
Thank you.
The text was updated successfully, but these errors were encountered: