Skip to content

Benchmark results look incorrect? #31

Open
@danieldk

Description

@danieldk
% export OMP_NUM_THREADS=1
% python -m blis.benchmark
Setting up data for gemm. 1000 iters,  nO=384 nI=384 batch_size=2000
Blis gemm...
Total: 11032014.6484375
9.54 seconds
Numpy (openblas) gemm...
Total: 11032015.625
9.50 seconds
Blis einsum ab,cb->ca
Total: 5510590.8203125
9.78 seconds
Numpy (openblas) einsum ab,cb->ca
unset OMP_NUM_THREADS
Total: 5510596.19140625
90.67 seconds

numpy with OpenBLAS and blis are on-par for gemm. However, this does not use intermediate optimization on numpy's einsum. Enabling this by passing optimize=True:

% python -m blis.benchmark
Setting up data for gemm. 1000 iters,  nO=384 nI=384 batch_size=2000
Blis gemm...
Total: 11032014.6484375
9.62 seconds
Numpy (openblas) gemm...
Total: 11032015.625
9.51 seconds
Blis einsum ab,cb->ca
Total: 5510590.8203125
9.70 seconds
Numpy (openblas) einsum ab,cb->ca
Total: 5510592.28515625
11.43 seconds

Only slightly slower than blis now. However, I am skeptical of the claim that parallelization does not help in inference. The matrix sizes used in the benchmark are fairly typical in inference (e.g. the standard transformer attention matrices are 768x768). Testing with 4 threads (fairly modest on current multi-core SMT CPUs):

% export OMP_NUM_THREADS=4
% python -m blis.benchmark
Setting up data for gemm. 1000 iters,  nO=384 nI=384 batch_size=2000
Blis gemm...
Total: 11032014.6484375
9.77 seconds
Numpy (openblas) gemm...
Total: 11032015.625
3.40 seconds
Blis einsum ab,cb->ca
Total: 5510590.8203125
9.83 seconds
Numpy (openblas) einsum ab,cb->ca
Total: 5510592.28515625
4.53 seconds

Maybe it's worthwhile compiling blis with multi-threading support?

For reference:

% lscpu | grep name:
Model name:          Intel(R) Xeon(R) Gold 6138 CPU @ 2.00GHz

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions