feat: add `blas/base/sgemm` #2742

aman-095 · 2024-08-05T07:32:01Z

Progresses #2039.

Description

What is the purpose of this pull request?

This RFC proposes to add a routine to perform one of the matrix-matrix operation C = α*op(A)*op(B) + β*C where op(A) is one of the op(A) = A, or op(A) = A^T, α and β are scalars, A, B, and C are matrices, with op(A) an M by K matrix, op(B) a K by N matrix and C an M by N matrix as defined in BLAS Level 3 routines. Specifically adding @stdlib/blas/base/sgemm is proposed.

Related Issues

Does this pull request have any related issues?

This pull request:

progresses [RFC]: Add BLAS bindings and implementations for linear algebra (tracking issue) #2039.

Questions

Any questions for reviewers of this pull request?

No.

Other

Any other information relevant to this pull request? This may include screenshots, references, and/or implementation notes.

No.

Checklist

Please ensure the following tasks are completed before submitting this pull request.

Read, understood, and followed the contributing guidelines.

@stdlib-js/reviewers

…/aman-095/2742

…an-095/2742

kgryte · 2024-08-10T07:32:58Z

/stdlib update-copyright-years

kgryte · 2024-08-12T09:07:41Z

@aman-095 To reduce the risk of benchmark workflow timeout, let's reduce the max power in the benchmark files to 5, rather than 6.

aman-095 · 2024-08-13T06:05:45Z

@kgryte I tried reducing the max power to 5, but it still takes a lot of time. Can we reduce it further?

kgryte · 2024-08-13T07:18:52Z

@aman-095 Yeah, reducing to 4 should be fine. We can increase again for dgemm. From testing locally, the repeated calls to f32() slow things down a bit. Usually doesn't matter too much, but for gemm it does due to the sheer number of repeated calls.

kgryte · 2024-08-14T01:32:30Z

@aman-095 Looking at the test fixtures, it is not clear why the strides are changing when parameterizing whether a transpose should be performed. E.g., for ca_cb_cc_nta_tb.json, B is a 3x4 column-major matrix. In which case, the strides should be [1,4], as they are in the fixture.

{
  "transA": "no-transpose",
  "transB": "transpose",
  "M": 2,
  "N": 4,
  "K": 3,
  "alpha": 1.0,
  "A": [ 1.0, 4.0, 2.0, 5.0, 3.0, 6.0 ],
  "strideA1": 1,
  "strideA2": 2,
  "offsetA": 0,
  "B": [ 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0 ],
  "strideB1": 1,
  "strideB2": 4,
  "offsetB": 0,
  "beta": 1.0,
  "C": [ 1.0, 5.0, 2.0, 6.0, 3.0, 7.0, 4.0, 8.0 ],
  "strideC1": 1,
  "strideC2": 2,
  "offsetC": 0,
  "C_out": [ 7.0, 20.0, 8.0, 21.0, 9.0, 22.0, 10.0, 23.0 ]
}

However, for the no-transpose fixture ca_cb_cc_nta_ntb.json, you have

{
  "transA": "no-transpose",
  "transB": "no-transpose",
  "M": 2,
  "N": 4,
  "K": 3,
  "alpha": 1.0,
  "A": [ 1.0, 4.0, 2.0, 5.0, 3.0, 6.0 ],
  "strideA1": 1,
  "strideA2": 2,
  "offsetA": 0,
  "B": [ 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0 ],
  "strideB1": 1,
  "strideB2": 3,
  "offsetB": 0,
  "beta": 1.0,
  "C": [ 1.0, 5.0, 2.0, 6.0, 3.0, 7.0, 4.0, 8.0 ],
  "strideC1": 1,
  "strideC2": 2,
  "offsetC": 0, 
  "C_out": [ 7.0, 20.0, 8.0, 21.0, 9.0, 22.0, 10.0, 23.0 ]
}

with the strides for B being [1,3]. This doesn't appear correct. The strides should be for B, not op(B), as the transA and transB state what should happen inside the implementation, not how B is provided.

I believe this needs to be addressed across the various test fixtures. For the benchmarks, we handle this correctly.

aman-095 · 2024-08-14T05:37:50Z

@kgryte We use matrices of dimension N*N for benchmarks, so we don't need to bother about this. But, in the standard lapack implementation they say that M, N, and K are dimensions based on op(X).

aman-095 · 2024-08-14T05:45:54Z

In the test suites I have used matrices:

A =    [1, 2, 3]
       [4, 5, 6]
       
B =    [1, 1, 1, 1]
       [1, 1, 1, 1]
       [1, 1, 1, 1]

C =    [1, 2, 3, 4]
       [5, 6, 7, 8]

Now, based on the operation say if we have transB = transpose the operation would be α*A*B^T + β*C where A = (2X3) but here B^T should have the dimension of (3X4), and hence the B which I pass as input changes to B^T (4X3) and then based on 'row-major' or 'column-major' we have strides.

kgryte · 2024-08-14T06:01:53Z

@aman-095 You're right. Thanks for correcting me.

kgryte

LGTM. Thanks, @aman-095!

Pranavchiku · 2024-08-15T07:13:59Z

+9000 lines of code, with review! 🙇‍♂️🚀

PR-URL: stdlib-js#2742 Ref: stdlib-js#2039 Co-authored-by: Athan Reines <[email protected]> Reviewed-by: Athan Reines <[email protected]> Co-authored-by: stdlib-bot <[email protected]>

feat: add BLAS Level 3 routine for sgemm

8ad06b6

stdlib-bot added the BLAS Issue or pull request related to Basic Linear Algebra Subprograms (BLAS). label Aug 5, 2024

aman-095 marked this pull request as draft August 5, 2024 07:32

kgryte added the Feature Issue or pull request for adding a new feature. label Aug 7, 2024

kgryte and others added 10 commits August 7, 2024 02:47

docs: update description

ce70f16

temp: refactor implementation

07e4a0e

refactor: replace loop with BLAS sdot

bd3faff

refactor: add loop tiling algorithm

0b644bf

fix: update transpose literal

6a8041f

docs: update example

bab523b

Merge branch 'develop' of https://github.com/stdlib-js/stdlib into pr…

fa068e6

…/aman-095/2742

bench: add benchmark for native and ndarray implementation

50ff368

fix: address buffer overflow bug

0c9204b

Merge branch 'sgemm' of https://github.com/aman-095/stdlib into pr/am…

1c53ffc

…an-095/2742

stdlib-bot and others added 3 commits August 10, 2024 07:33

chore: update copyright years

4318626

Merge branch 'stdlib-js:develop' into sgemm

e67b0b4

docs: add types and repl

470d540

test: add test cases for native and ndarray implementation

23986e6

aman-095 added 2 commits August 13, 2024 11:52

bench: reduce max power to 5

44afbdc

docs: add README, test.js and update examples

ab05c1a

aman-095 marked this pull request as ready for review August 13, 2024 06:38

kgryte added the Needs Review A pull request which needs code review. label Aug 13, 2024

kgryte added 4 commits August 13, 2024 14:07

bench: add missing facets and reduce max power

fbdabab

docs: fix wrapping and visually group related arguments

9d5236b

docs: add todos

9d02719

docs: fix descriptions

c57d21a

docs: fix examples and style

0dbc0be

kgryte added Needs Changes Pull request which needs changes before being merged. and removed Needs Review A pull request which needs code review. labels Aug 14, 2024

kgryte added 7 commits August 14, 2024 21:36

test: simplify test descriptions

09e0cb9

test: simplify test descriptions

a84ddd3

test: simplify test descriptions

e47811c

refactor: reduce code duplication and update descriptions

5b77a41

test: add missing tests

f79d1b1

test: add missing tests

d185ae7

test: add tests to verify blocked iteration behavior

f93fe2a

kgryte removed the Needs Changes Pull request which needs changes before being merged. label Aug 15, 2024

kgryte approved these changes Aug 15, 2024

View reviewed changes

kgryte mentioned this pull request Aug 15, 2024

feat: add blas/base/dgemm #2541

Merged

10 tasks

kgryte merged commit ab0faa5 into stdlib-js:develop Aug 15, 2024
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat: add `blas/base/sgemm` #2742

feat: add `blas/base/sgemm` #2742

Uh oh!

aman-095 commented Aug 5, 2024

Uh oh!

kgryte commented Aug 10, 2024

Uh oh!

kgryte commented Aug 12, 2024

Uh oh!

aman-095 commented Aug 13, 2024

Uh oh!

kgryte commented Aug 13, 2024 •

edited

Loading

Uh oh!

kgryte commented Aug 14, 2024

Uh oh!

aman-095 commented Aug 14, 2024 •

edited

Loading

Uh oh!

aman-095 commented Aug 14, 2024

Uh oh!

kgryte commented Aug 14, 2024

Uh oh!

kgryte left a comment

Uh oh!

Pranavchiku commented Aug 15, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

feat: add blas/base/sgemm #2742

feat: add blas/base/sgemm #2742

Uh oh!

Conversation

aman-095 commented Aug 5, 2024

Description

Related Issues

Questions

Other

Checklist

Uh oh!

kgryte commented Aug 10, 2024

Uh oh!

kgryte commented Aug 12, 2024

Uh oh!

aman-095 commented Aug 13, 2024

Uh oh!

kgryte commented Aug 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kgryte commented Aug 14, 2024

Uh oh!

aman-095 commented Aug 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aman-095 commented Aug 14, 2024

Uh oh!

kgryte commented Aug 14, 2024

Uh oh!

kgryte left a comment

Choose a reason for hiding this comment

Uh oh!

Pranavchiku commented Aug 15, 2024

Uh oh!

Uh oh!

Uh oh!

feat: add `blas/base/sgemm` #2742

feat: add `blas/base/sgemm` #2742

kgryte commented Aug 13, 2024 •

edited

Loading

aman-095 commented Aug 14, 2024 •

edited

Loading