Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add blas/base/sgemm #2742

Merged
merged 29 commits into from
Aug 15, 2024
Merged

feat: add blas/base/sgemm #2742

merged 29 commits into from
Aug 15, 2024

Conversation

aman-095
Copy link
Member

@aman-095 aman-095 commented Aug 5, 2024

Progresses #2039.

Description

What is the purpose of this pull request?

This RFC proposes to add a routine to perform one of the matrix-matrix operation C = α*op(A)*op(B) + β*C where op(A) is one of the op(A) = A, or op(A) = A^T, α and β are scalars, A, B, and C are matrices, with op(A) an M by K matrix, op(B) a K by N matrix and C an M by N matrix as defined in BLAS Level 3 routines. Specifically adding @stdlib/blas/base/sgemm is proposed.

Related Issues

Does this pull request have any related issues?

This pull request:

Questions

Any questions for reviewers of this pull request?

No.

Other

Any other information relevant to this pull request? This may include screenshots, references, and/or implementation notes.

No.

Checklist

Please ensure the following tasks are completed before submitting this pull request.


@stdlib-js/reviewers

@stdlib-bot stdlib-bot added the BLAS Issue or pull request related to Basic Linear Algebra Subprograms (BLAS). label Aug 5, 2024
@aman-095 aman-095 marked this pull request as draft August 5, 2024 07:32
@kgryte kgryte added the Feature Issue or pull request for adding a new feature. label Aug 7, 2024
@kgryte
Copy link
Member

kgryte commented Aug 10, 2024

/stdlib update-copyright-years

@kgryte
Copy link
Member

kgryte commented Aug 12, 2024

@aman-095 To reduce the risk of benchmark workflow timeout, let's reduce the max power in the benchmark files to 5, rather than 6.

@aman-095
Copy link
Member Author

@kgryte I tried reducing the max power to 5, but it still takes a lot of time. Can we reduce it further?

@aman-095 aman-095 marked this pull request as ready for review August 13, 2024 06:38
@kgryte
Copy link
Member

kgryte commented Aug 13, 2024

@aman-095 Yeah, reducing to 4 should be fine. We can increase again for dgemm. From testing locally, the repeated calls to f32() slow things down a bit. Usually doesn't matter too much, but for gemm it does due to the sheer number of repeated calls.

@kgryte kgryte added the Needs Review A pull request which needs code review. label Aug 13, 2024
@kgryte
Copy link
Member

kgryte commented Aug 14, 2024

@aman-095 Looking at the test fixtures, it is not clear why the strides are changing when parameterizing whether a transpose should be performed. E.g., for ca_cb_cc_nta_tb.json, B is a 3x4 column-major matrix. In which case, the strides should be [1,4], as they are in the fixture.

{
  "transA": "no-transpose",
  "transB": "transpose",
  "M": 2,
  "N": 4,
  "K": 3,
  "alpha": 1.0,
  "A": [ 1.0, 4.0, 2.0, 5.0, 3.0, 6.0 ],
  "strideA1": 1,
  "strideA2": 2,
  "offsetA": 0,
  "B": [ 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0 ],
  "strideB1": 1,
  "strideB2": 4,
  "offsetB": 0,
  "beta": 1.0,
  "C": [ 1.0, 5.0, 2.0, 6.0, 3.0, 7.0, 4.0, 8.0 ],
  "strideC1": 1,
  "strideC2": 2,
  "offsetC": 0,
  "C_out": [ 7.0, 20.0, 8.0, 21.0, 9.0, 22.0, 10.0, 23.0 ]
}

However, for the no-transpose fixture ca_cb_cc_nta_ntb.json, you have

{
  "transA": "no-transpose",
  "transB": "no-transpose",
  "M": 2,
  "N": 4,
  "K": 3,
  "alpha": 1.0,
  "A": [ 1.0, 4.0, 2.0, 5.0, 3.0, 6.0 ],
  "strideA1": 1,
  "strideA2": 2,
  "offsetA": 0,
  "B": [ 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0 ],
  "strideB1": 1,
  "strideB2": 3,
  "offsetB": 0,
  "beta": 1.0,
  "C": [ 1.0, 5.0, 2.0, 6.0, 3.0, 7.0, 4.0, 8.0 ],
  "strideC1": 1,
  "strideC2": 2,
  "offsetC": 0, 
  "C_out": [ 7.0, 20.0, 8.0, 21.0, 9.0, 22.0, 10.0, 23.0 ]
}

with the strides for B being [1,3]. This doesn't appear correct. The strides should be for B, not op(B), as the transA and transB state what should happen inside the implementation, not how B is provided.

I believe this needs to be addressed across the various test fixtures. For the benchmarks, we handle this correctly.

@kgryte kgryte added Needs Changes Pull request which needs changes before being merged. and removed Needs Review A pull request which needs code review. labels Aug 14, 2024
@aman-095
Copy link
Member Author

aman-095 commented Aug 14, 2024

@kgryte We use matrices of dimension N*N for benchmarks, so we don't need to bother about this. But, in the standard lapack implementation they say that M, N, and K are dimensions based on op(X).

@aman-095
Copy link
Member Author

In the test suites I have used matrices:

A =    [1, 2, 3]
       [4, 5, 6]
       
B =    [1, 1, 1, 1]
       [1, 1, 1, 1]
       [1, 1, 1, 1]

C =    [1, 2, 3, 4]
       [5, 6, 7, 8]

Now, based on the operation say if we have transB = transpose the operation would be α*A*B^T + β*C where A = (2X3) but here B^T should have the dimension of (3X4), and hence the B which I pass as input changes to B^T (4X3) and then based on 'row-major' or 'column-major' we have strides.

@kgryte
Copy link
Member

kgryte commented Aug 14, 2024

@aman-095 You're right. Thanks for correcting me.

@kgryte kgryte removed the Needs Changes Pull request which needs changes before being merged. label Aug 15, 2024
Copy link
Member

@kgryte kgryte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks, @aman-095!

@kgryte kgryte mentioned this pull request Aug 15, 2024
10 tasks
@Pranavchiku
Copy link
Member

+9000 lines of code, with review! 🙇‍♂️🚀

@kgryte kgryte merged commit ab0faa5 into stdlib-js:develop Aug 15, 2024
11 checks passed
gunjjoshi pushed a commit to gunjjoshi/stdlib that referenced this pull request Aug 21, 2024
PR-URL: stdlib-js#2742
Ref: stdlib-js#2039
Co-authored-by: Athan Reines <[email protected]>
Reviewed-by: Athan Reines <[email protected]> 
Co-authored-by: stdlib-bot <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BLAS Issue or pull request related to Basic Linear Algebra Subprograms (BLAS). Feature Issue or pull request for adding a new feature.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants