Skip to content

Argmax(cpp wrapper) #784

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jul 16, 2025
Merged

Conversation

AdvancedCompiler
Copy link
Contributor

PR Category

Operator

Type of Change

Refactor

Description

cpp wrapper

Issue

Progress

  • Change is properly reviewed (1 reviewer required, 2 recommended).
  • Change is responded to an issue.
  • Change is fully covered by a UT.

Performance

#include "flag_gems/operators.h"
#include "torch/torch.h"

TEST(reduction_op_test, argmax) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to test with larger shapes and with different dtypes.


TEST(reduction_op_test, argmax_keepdim_option) {
const torch::Device device(torch::kCUDA, 0);
torch::Tensor input = torch::randn({2, 2, 2, 2}, device);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto, the test shape is too small

lib/argmax.cpp Outdated
Comment on lines 29 to 32
auto shape = self.sizes().vec();
for (auto &s : shape) {
s = 1;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto shape = self.sizes().vec();
for (auto &s : shape) {
s = 1;
}
const auto shape = std::vector<int64_t>(self.dim(), 1);

lib/argmax.cpp Outdated
c10::DeviceGuard guard(self.device());
c10::cuda::CUDAStream stream = c10::cuda::getCurrentCUDAStream();

f1(stream, mid_size, 1, 1, 4 /*num_warps*/, 2 /*num_stages*/, self, mid_value, mid_index, M, block_size);
Copy link
Collaborator

@Bowen12992 Bowen12992 Jul 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
f1(stream, mid_size, 1, 1, 4 /*num_warps*/, 2 /*num_stages*/, self, mid_value, mid_index, M, block_size);
f1(stream, mid_size, 1, 1, /* num_warps = */ 4, /* num_stages = */ 2 , self, mid_value, mid_index, M, block_size);

lib/argmax.cpp Outdated

f1(stream, mid_size, 1, 1, 4 /*num_warps*/, 2 /*num_stages*/, self, mid_value, mid_index, M, block_size);

f2(stream, 1, 1, 1, 4 /*num_warps*/, 2 /*num_stages*/, mid_value, mid_index, out, mid_size, block_mid);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

lib/argmax.cpp Outdated
int64_t dim_val = dim.value();
dim_val = at::maybe_wrap_dim(dim_val, self.dim());

auto shape = self.sizes();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto shape = self.sizes();
const auto& shape = self.sizes();

Bowen12992
Bowen12992 previously approved these changes Jul 15, 2025
Copy link
Collaborator

@0x45f 0x45f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Bowen12992 Bowen12992 merged commit b91901b into FlagOpen:master Jul 16, 2025
12 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants