Skip to content

[Feature]: Compute and log the serving FLOPs #3490

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
zhuohan123 opened this issue Mar 19, 2024 · 12 comments · May be fixed by #19290
Open

[Feature]: Compute and log the serving FLOPs #3490

zhuohan123 opened this issue Mar 19, 2024 · 12 comments · May be fixed by #19290
Labels
feature request New feature or request good first issue Good for newcomers

Comments

@zhuohan123
Copy link
Member

🚀 The feature, motivation and pitch

vLLM should output the serving FLOPs. This should be helpful for debugging performance and check the GPU utilization.

Alternatives

No response

Additional context

No response

@zhuohan123 zhuohan123 added good first issue Good for newcomers feature request New feature or request labels Mar 19, 2024
@ayusher
Copy link
Contributor

ayusher commented Apr 2, 2024

Happy to take this up!

@gardberg
Copy link
Contributor

gardberg commented Apr 4, 2024

@ayusher Just out of curiosity, how would you go about doing this? Let's take the simplest case of a single non-sharded model running on a machine. I assume we want to log the actual FLOPs / MACs (multiply accumulates) that the hardware is doing, and not estimate it from the modules of the model e.g. via profiling or just theoretically counting per-module, as this does not take into consideration the kernel implementation of the module (different implementations of attention in e.g. cuda can require different amount of FLOPs, right?). What would be the right way of doing this, where we also do not introduce too much overhead?

@ayusher
Copy link
Contributor

ayusher commented Apr 26, 2024

Yeah, after further research this looks like a more involved problem than I initially anticipated. I'm not sure on the best approach or if this is worth considering right now.

@gardberg
Copy link
Contributor

Yeah I agree. I know that some nvidia gpu libraries enable you to get flops, but does not seem like a trivial problem (not a good first issue I guess lol). I'd love to help if we find a manageable way though!

@rakshithvasudev
Copy link

Thank you all!

Are the flops required to be accurate? Is it okay if the FLOPS numbers are a close estimation?

As discussed by others in this thread it's not very straightforward to calculate FLOPS without digging into internals. That may also add overheads during inference. If okay with a close estimation, I can try to implement a basic FLOPS calculator.

Copy link

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

@github-actions github-actions bot added the stale Over 90 days of inactivity label Nov 23, 2024
Copy link

This issue has been automatically closed due to inactivity. Please feel free to reopen if you feel it is still relevant. Thank you!

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 24, 2024
@simon-mo simon-mo reopened this Dec 27, 2024
@simon-mo simon-mo removed the stale Over 90 days of inactivity label Dec 27, 2024
@krtkvrm
Copy link

krtkvrm commented Jan 29, 2025

can I take this up?

@gangula-karthik
Copy link

hi @zhuohan123 is this issue still open to work on ?

@duhaode520
Copy link

@zhuohan123 @robertgshaw2-redhat @WoosukKwon Hi, is this issue still open? I'd like to take this if it is still needed.

@plops655
Copy link

@duhaode520 Do you have any ideas as to how to approach this?

@sysradium sysradium linked a pull request Jun 6, 2025 that will close this issue
@sysradium
Copy link

I have tried addressing this in my PR. Would be nice if anyone could have a look at it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request good first issue Good for newcomers
Projects
Status: Todo
Development

Successfully merging a pull request may close this issue.

10 participants