-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
[Feature]: Compute and log the serving FLOPs #3490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Happy to take this up! |
@ayusher Just out of curiosity, how would you go about doing this? Let's take the simplest case of a single non-sharded model running on a machine. I assume we want to log the actual FLOPs / MACs (multiply accumulates) that the hardware is doing, and not estimate it from the modules of the model e.g. via profiling or just theoretically counting per-module, as this does not take into consideration the kernel implementation of the module (different implementations of attention in e.g. cuda can require different amount of FLOPs, right?). What would be the right way of doing this, where we also do not introduce too much overhead? |
Yeah, after further research this looks like a more involved problem than I initially anticipated. I'm not sure on the best approach or if this is worth considering right now. |
Yeah I agree. I know that some nvidia gpu libraries enable you to get flops, but does not seem like a trivial problem (not a good first issue I guess lol). I'd love to help if we find a manageable way though! |
Thank you all! Are the flops required to be accurate? Is it okay if the FLOPS numbers are a close estimation? As discussed by others in this thread it's not very straightforward to calculate FLOPS without digging into internals. That may also add overheads during inference. If okay with a close estimation, I can try to implement a basic FLOPS calculator. |
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you! |
This issue has been automatically closed due to inactivity. Please feel free to reopen if you feel it is still relevant. Thank you! |
can I take this up? |
hi @zhuohan123 is this issue still open to work on ? |
@zhuohan123 @robertgshaw2-redhat @WoosukKwon Hi, is this issue still open? I'd like to take this if it is still needed. |
@duhaode520 Do you have any ideas as to how to approach this? |
I have tried addressing this in my PR. Would be nice if anyone could have a look at it. |
🚀 The feature, motivation and pitch
vLLM should output the serving FLOPs. This should be helpful for debugging performance and check the GPU utilization.
Alternatives
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: