-
-
Notifications
You must be signed in to change notification settings - Fork 12.3k
Support loading transformers models with named parameters #16868
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support loading transformers models with named parameters #16868
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for loading custom transformers models with named parameters by initializing parameters from the meta device onto the execution device.
- Added a new method init_parameters to recursively initialize parameters on the meta device.
- Adjusted model initialization to call init_parameters right after setting up buffers.
Copilot
AI
Apr 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using module.register_parameter(name, new_param) instead of setattr(module, name, new_param) to ensure the parameter is correctly registered in the module's parameter dictionary.
| setattr(module, name, new_param) | |
| for child in module.children(): | |
| module.register_parameter(name, new_param) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is necessary, I'd prefer to parameterise init_buffers to work with buffers and parameters instead of duplicating the logic.
However, I'd like to find out why meta_to_device missed some parameters that your init_parameters didn't?
I guess a parameter initialized on a meta device just doesn't have an associated buffer? I think this simplified repro illustrates what's going on |
|
But vllm/vllm/model_executor/models/transformers.py Lines 302 to 308 in 8d32dc6
If I add the following to the end of your repro, then device="cuda"
def meta_to_empty(module: nn.Module):
tensors = list(chain(module.buffers(), module.parameters()))
if tensors and all(t.device == torch.device("meta") for t in tensors):
module.to_empty(device=device)
return # We can stop recursing because to_empty is recursive
for child in module.children():
meta_to_empty(child)
meta_to_empty(model)
print(model.my_param) |
|
So it seems like I verified that all the non-meta parameters in my model were corresponding to the ColumnParallelLinear and RowParallelLinear layers with print statements. In my particular case, there actually aren't any named buffers so This illustrates what seems to be happening |
|
Yeah it's expected that the TP substitutions create tensors on device. This way the tensor parallel linear layers are instantiated like normal rather than being created on the meta device and potentially initialised incorrectly.
|
hmellor
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
The original issue was that meta_to_empty specifically doesn't work for nn.Parameters that are direct children of the model. This is because they don't appear in module.children() (because they're not nn.Modules) and therefore never reach module.to_empty (which would fail as nn.Parameter does not have this method).
After this PR I will follow up on the following two points:
- Is
meta_to_emptyeven necessary anymore? - We need to better handle buffers that are direct children of the
model(right now,init_bufferswould try and instantiate the whole model on one GPU, which would almost certainly cause OOM since most of the model is already on the GPU)
|
Updated the doc string. At first glance, the CI failure looks unrelated, but will defer to you.
I can't think of any real world models where it is needed, but in theory you could have modules which don't have parameters (some rope, norms, and quantization specific modules could look like this). In all of those cases, I think most people would explicitly set the device on any tensors they create to the same device as the input tensor though. |
|
The main tests have not run yet, could you please make the DCO check pass then I'll trigger the CI suite |
7a1205a to
baaf5dd
Compare
Head branch was pushed to by a user without write access
Signed-off-by: Alex <[email protected]>
Signed-off-by: Alex <[email protected]>
Signed-off-by: Alex <[email protected]>
Signed-off-by: Alex <[email protected]>
Signed-off-by: Alex <[email protected]>
ffa2235 to
6026465
Compare
|
I messed up a rebase, apologies for spamming everyone |
…ct#16868) Signed-off-by: Alex <[email protected]>
…ct#16868) Signed-off-by: Alex <[email protected]>
…ct#16868) Signed-off-by: Alex <[email protected]> Signed-off-by: Mu Huai <[email protected]>
…ct#16868) Signed-off-by: Alex <[email protected]> Signed-off-by: Yuqi Zhang <[email protected]>
This PR adds support for loading custom transformers models with NamedParameters.
This is generally helpful for models which have something of the form