-
Notifications
You must be signed in to change notification settings - Fork 18
Use VLLM_WORKER_MULTIPROC_METHOD=spawn instead of --forked for tests #268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Hi! Thank you for contributing to vLLM support on Spyre.
Or this can be done with
Now you are good to go 🚀 |
bot:test |
bot:test |
1 similar comment
bot:test |
bot:test |
d834d5b
to
fa06501
Compare
Signed-off-by: Travis Johnson <[email protected]>
Signed-off-by: Travis Johnson <[email protected]>
Signed-off-by: Travis Johnson <[email protected]>
45ff0fe
to
e21df75
Compare
Signed-off-by: Travis Johnson <[email protected]>
ca6f42c
to
a75e4df
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lpgtm
This PR lets us remove the requirement of
--forked
from our pytest tests.The hang that is observed without
--forked
is due to the known issue with libgomp and threading (see this gcc bug report that is a "won't fix"). It is a common problem in Python dueo to the usage of native libraries behind the scenes. If a process is forked after an OpenMP thread pool has been created, then the child will not have a threadpool and the code hangs the next time code enters a parallel context.Where this comes up in our tests is actually because we use
transformers
to compare the generation results. vLLM and PyTorch delay initializing the thread pool until it is needed. When just using vLLM in V1, this does not happen in the frontend process, so it is ok to usefork()
, but usingtransformer
'smodel.generate
in the main process during the tests initializes the thread pool and causes the next attempt to create avllm.LLM
to hang in the forked worker process.With
spawn
, the new process is created from scratch and creates a new OpenMP thread pool. But there are trade-offs here too, eg. usingspawn
in offline mode requires particular handling in a script. vLLM docs have a good summary of the trade-offs of this setting in REF. Because of that, I just set in in the test environment. Using thevllm
cli to run the code actually defaults tospawn
anyways (REF).FIX #146