-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A tutorial on pin_memory and non_blocking usage #2983
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/2983
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 07f9932 with merge base c3882db (): This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@@ -0,0 +1,429 @@ | |||
# -*- coding: utf-8 -*- | |||
""" | |||
A guide on good usage of `non_blocking` and `pin_memory()` in PyTorch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I was planning on doing that - currently the script is a mere export from a regular ipynb, I should make a second pass and correct all the links etc!
TL;DR | ||
----- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest we remove the TL;DR - the text below can be just the intro abstract right after the title. Can you follow this template: https://github.com/pytorch/tutorials/blob/main/beginner_source/template_tutorial.py
Add What you will learn and Prerequisites and Author
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure!
@svekars Happy to get some more feedback.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple very minor editorial nits - looks great, otherwise!
# .. _pinned_memory_resources: | ||
# | ||
# If you are dealing with issues with memory copies when using CUDA devices or want to learn more about | ||
# what was discussed in this tutorial, check the following references: | ||
# | ||
# - `CUDA toolkit memory management doc <https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__MEMORY.html>`_ | ||
# - `CUDA pin-memory note <https://forums.developer.nvidia.com/t/pinned-memory/268474>`_ | ||
# - `How to Optimize Data Transfers in CUDA C/C++ <https://developer.nvidia.com/blog/how-optimize-data-transfers-cuda-cc/>`_ | ||
# - tensordict :meth:`~tensordict.TensorDict.to` method; | ||
# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Co-authored-by: Svetlana Karslioglu <[email protected]>
Co-authored-by: Svetlana Karslioglu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please address the nits though
This is a draft PR on the proper usage of
pin_memory
andnon_blocking
when sending data from CPU to GPU (and GPU to CPU).Description
There is some confusion about the proper use of pin_memory and non_blocking in the community.
To be convinced about it, one can simply look for the
.pin_memory.to(
pattern on github (about 2k results), when this command is always slower than a simple call toto(device)
.The responsibility lies in part in the pytorch doc itself where it is recommended (implicitly) to call
pin_memory
before callingto
withnon_blocking=True
and several such occurrences on the pytorch forum too.Some refs on the topic:
https://discuss.pytorch.org/t/what-is-the-disadvantage-of-using-pin-memory/1702/13
https://discuss.pytorch.org/t/non-blocking-memory-transfer-to-gpu/188941
https://discuss.pytorch.org/t/should-we-set-non-blocking-to-true/38234
https://discuss.pytorch.org/t/how-is-explicit-pin-memory-different-from-just-calling-to-and-let-cuda-handle-it/197422
I use TensorDict to demonstrate how to use pin_memory across threads.
A follow-up will be to link back the pytorch doc / docstrings of to / pin_memory etc to this tutorial.
The syntax should be fixed and integrated better in the lib. Conclusion and additional resources are missing.
Still, feedback is more than welcome!
The tutorial requires tensordict v0.5 which will be released in the coming days.
cc @shagunsodhani @albanD @janeyx99 @dstaay-fb @mikaylagawarecki @ptrblck