Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A tutorial on pin_memory and non_blocking usage #2983

Merged
merged 56 commits into from
Jul 31, 2024
Merged

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Jul 24, 2024

This is a draft PR on the proper usage of pin_memory and non_blocking when sending data from CPU to GPU (and GPU to CPU).

Description

There is some confusion about the proper use of pin_memory and non_blocking in the community.
To be convinced about it, one can simply look for the .pin_memory.to( pattern on github (about 2k results), when this command is always slower than a simple call to to(device).

The responsibility lies in part in the pytorch doc itself where it is recommended (implicitly) to call pin_memory before calling to with non_blocking=True and several such occurrences on the pytorch forum too.

Some refs on the topic:
https://discuss.pytorch.org/t/what-is-the-disadvantage-of-using-pin-memory/1702/13
https://discuss.pytorch.org/t/non-blocking-memory-transfer-to-gpu/188941
https://discuss.pytorch.org/t/should-we-set-non-blocking-to-true/38234
https://discuss.pytorch.org/t/how-is-explicit-pin-memory-different-from-just-calling-to-and-let-cuda-handle-it/197422

I use TensorDict to demonstrate how to use pin_memory across threads.

A follow-up will be to link back the pytorch doc / docstrings of to / pin_memory etc to this tutorial.

The syntax should be fixed and integrated better in the lib. Conclusion and additional resources are missing.
Still, feedback is more than welcome!

The tutorial requires tensordict v0.5 which will be released in the coming days.

cc @shagunsodhani @albanD @janeyx99 @dstaay-fb @mikaylagawarecki @ptrblck

Copy link

pytorch-bot bot commented Jul 24, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/2983

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 07f9932 with merge base c3882db (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@vmoens vmoens marked this pull request as draft July 24, 2024 15:51
@@ -0,0 +1,429 @@
# -*- coding: utf-8 -*-
"""
A guide on good usage of `non_blocking` and `pin_memory()` in PyTorch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to use .rst syntax:

  • double backticks instead of single ones
  • Links should be Text <link>_ not

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I was planning on doing that - currently the script is a mere export from a regular ipynb, I should make a second pass and correct all the links etc!

Comment on lines 6 to 7
TL;DR
-----
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we remove the TL;DR - the text below can be just the intro abstract right after the title. Can you follow this template: https://github.com/pytorch/tutorials/blob/main/beginner_source/template_tutorial.py
Add What you will learn and Prerequisites and Author

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure!

@vmoens
Copy link
Contributor Author

vmoens commented Jul 25, 2024

@svekars Happy to get some more feedback.
On my side, I only see these items to be done:

  • edit figures to make them look prettier
  • add additional info (links etc)
  • check that we have the most demonstrative examples (ie play a bit with the setups to highlight the differences better)

@vmoens vmoens marked this pull request as ready for review July 29, 2024 22:23
Copy link
Contributor

@svekars svekars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple very minor editorial nits - looks great, otherwise!

intermediate_source/pinmem_nonblock.py Outdated Show resolved Hide resolved
intermediate_source/pinmem_nonblock.py Outdated Show resolved Hide resolved
intermediate_source/pinmem_nonblock.py Outdated Show resolved Hide resolved
intermediate_source/pinmem_nonblock.py Outdated Show resolved Hide resolved
intermediate_source/pinmem_nonblock.py Outdated Show resolved Hide resolved
intermediate_source/pinmem_nonblock.py Outdated Show resolved Hide resolved
Comment on lines 719 to 728
# .. _pinned_memory_resources:
#
# If you are dealing with issues with memory copies when using CUDA devices or want to learn more about
# what was discussed in this tutorial, check the following references:
#
# - `CUDA toolkit memory management doc <https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__MEMORY.html>`_
# - `CUDA pin-memory note <https://forums.developer.nvidia.com/t/pinned-memory/268474>`_
# - `How to Optimize Data Transfers in CUDA C/C++ <https://developer.nvidia.com/blog/how-optimize-data-transfers-cuda-cc/>`_
# - tensordict :meth:`~tensordict.TensorDict.to` method;
#
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thin there is a bit of an issue with rendering this part because of an extra space in all those liens. If you have just one space instead of two, the indentation will be correct.
Screenshot 2024-07-30 at 9 22 20 AM

vmoens and others added 2 commits July 30, 2024 19:28
Copy link
Contributor

@mikaylagawarecki mikaylagawarecki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address the nits though

intermediate_source/pinmem_nonblock.py Outdated Show resolved Hide resolved
intermediate_source/pinmem_nonblock.py Outdated Show resolved Hide resolved
intermediate_source/pinmem_nonblock.py Outdated Show resolved Hide resolved
intermediate_source/pinmem_nonblock.py Outdated Show resolved Hide resolved
intermediate_source/pinmem_nonblock.py Outdated Show resolved Hide resolved
intermediate_source/pinmem_nonblock.py Outdated Show resolved Hide resolved
intermediate_source/pinmem_nonblock.py Outdated Show resolved Hide resolved
@vmoens vmoens merged commit a66464b into main Jul 31, 2024
20 checks passed
@vmoens vmoens deleted the pinmem-nonblock-tuto branch July 31, 2024 00:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants