You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
all tensors in ModelInput should be on pinned memory for non-blocking device-to-host data transfer (#2985)
Summary:
Pull Request resolved: #2985
# context
* `KeyedJaggedTensor` has the method of `pin_memory` so there's no need to do the pin_memory manually.
* The `pin_memory()` call for input KJTs are important for training.
NOTE: It's recommended in the prod training scenario that `TrainModelInput` should be created on pinned memory for a fast transfer to gpu. For more on [pin_memory](https://pytorch.org/tutorials/intermediate/pinmem_nonblock.html#pin-memory).
* ModelInput example
```
if pin_memory:
float_features = float_features.pin_memory()
label = label.pin_memory()
idlist_features: Optional[KeyedJaggedTensor] = (
None if idlist_features is None else idlist_features.pin_memory()
)
idscore_features: Optional[KeyedJaggedTensor] = (
None if idscore_features is None else idscore_features.pin_memory()
)
return ModelInput(
float_features=float_features,
idlist_features=idlist_features,
idscore_features=idscore_features,
label=label,
)
```
WARNING: All the tensors in `TrainModelInput` should be pinned in memory, not just the KJTs. Otherwise you'll find that cpu execution is still blocked by `_to_copy` even most of the (host-to-device) data transfer is non-blocking.
{F1978313151} {F1978313156}
Reviewed By: tao-jia
Differential Revision: D74434209
fbshipit-source-id: c7ad466b8d278044b2e2b9dd8f89489545f3060a
0 commit comments