JaggedTensor permute - less CPU ops #2786

che-sh · 2025-03-07T02:57:43Z

Summary: JaggedTensor.permute could be called with very large indices list (a few hundred items) - so calling python properties self.keys(), self.variable_stride_per_key() and self.stride_per_key_per_rank() in the loop over indices start to compound and take noticeable time on CPU.

Differential Revision: D70609204

facebook-github-bot · 2025-03-07T03:00:43Z

This pull request was exported from Phabricator. Differential Revision: D70609204

facebook-github-bot · 2025-03-07T03:00:48Z

This pull request was exported from Phabricator. Differential Revision: D70609204

Summary: `JaggedTensor.permute` could be called with very large `indices` list (a few hundred items) - so calling python properties `self.keys()`, `self.variable_stride_per_key()` and `self.stride_per_key_per_rank()` in the loop over indices start to compound and take noticeable time **on CPU**. Differential Revision: D70609204

facebook-github-bot · 2025-03-07T04:36:41Z

This pull request was exported from Phabricator. Differential Revision: D70609204

Summary: Pull Request resolved: pytorch#2786 `JaggedTensor.permute` could be called with very large `indices` list (a few hundred items) - so calling python properties `self.keys()`, `self.variable_stride_per_key()` and `self.stride_per_key_per_rank()` in the loop over indices start to compound and take noticeable time **on CPU**. Differential Revision: D70609204

Summary: `JaggedTensor.permute` could be called with very large `indices` list (a few hundred items) - so calling python properties `self.keys()`, `self.variable_stride_per_key()` and `self.stride_per_key_per_rank()` in the loop over indices start to compound and take noticeable time **on CPU**. Reviewed By: sarckk Differential Revision: D70609204

Summary: Pull Request resolved: pytorch#2786 `JaggedTensor.permute` could be called with very large `indices` list (a few hundred items) - so calling python properties `self.keys()`, `self.variable_stride_per_key()` and `self.stride_per_key_per_rank()` in the loop over indices start to compound and take noticeable time **on CPU**. Reviewed By: sarckk Differential Revision: D70609204

facebook-github-bot · 2025-03-18T23:14:26Z

This pull request was exported from Phabricator. Differential Revision: D70609204

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 7, 2025

facebook-github-bot added the fb-exported label Mar 7, 2025

che-sh force-pushed the export-D70609204 branch from 0b32aef to 51755cd Compare March 7, 2025 03:00

che-sh force-pushed the export-D70609204 branch from 51755cd to c264c54 Compare March 7, 2025 04:33

che-sh force-pushed the export-D70609204 branch from c264c54 to a3d67d4 Compare March 7, 2025 04:36

che-sh force-pushed the export-D70609204 branch from a3d67d4 to cc7219a Compare March 18, 2025 23:11

che-sh force-pushed the export-D70609204 branch from cc7219a to 0d1e6d9 Compare March 18, 2025 23:14

facebook-github-bot closed this in 76446e7 Mar 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JaggedTensor permute - less CPU ops #2786

JaggedTensor permute - less CPU ops #2786

che-sh commented Mar 7, 2025

facebook-github-bot commented Mar 7, 2025

facebook-github-bot commented Mar 7, 2025

facebook-github-bot commented Mar 7, 2025

facebook-github-bot commented Mar 18, 2025

JaggedTensor permute - less CPU ops #2786

JaggedTensor permute - less CPU ops #2786

Conversation

che-sh commented Mar 7, 2025

facebook-github-bot commented Mar 7, 2025

facebook-github-bot commented Mar 7, 2025

facebook-github-bot commented Mar 7, 2025

facebook-github-bot commented Mar 18, 2025