convert stride_per_key_per_rank to tensor inside KJT #2959

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

TroyGarden wants to merge 2 commits into pytorch:main from TroyGarden:export-D74366343

Contributor

TroyGarden commented May 8, 2025

Summary:

context

this diff is part of the "variable-batch KJT refactoring" project (doc)
previously the stride_per_key_per_rank variable is List[List[int]] | None which can't be handled correctly in PT2 IR (torch.export)
this change makes the KJT class variable _stride_per_key_per_rank as torch.IntTensor | None so it would be compatible with PT2 IR.

equivalency

to check if self._stride_per_key_per_rank is None
this logic is used to differentiate variable_batch case, and should have the same behavior after this diff
to use self._stride_per_key_per_rank as List[List[int]]
most of the callsite use the function to get the list: def stride_per_key_per_rank(self) -> List[List[int]]:, and this function is modified to covert the torch.IntTensor to list as _stride_per_key_per_rank.tolist(), the results should be the same

NOTE: this self. _stride_per_key_per_rank.tolist() tensor should always be on CPU since it's effective the meta data of a KJT. For generic torch APIs like .to(...), record_stream(), etc. should in general avoid altering this variable.

Differential Revision: D74366343

facebook-github-bot added the CLA Signed label

Contributor

facebook-github-bot commented May 8, 2025

This pull request was exported from Phabricator. Differential Revision: D74366343

facebook-github-bot added the fb-exported label

TroyGarden added a commit to TroyGarden/torchrec that referenced this pull request


          convert stride_per_key_per_rank to tensor inside KJT (pytorch#2959)

1ce1f10

Summary:

# context
* this diff is part of the "variable-batch KJT refactoring" project ([doc](https://fburl.com/gdoc/svfysfai))
* previously the `stride_per_key_per_rank` variable is `List[List[int]] | None` which can't be handled correctly in PT2 IR (torch.export)
* this change makes the KJT class variable `_stride_per_key_per_rank` as `torch.IntTensor | None` so it would be compatible with PT2 IR.

# equivalency
* to check if `self._stride_per_key_per_rank` is `None`
this logic is used to differentiate variable_batch case, and should have the same behavior after this diff
* to use `self._stride_per_key_per_rank` as `List[List[int]]`
most of the callsite use the function to get the list: `def stride_per_key_per_rank(self) -> List[List[int]]:`, and this function is modified to covert the `torch.IntTensor` to list as ` _stride_per_key_per_rank.tolist()`, the results should be the same

NOTE: this `self. _stride_per_key_per_rank.tolist()` tensor should always be on CPU since it's effective the meta data of a KJT. For generic torch APIs like `.to(...)`, `record_stream()`, etc. should in general avoid altering this variable.

Differential Revision: D74366343

TroyGarden force-pushed the export-D74366343 branch from d9ba5cf to 1ce1f10 Compare

May 8, 2025 18:43

Contributor

facebook-github-bot commented May 8, 2025

This pull request was exported from Phabricator. Differential Revision: D74366343

TroyGarden force-pushed the export-D74366343 branch from 1ce1f10 to c195982 Compare

May 8, 2025 21:21

TroyGarden added a commit to TroyGarden/torchrec that referenced this pull request


          convert stride_per_key_per_rank to tensor inside KJT (pytorch#2959)

c195982

Summary:

# context
* this diff is part of the "variable-batch KJT refactoring" project ([doc](https://fburl.com/gdoc/svfysfai))
* previously the `stride_per_key_per_rank` variable is `List[List[int]] | None` which can't be handled correctly in PT2 IR (torch.export)
* this change makes the KJT class variable `_stride_per_key_per_rank` as `torch.IntTensor | None` so it would be compatible with PT2 IR.

# equivalency
* to check if `self._stride_per_key_per_rank` is `None`
this logic is used to differentiate variable_batch case, and should have the same behavior after this diff
* to use `self._stride_per_key_per_rank` as `List[List[int]]`
most of the callsite use the function to get the list: `def stride_per_key_per_rank(self) -> List[List[int]]:`, and this function is modified to covert the `torch.IntTensor` to list as ` _stride_per_key_per_rank.tolist()`, the results should be the same

NOTE: this `self. _stride_per_key_per_rank.tolist()` tensor should always be on CPU since it's effective the meta data of a KJT. For generic torch APIs like `.to(...)`, `record_stream()`, etc. should in general avoid altering this variable.

Reviewed By: jd7-tr

Differential Revision: D74366343

Contributor

facebook-github-bot commented May 8, 2025

This pull request was exported from Phabricator. Differential Revision: D74366343

TroyGarden added a commit to TroyGarden/torchrec that referenced this pull request


          convert stride_per_key_per_rank to tensor inside KJT (pytorch#2959)

7b44f11

Summary:

# context
* this diff is part of the "variable-batch KJT refactoring" project ([doc](https://fburl.com/gdoc/svfysfai))
* previously the `stride_per_key_per_rank` variable is `List[List[int]] | None` which can't be handled correctly in PT2 IR (torch.export)
* this change makes the KJT class variable `_stride_per_key_per_rank` as `torch.IntTensor | None` so it would be compatible with PT2 IR.

# equivalency
* to check if `self._stride_per_key_per_rank` is `None`
this logic is used to differentiate variable_batch case, and should have the same behavior after this diff
* to use `self._stride_per_key_per_rank` as `List[List[int]]`
most of the callsite use the function to get the list: `def stride_per_key_per_rank(self) -> List[List[int]]:`, and this function is modified to covert the `torch.IntTensor` to list as ` _stride_per_key_per_rank.tolist()`, the results should be the same

NOTE: this `self. _stride_per_key_per_rank.tolist()` tensor should always be on CPU since it's effective the meta data of a KJT. For generic torch APIs like `.to(...)`, `record_stream()`, etc. should in general avoid altering this variable.

Reviewed By: jd7-tr

Differential Revision: D74366343

TroyGarden force-pushed the export-D74366343 branch from c195982 to 7b44f11 Compare

May 8, 2025 23:02

Contributor

facebook-github-bot commented May 8, 2025

This pull request was exported from Phabricator. Differential Revision: D74366343

TroyGarden force-pushed the export-D74366343 branch from 7b44f11 to bae3f97 Compare

May 12, 2025 00:13

TroyGarden added a commit to TroyGarden/torchrec that referenced this pull request


          convert stride_per_key_per_rank to tensor inside KJT (pytorch#2959)

bae3f97

Summary:

# context
* this diff is part of the "variable-batch KJT refactoring" project ([doc](https://fburl.com/gdoc/svfysfai))
* previously the `stride_per_key_per_rank` variable is `List[List[int]] | None` which can't be handled correctly in PT2 IR (torch.export)
* this change makes the KJT class variable `_stride_per_key_per_rank` as `torch.IntTensor | None` so it would be compatible with PT2 IR.

# equivalency
* to check if `self._stride_per_key_per_rank` is `None`
this logic is used to differentiate variable_batch case, and should have the same behavior after this diff
* to use `self._stride_per_key_per_rank` as `List[List[int]]`
most of the callsite use the function to get the list: `def stride_per_key_per_rank(self) -> List[List[int]]:`, and this function is modified to covert the `torch.IntTensor` to list as ` _stride_per_key_per_rank.tolist()`, the results should be the same

NOTE: this `self. _stride_per_key_per_rank.tolist()` tensor should always be on CPU since it's effective the meta data of a KJT. For generic torch APIs like `.to(...)`, `record_stream()`, etc. should in general avoid altering this variable.

Differential Revision: D74366343

Contributor

facebook-github-bot commented May 12, 2025

This pull request was exported from Phabricator. Differential Revision: D74366343

TroyGarden added a commit to TroyGarden/torchrec that referenced this pull request


          convert stride_per_key_per_rank to tensor inside KJT (pytorch#2959)

808901b

Summary:

# context
* this diff is part of the "variable-batch KJT refactoring" project ([doc](https://fburl.com/gdoc/svfysfai))
* previously the `stride_per_key_per_rank` variable is `List[List[int]] | None` which can't be handled correctly in PT2 IR (torch.export)
* this change makes the KJT class variable `_stride_per_key_per_rank` as `torch.IntTensor | None` so it would be compatible with PT2 IR.

# equivalency
* to check if `self._stride_per_key_per_rank` is `None`
this logic is used to differentiate variable_batch case, and should have the same behavior after this diff
* to use `self._stride_per_key_per_rank` as `List[List[int]]`
most of the callsite use the function to get the list: `def stride_per_key_per_rank(self) -> List[List[int]]:`, and this function is modified to covert the `torch.IntTensor` to list as ` _stride_per_key_per_rank.tolist()`, the results should be the same

NOTE: this `self. _stride_per_key_per_rank.tolist()` tensor should always be on CPU since it's effective the meta data of a KJT. For generic torch APIs like `.to(...)`, `record_stream()`, etc. should in general avoid altering this variable.

Differential Revision: D74366343

TroyGarden force-pushed the export-D74366343 branch from bae3f97 to 808901b Compare

May 12, 2025 03:13

Contributor

facebook-github-bot commented May 12, 2025

This pull request was exported from Phabricator. Differential Revision: D74366343

TroyGarden force-pushed the export-D74366343 branch from 808901b to 5a71b86 Compare

May 12, 2025 04:54

TroyGarden added a commit to TroyGarden/torchrec that referenced this pull request


          convert stride_per_key_per_rank to tensor inside KJT (pytorch#2959)

5a71b86

Summary:

# context
* this diff is part of the "variable-batch KJT refactoring" project ([doc](https://fburl.com/gdoc/svfysfai))
* previously the `stride_per_key_per_rank` variable is `List[List[int]] | None` which can't be handled correctly in PT2 IR (torch.export)
* this change makes the KJT class variable `_stride_per_key_per_rank` as `torch.IntTensor | None` so it would be compatible with PT2 IR.

# equivalency
* to check if `self._stride_per_key_per_rank` is `None`
this logic is used to differentiate variable_batch case, and should have the same behavior after this diff
* to use `self._stride_per_key_per_rank` as `List[List[int]]`
most of the callsite use the function to get the list: `def stride_per_key_per_rank(self) -> List[List[int]]:`, and this function is modified to covert the `torch.IntTensor` to list as ` _stride_per_key_per_rank.tolist()`, the results should be the same

NOTE: this `self. _stride_per_key_per_rank.tolist()` tensor should always be on CPU since it's effective the meta data of a KJT. For generic torch APIs like `.to(...)`, `record_stream()`, etc. should in general avoid altering this variable.

Differential Revision: D74366343

Contributor

facebook-github-bot commented May 12, 2025

This pull request was exported from Phabricator. Differential Revision: D74366343

jd7-tr pushed a commit to jd7-tr/torchrec that referenced this pull request


          convert stride_per_key_per_rank to tensor inside KJT (pytorch#2959)

61cc93b

Summary:
Pull Request resolved: pytorch#2959

# context
* this diff is part of the "variable-batch KJT refactoring" project ([doc](https://fburl.com/gdoc/svfysfai))
* previously the `stride_per_key_per_rank` variable is `List[List[int]] | None` which can't be handled correctly in PT2 IR (torch.export)
* this change makes the KJT class variable `_stride_per_key_per_rank` as `torch.IntTensor | None` so it would be compatible with PT2 IR.

# equivalency
* to check if `self._stride_per_key_per_rank` is `None`
this logic is used to differentiate variable_batch case, and should have the same behavior after this diff
* to use `self._stride_per_key_per_rank` as `List[List[int]]`
most of the callsite use the function to get the list: `def stride_per_key_per_rank(self) -> List[List[int]]:`, and this function is modified to covert the `torch.IntTensor` to list as ` _stride_per_key_per_rank.tolist()`, the results should be the same

NOTE: this `self. _stride_per_key_per_rank.tolist()` tensor should always be on CPU since it's effective the meta data of a KJT. For generic torch APIs like `.to(...)`, `record_stream()`, etc. should in general avoid altering this variable.

Differential Revision: D74366343

Reviewed By: jd7-tr

TroyGarden added a commit to TroyGarden/torchrec that referenced this pull request


          convert stride_per_key_per_rank to tensor inside KJT (pytorch#2959)

bd7bf5e

Summary:

# context
* this diff is part of the "variable-batch KJT refactoring" project ([doc](https://fburl.com/gdoc/svfysfai))
* previously the `stride_per_key_per_rank` variable is `List[List[int]] | None` which can't be handled correctly in PT2 IR (torch.export)
* this change makes the KJT class variable `_stride_per_key_per_rank` as `torch.IntTensor | None` so it would be compatible with PT2 IR.

# equivalency
* to check if `self._stride_per_key_per_rank` is `None`
this logic is used to differentiate variable_batch case, and should have the same behavior after this diff
* to use `self._stride_per_key_per_rank` as `List[List[int]]`
most of the callsite use the function to get the list: `def stride_per_key_per_rank(self) -> List[List[int]]:`, and this function is modified to covert the `torch.IntTensor` to list as ` _stride_per_key_per_rank.tolist()`, the results should be the same

NOTE: currently this `self._stride_per_key_per_rank` tensor is always on CPU since it's effective the meta data of a KJT. However, ideally it should be on GPU side since it's after input_dist and we'll should avoid move it to cpu unless really need it.

Reviewed By: jd7-tr

Differential Revision: D74366343

TroyGarden force-pushed the export-D74366343 branch from 5a71b86 to bd7bf5e Compare

May 13, 2025 18:34

Contributor

facebook-github-bot commented May 13, 2025

This pull request was exported from Phabricator. Differential Revision: D74366343

TroyGarden added a commit to TroyGarden/torchrec that referenced this pull request


          convert stride_per_key_per_rank to tensor inside KJT (pytorch#2959)

f2bdf5e

Summary:

# context
* this diff is part of the "variable-batch KJT refactoring" project ([doc](https://fburl.com/gdoc/svfysfai))
* previously the `stride_per_key_per_rank` variable is `List[List[int]] | None` which can't be handled correctly in PT2 IR (torch.export)
* this change makes the KJT class variable `_stride_per_key_per_rank` as `torch.IntTensor | None` so it would be compatible with PT2 IR.

# equivalency
* to check if `self._stride_per_key_per_rank` is `None`
this logic is used to differentiate variable_batch case, and should have the same behavior after this diff
* to use `self._stride_per_key_per_rank` as `List[List[int]]`
most of the callsite use the function to get the list: `def stride_per_key_per_rank(self) -> List[List[int]]:`, and this function is modified to covert the `torch.IntTensor` to list as ` _stride_per_key_per_rank.tolist()`, the results should be the same

NOTE: currently this `self._stride_per_key_per_rank` tensor is always on CPU since it's effective the meta data of a KJT. However, ideally it should be on GPU side since it's after input_dist and we'll should avoid move it to cpu unless really need it.

Reviewed By: jd7-tr

Differential Revision: D74366343

TroyGarden force-pushed the export-D74366343 branch from bd7bf5e to f2bdf5e Compare

May 21, 2025 05:25

Contributor

facebook-github-bot commented May 21, 2025

This pull request was exported from Phabricator. Differential Revision: D74366343

TroyGarden added a commit to TroyGarden/torchrec that referenced this pull request


          convert stride_per_key_per_rank to tensor inside KJT (pytorch#2959)

5d2a5a1

Summary:

# context
* this diff is part of the "variable-batch KJT refactoring" project ([doc](https://fburl.com/gdoc/svfysfai))
* previously the `stride_per_key_per_rank` variable is `List[List[int]] | None` which can't be handled correctly in PT2 IR (torch.export)
* this change makes the KJT class variable `_stride_per_key_per_rank` as `torch.IntTensor | None` so it would be compatible with PT2 IR.

# equivalency
* to check if `self._stride_per_key_per_rank` is `None`
this logic is used to differentiate variable_batch case, and should have the same behavior after this diff
* to use `self._stride_per_key_per_rank` as `List[List[int]]`
most of the callsite use the function to get the list: `def stride_per_key_per_rank(self) -> List[List[int]]:`, and this function is modified to covert the `torch.IntTensor` to list as ` _stride_per_key_per_rank.tolist()`, the results should be the same

NOTE: currently this `self._stride_per_key_per_rank` tensor is always on CPU since it's effective the meta data of a KJT. However, ideally it should be on GPU side since it's after input_dist and we'll should avoid move it to cpu unless really need it.

Reviewed By: jd7-tr

Differential Revision: D74366343

TroyGarden force-pushed the export-D74366343 branch from f2bdf5e to 5d2a5a1 Compare

May 29, 2025 05:30

Contributor

facebook-github-bot commented May 29, 2025

This pull request was exported from Phabricator. Differential Revision: D74366343

TroyGarden force-pushed the export-D74366343 branch from 5d2a5a1 to 31c9ba1 Compare

May 29, 2025 05:33

TroyGarden added a commit to TroyGarden/torchrec that referenced this pull request


          convert stride_per_key_per_rank to tensor inside KJT (pytorch#2959)

31c9ba1

Summary:

# context
* this diff is part of the "variable-batch KJT refactoring" project ([doc](https://fburl.com/gdoc/svfysfai))
* previously the `stride_per_key_per_rank` variable is `List[List[int]] | None` which can't be handled correctly in PT2 IR (torch.export)
* this change makes the KJT class variable `_stride_per_key_per_rank` as `torch.IntTensor | None` so it would be compatible with PT2 IR.

# equivalency
* to check if `self._stride_per_key_per_rank` is `None`
this logic is used to differentiate variable_batch case, and should have the same behavior after this diff
* to use `self._stride_per_key_per_rank` as `List[List[int]]`
most of the callsite use the function to get the list: `def stride_per_key_per_rank(self) -> List[List[int]]:`, and this function is modified to covert the `torch.IntTensor` to list as ` _stride_per_key_per_rank.tolist()`, the results should be the same

NOTE: currently this `self._stride_per_key_per_rank` tensor is always on CPU since it's effective the meta data of a KJT. However, ideally it should be on GPU side since it's after input_dist and we'll should avoid move it to cpu unless really need it.

Reviewed By: jd7-tr

Differential Revision: D74366343

Contributor

facebook-github-bot commented May 29, 2025

This pull request was exported from Phabricator. Differential Revision: D74366343

TroyGarden added 2 commits

May 28, 2025 22:59


          simplify the KJT.split function when segment is the original KJT (pyt…

9bec871

…orch#3014)

Summary:

# context
* in KJT.split function, when the segment == len(keys), the returned KJT contains the same data as the original KJT
* however in the function it recreates a new one which introduces extra cost
* this diff remove the redundent KJT creation

# analysis
* when segment == len(keys), start has to be zero so the stride_per_key_per_rank is the original one.
* the following KJT init produces the same KJT as self
```
KeyedJaggedTensor(
    keys=self._keys,
    values=self._values,
    weights=self.weights_or_none(),
    lengths=self._lengths,
    offsets=self._offsets,
    stride=self._stride,
    stride_per_key_per_rank=stride_per_key_per_rank,
    stride_per_key=None,
    length_per_key=self._length_per_key,
    lengths_offset_per_key=None,
    offset_per_key=self._offset_per_key,
    index_per_key=self._index_per_key,
    jt_dict=self._jt_dict,
    inverse_indices=None,
)
```

Reviewed By: iamzainhuda

Differential Revision: D70756397


          convert stride_per_key_per_rank to tensor inside KJT (pytorch#2959)

d54cb57

Summary:

# context
* this diff is part of the "variable-batch KJT refactoring" project ([doc](https://fburl.com/gdoc/svfysfai))
* previously the `stride_per_key_per_rank` variable is `List[List[int]] | None` which can't be handled correctly in PT2 IR (torch.export)
* this change makes the KJT class variable `_stride_per_key_per_rank` as `torch.IntTensor | None` so it would be compatible with PT2 IR.

# equivalency
* to check if `self._stride_per_key_per_rank` is `None`
this logic is used to differentiate variable_batch case, and should have the same behavior after this diff
* to use `self._stride_per_key_per_rank` as `List[List[int]]`
most of the callsite use the function to get the list: `def stride_per_key_per_rank(self) -> List[List[int]]:`, and this function is modified to covert the `torch.IntTensor` to list as ` _stride_per_key_per_rank.tolist()`, the results should be the same

NOTE: currently this `self._stride_per_key_per_rank` tensor is always on CPU since it's effective the meta data of a KJT. However, ideally it should be on GPU side since it's after input_dist and we'll should avoid move it to cpu unless really need it.

Reviewed By: jd7-tr

Differential Revision: D74366343

TroyGarden force-pushed the export-D74366343 branch from 31c9ba1 to d54cb57 Compare

May 29, 2025 05:59

Contributor

facebook-github-bot commented May 29, 2025

This pull request was exported from Phabricator. Differential Revision: D74366343

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed fb-exported