30 Jan 19:33

2c5f6ee

v1.1.0 Latest

Latest

New Features

Grid-based Sharding For EBC

This is a form of CW sharding and then TWRW sharding the respective CW shards. One of the key changes is how the metadata from sharding placements is constructed in grid sharding. We leverage the concept of per_node from TWRW and combine it with the permutations and concatenation required in CW. Pull Request #2445

Re-shardable Hash Zch

Fully reshardable ZCH: we can handle any value which is in common with default value (768) so WS 1,2,4,8,16,24,32,48,64,96,128, etc. and go up and down. Pull Request #2538

TorchRec 2D Parallel

In this diff we introduce a new parallelism strategy for scaling recommendation model training called 2D parallel. In this case, we scale model parallel through data parallel, hence, the 2D name. Our new entry point, DMPCollection, subclasses DMP and is meant to be a drop in replacement to integrate 2D parallelism in distributed training. By setting the total number of GPUs to train across and the number of GPUs to locally shard across (aka one replication group), users can train their models in the same training loop but now over a larger number of GPUs. The current implementation shards the model such that, for a given shard, its replicated shards lie on the ranks within the node. This significantly improves the performance of the all-reduce communication (parameter sync) by utilizing intra-node bandwidth. Under this scheme the supported sharding types are RW, CW, and GRID. TWRW is not supported due to no longer being able to take advantage of the intra node bandwidth in the 2D scheme. Pull Request #2554

Changelog

torch.compile compatibility support: #2381, #2475, #2583

torch.export module support: #2388, #2390, #2393,

DTensor improvement: #2585, #2626

Assets 2

24 Jan 17:27

PaulZhang12

v1.1.0-rc3

0ce7cc6

v1.1.0-rc3 Pre-release

Pre-release

Remove TensorDict from requirements

Assets 2

23 Jan 20:34

PaulZhang12

v1.1.0-rc2

3441ac3

v1.1.0-rc2 Pre-release

Pre-release

Revert "Revert "add NJT/TD support in test data generator (#2528)""

This reverts commit e5e25650679a4c9d1655390ddf8be28169dd5981.

Assets 2

22 Jan 18:46

TroyGarden

v1.1.0-rc1

c39bd60

v1.1.0-rc1 Pre-release

Pre-release

first release candidate for v1.1.0

Assets 2

21 Oct 22:05

PaulZhang12

v1.0.0-rc3

d3bb883

v1.0.0

TorchRec 1.0.0 Stable Release Notes

We are excited to announce the release of TorchRec 1.0.0, the official stable release for TorchRec! This release is done in conjunction with FBGEMM 1.0.0 release.

Stable Release

Core portions of the TorchRec library are now marked as stable, with the following guarantees:

Backward compatability guarantees, with breaking changes announced two versions ahead of time.
Enhanced documentation for all stable features of TorchRec
Functionality guarantees through unit test frameworks for each stable feature, running on every merged PR and release.
No performance guarantees. However, we are committed to providing support on a best-effort basis.

Improvements

Key improvements have been added to TorchRec for the reliability and UX of the library as part of the stable release. The following main features have been added as part of the stable release:

TorchRec's documentation has been completely revamped, with added overview of how TorchRec works, high level architecture and concepts, and simplification of the API references for TorchRec stable features! Check out the new documentation here.
The TorchRec tutorial on pytorch.org has been completely redone, with a new, comprehensive end-to-end tutorial of TorchRec highlighting all the stable features! Check it out the new tutorial here.
A unit test framework for API compatability have been added under torchrec/schema to test compatibility for all TorchRec stable features.
The unit test CI for TorchRec on Github has been enabled on GPU, running on nightly versions of TorchRec and manually validated during release time.

Changelog

Revamped TorchRec inference solution with torch.fx and TorchScript #2101

Faster KJT init #2369

Improvements to TorchRec Train Pipeline #2363 #2352 #2324 #2149 #2181

PT2 Dynamo and Inductor compatibility work with TorchRec and train pipeline #2108 #2125 #2130 #2141 #2151 #2152 #2162 #2176 #2178 #2228 #2310

VBE improvements #2366 #2127 #2215 #2216 #2256

Replace ShardedTensor with DTensor #2147 #2167 #2169

Enable pruning of embedding tables in TorchRec inference #2343

torch.export compatibility support with TorchRec data types and modules #2166 #2174 #2067 #2197 #2195 #2203 #2246 #2250 #1900

Added benchmarking for TorchRec modules #2139 #2145

Much more optimized KeyedTensor regroup with custom module KTRegroupAsDict with benchmarked results #2120 #2158 #2210

Overlap comms on backwards pass #2117

OSS quality of life improvements #2273

Assets 2

08 Oct 23:46

PaulZhang12

v1.0.0-rc2

92ac355

v1.0.0-rc2 Pre-release

Pre-release

2nd Release candidate for v1.0.0

Assets 2

12 Sep 17:52

PaulZhang12

v1.0.0-rc1

c11d5bf

v1.0.0-rc1 Pre-release

Pre-release

Release candidate 1 for stable release

Assets 2

23 Jul 18:00

PaulZhang12

v0.8.0

9264186

v0.8.0

New Features

In Training Embedding Pruning (ITEP) for more efficient RecSys training

Provides a representation of In-Training Embedding Pruning, which is used internally at Meta for more efficient RecSys training by decreasing memory footprint of embedding tables. Pull Request: #2074 introduces the modules into TorchRec, with tests showing how to use them.

Mean Pooling

Mean pooling enabled on embeddings for row-wise and table-row-wise sharding types in TorchRec. Mean pooling mode done through TBE (table-batched embedding) won’t be accurate for row-wise and table-row-wise sharding types, which modify the input due to sharding. This feature efficiently calculates the divisor using caching and overlapping in input dist to implement mean pooling, which had proved to be much more performant than out-of-library implementations. PR: #1772

Changelog

Torch.export (non-strict) compatibility with KJT/JT/KT, EBC/Quantized EBC, sharded variants #1815 #1816 #1788 #1850 #1976 and dynamic shapes #2058

torch.compile support with TorchRec #2045 #2018 #1979

TorchRec serialization with non-strict torch.export for regenerating eager sparse modules (EBC) from IR for sharding #1860 #1848 with meta functionalization when torch.exporting #1974

More benchmarking for TorchRec modules/data types #2094 #2033 #2001 #1855

More VBE support (data parallel sharding) #2093 (EmbeddingCollection) #2047 #1849

RegroupAsDict module for performance improvements with caching #2007

Train Pipeline improvements #1967 #1969 #1971

Bug Fixes and library improvements

Assets 2

17 Jun 13:44

PaulZhang12

v0.8.0-rc1

67fb709

v0.8.0-rc1 Pre-release

Pre-release

Update setup and version for release 0.8.0

Assets 2

25 Apr 01:39

PaulZhang12

v0.7.0

8a8acfa

v0.7.0

No major features in this release

Changelog

Expanding out ZCH/MCH
Increased support with Torch Dynamo/Export
Distributed Benchmarking introduced under torchrec/distributed/benchmarks for inference and training
VBE optimizations
TWRW support for VBE (I think this happened in the last release, Josh can confirm)
Generalized train_pipeline for different pipeline stage overlapping
Autograd support for traceable collectives
Output dtype support for embeddings
Dynamo tracing for sharded embedding modules
Bug fixes

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New Features

Grid-based Sharding For EBC

Re-shardable Hash Zch

TorchRec 2D Parallel

Changelog

TorchRec 1.0.0 Stable Release Notes

Stable Release

Improvements

Changelog

New Features

In Training Embedding Pruning (ITEP) for more efficient RecSys training

Mean Pooling

Changelog

No major features in this release

Changelog

Releases: pytorch/torchrec

v1.1.0

New Features

Grid-based Sharding For EBC

Re-shardable Hash Zch

TorchRec 2D Parallel

Changelog

v1.1.0-rc3

v1.1.0-rc2

v1.1.0-rc1

v1.0.0

TorchRec 1.0.0 Stable Release Notes

Stable Release

Improvements

Changelog

v1.0.0-rc2

v1.0.0-rc1

v0.8.0

New Features

In Training Embedding Pruning (ITEP) for more efficient RecSys training

Mean Pooling

Changelog

v0.8.0-rc1

v0.7.0

No major features in this release

Changelog