Releases: DeepRec-AI/HybridBackend
HybridBackend 1.0.0
Objectives:
- Memory-efficient loading of categorical data
- Communication-efficient training and evaluation at scale
- Easy to use with existing AI workflows
Features:
-
Performance:
- Support ORC format in data loading.
- Support data deduplication.
- Improve performance of data transfer.
- Improve performance of loading and shuffling string data.
- Support workers with unbalanced training data via SyncReplicasDataset.
- Support pipeline-based semi-synchronous training.
- Support a hierarchical embedding lookup. -
Usability
- Support standalone evaluation and prediction APIs of estimator and keras. -
Bugfixes:
- Fix shape calculation oftf.feature_column.shared_embeddings
HybridBackend 0.8.0
Objectives:
- Memory-efficient loading of categorical data
- Communication-efficient training and evaluation at scale
Features:
-
Performance
- Support of automatic embedding fusion on PAI DLC / PAI DSW
- Support of row-wise shuffling
- Improves data transfer prefetching -
Usability
- Support ofembedding_lookup_*
API
- Support of new composable Dataset API
HybridBackend v0.7.0
Objectives:
- Memory-efficient loading of categorical data
- GPU-efficient orchestration of embedding layers
- Communication-efficient training and evaluation at scale
- Easy to use with existing AI workflows
Features:
-
Performance
- Support of data transfer prefetching -
Usability
- Support of Keras Model API
- Support direct pip install via Pypi
HybridBackend 0.5.4
Objectives:
- Easy to use with existing AI workflows
Features:
- Support fixed length list in ParquetDataset
- Support schema parsing in ParquetDataset
- Provide validation tools for parquet files
Bug Fixes:
- Fixes indices calculation in rebatching
HybridBackend v0.6.0
Objectives:
- Communication-efficient training and evaluation at scale
- Easy to use with existing AI workflows
Features:
-
Data-Parallel Training and Evaluation
- Bucketized Gradients Aggregation using AllReduce
- Global Metric Operations
- Out-Of-Range Coordination -
Hybrid-Parallel Embedding Learning
- Bucketized Embedding Exchanging using AllToAllv
- Fusion and Quantization of AllToAllv
- Fusion of Partitioning and Stitching -
Usability
- Support of MonitoredSession and Estimator
- Declarative API for Model Definition -
Compatibility
- Support of NVIDIA TensorFlow and DeepRec -
Interoperability
- Inference Pipeline Needs No Change
- Support of SavedModel
- Support of Variable, XDL HashTable and PAI Embedding Variable
Bug Fixes:
[#46] Fixes rebatching in ParquetDataset.
HybridBackend v0.5.3
Objectives:
- Easy to use with existing AI workflows
Features:
- Support working with GPU
- Support building on macOS
HybridBackend v0.5.2
Objectives:
- Memory-efficient loading of categorical data
- Easy to use with existing AI workflows
Features:
-
Parquet Dataset
- Reading batch of tensors from numeric fields in zero-copy way
- Reading batch of sparse tensors from numeric list fields in zero-copy way
- Support of string fields
- Support of local filesystem, HDFS, S3 and OSS -
Data Pipeline Functions
- Resizing batch of tensors and ragged tensors
- Converting ragged tensors to sparse tensors
- Objective: "Easy to use with existing AI workflows" -
Compatibility
- Support of TensorFlow 1.15 and Tensorflow 1.14
- GitHub actions for uploading wheels to PyPI
Bug Fixes: