Version 3 Implementation of Neural Network service leveraging pytorch distributed data platform (ddp)
This repository demonstrates same key concepts in neural networks as in penr-oz-neural-network-v2-torch-nn with automatic gradient descent calculations relying on PyTorch library leveraging Neural Network (nn) package, Distributed Data Parallel (ddp) feature to support scaling to multiple GPU (CUDA) devices, and changes to API to support downloading/sharding training data for local read instead of an API payload.
Implementation follows:
The gradients are automatically computed using PyTorch and the PyTorch Neural Network package
This is done by leveraging PyTorch Distributed Data Parallel
-
Clone the Repository:
git clone https://github.com/derinworks/penr-oz-neural-network-v3-torch-ddp.git cd penr-oz-neural-network-v3-torch-ddp -
Create and Activate a Virtual Environment:
- Install python 3.10
$ python3 --version Python 3.10.18
- Create:
python3 -m venv venv
- Activate:
- On Unix or macOS:
source venv/bin/activate - On Windows:
venv\Scripts\activate
- On Unix or macOS:
- Install python 3.10
-
Install Dependencies:
pip install -r requirements.txt
-
Run the Service:
python main.py
or
uvicorn main:app --log-config log_config.json
-
Interact with the Service Test the endpoints using Swagger at http://127.0.0.1:8000/docs.
-
Interact with the Dashboard Diagnose model training at http://127.0.0.1:8000/dashboard.
-
Quickly spin up the Service in a brand-new Linux VM
./run-in-vm.sh
To ensure code quality and maintainability, follow these steps to run tests and check code coverage:
-
Run All Tests:
python -m pytest -v
The test suite includes 114 tests across 7 test files:
test_main.py- API endpoint tests (29 tests)test_neural_net_model.py- Model implementation tests (43 tests)test_neural_net_layers.py- Custom layer tests (12 tests)test_loaders.py- Dataset loader/downloader tests (7 tests)test_mappers.py- Layer/optimizer mapper tests (3 tests)test_tokenizers.py- Tokenization tests (4 tests)test_ddp.py- Distributed training tests (16 tests)
-
Run Tests with Coverage: Execute the following commands to run tests and generate a coverage report:
coverage run -m pytest coverage report
-
Generate HTML Coverage Report (Optional): For a detailed coverage report in HTML format:
coverage html
Open the
htmlcov/index.htmlfile in a web browser to view the report.
Some tests require Linux-specific features (e.g., /dev/shm for shared memory caching) and will be automatically skipped on macOS/Windows:
test_train_*- Full training integration tests with model persistencetest_cache_miss- Shared memory cache behaviortest_delete- Model deletion with shared memory cleanup
These tests will run automatically on Linux systems where /dev/shm is available.