Skip to content

Version 3 Implementation of Neural Network service leveraging pytorch distributed data platform (ddp)

License

Notifications You must be signed in to change notification settings

derinworks/penr-oz-neural-network-v3-torch-ddp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

penr-oz-neural-network-v3-torch-ddp

Version 3 Implementation of Neural Network service leveraging pytorch distributed data platform (ddp)

This repository demonstrates same key concepts in neural networks as in penr-oz-neural-network-v2-torch-nn with automatic gradient descent calculations relying on PyTorch library leveraging Neural Network (nn) package, Distributed Data Parallel (ddp) feature to support scaling to multiple GPU (CUDA) devices, and changes to API to support downloading/sharding training data for local read instead of an API payload.

Implementation follows:

Backpropagation: Auto Gradient Calculation

The gradients are automatically computed using PyTorch and the PyTorch Neural Network package

Scaling Calculation Speed and Concurrency

This is done by leveraging PyTorch Distributed Data Parallel

Quickstart Guide

  1. Clone the Repository:

    git clone https://github.com/derinworks/penr-oz-neural-network-v3-torch-ddp.git
    cd penr-oz-neural-network-v3-torch-ddp
  2. Create and Activate a Virtual Environment:

    • Install python 3.10
      $ python3 --version
      Python 3.10.18
    • Create:
      python3 -m venv venv
    • Activate:
      • On Unix or macOS:
        source venv/bin/activate
      • On Windows:
        venv\Scripts\activate
  3. Install Dependencies:

    pip install -r requirements.txt
  4. Run the Service:

    python main.py

    or

    uvicorn main:app --log-config log_config.json
  5. Interact with the Service Test the endpoints using Swagger at http://127.0.0.1:8000/docs.

  6. Interact with the Dashboard Diagnose model training at http://127.0.0.1:8000/dashboard.

  7. Quickly spin up the Service in a brand-new Linux VM

    ./run-in-vm.sh

Testing and Coverage

To ensure code quality and maintainability, follow these steps to run tests and check code coverage:

  1. Run All Tests:

    python -m pytest -v

    The test suite includes 114 tests across 7 test files:

    • test_main.py - API endpoint tests (29 tests)
    • test_neural_net_model.py - Model implementation tests (43 tests)
    • test_neural_net_layers.py - Custom layer tests (12 tests)
    • test_loaders.py - Dataset loader/downloader tests (7 tests)
    • test_mappers.py - Layer/optimizer mapper tests (3 tests)
    • test_tokenizers.py - Tokenization tests (4 tests)
    • test_ddp.py - Distributed training tests (16 tests)
  2. Run Tests with Coverage: Execute the following commands to run tests and generate a coverage report:

    coverage run -m pytest
    coverage report
  3. Generate HTML Coverage Report (Optional): For a detailed coverage report in HTML format:

    coverage html

    Open the htmlcov/index.html file in a web browser to view the report.

Platform-Specific Tests

Some tests require Linux-specific features (e.g., /dev/shm for shared memory caching) and will be automatically skipped on macOS/Windows:

  • test_train_* - Full training integration tests with model persistence
  • test_cache_miss - Shared memory cache behavior
  • test_delete - Model deletion with shared memory cleanup

These tests will run automatically on Linux systems where /dev/shm is available.

About

Version 3 Implementation of Neural Network service leveraging pytorch distributed data platform (ddp)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •