Distributed Training Practice

Hands-on PyTorch examples to learn and demonstrate (multi-)GPU model training and inference.

1. Distributed Data Parallel (DDP) Training with PyTorch

The notebook provides a concise and practical walkthrough of implementing Distributed Data Parallel (DDP) training using PyTorch. DDP is an efficient way to scale model training across multiple GPUs, offering near-linear speedup and better performance compared to DataParallel.

2. Mini-GPT with multi-GPU training

This project demonstrates how to train a character-level GPT model on the Tiny Shakespeare dataset using PyTorch with support for multi-GPU distributed training via DistributedDataParallel (DDP). It is designed as a practical, modular framework for experimenting with distributed deep learning techniques and efficient training setups.

Technical Architecture

3. SGLang Demo

This code runs Qwen2-VL-7B-Instruct, a vision-language model (VLM), using the SGLang framework on Modal, a serverless GPU cloud platform. It allows users to ask natural language questions about images via an HTTP API, and the model returns text-based answers by understanding both the image and the question.

                       ┌────────────────────────────┐
                       │     User / Client (API)    │
                       │----------------------------│
                       │ Sends POST request:        │
                       │ {                          │
                       │   "image_url": "...",      │
                       │   "question": "What is..." │
                       │ }                          │
                       └────────────┬───────────────┘
                                    │
                                    ▼
                       ┌────────────────────────────┐
                       │   Modal FastAPI Endpoint   │
                       │  (Model.generate method)   │
                       └────────────┬───────────────┘
                                    │
       Downloads image from URL     │
       ────────────────────────────►│
                                    │
                                    ▼
                  ┌────────────────────────────────────┐
                  │     SGLang Runtime (Qwen2-VL)      │
                  │------------------------------------│
                  │  1. Load model & tokenizer         │
                  │  2. Format prompt using template   │
                  │  3. Attach image + user question   │
                  │  4. Generate assistant response    │
                  └────────────────────┬───────────────┘
                                       │
                                       ▼
                  ┌──────────────────────────────────┐
                  │  Response: Textual answer to     │
                  │  the visual question             │
                  └──────────────────────────────────┘
                                       │
                                       ▼
                       ┌────────────────────────────┐
                       │    User / Client receives  │
                       │    JSON answer string      │
                       └────────────────────────────┘

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
__pycache__		__pycache__
data		data
Distributed_Data_Parallel.ipynb		Distributed_Data_Parallel.ipynb
README.md		README.md
char_dataset.py		char_dataset.py
gpt2_train_cfg.yaml		gpt2_train_cfg.yaml
main.py		main.py
model.py		model.py
requirements.txt		requirements.txt
sgl_vlm.py		sgl_vlm.py
trainer.py		trainer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Distributed Training Practice

1. Distributed Data Parallel (DDP) Training with PyTorch

2. Mini-GPT with multi-GPU training

Technical Architecture

3. SGLang Demo

About

Uh oh!

Releases

Packages

Languages

ananya-ayasi/gpu-distributed

Folders and files

Latest commit

History

Repository files navigation

Distributed Training Practice

1. Distributed Data Parallel (DDP) Training with PyTorch

2. Mini-GPT with multi-GPU training

Technical Architecture

3. SGLang Demo

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages