This project evaluates the efficiency of data parallelism in machine learning workflows using CPU and GPU clusters. It measures the impact of different configurations (number of cores/GPUs, batch sizes) on training time and throughput using the DenseNet121 model and Imagenette dataset.
Key Features:
- Distributed training experiments for both CPU and GPU
- Automated experiment scripting with multiple configurations
- Performance metric visualization (throughput, time components)
- Memory-aware batch size handling for GPU constraints
.
├── cpu/
│ ├── cpu_plots/ # Generated CPU performance plots
│ ├── cpu_run/ # Raw experiment results (CSV files)
│ ├── plot_cpu_results.py # CPU data analysis & visualization
│ ├── project_ex_1.py # CPU distributed training code
│ └── run_experiments.sh # CPU experiment runner
│
├── gpu/
│ ├── gpu_experiments/ # Raw GPU experiment results
│ ├── gpu_plots/ # Generated GPU performance plots
│ ├── gpu_plots_ex_2.py # GPU data analysis & visualization
│ ├── project_ex_2.py # GPU distributed training code
│ └── run_gpu_experiments.sh # GPU experiment runner
└── report/ # Detailed project report folder: LateX and PDF verison
- Run experiments:
cd cpu/
chmod +x run_experiments.sh
./run_experiments.sh
- Generate plots:
python plot_cpu_results.py
- Run experiments:
cd gpu/
chmod +x run_gpu_experiments.sh
./run_gpu_experiments.sh
- Generate plots:
python gpu_plots_ex_2.py
File | Description |
---|---|
run_experiments.sh |
Runs CPU experiments with batch size 32 and 1-8 cores (10 times) |
run_gpu_experiments.sh |
Runs GPU experiments with batch sizes 16-128 and 1-3 GPUs (10 times) |
plot_cpu_results.py |
Generates: - Time Components Bar Chart - Throughput Line Plot |
gpu_plots_ex_2.py |
Generates: - Throughput vs GPU plots for different batch sizes - Optimal GPU configuration chart |
The generated plots show:
- CPU Scaling: Linear throughput improvement up to 7 cores
- GPU Scaling: Super-linear throughput gains with multiple GPUs
- Batch Size Impact: Larger batches require more GPUs for optimal performance
- Memory Constraints: Batch size 128 requires ≥2 GPUs due to memory limits
This project is licensed under the MIT License - see the LICENSE file for details.
For detailed analysis and methodology, see the Project Report.