Evaluating Data Parallelism Efficiency in Machine Learning

Project Description

This project evaluates the efficiency of data parallelism in machine learning workflows using CPU and GPU clusters. It measures the impact of different configurations (number of cores/GPUs, batch sizes) on training time and throughput using the DenseNet121 model and Imagenette dataset.

Key Features:

Distributed training experiments for both CPU and GPU
Automated experiment scripting with multiple configurations
Performance metric visualization (throughput, time components)
Memory-aware batch size handling for GPU constraints

Folder Structure

.
├── cpu/
│   ├── cpu_plots/            # Generated CPU performance plots
│   ├── cpu_run/              # Raw experiment results (CSV files)
│   ├── plot_cpu_results.py   # CPU data analysis & visualization
│   ├── project_ex_1.py       # CPU distributed training code
│   └── run_experiments.sh    # CPU experiment runner
│
├── gpu/
│   ├── gpu_experiments/      # Raw GPU experiment results
│   ├── gpu_plots/            # Generated GPU performance plots
│   ├── gpu_plots_ex_2.py     # GPU data analysis & visualization
│   ├── project_ex_2.py       # GPU distributed training code
│   └── run_gpu_experiments.sh # GPU experiment runner
└── report/  # Detailed project report folder: LateX and PDF verison

Usage

CPU Experiments

Run experiments:

cd cpu/
chmod +x run_experiments.sh
./run_experiments.sh

Generate plots:

python plot_cpu_results.py

GPU Experiments

Run experiments:

cd gpu/
chmod +x run_gpu_experiments.sh
./run_gpu_experiments.sh

Generate plots:

python gpu_plots_ex_2.py

Key Scripts

File	Description
`run_experiments.sh`	Runs CPU experiments with batch size 32 and 1-8 cores (10 times)
`run_gpu_experiments.sh`	Runs GPU experiments with batch sizes 16-128 and 1-3 GPUs (10 times)
`plot_cpu_results.py`	Generates: - Time Components Bar Chart - Throughput Line Plot
`gpu_plots_ex_2.py`	Generates: - Throughput vs GPU plots for different batch sizes - Optimal GPU configuration chart

Results Analysis

The generated plots show:

CPU Scaling: Linear throughput improvement up to 7 cores
GPU Scaling: Super-linear throughput gains with multiple GPUs
Batch Size Impact: Larger batches require more GPUs for optimal performance
Memory Constraints: Batch size 128 requires ≥2 GPUs due to memory limits

License

This project is licensed under the MIT License - see the LICENSE file for details.

References

For detailed analysis and methodology, see the Project Report.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
cpu		cpu
gpu		gpu
report		report
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Evaluating Data Parallelism Efficiency in Machine Learning

Project Description

Folder Structure

Usage

CPU Experiments

GPU Experiments

Key Scripts

Results Analysis

License

References

About

Uh oh!

Releases

Packages

Languages

yisola2/Parallel

Folders and files

Latest commit

History

Repository files navigation

Evaluating Data Parallelism Efficiency in Machine Learning

Project Description

Folder Structure

Usage

CPU Experiments

GPU Experiments

Key Scripts

Results Analysis

License

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages