How should I interpret the Flop Profiler's output? #1458

Paandaman · 2021-10-18T07:47:38Z

Paandaman
Oct 18, 2021

After enabling the Flop Profiler with the exact same configuration as given in the example here: https://www.deepspeed.ai/tutorials/flops-profiler/ I get a different output from the one presented in the documentation. What I get looks like this

I am training on 4 gpus with the following DeepSpeed config file:

{ "train_batch_size": 128, "train_micro_batch_size_per_gpu": 4, "steps_per_print": 1000, "optimizer": { "type": "Adam", "params": { "lr": 1e-5 } }, "gradient_clipping": 1.0, "zero_optimization": { "stage": 3, "stage3_max_live_parameters": 1e9, "stage3_max_reuse_distance": 1e8, "stage3_param_persitance_threshold": 1e5, "stage3_prefetch_bucket_size": 5e7, "contiguous_gradients": true, "cpu_offload": true, "cpu_offload_params": true, "cpu_offload_use_pin_memory": true, "overlap_comm": true, "reduce_bucket_size": 90000000, "sub_group_size": 4e8 }, "wall_clock_breakdown": false, "fp16": { "enabled": true, "loss_scale": 1024, "loss_scale_window": 1000, "hysteresis": 2, "min_loss_scale": 1 }, "flops_profiler": { "enabled": true, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } }

I initially thought I would get one such output from each of the GPUs so a total of 4, but instead I get 8 outputs with different FLOPS values: ['2.31 TFLOPS', '2.01 TFLOPS', '2.07 TFLOPS', '2.06 TFLOPS', '2.06 TFLOPS', '2.09 TFLOPS', '2.08 TFLOPS', '1.86 TFLOPS']

When I then trained a larger model with 4 GPUs and same DeepSpeed config as above except for train_micro_batch_size_per_gpu=2 I instead got 16 different outputs.

What am I actually getting a profiling of here? Is it not the FLOPS per GPU? What I want to do is to calculate the total FLOPS for a model, how would I do that?

Many thanks in advance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How should I interpret the Flop Profiler's output? #1458

{{title}}

Replies: 0 comments

Select a reply

How should I interpret the Flop Profiler's output? #1458

Paandaman Oct 18, 2021

Replies: 0 comments

Paandaman
Oct 18, 2021