Skip to content

[Bug] Evaluation accuracy drops when using distributed accelerate launch with multi-GPU #1677

@GuilhermeAumo

Description

@GuilhermeAumo

What happened?

I'm seeing a major drop in evaluation accuracy when using distributed accelerate launch with multiple GPUs vs single-GPU evaluation, with no changes to the config file.

Here is the config file I used (oumi/qwen7b/eval_config_base.yml):

model:
  model_name: "Qwen/Qwen2.5-7B-Instruct"
  trust_remote_code: True
  shard_for_eval: True

tasks:
  - evaluation_backend: lm_harness
    task_name: mmlu_pro
    num_samples: 5

inference_engine: NATIVE  # NATIVE

output_dir: "eval_results/qwen7b_base"

When I run the evaluation on a single GPU, I get reasonable accuracy:

CUDA_VISIBLE_DEVICES=0 oumi evaluate -c oumi/qwen7b/eval_config_base.yml
# Accuracy: 35%

However, when I try using multiple GPUs with distributed accelerate launch, the accuracy drops drastically:

CUDA_VISIBLE_DEVICES=0,1 oumi distributed accelerate launch -m oumi evaluate -c oumi/qwen7b/eval_config_base.yml
# Accuracy: 5%

I was not able to find any error message or log that could indicate why this was happening.

No changes were made to the config or environment besides switching the command and enabling multi-GPU. Could this be an issue with how the dataset is sharded or results aggregated in distributed mode?

Let me know what further details I can provide to help debug.

Steps to reproduce the bug

file eval.yml

model:
  model_name: "Qwen/Qwen2.5-7B-Instruct"
  trust_remote_code: True
  shard_for_eval: True

tasks:
  - evaluation_backend: lm_harness
    task_name: mmlu_pro
    num_samples: 5

inference_engine: NATIVE  # NATIVE

output_dir: "eval_results/qwen7b_base"
conda create -n test-oumi python=3.11
pip install oumi[gpu]
CUDA_VISIBLE_DEVICES=0,1 oumi distributed accelerate launch -m oumi evaluate -c eval.yml

System Info

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
   Oumi environment information:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

┌────────────────┬───────────────────────────────────────────────┐
│ Oumi version   │ 0.1.12                                        │
│ Python version │ 3.11.11                                       │
│ Platform       │ Linux-5.15.0-84-generic-x86_64-with-glibc2.31 │
└────────────────┴───────────────────────────────────────────────┘

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
   Installed dependencies:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

┏━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ PACKAGE          ┃ VERSION         ┃
┡━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ accelerate       │ 1.2.1           │
│ aiohttp          │ 3.11.18         │
│ bitsandbytes     │ 0.45.5          │
│ datasets         │ 3.2.0           │
│ diffusers        │ <not installed> │
│ einops           │ 0.8.1           │
│ jsonlines        │ 4.0.0           │
│ liger-kernel     │ 0.5.9           │
│ llama-cpp-python │ <not installed> │
│ lm-eval          │ 0.4.8           │
│ mlflow           │ 2.21.3          │
│ numpy            │ 1.26.4          │
│ nvidia-ml-py     │ 12.560.30       │
│ omegaconf        │ 2.4.0.dev3      │
│ open_clip_torch  │ <not installed> │
│ pandas           │ 2.2.3           │
│ peft             │ 0.14.0          │
│ pexpect          │ 4.8.0           │
│ pillow           │ 11.1.0          │
│ pydantic         │ 2.9.2           │
│ responses        │ 0.25.7          │
│ sglang           │ <not installed> │
│ skypilot         │ 0.7.0           │
│ tensorboard      │ 2.18.0          │
│ timm             │ <not installed> │
│ torch            │ 2.5.1           │
│ torchdata        │ 0.9.0           │
│ torchvision      │ 0.20.1          │
│ tqdm             │ 4.67.1          │
│ transformers     │ 4.51.3          │
│ trl              │ 0.16.1          │
│ typer            │ 0.15.3          │
│ vllm             │ 0.7.3           │
│ wandb            │ 0.19.11         │
└──────────────────┴─────────────────┘

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
   Environment variables:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ VARIABLE                        ┃ VALUE     ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ ACCELERATE_DYNAMO_BACKEND       │ <not set> │
│ ACCELERATE_DYNAMO_MODE          │ <not set> │
│ ACCELERATE_DYNAMO_USE_DYNAMIC   │ <not set> │
│ ACCELERATE_DYNAMO_USE_FULLGRAPH │ <not set> │
│ ACCELERATE_USE_FSDP             │ <not set> │
│ CUDA_VISIBLE_DEVICES            │ 0,1       │
│ LOCAL_RANK                      │ <not set> │
│ LOCAL_WORLD_SIZE                │ <not set> │
│ OUMI_EXTRA_DEPS_FILE            │ <not set> │
│ OUMI_FORCE_EDITABLE_INSTALL     │ <not set> │
│ OUMI_SLURM_CONNECTIONS          │ <not set> │
│ OUMI_USE_SPOT_VM                │ <not set> │
│ RANK                            │ <not set> │
│ WORLD_SIZE                      │ <not set> │
└─────────────────────────────────┴───────────┘

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
   PyTorch information:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

┌────────────────┬───────────────────────┐
│ CUDA available │ True                  │
│ CUDA version   │ 12.4                  │
│ cuDNN version  │ 90.1.0                │
│ Number of GPUs │ 2                     │
│ GPU type       │ NVIDIA A100 80GB PCIe │
│ GPU memory     │ 79.2GB                │
└────────────────┴───────────────────────┘

Metadata

Metadata

Labels

bugSomething isn't working

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions