Starcoder 2 NeMo to HF Checkpoint Converter Is Crashing For Models Trained With Sequence Parallelism

**Describe the bug**
For NeMo 2502 container, when one tries to convert a SC2 checkpoint trained with sequence parallelism, NeMo checkpoint converter script `NeMo/scripts/checkpoint_converters/convert_starcoder2_nemo_to_hf.py will crash with `Can not use sequence paralllelism without tensor parallelism` error.

This happens because the script is using a fixed `model_config.tensor_model_parallel_size = 1`, however for certain GPU/model size configurations, one might need a higher TP size

**Steps/Code to reproduce bug**

- Launch a NeMo 2502 container
- Train a starcoder2 model using sequence parallelism and TP=2
- Run the `NeMo/scripts/checkpoint_converters/convert_starcoder2_nemo_to_hf.py` script to convert it to HF

```
python /opt/NeMo/scripts/checkpoint_converters/convert_starcoder2_nemo_to_hf.py --input_name_or_path mymodel.nemo --output_path mymodel_hf/ --hf-model-name /pretrained/huggingface/starcoder2-7b/ --precision bf16"
```

**Expected behavior**

Should convert the model to HF

**Environment overview (please complete the following information)**

 - Environment location: Docker
 - Method of install: Please specify exact commands you used to install.
 - If method of install is [Docker], provide `docker pull` & `docker run` commands used

**Additional context**

Add any other context about the problem here.
Example: GPU model


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Starcoder 2 NeMo to HF Checkpoint Converter Is Crashing For Models Trained With Sequence Parallelism #14302

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Starcoder 2 NeMo to HF Checkpoint Converter Is Crashing For Models Trained With Sequence Parallelism #14302

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions