phi 4 multimodal training version 1 ( with limitations ) #1555

optas · 2025-03-18T20:14:54Z

Description

Adds the necessary changes for SFT training (vision/text-only) with phi-4 in Oumi.

Given that we are using here the vision_language_with_padding we are limited to size batch 1, and we cannot accommodate multi-image training.

Related issues

Towards OPE-1102

Before submitting

This PR only changes documentation. (You can ignore the following checks in that case)
Did you read the contributor guideline Pull Request guidelines?
Did you link the issue(s) related to this PR in the section above?
Did you add / update tests where needed?

Reviewers

At least one review from a member of oumi-ai/oumi-staff is required.

configs/recipes/vision/phi4/sft/gcp_job.yaml

src/oumi/core/configs/internal/internal_model_config.py

src/oumi/core/configs/internal/supported_models.py

src/oumi/core/processors/base_processor.py

src/oumi/core/processors/default_processor.py

src/oumi/datasets/chat_templates/phi4-multimodal-instruct.jinja

src/oumi/core/processors/default_processor.py

nikg4 · 2025-03-20T21:55:10Z

configs/recipes/vision/phi4/sft/gcp_job.yaml

+setup: |
+  set -e
+
+  pip install uv && uv pip install oumi[gpu] hf_transfer


is Phi4 compatible with the current transformers version ?

yes. please see here https://huggingface.co/microsoft/Phi-4-multimodal-instruct
flash_attn==2.7.4.post1
torch==2.6.0
transformers==4.48.2
accelerate==1.3.0
soundfile==0.13.1
pillow==11.1.0
scipy==1.15.2
torchvision==0.21.0
backoff==2.2.1
peft==0.13.2

the one place I see a potential issue is peft (they use a lower version than our lowest allowed) -- I plan to investigate this more when we publish a more LoRA oriented version of it.

nikg4 · 2025-03-20T21:59:18Z

src/oumi/core/processors/default_processor.py

+    @override
+    def ignore_features(self) -> list[str]:
+        """Returns a list of keys of features to ignore from feeding the model."""
+        return self._ignore_features if self._ignore_features else []


Consider returning a shallow copy. Just to make sure that external users can't mutate our member variables.
return copy.copy(self._ignore_features) if self._ignore_features else []

nikg4 · 2025-03-20T22:01:02Z

src/oumi/core/processors/default_processor.py

@@ -75,6 +79,7 @@ def __init__(
                self._worker_processor.image_processor
            )
        self._label_ignore_index: Optional[int] = label_ignore_index
+        self._ignore_features: Optional[list[str]] = ignore_features


Consider making our own private copy. Just to make sure that external users can't mutate our member variable.

self._ignore_features: Optional[list[str]] = copy.copy(ignore_features) if ignore_features else []

nikg4 · 2025-03-20T22:02:58Z

configs/recipes/vision/phi4/sft/README.md

+Configs for Phi-4-multimodal-instruct 14B model. See https://huggingface.co/microsoft/Phi-4-multimodal-instruct
+
+This is a multimodal model that combines text, visual, and audio inputs.
+It uses a "Mixture of LoRAs" approach, allowing you to plug in adapters for each


if there is relevant paper for "Mixture of LoRAs" and/or the model itlself , please quote it here.

optas added 10 commits February 27, 2025 00:15

init

c192d05

remove draft notebook

e6202c2

Merge remote-tracking branch 'origin/main' into optas/phi-4-multimodal

b725dbc

Merge remote-tracking branch 'origin/main' into optas/phi-4-multimodal

0752536

update/temp

3387711

Merge remote-tracking branch 'origin/main' into optas/phi-4-multimodal

ad47b5e

Merge remote-tracking branch 'origin/main' into optas/phi-4-multimodal

fe983e6

long-due-update

3146f78

Merge remote-tracking branch 'origin/main' into optas/phi-4-multimodal

5ae68fd

update

793d427

optas requested a review from nikg4 March 18, 2025 20:15

remove assertion

8b5787e