-
Notifications
You must be signed in to change notification settings - Fork 630
Description
Feature request
Oumi should provide a way for people to organize and compare experiments across different model architectures or revisions. Hyper parameter tracking is supported through WANDB and Tensorboard, but these are insufficient when modifying model architecture (core underlying layers or components).
Motivation / references
As a user of Oumi, I often run multiple experiments with many different models or different versions of architectures (changing attention versions, activation functions, normalization, etc.)
Right now, there's no easy way to compare or track performance between runs when modifying these types of things. While hyperparameters are tracked and the model summary can be output to logs, these mechanisms aren't sufficient to properly organize and study model revisions when you change other aspects like layer types, parameter counts, etc.
From wolffrost:
This is more a general question about managing artifact versions and best practices. Let me just give some context for my experiments. I started with a gpt2 causal model because I wanted to be able to compare with a known baseline and make sure my general model was behaving as expected. Then I added multihead attention. This morning I switched to RMSNorm and flash attention. Generally speaking, each version of the the model requires a different trained model. How do people manage the different versions of the models with the different trained artifacts? I'm trying to balance "go fast" with "be able to regress and compare". I'm using wandb so I need to distinguish those logs as well. I'm probably looking for a free lunch here that doesn't exist 🙂
Your contribution
N/A