Skip to content

Commit 19ef6c4

Browse files
committed
introduce lofi and hifi commands for train, generate. Add Data Mixing
This enhancement discusses "more intensive" training and data generation techniques as well as a new Data Mixing command. This is all built off of the command redesign. The goal here is to produce higher fidelity models using the CLI. Signed-off-by: Charlie Doern <[email protected]>
1 parent 3f447c4 commit 19ef6c4

File tree

1 file changed

+212
-0
lines changed

1 file changed

+212
-0
lines changed

docs/lofi-hifi-backends.md

+212
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,212 @@
1+
# Introduce Commands that Run Jobs with Different Fidelity levels for Key ilab functions
2+
3+
This document describes adding different data generation, mixing, and training backends to ilab to enable higher fidelity training using the backend code.
4+
5+
Currently all training is done via qlora or the like. Adding the following commands will enable higher fidelity training and introduce commands such as data mixing.
6+
7+
## Key Component
8+
9+
### Building off of the InstructLab Structure Redesign
10+
11+
After github.com/instructlab/instructlab/pull/990 is merged, ilab will use a parent -> child command structure. This proposal operates under that new structure.
12+
13+
This is the proposed new structure under the above Enhancement:
14+
15+
```console
16+
ilab
17+
|
18+
|_______model
19+
| |
20+
| |____convert
21+
| |____download
22+
| |____train (--convert)
23+
| |____serve (-i)
24+
| |____chat
25+
| |____inference
26+
| |
27+
|_______data
28+
| |
29+
| |____generate
30+
| |
31+
|_______config
32+
| |
33+
| |____init
34+
| |
35+
|_______tax
36+
| |
37+
| |____diff
38+
| |____check
39+
| |____download
40+
```
41+
42+
And this would be the structure after these new commands are added:
43+
44+
```console
45+
ilab
46+
|
47+
|_______model
48+
| |
49+
| |____convert
50+
| |____download
51+
| |____train (--convert)
52+
| | |
53+
| | |______integrated *
54+
| | |______phased *
55+
| |
56+
| |____serve (-i)
57+
| |____chat
58+
| |____inference
59+
| |
60+
|_______data
61+
| |____generate
62+
| | |
63+
| | |______lofi * (name pending)
64+
| | |______hifi * (name pending)
65+
| |
66+
| |____mix *
67+
| |
68+
| |
69+
|_______checkpoint
70+
| |
71+
| |____evaluate *
72+
| |
73+
|_______config
74+
| |
75+
| |____init
76+
|_______tax
77+
| |
78+
| |____diff
79+
| |____check
80+
| |____download
81+
```
82+
83+
84+
The starred commands are the new ones under the redesigned structure. We are opting to add integrated and phased as commands under train and lofi/higi under generate rather than flags given their completely different functionality and backend structure.
85+
86+
87+
These commands would connect to the instructlab backend, which will be in the form of libraries. The lower fidelity commands if accepted will be the equivalent of what currently exists for generate and train. qlora, with pytorch, mlx, etc.
88+
89+
90+
The Higher Fidelity versions would validate the existence of hardware that can properly run the generation, mixing, and training backends. At least for training, the existing infrastructure simply shells out to various python scripts, libraries, etc. So, as long as we combine this backend code into a place that can be imported into ilab without breaking other dependencies, this should be more of a structural change than a functional one. We know the backend code works on an isolated system, we just need to make it pluggable.
91+
92+
High Fidelity can run locally on someone's laptop or desktop and even utilize deepspeed if they have GPUs. In a more powerful system, the user can also run it in a container, utilize deepspeed and potentially even distribute the workload accorss machines using torch.distributed.
93+
94+
A key component here too is checkpoint evaluation. We want users to be able to have an understanding of model checkpoints starting with the ability to run evaluation on them in between phased training runs. The output of `ilab checkpoint evaluate` will tell the user what to point `ilab model train phased` arguments towards.
95+
96+
97+
### Reasoning
98+
99+
100+
Plugging into hardware acceleration and multi-phase training is the logical next step for ilab. Ensuring we do this in a clean way that does not overload our current commands is also crucial. Many of the processes in the backend are confusing so we want to abstract some of the steps away from users while also giving them a reasonable amount of choice in configuring these new processes. However, maintaining the current laptop story is important to users without hardware access. Splitting these two paths into separate commands maintains the integrity of each.
101+
102+
103+
104+
### ilab model train integrated
105+
106+
107+
This command would take something like the following arguments
108+
109+
110+
--gpus=str , describes the amount of GPUs (of what is available) to use for this process. This comes in the form of: 0-1, 8, etc.
111+
--quantize=bool, enabled Qlora which basically loads the model in a quantized form so it can fit on a consumer GPU
112+
--optimizer=str (deepspeed, fsdp) describes the optimizer framework to use during training
113+
--learning-rate, int (?)
114+
--batch-len=int
115+
--input-dir=str, describes where the generated+mixed data lives to be trained on.
116+
--model-name=str, name of the model to be output
117+
--output-dir=str, where to put the model after training
118+
--num-epochs=int, the amount of epochs to run in this phase of training
119+
--device=str, the accelerator to use: cuda,rocm,cpu,mlx,mps
120+
++ many of the current train flags probably
121+
122+
#### Implementation Specifics
123+
124+
The Transformers library (which is currently what we use for training along with torch) has options for fsdp, deepspeed, and many of the related arguments currently in use elsewhere in the project. My idea here is to try and use these as best as we can in the CLI rather than make our own custom versions that we then have to maintain.
125+
126+
There might be a usecase for kicking off some custom deepspeed code for more in depth inter-checkpoint eval. That would be the point of the `phased` command. The phased train command is meant to be run in conjunction with `ilab checkpoint evaluate` which will give the user the best checkpoint to run the next phase on.
127+
128+
Keep in mind though, this is the community CLI! I feel as though we should try to find a middle ground between server usecases and community usecases. Having the default path in the InstructLab CLI be torch+transformers makes sense for the following usecases:
129+
130+
1. Developer with a Gaming PC:
131+
* Transformers+Pytorch support Qlora&&FSDP. While deepspeed might be a more "server-rack" use-case, having multi-phase training in the CLI for anyone with a consumer GPU makes sense.
132+
2. Someone interested in ML, has a Homelab, or *anything with 2 GPUs*
133+
* Trandformers+Pytorch supports Deepspeed on a single system spreading the training over the GPUs. Any professional or Hobbyist that has 2 GPUs will be looking for this option
134+
3. The laptop usecase
135+
* Maintianing Qlora as the performant training mode for the laptop is crucial as most people cannot handle the full models. However, unlocking some better results by using FSDP+Qlora could improve local results and get people more interested in InstructLab.
136+
137+
The above usecases create a spectrum of possibility of what users can do with ilab! adding `ilab model train lofi/hifi` each with different options for optimizers: Adam (default, Qlora), FSDP (Qlora OR Non-Qlora train), and Deepspeed (Non-Qlora, multi GPU) increase the amount of situations where ilab is viable. Adding other options like --multi-phase, --learning-rate, etc give the user granular control over this new multi-phased training approach.
138+
139+
140+
### ilab model train phased
141+
142+
This command would take roughly the following arguments
143+
144+
--optimizer=str (deepspeed, fsdp) describes the optimizer framework to use during training
145+
--device=str , accelerator to use cpu,cuda,rocm,mlx,mps
146+
--model-dir=path , dir where the model to be used in this phase is located
147+
--data-dir=path , dir where the data for this phase is located
148+
149+
Note there is no Lora in this command and there is no quantization. The `ilab model train integrated` command will use the transformers library with pytorch. This is because those have awesome plugins for deepspeed and fsdp WITH lora and qlora. Those are absolute necessities for community usecase. However, they will be mostly unused in the "High Fidelity" usecase.
150+
151+
### ilab data generate hifi
152+
153+
This command would take something like the following arguments
154+
155+
--num-samples=int
156+
--num-grounded-questions=int
157+
--num-gen-proc=int
158+
--num-util-proc=int (or is this for mixing?)
159+
160+
161+
### ilab data generate lofi
162+
163+
This command would be the same as the existing `ilab generate`
164+
165+
### ilab data mix
166+
167+
This command would take something like the following arguments
168+
169+
--num-util-proc=int
170+
--output-dir=str (defaults to generated/mixed)
171+
--knowledge-recipes=[]str (path to yaml)
172+
--skill-recipes=[]str (path to yaml)
173+
174+
* Do we need an `ilab recipe` cmd? *
175+
176+
177+
## Workflows
178+
179+
### ilab model train integrated
180+
181+
A user on a Desktop with a consumer GPU (assume RTX 20/30 series) would run something like
182+
183+
`ilab model train integrated --device=cuda --quantize --optimizer=deepspeed --num-epochs=5`
184+
185+
This would
186+
1. load the model in 4-bit-quantized form onto the GPUs vram
187+
2. setup the transformers trainer with this model, and with a hardcoded deepspeed config ilab would come with
188+
3. train for 5 epochs using fusedadam as the optim and deepspeed on top of that.
189+
4. give you a model in safetensors format (we cannot convert a quantized safetensor model)
190+
191+
192+
The big advantage here is more "fine tuned" training than currently exists in the CLI because of deepspeed (or fsdp). The user could even set this up for multi GPU or multi system support with future ilab enhancements.
193+
194+
### ilab model train phased
195+
196+
A user would run something like (assuming phase00 has run) on a GPU enabled server.
197+
198+
`ilab model train phased --device=cuda --model-dir=./phase00/model --data-dir=./phase00/data`
199+
`ilab checkpoint evaluate ./phase05/checkpoints --output-dir=./phase10`
200+
`ilab model train phased --device=cuda --model-dir=./phase10/model --data-dir=./phase10/data`
201+
....
202+
203+
Basically they would run phased with an eval in between. The Eval looks at the checkpoints output by the previous phase and outputs a model dir in the next phase's working directory.
204+
205+
## Alternatives
206+
207+
The other alternative is to keep the same train and generate commands and instead add a --backend or --hifi flag to trigger the high fidelity code. The issue here is that ilab train is already overloaded with pytorch, mlx, etc. Adding more switches and dials into the main train code will make it hard to maintain.
208+
209+
210+
211+
212+

0 commit comments

Comments
 (0)