New model export [call for feedback and help] #322

karpathy · 2023-08-19T18:28:03Z

I took the new grouped int8 export from my int8 PR #312 and made it just a new "version 1". I think we'll be in a world with a few output .bin formats, so I'm centralizing things to a new export.py file. And this will allow us to slowly introduce new formats and deprecate the old ones over time, too.

@kroggen I think you'd want to potentially include your own export here? As it only quantizes on the level of tensors.

We'd also want to move over the Llama export and the HF export into this file, I think, and then delete those files. I have to think this through a little bit more still.

karpathy · 2023-08-19T18:32:16Z

A few more questions on my mind, as exports are very annyoing to change in the future:

Header design think through
Cache alignment think through: is version 1 well-aligned (?)
Tokenizer inclusion? I noticed that in llama.cpp the tokenizer is included in the .bin file. This is nice because the .bin file becomes the only needed self-contained artifact. But it's a bit more annoying to pipe through the tokenizer, it bloats the files a little bit, and aesthetically I feel like the "integer inference engine" is what a Llama model fundamentally is, and the Tokenizer is a whole separate artifact that translates strings <-> ints. So I like that separation.

My plan is to merge this first, and then think through how to merge the int8 quantization version, and maybe the other versions (cuda, etc.).

…ll be Q8_0 export

ozabluda · 2023-08-19T19:14:10Z

Probably a lot can be learned from

https://github.com/philpax/ggml/blob/gguf-spec/docs/gguf.md
ggerganov/ggml#302

ozabluda · 2023-08-19T19:20:09Z

export.py

+    nbytes += 1
+    pad = 256 - nbytes # pad the rest with zeros
+    assert pad >= 0
+    out_file.write(b'\0' * pad)


If you do this, you don't have to keep track of nbytes, which is error-prone.

Suggested change

out_file.write(b'\0' * pad)

out_file.write(b'\0' * (-out_file.tell() % 256) )

TIL thank you!

… version 0 and version 1

atamurad · 2023-08-19T21:05:13Z

We'd also want to move over the Llama export and the HF export into this file, I think, and then delete those files. I have to think this through a little bit more still.

We can load/map Meta and HF models to model.py format and let model_export() handle the quantization and serialization part. But before that would be good to have optional (non-shared) final classifier weights in model.py. I can submit PR if you're interested (1. separate wcls in model.py, 2. export wcls, 3. update old export scripts).

UPDATE: I noticed that the model.py Transformer class has an output attribute, so no changes required in model.py.

karpathy · 2023-08-20T15:52:19Z

@atamurad yes exactly I think we should load the HF/Meta models into Transformer and then export it as usual. I didn't take a look yet at how this could be achieved. Maybe the export.py file can also have some import utilities like this.

@ozabluda great links!

karpathy · 2023-08-20T17:05:26Z

I think I am going to merge this branch to master because the previous functionality around the old model.bin files (now termed "version 0") works without issues. There is a bunch of new code for version 1,2 export that is unused, but it can be tweaked with further PRs, and slowly we can migrate the run files to use them instead (including run.c). Eventually when we're happy with v1,2,... we'll deprecate v0 and delete the legacy code.

ozabluda · 2023-08-20T17:35:26Z

export.py

+    """ writes one int8 tensor to file that is open in wb mode """
+    d = tensor.detach().cpu().view(-1).numpy().astype(np.int8)
+    b = struct.pack(f'{len(d)}b', *d)
+    file.write(b)


Copied from #312 (comment)

Pad with zeroes to align on 64 byte boundary (128 would be better, see #312 (comment)), maybe 256 (current alignment of first weights after header) even better. This is a sketch. Same thing should be done in serialize_fp32() above and in the C files.

file.write(b'\0' * (-file.tell() % 64) ) file.write(b)

kroggen · 2023-08-22T02:33:39Z

@kroggen I think you'd want to potentially include your own export here? As it only quantizes on the level of tensors.

I'm sorry, I am not well versed in Python/numpy/pytorch

New model export (the code remains "dead" and legacy version is still the default behavior, so no breaking changes are introduced). The major benefit is a new export.py file, which we can use to centralize work on formatting: both imports and exports.

new model export: versions 0 (legacy) and 1

7f551db

karpathy added 2 commits August 19, 2023 18:34

oops fix double indent on quantize def

4212bd6

make version 1 be the legacy export but with new header. version 2 wi…

4df5e2e

…ll be Q8_0 export

ozabluda reviewed Aug 19, 2023

View reviewed changes

isolate read_checkpoint, because i'd like to now make it support both…

fa8dfd8

… version 0 and version 1

karpathy added 2 commits August 20, 2023 16:51

use out_file.tell() instead of nbytes += arithmetic

f3db92a

todos update

13dcee4

karpathy marked this pull request as ready for review August 20, 2023 17:06

karpathy merged commit 8c93c7a into master Aug 20, 2023
6 checks passed

ozabluda reviewed Aug 20, 2023

View reviewed changes

karpathy deleted the feature/export branch August 24, 2023 03:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New model export [call for feedback and help] #322

New model export [call for feedback and help] #322

karpathy commented Aug 19, 2023

karpathy commented Aug 19, 2023

ozabluda commented Aug 19, 2023

ozabluda Aug 19, 2023

karpathy Aug 20, 2023

atamurad commented Aug 19, 2023 •

edited

Loading

karpathy commented Aug 20, 2023

karpathy commented Aug 20, 2023

ozabluda Aug 20, 2023

kroggen commented Aug 22, 2023

	out_file.write(b'\0' * pad)
	out_file.write(b'\0' * (-out_file.tell() % 256) )

New model export [call for feedback and help] #322

New model export [call for feedback and help] #322

Conversation

karpathy commented Aug 19, 2023

karpathy commented Aug 19, 2023

ozabluda commented Aug 19, 2023

ozabluda Aug 19, 2023

Choose a reason for hiding this comment

karpathy Aug 20, 2023

Choose a reason for hiding this comment

atamurad commented Aug 19, 2023 • edited Loading

karpathy commented Aug 20, 2023

karpathy commented Aug 20, 2023

ozabluda Aug 20, 2023

Choose a reason for hiding this comment

kroggen commented Aug 22, 2023

atamurad commented Aug 19, 2023 •

edited

Loading