Do parameter count calculations in 64 bits to not overflow in case of… #367

janimo · 2023-08-27T09:47:57Z

… very large models

This is based on discussions and code in #111, #154 and #164 so all credits to their authors, and is tested with the 13B and 34B code llama models. It seems to be the minimal amount of conceptual change needed.
The alternative to declaring the new n_layers variable are explicit (unsigned long)p->n_layers casts in 5 of the multiplications.

The p->n_layers*p->n_dim could also be precomputed in unsigned long and reused since it appears in all multiplications, but it would lose the pedagogical value of having the order of operations reflect the order of matrices in the cases where dim comes last.

… very large models

kroggen · 2023-08-29T14:39:52Z

Just a reminder that sizeof(long) on MSVC and MinGW is 32-bit

janimo · 2023-08-29T14:48:36Z

@kroggen thank you, updated to unsigned long long.

karpathy · 2023-09-01T17:07:53Z

thank you @janimo

Do parameter count calculations in 64 bits to not overflow in case of…

Do parameter count calculations in 64 bits to not overflow in case of…

1ebb27f

… very large models

Use long long so it works with MSVC

c5ec6e2

karpathy merged commit b9fb861 into karpathy:master Sep 1, 2023
6 checks passed

This was referenced Sep 1, 2023

Fix 13B #111

Closed

fix overflow with 13B model #154

Closed

13B doesn't work for unknown reasons #164

Closed

vinhtran2611 pushed a commit to vinhtran2611/llama2.c that referenced this pull request Jan 20, 2024

Merge pull request karpathy#367 from janimo/long-multiply

e1e3552

Do parameter count calculations in 64 bits to not overflow in case of…

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do parameter count calculations in 64 bits to not overflow in case of… #367

Do parameter count calculations in 64 bits to not overflow in case of… #367

janimo commented Aug 27, 2023

kroggen commented Aug 29, 2023 •

edited

Loading

janimo commented Aug 29, 2023

karpathy commented Sep 1, 2023

Do parameter count calculations in 64 bits to not overflow in case of… #367

Do parameter count calculations in 64 bits to not overflow in case of… #367

Conversation

janimo commented Aug 27, 2023

kroggen commented Aug 29, 2023 • edited Loading

janimo commented Aug 29, 2023

karpathy commented Sep 1, 2023

kroggen commented Aug 29, 2023 •

edited

Loading