-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
llama : refactor src/llama.cpp
#10902
Conversation
I think |
524886b
to
7ab08d5
Compare
be8f568
to
dcbfda1
Compare
ba48e37
to
0ccae21
Compare
1e7e338
to
597ae05
Compare
ggml-ci
1521f9e
to
c16630e
Compare
ggml-ci
391a111
to
089cf4a
Compare
ggml-ci
089cf4a
to
e06d267
Compare
ggml-ci
I think this is a good place to merge this change. The project builds faster now and hopefully the code is organized a bit better. Will continue refactoring in follow-up PRs and any suggestions and recommendations are welcome. I've left some TODOs around the code and will try to address those next. After that will be looking for ways to separate the KV cache from the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall, thanks for taking time during the holidays to finish this. Happy new year btw 🎉
common/common.h
Outdated
@@ -24,13 +24,12 @@ | |||
|
|||
#define DEFAULT_MODEL_PATH "models/7B/ggml-model-f16.gguf" | |||
|
|||
// TODO: "lora_adapter" is tautology |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note sure what do you mean by this. I think "lora_adapter" is not tautology because there can be multiple types of adapter, and there can also be "lora_a", "lora_b", "lora_scale"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought that "lora" already implies "adapter", since it means comes from "LOw-Rank Adapter". So it seems to me that common_lora_adapter_info
should be simply called common_lora_info
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm no, the "A" means "adaptation", not "adapter". Quoting from this article:
LoRA, which stands for “Low-Rank Adaptation”, distinguishes itself by training and storing the additional weight changes in a matrix while freezing all the pre-trained model weights. LoRA is not called an “adapter” because it does not add adapters. Instead, it is referred to as “adaptation” to describe the process of fine-tuning the domain data and tasks.
Funny enough, I've just found out that "adapter" is technically a different technique than LoRA, firstly introduced in this paper. But the way they work are quite similar, adding nodes to the existing cgraph. So, I guess the term "adapter" is being used correctly in our context in llama.cpp, since both LoRA and cvector are just additions on top of model's cgraph.
After this PR, What change caused this? I don't see changes to |
There should be no functional changes. Try to clean your build folder. |
There's definitely a change, I see
but later:
|
There is a slight change in Please note that we don't provide stable interface for |
That's already done according to the changes in
That's exactly what I'm trying to find. |
Be careful that with unique ptr, the ctx maybe deallocated when it goes out of scope. You should make sure that |
Yes, looks like that's what happens here. I see that Although, it's a bit weird that this wasn't needed previously. |
Another way is: (but remember to free it manually afterwards) common_init_result llama_init = common_init_from_params(params);
llama_model * model = llama_init.model.release();
llama_context * ctx = llama_init.context.release();
llama_free(ctx);
llama_free_model(model); |
Yes, thank you, that's exactly what I needed! So, |
In cpp, you have notion of "ownership". For example, Same for
|
I understand that, but the problem is that |
Attempting to split the
src/llama.cpp
into a few separate modules. Very work-in-progress, mainly opening this PR for people to keep track and suggest improvements as we move along. This part does not involve functional changes, just code reorganization and decoupling to make it easier to work with the codebase. The batch and KV cache abstractions and reimplementations will be done in follow-up PRs.TODO
move the(no)llama_mmaps
andllama_mlocks
fromllama_model
tollama_context
?_internal
suffix to_impl
(next PR)llama_tensor_loader
?Conflicts