Add llama compatibility with new ggml quantization#642
Add llama compatibility with new ggml quantization#642manyoso merged 8 commits intonomic-ai:dlopen_backendfrom
Conversation
|
Closes #541 once |
Nice! Thank you! :) I looked at the code and looks like the temperature sampling is the same. You added |
That's why I put them into the |
|
Looks like ggerganov made another breaking change. |
|
Thanks to @imaami for helping with the commit history :-) |
I did something like this, or at least I hope. If you're interested: imaami@fea4747 |
Please feel free to PR this once this whole thing is merged into master! We should add as few unrelated things as possible to the |
This is great! I like it. :) |
Sure thing, let's do things in order. |
Signed-off-by: niansa/tuxifan <tuxifan@posteo.de>
manyoso
left a comment
There was a problem hiding this comment.
Please address comments...
|
|
||
| # Add each individual implementation | ||
| add_library(llamamodel-${BUILD_VARIANT} SHARED | ||
| # Add each individual implementations |
There was a problem hiding this comment.
nitpick, you don't want the plural here
There was a problem hiding this comment.
I noticed that as well, but decided to leave it as is since it's not worth a commit. Will batch this with further things that may come up.
| url = https://github.com/manyoso/llama.cpp.git | ||
| [submodule "llama.cpp-mainline"] | ||
| path = gpt4all-backend/llama.cpp-mainline | ||
| url = https://github.com/ggerganov/llama.cpp.git |
There was a problem hiding this comment.
Ok, ok, i get ya, but this isn't actually pinning them. Also, I think I still want all of them to use the 'manyoso' fork as this gives us further control,right?
There was a problem hiding this comment.
Not sure what you mean, the manyoso fork hasn't been updated to latest llama.cpp, it's 132 commits behind...
There was a problem hiding this comment.
Also that fork only adds alibi, which is only needed for MPT
There was a problem hiding this comment.
I mean we should update that fork, and point to it I believe. lemme do that now.
| llamamodel.cpp) | ||
| prepare_target(llamamodel llama) | ||
| target_compile_definitions(llamamodel-mainline-${BUILD_VARIANT} PRIVATE | ||
| LLAMA_VERSIONS=>=3 LLAMA_DATE=999999) |
There was a problem hiding this comment.
=>= oh man cmake.. you're kiling me
There was a problem hiding this comment.
There was a problem hiding this comment.
Haha, yup. Looks confusing, is confusing, but does what we need quite flexibly.
There was a problem hiding this comment.
That conditional should probably be changed to a slightly less cursed variant:
#if LLAMA_VERSION <= 123456
// ...
#elif LLAMA_VERSION >= 654321
// ...
#endifAt least then it would be a readily recognizable pattern of tragic stylistic compromise instead of a confusing entirely new way to crush one's hopes and dreams. Would also shrink the cmake side a little.
Pardon the gallows humour, can't help it whenever pre-processor macros seem necessary. ;)
| add_library(gptj-${BUILD_VARIANT} SHARED | ||
| gptj.cpp) | ||
| prepare_target(gptj ggml) | ||
| prepare_target(gptj ggml-230511) |
There was a problem hiding this comment.
wait, where are you tagging the actual ggml with this?
There was a problem hiding this comment.
llama.cpp.cmake adds the given suffix to ggml as well.
| int32_t n_keep = 0; // number of tokens to keep from initial prompt | ||
| #if LLAMA_DATE <= 230511 | ||
| int32_t n_parts = -1; // amount of model parts (-1 = determine from model dimensions) | ||
| #endif |
There was a problem hiding this comment.
The crux of it. We're going to use macros...
There was a problem hiding this comment.
Our other option would be to have an extensive collection of almost-identical llamamodel.cpp files for different llama.cpp versions.
There was a problem hiding this comment.
No, I think this is the right choice of a bunch of bad choices.
There was a problem hiding this comment.
There's also CRTP and C++ template magic, but I agree it's not the time to go there yet.
| ${DIRECTORY}/llama.cpp | ||
| ${DIRECTORY}/llama.h | ||
| ${DIRECTORY}/llama_util.h) | ||
| ${DIRECTORY}/${LLAMA_UTIL_SOURCE_FILE}) |
There was a problem hiding this comment.
This branch doesn't actually introduce this file, right? It exists upstream in one of the pinned submodules?
There was a problem hiding this comment.
The filename was changed.
| llama_sample_top_p(ctx, &candidates_p, top_p, 1); | ||
| llama_sample_temperature(ctx, &candidates_p, temp); | ||
| return llama_sample_token(ctx, &candidates_p); | ||
| } |
There was a problem hiding this comment.
Going to assume this is giving you sane results? Have you made sure to go through and test models with each of the pinned variants and file formats? Man, we almost want regression or unit tests here...
There was a problem hiding this comment.
Yup! I did. Man was my harddrive full..
There was a problem hiding this comment.
This is also how it's done in the llama.cpp main example.
|
Thank god someone is handling this migration the sane way. |
Describe your changes
This change introduces full compatibility with new ggml quanitzation without killing the old one (which is renamed to
{llama,ggml}-old).The API is be kept unchanged and these changes completely invisible to it.
Issue ticket number and link
Every single one that complains about new llama models not working :-)