WIP: Replace llama.cpp with ollama #3542

cebtenzzre · 2025-03-19T16:05:47Z

Summary of changes as of 3/19

new directories:

gpt4all-backend: an entirely new backend for GPT4All which includes a REST client for ollama.
gpt4all-backend-test: a test for ollama client functionality written before the gpt4all-chat changes.

removed directories:

gpt4all-bindings: there is no longer a llama.cpp backend for them to use. no plan for Embed4All users yet.

moved directories:

gpt4all-backend -> gpt4all-backend-old: renaming this directory while we transition gpt4all-chat off of it.

new files:

deps/CMakeLists.txt: cmake configuration for shared dependencies between gpt4all-chat and gpt4all-backend.
gpt4all-backend/deps/CMakeLists.txt: cmake configuration for backend dependencies.
gpt4all-backend/include/gpt4all-backend/formatters.h: fmt helpers split off from gpt4all-chat/utils.h.
gpt4all-backend/include/gpt4all-backend/generation-params.h: obsolete; these are now defined in gpt4all-chat.
gpt4all-backend/include/gpt4all-backend/json-helpers.h: helpers for (de-)serializing Qt types with Boost.JSON.
gpt4all-backend/include/gpt4all-backend/ollama-client.h: a REST client for ollama.
gpt4all-backend/include/gpt4all-backend/ollama-model.h: obsolete; this is now defined in gpt4all-chat.
gpt4all-backend/include/gpt4all-backend/ollama-types.h: (de-)serializale types used in ollama client requests/responses.
gpt4all-backend/include/gpt4all-backend/rest.h: helpers for working with REST APIs. used by ollama-client and gpt4all-chat.
gpt4all-backend/src/CMakeLists.txt: cmake configuration for backend sources.
gpt4all-backend/src/json-helpers.cpp: json-helpers.h implementation.
gpt4all-backend/src/ollama-client.cpp: ollama-client.h implementation.
gpt4all-backend/src/ollama-types.cpp: ollama-types.h implementation.
gpt4all-backend/src/qt-json-stream.{cpp,h}: a QIODevice wrapping a boost::json::value, similar in concept to a std::ostringstream.
gpt4all-backend/src/rest.cpp: implementation for rest.h.
gpt4all-chat/qml/AddCustomProviderView.qml: a new tab of the "add model" page for adding new custom OpenAI or ollama providers.
gpt4all-chat/qml/CustomProviderCard.qml: a card in AddCustomProviderView representing an ollama or OpenAI provider.
gpt4all-chat/src/creatable.h: helper for classes derived from std::enable_shared_from_this.
gpt4all-chat/src/json-helpers.cpp:
gpt4all-chat/src/json-helpers.h: helpers for (de-)serializing Qt types with Boost.JSON which only gpt4all-chat needs.
gpt4all-chat/src/llmodel_chat.{cpp,h}: base class for working with an LLM that can generate text.
gpt4all-chat/src/llmodel_description.{cpp,h}: base class for working with the description of an ollama or OpenAI model.
gpt4all-chat/src/llmodel_ollama.{cpp,h}: classes for working with ollama providers and models.
gpt4all-chat/src/llmodel_openai.{cpp,h}: classes for working with OpenAI providers and models. derived from the obsolete chatllm.h.
gpt4all-chat/src/llmodel_provider.{cpp,h,inl}: class for representing a model provider (type + name + base URL) which is either builtin or custom, and is serialized to the models directory if needed.
gpt4all-chat/src/llmodel_provider_builtins.cpp: hardcoded, ordered list of built-in model providers, taken from qml.
gpt4all-chat/src/main.h: exposes a singleton QNetworkAccessManager which should be used for all network requests on the main thread.
gpt4all-chat/src/qmlfunctions.{cpp,h}: free functions needed by QML go here, since QML can only call instance methods.
gpt4all-chat/src/qmlsharedptr.{cpp,h}: a basic shared pointer for QML. QSharedPointer has no built-in QML equivalent.
gpt4all-chat/src/store_base.{cpp,h,inl}: base class for managing a collection of (de-)serialized objects (such as providers) using Boost.JSON.
gpt4all-chat/src/store_provider.{cpp,h}: a class that manages a collection of (de-)serialized providers.
requirements-docs.txt: defines the dependencies required to build the docs. extracted from the python bindings' pyproject.toml.

removed files:

.github/ISSUE_TEMPLATE/bindings-bug.md: removed as the bindings are now gone.
gpt4all-chat/src/chatapi.{cpp,h}: replaced by llmodel_openai.{cpp,h}.

changed files:

.circleci/config.yml: removed references to the bindings.
.circleci/continue_config.yml: removed references to the bindings.
.codespellrc: updated whitelist.
MAINTAINERS.md: removed references to the bindings.
README.md: removed references to llama.cpp and the bindings.
common/common.cmake: added color diagnostics for ninja.
gpt4all-chat/CMakeLists.txt: modified lists of source files and dependencies.
gpt4all-chat/deps/CMakeLists.txt: modified dependencies.
gpt4all-chat/qml/AddModelView.qml: added custom providers tab.
gpt4all-chat/qml/AddRemoteModelView.qml: replaced static providers with a dynamic list.
gpt4all-chat/qml/ApplicationSettings.qml: removed llama.cpp-specific settings.
gpt4all-chat/qml/ModelSettings.qml: removed llama.cpp-specific settings.
gpt4all-chat/qml/RemoteModelCard.qml: remote providers now use the classes in gpt4all-chat.
gpt4all-chat/src/chat.{cpp,h}: removed llama.cpp-specific state.
gpt4all-chat/src/chatlistmodel.cpp: bumped .chat file version.
gpt4all-chat/src/chatllm.{cpp,h}: partway through replacing llama.cpp use with ChatLLMInstance use.
gpt4all-chat/src/chatmodel.h: replaced an #include since the referenced code moved.
gpt4all-chat/src/database.cpp: replaced an #include since the referenced code moved.
gpt4all-chat/src/embllm.cpp: partway through replacing llama.cpp-specific code.
gpt4all-chat/src/jinja_helpers.cpp: unnecessary #include change.
gpt4all-chat/src/main.cpp: added a global QNetworkAccessManager, and exposed the ProviderRegistry to QML.
gpt4all-chat/src/modellist.{cpp,h}: moved out OpenAI-specific code and started integrating ModelDescription.
gpt4all-chat/src/mysettings.{cpp,h}: removed llama.cpp-specific settings, added a hardcoded user agent, and started changing generation params.
gpt4all-chat/src/network.cpp: removed llama.cpp-specific analytics.
gpt4all-chat/src/server.{cpp,h}: started adapting to chatllm.{cpp,h} changes.
gpt4all-chat/src/utils.{h,inl}: added more simple utilities.
gpt4all-training/old-README.md: removed references to the bindings.

new deps:

deps/qcoro: a library for building C++20 coroutines (async functions) using Qt's event loop.
gpt4all-backend/deps/date: time zone aware date parsing required for parsing timestamps in ollama responses. will drop when all platforms support __cpp_lib_chrono >= 201907L.
gpt4all-chat/deps/generator: third-party implementation of std::generator. will drop when all platforms support __cpp_lib_generator >= 202207L.

moved deps:

gpt4all-chat/deps/fmt -> deps/fmt: fmt is now used by the backend as well.

changed deps:

gpt4all-chat/deps/minja: changed to fix #include path of nlohmann/json.

Signed-off-by: Jared Van Bortel <[email protected]>

2025 is too soon to use C++ features from 2020 without running into bugs in every build tool that touches the project.

Titaniumtown · 2025-03-20T07:38:26Z

Is this simply going to use a ollama api endpoint? Or is ollama actually integrated inside of gpt4all in this PR?

iwr-redmond · 2025-03-22T21:19:13Z

I have the same question as @Titaniumtown. I recently catalogued Ollama's recurring issues with non-standard installation processes, and wouldn't like to see GPT4all jump into that quagmire.

Titaniumtown · 2025-03-22T21:24:00Z

Thank you @iwr-redmond, issues such as those were going to be my follow up. If somehow ollama can be integrated inside of gpt4all, so it is seemless to the user, I would be in favor. As long as it would be used as simply an abstraction layer to llama.cpp and not an external server you need to connect to.

iwr-redmond · 2025-03-28T16:54:14Z

The Ollama devs have decided to shoot a hole in the screen door and abandon llama.cpp in favor of a custom inference engine. I reckon that pushes this PR into wet shoe territory.

Titaniumtown · 2025-03-28T19:04:15Z

Oh yikes. Ollama is really going down the drain.

vaultdweller-2287 · 2025-06-05T19:53:34Z

This would only be detrimental considering the fact Ollama has spiraled into chaos.

igorschlum · 2025-06-22T20:16:23Z

@Titaniumtown How to build this release? Is there a dmg available for testing for MacOS? thank you!
@vaultdweller-2287 Why do you said that Ollama has spiraled into chaos?

Titaniumtown · 2025-06-23T00:03:46Z

@igorschlum I don't know how to compile it off the top of my head. Maybe read the readme or look at packaging for various distros.

Also this project seems to be abandoned, so look elsewhere.

vaultdweller-2287 · 2025-06-30T17:40:08Z

@igorschlum They're switching the inference engine to their own instead of keeping llama.cpp. While this may seem trivial to be angry at, it does cause a very real issue with no format standardization. I find it more convenient to just check if a model supports GGUF, and not Ollama, or whatever other formats exist.

Source

cebtenzzre added 30 commits February 25, 2025 12:00

WIP: gpt4all backend stub

8e94409

Signed-off-by: Jared Van Bortel <[email protected]>

WIP: remove bindings and all references to them

9bfab99

Signed-off-by: Jared Van Bortel <[email protected]>

WIP: backend dependencies

b194d71

Signed-off-by: Jared Van Bortel <[email protected]>

enable color diagnostics with ninja

c6d0a1f

Signed-off-by: Jared Van Bortel <[email protected]>

WIP: working fmt dep

f4a350d

Signed-off-by: Jared Van Bortel <[email protected]>

ollama-hpp immediately segfaulted. will try something else

729a5b0

Signed-off-by: Jared Van Bortel <[email protected]>

WIP: bring back old backend so we can test the gpt4all-chat build

196c387

WIP: fix import order and enable module scanning

97a2e18

WIP: update llama.cpp submodule for -Werror workaround

98a27c9

WIP: fork fmt to fix missing string_view support

bca31e6

WIP (hit a clang bug causing an incorrect compiler error)

ebe6352

WIP: get build working on macOS

1699e77

fix .gitmodules

39102d8

stop using C++20 modules

407cb81

2025 is too soon to use C++ features from 2020 without running into bugs in every build tool that touches the project.

fix test

e592a58

fix test harder (it works now)

daf48a5

fix #includes

b5144de

parse the JSON response

927e963

WIP: use Boost::json for incremental parsing and reflection

06475dd

rename the class to "OllamaClient"

4c5dcf5

implement and test /api/tags

86de26e

undercores to dashes

e872f1d

base url should include /api/

85eaa41

implement /api/show (not tested)

7ce2ea5

finished initial impl of /show and tested -> hangs!

d4e9a61

fix #includes

cc6f995

fix qcoro infinite recursion

9f130b0

fix json EOF handling

ea2ced8

don't duplicate QCoro's exception passing

068845e

stuff is working now

d20cfbb

cebtenzzre added 10 commits February 28, 2025 12:38

fix handling of responses that come in chunks, and non-200 status codes

1ba555a

WIP

1dc9f22

WIP (clang is crashing)

7745f20

make it build - still plenty of TODOs

f7cd880

WIP (fixing compile errors)

bcbbe51

WIP: fix build

371971e

WIP: need to run rr on a real computer since this bug is confusing

8294a5c

WIP: provider page in the "add models" view

9772027

WIP

b359c92

WIP: provider types are recongized properly in qml

f71f857

iwr-redmond mentioned this pull request Mar 28, 2025

[Refactor] Migrate to Nexa AI llama.cpp build #3547

Open

iwr-redmond mentioned this pull request Apr 11, 2025

Development stopped? #3558

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

WIP: Replace llama.cpp with ollama #3542

WIP: Replace llama.cpp with ollama #3542

Uh oh!

cebtenzzre commented Mar 19, 2025

Uh oh!

Titaniumtown commented Mar 20, 2025

Uh oh!

iwr-redmond commented Mar 22, 2025 •

edited

Loading

Uh oh!

Titaniumtown commented Mar 22, 2025

Uh oh!

iwr-redmond commented Mar 28, 2025

Uh oh!

Titaniumtown commented Mar 28, 2025

Uh oh!

vaultdweller-2287 commented Jun 5, 2025

Uh oh!

igorschlum commented Jun 22, 2025 •

edited

Loading

Uh oh!

Titaniumtown commented Jun 23, 2025

Uh oh!

vaultdweller-2287 commented Jun 30, 2025 •

edited

Loading

Uh oh!

Uh oh!

WIP: Replace llama.cpp with ollama #3542

Are you sure you want to change the base?

WIP: Replace llama.cpp with ollama #3542

Uh oh!

Conversation

cebtenzzre commented Mar 19, 2025

Summary of changes as of 3/19

Uh oh!

Titaniumtown commented Mar 20, 2025

Uh oh!

iwr-redmond commented Mar 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Titaniumtown commented Mar 22, 2025

Uh oh!

iwr-redmond commented Mar 28, 2025

Uh oh!

Titaniumtown commented Mar 28, 2025

Uh oh!

vaultdweller-2287 commented Jun 5, 2025

Uh oh!

igorschlum commented Jun 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Titaniumtown commented Jun 23, 2025

Uh oh!

vaultdweller-2287 commented Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

iwr-redmond commented Mar 22, 2025 •

edited

Loading

igorschlum commented Jun 22, 2025 •

edited

Loading

vaultdweller-2287 commented Jun 30, 2025 •

edited

Loading