-
-
Notifications
You must be signed in to change notification settings - Fork 95
Add Ollama as a supported provider #10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This was done copying and adapting from the existing Gemini provider. Tool usage and media was excised for now; I still need to research what Ollama offers and what overlaps within this project. Also, Commit 9b387c1 disables all providers by default in The intention is to be able to work on Ollama (or any other specific single provider) without being forced to provide valid keys for EVERY provider, also incurring live calls. In particular, because Ollama doesn't come with any default models and so As mentioned in the commit message, this might be too intrusive for the intended usage of this project, so an alternative is to only do this when given a specific env var that puts ruby-llm into such a "default offline" mode. This might interfere with tests and/or the models update rake task, none of which I touched yet since they also require valid keys for all providers. As for tests, I'd like some guidance into how to implement unit testing and eventually integration testing specifically for Ollama, in a way that does not require configuring and using all APIs (and attending cost). Since Ollama does not come with default models, I suggest a separate test suite that first ensures that a tiny model is downloaded into the Ollama server via its API. |
was just about to file an issue for this — awesome! |
Two new features to consider in your provider implementation:
|
In general, I understand this can be used to specify a provider when the model is served by more than one provider. But for that, we have to know for sure that the provider supports that model, and yet in the case of Ollama, there are no guarantees that a given server will have any models installed, thus we can't hardcode any static assumptions into the gem. Rather, the list of models would only be available after querying the server at runtime. I realize this deviates from the general pattern of the gem which aims to provide a good out of the box experience for well known API providers, but by definition Ollama is not that. It can be used with arbitrary models even beyond Ollama's official library - you can pull from HuggingFace or even create your own models with arbitrary names. I still think supporting Ollama within reasonable limits is a valuable addition to the gem; we just have to figure out those limits. |
Any guidance on this?
And on this? I didn't consider it originally, but I see now that the test suite has no support for being run without valid keys for all providers, which is a barrier for me. Is there any chance that the test suite can be mocked as much as possible (requests for metadata, etc) such that only tests that NEED to run API inference fail? This is the only way I can see to mitigate the key requirement problem while still running as much of the test suite as possible. Another option is to create an entirely separate suit of tests just for the Ollama provider, but the reduced integration test coverage is obviously problematic even on paper. |
Because of the recently added VCR support I was able to get the tests to pass (with empty strings in all the API key environment variables). Wish the tests did not require these, but it's an easy fix. |
Excellent! I somehow missed that commit earlier. I'm now unblocked and will work on tests for this provider. |
6bcdfa2
to
31034be
Compare
Tests added, but a few provisional hacks are still in place and will likely need revisiting: Taking a step back, all of these hacks have to do with the requirement for valid API keys when using It's a bit of a rough marriage, but we can make it work :D We just need to figure out a sane way to conditionally turn off all but one provider. All current tests pass when relying on VCR cassettes. |
Valid point @ldmosquera! I'm working on a solution, will commit it on |
Added configuration requirements handling in 75f99a1 Each provider now specifies what configuration is required via a simple
Example of the new error messages: RubyLLM::ConfigurationError: anthropic provider is not configured. Add this to your initialization:
RubyLLM.configure do |config|
config.anthropic_api_key = ENV['ANTHROPIC_API_KEY']
end |
31034be
to
c8cd658
Compare
Smashing 😎 All major hacks dropped, save for |
62f6859
to
18978b5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation path is straightforward - Ollama is just another provider that needs to:
- Implement the standard provider interface
- Override specific methods where needed
- Use the existing test infrastructure
- Get documented like other providers
I think the only change w.r.t. to other providers is that in the documentation we should add that refreshing models should be done prior to using the API.
Happy to help out. The goal is making Ollama work cleanly within RubyLLM's existing patterns.
Conflicts: docs/guides/getting-started.md lib/ruby_llm.rb lib/ruby_llm/configuration.rb spec/fixtures/vcr_cassettes/models_refresh_updates_models_and_returns_a_chainable_models_instance.yml spec/fixtures/vcr_cassettes/models_refresh_works_as_a_class_method_too.yml spec/ruby_llm/chat_spec.rb spec/ruby_llm/chat_streaming_spec.rb spec/ruby_llm/chat_tools_spec.rb
All cassettes for Ollama are up to date; I manually copied the new Bedrock responses for model listings. VCR data will be synthetic until someone can run tests with all providers properly enabled, including a local Ollama server having run the rake task to get test models.
Merged main and solved conflicts. Echoing the commit message in ba56fa2 for visibility:
|
@crmne @ldmosquera just dropping appreciation! 🙇 🙏 |
Conflicts: spec/fixtures/vcr_cassettes/models_refresh_updates_models_and_returns_a_chainable_models_instance.yml spec/fixtures/vcr_cassettes/models_refresh_works_as_a_class_method_too.yml spec/ruby_llm/chat_content_spec.rb spec/ruby_llm/chat_spec.rb spec/ruby_llm/chat_streaming_spec.rb spec/ruby_llm/chat_tools_spec.rb spec/ruby_llm/embeddings_spec.rb spec/spec_helper.rb
In retrospect, this is too automagic and ultimately not needed at the library level. This reverts commit e810a7a.
@ldmosquera I'd love to see Ollama support merged soon. Do you think you can resolve the conflicts? Once you're ready set this PR as ready for review and I'll take a look. |
Conflicts: docs/guides/getting-started.md spec/ruby_llm/chat_content_spec.rb spec/ruby_llm/chat_spec.rb spec/ruby_llm/chat_streaming_spec.rb spec/ruby_llm/chat_tools_spec.rb
After tests were changed to use `gpt-4.1-nano` in 7017dcf, these cassettes were not refreshed: - `spec/fixtures/vcr_cassettes/models_refresh_updates_models_and_returns_a_chainable_models_instance.yml` - `spec/fixtures/vcr_cassettes/models_refresh_works_as_a_class_method_too.yml` My branch does `models.refresh!` into a cassette named `spec/fixtures/vcr_cassettes/initial_model_refresh.yml` where I have been copying responses from the above original files, but right now I don't have the model list responses for the new `gpt-4.1-nano` ones and tests can't run with unknown models. The proper solution is to delete all these files then re-run the tests to get fresh data from each source, but meanwhile, this commit jury-rigs nano into the response.
Llama-3.1 has trouble with tool usage with default temperatures; it will sometimes use available tools but most times it will opt not to and hallucinate instead, making the tests brittle. And lower temp mitigates this and will probably reflect real world usage anyway.
Ready for review; I imagine there will be some more back and forth before it's mergeable. Apologies and thanks in advance 😬 |
Conflicts: spec/ruby_llm/chat_tools_spec.rb
Conflicts: spec/ruby_llm/chat_tools_spec.rb
bf80ff9
to
591668c
Compare
Conflicts: docs/installation.md lib/ruby_llm.rb lib/tasks/model_updater.rb spec/ruby_llm/chat_content_spec.rb spec/ruby_llm/chat_spec.rb spec/ruby_llm/chat_streaming_spec.rb spec/ruby_llm/chat_tools_spec.rb
Merged main; diff to yesterday here. |
Just curious - couldn't it be easier to use OpenAI's compatibility layer in Ollama? https://ollama.com/blog/openai-compatibility The same worked for OpenRouter, so maybe it's also the case here. |
Hey @khasinski I'm pondering the same. The reason why I preferred a true Ollama implementation are summed up perfectly by Ollama themselves:
However, we could simply test these assumptions and roll it out as OpenAI compatible if everything works. Case in point OpenRouter claims that they don't support PDFs in their documentation, but you can check out the tests I just committed in RubyLLM |
At this point the PR has full coverage for the Ollama API, including repeatable tests plus a rake task that automates downloading the models needed for those tests. If we were to scale back to simply using Ollama's OpenAI compatibility layer, some Ollama-specific code for parsing responses could be dropped, but other code for querying models would still need to remain. Tests would need to be rethought. That, along with the fact the Ollama's OAI compatibility is not yet complete as mentioned above, makes think we should continue targeting for Ollama API at least for now. I can commit to watch for incoming Ollama-related issues and provide support. This could be revisited in the future; over time it would be great if we could have generic live model querying capabilities, so that OpenAI compatibility can be used to point at any compatible self-hosted engines like Ollama, KoboldCPP, LlamaCPP, TabbyAPI and so on. The messy thing is still dealing with uncertain self-hosted model capabitilies - this is inherent, as people can stand up models with any context size, with or without media projector in the case of GGUF based vision models, etc. |
As mentioned in issue #2, Ollama support is valuable for users interested in self-hosted/offline inference.
This PR adds initial support for Ollama including chat completions, streaming and embeddings. No tool support yet, needs further investigation.
PR is a rough draft and will likely take some back and forth to get it merge-ready; more comments to follow.
Closes #2