Skip to content

Add Ollama as a supported provider #10

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 62 commits into
base: main
Choose a base branch
from

Conversation

ldmosquera
Copy link

@ldmosquera ldmosquera commented Mar 12, 2025

As mentioned in issue #2, Ollama support is valuable for users interested in self-hosted/offline inference.

This PR adds initial support for Ollama including chat completions, streaming and embeddings. No tool support yet, needs further investigation.

PR is a rough draft and will likely take some back and forth to get it merge-ready; more comments to follow.

Closes #2

@ldmosquera
Copy link
Author

ldmosquera commented Mar 12, 2025

This was done copying and adapting from the existing Gemini provider. Tool usage and media was excised for now; I still need to research what Ollama offers and what overlaps within this project. Also, capabilities.rb is full of placeholder stuff until we figure out sane defaults since Ollama allows running arbitrary models instead of a well known list.


Commit 9b387c1 disables all providers by default in models.refresh! unless they are explicitly configured.

The intention is to be able to work on Ollama (or any other specific single provider) without being forced to provide valid keys for EVERY provider, also incurring live calls. In particular, because Ollama doesn't come with any default models and so models.refresh! is mandatory before any usage to populate available models on the server.

As mentioned in the commit message, this might be too intrusive for the intended usage of this project, so an alternative is to only do this when given a specific env var that puts ruby-llm into such a "default offline" mode.

This might interfere with tests and/or the models update rake task, none of which I touched yet since they also require valid keys for all providers.


As for tests, I'd like some guidance into how to implement unit testing and eventually integration testing specifically for Ollama, in a way that does not require configuring and using all APIs (and attending cost).

Since Ollama does not come with default models, I suggest a separate test suite that first ensures that a tiny model is downloaded into the Ollama server via its API.

@jm3
Copy link

jm3 commented Mar 13, 2025

was just about to file an issue for this — awesome!

@hemanth hemanth mentioned this pull request Mar 15, 2025
@crmne crmne added the new provider New provider integration label Mar 21, 2025
@crmne
Copy link
Owner

crmne commented Mar 23, 2025

Two new features to consider in your provider implementation:

  1. Model Aliases: Please add entries for your provider in aliases.json:

    "claude-3-5-sonnet": {
      "anthropic": "claude-3-5-sonnet-20241022",
      "your-provider": "your-provider-specific-id"
    }
  2. Provider Selection: Users will be able to specify your provider:

    chat = RubyLLM.chat(model: 'claude-3-5-sonnet', provider: 'your-provider')

Docs: https://rubyllm.com/guides/models#using-model-aliases

@ldmosquera
Copy link
Author

In general, I understand this can be used to specify a provider when the model is served by more than one provider.

But for that, we have to know for sure that the provider supports that model, and yet in the case of Ollama, there are no guarantees that a given server will have any models installed, thus we can't hardcode any static assumptions into the gem. Rather, the list of models would only be available after querying the server at runtime.

I realize this deviates from the general pattern of the gem which aims to provide a good out of the box experience for well known API providers, but by definition Ollama is not that. It can be used with arbitrary models even beyond Ollama's official library - you can pull from HuggingFace or even create your own models with arbitrary names.

I still think supporting Ollama within reasonable limits is a valuable addition to the gem; we just have to figure out those limits.

@ldmosquera
Copy link
Author

Commit 9b387c1 disables all providers by default in models.refresh! unless they are explicitly configured.
As mentioned in the commit message, this might be too intrusive for the intended usage of this project, so an alternative is to only do this when given a specific env var that puts ruby-llm into such a "default offline" mode.

Any guidance on this?

As for tests, I'd like some guidance into how to implement unit testing and eventually integration testing specifically for Ollama, in a way that does not require configuring and using all APIs (and attending cost).

And on this?

I didn't consider it originally, but I see now that the test suite has no support for being run without valid keys for all providers, which is a barrier for me.

Is there any chance that the test suite can be mocked as much as possible (requests for metadata, etc) such that only tests that NEED to run API inference fail? This is the only way I can see to mitigate the key requirement problem while still running as much of the test suite as possible.

Another option is to create an entirely separate suit of tests just for the Ollama provider, but the reduced integration test coverage is obviously problematic even on paper.

@tpaulshippy
Copy link
Contributor

tpaulshippy commented Mar 24, 2025

Because of the recently added VCR support I was able to get the tests to pass (with empty strings in all the API key environment variables). Wish the tests did not require these, but it's an easy fix.

@ldmosquera
Copy link
Author

Excellent! I somehow missed that commit earlier.

I'm now unblocked and will work on tests for this provider.

@ldmosquera
Copy link
Author

ldmosquera commented Mar 25, 2025

Tests added, but a few provisional hacks are still in place and will likely need revisiting:

Taking a step back, all of these hacks have to do with the requirement for valid API keys when using models.refresh!. This is a blocker for developers without keys for everything (an onus which will only grow with each provider contributed in), and it also creates friction for privacy minded users who want to use Ollama locally with ZERO traffic to cloud providers.

It's a bit of a rough marriage, but we can make it work :D We just need to figure out a sane way to conditionally turn off all but one provider.


All current tests pass when relying on VCR cassettes.

@crmne
Copy link
Owner

crmne commented Mar 25, 2025

Valid point @ldmosquera! I'm working on a solution, will commit it on main soon.

@crmne
Copy link
Owner

crmne commented Mar 25, 2025

Added configuration requirements handling in 75f99a1

Each provider now specifies what configuration is required via a simple configuration_requirements method (you will need to implement this in your main provider file) that returns an array of config keys as symbols. The Provider module uses this to:

  1. Determine if a provider is properly configured
  2. Throw an error if you're trying to use that provider without configuration
  3. Include ready-to-paste configuration code in the error message
  4. Skip unconfigured providers during model refresh while preserving their models

Example of the new error messages:

RubyLLM::ConfigurationError: anthropic provider is not configured. Add this to your initialization:

RubyLLM.configure do |config|
  config.anthropic_api_key = ENV['ANTHROPIC_API_KEY']
end

@ldmosquera
Copy link
Author

ldmosquera commented Mar 25, 2025

Smashing 😎

All major hacks dropped, save for RubyLLM::Provider.disable_all_providers (see below).

@ldmosquera ldmosquera force-pushed the ollama-provider branch 3 times, most recently from 62f6859 to 18978b5 Compare March 25, 2025 14:43
Copy link
Owner

@crmne crmne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation path is straightforward - Ollama is just another provider that needs to:

  1. Implement the standard provider interface
  2. Override specific methods where needed
  3. Use the existing test infrastructure
  4. Get documented like other providers

I think the only change w.r.t. to other providers is that in the documentation we should add that refreshing models should be done prior to using the API.

Happy to help out. The goal is making Ollama work cleanly within RubyLLM's existing patterns.

 Conflicts:
  docs/guides/getting-started.md
  lib/ruby_llm.rb
  lib/ruby_llm/configuration.rb
  spec/fixtures/vcr_cassettes/models_refresh_updates_models_and_returns_a_chainable_models_instance.yml
  spec/fixtures/vcr_cassettes/models_refresh_works_as_a_class_method_too.yml
  spec/ruby_llm/chat_spec.rb
  spec/ruby_llm/chat_streaming_spec.rb
  spec/ruby_llm/chat_tools_spec.rb
All cassettes for Ollama are up to date; I manually copied the new
Bedrock responses for model listings.

VCR data will be synthetic until someone can run tests with all
providers properly enabled, including a local Ollama server having run
the rake task to get test models.
@ldmosquera
Copy link
Author

ldmosquera commented Apr 2, 2025

Merged main and solved conflicts.

Echoing the commit message in ba56fa2 for visibility:

All cassettes for Ollama are up to date; I manually copied the new
Bedrock responses for model listings.

VCR data will be synthetic until someone can run tests with all
providers properly enabled, including a local Ollama server having run
the rake task to get test models.

@frimmy
Copy link

frimmy commented Apr 6, 2025

@crmne @ldmosquera just dropping appreciation! 🙇 🙏

Conflicts:
  spec/fixtures/vcr_cassettes/models_refresh_updates_models_and_returns_a_chainable_models_instance.yml
  spec/fixtures/vcr_cassettes/models_refresh_works_as_a_class_method_too.yml
  spec/ruby_llm/chat_content_spec.rb
  spec/ruby_llm/chat_spec.rb
  spec/ruby_llm/chat_streaming_spec.rb
  spec/ruby_llm/chat_tools_spec.rb
  spec/ruby_llm/embeddings_spec.rb
  spec/spec_helper.rb
In retrospect, this is too automagic and ultimately not needed at the
library level.

This reverts commit e810a7a.
@crmne
Copy link
Owner

crmne commented Apr 17, 2025

@ldmosquera I'd love to see Ollama support merged soon. Do you think you can resolve the conflicts? Once you're ready set this PR as ready for review and I'll take a look.

Conflicts:
  docs/guides/getting-started.md
  spec/ruby_llm/chat_content_spec.rb
  spec/ruby_llm/chat_spec.rb
  spec/ruby_llm/chat_streaming_spec.rb
  spec/ruby_llm/chat_tools_spec.rb
After tests were changed to use `gpt-4.1-nano` in 7017dcf, these
cassettes were not refreshed:
- `spec/fixtures/vcr_cassettes/models_refresh_updates_models_and_returns_a_chainable_models_instance.yml`
- `spec/fixtures/vcr_cassettes/models_refresh_works_as_a_class_method_too.yml`

My branch does `models.refresh!` into a cassette named `spec/fixtures/vcr_cassettes/initial_model_refresh.yml`
where I have been copying responses from the above original files, but right now
I don't have the model list responses for the new `gpt-4.1-nano` ones and tests
can't run with unknown models.

The proper solution is to delete all these files then re-run the tests to get
fresh data from each source, but meanwhile, this commit jury-rigs nano into
the response.
Llama-3.1 has trouble with tool usage with default temperatures; it will
sometimes use available tools but most times it will opt not to and
hallucinate instead, making the tests brittle. And lower temp mitigates this
and will probably reflect real world usage anyway.
@ldmosquera ldmosquera marked this pull request as ready for review April 17, 2025 14:10
@ldmosquera
Copy link
Author

Ready for review; I imagine there will be some more back and forth before it's mergeable. Apologies and thanks in advance 😬

Conflicts:
  docs/installation.md
  lib/ruby_llm.rb
  lib/tasks/model_updater.rb
  spec/ruby_llm/chat_content_spec.rb
  spec/ruby_llm/chat_spec.rb
  spec/ruby_llm/chat_streaming_spec.rb
  spec/ruby_llm/chat_tools_spec.rb
@ldmosquera
Copy link
Author

Merged main; diff to yesterday here.

@khasinski
Copy link

Just curious - couldn't it be easier to use OpenAI's compatibility layer in Ollama? https://ollama.com/blog/openai-compatibility

The same worked for OpenRouter, so maybe it's also the case here.

@crmne
Copy link
Owner

crmne commented Apr 22, 2025

Hey @khasinski I'm pondering the same. The reason why I preferred a true Ollama implementation are summed up perfectly by Ollama themselves:

This is initial experimental support for the OpenAI API. Future improvements under consideration include:

Embeddings API
Function calling
Vision support

However, we could simply test these assumptions and roll it out as OpenAI compatible if everything works.

Case in point OpenRouter claims that they don't support PDFs in their documentation, but you can check out the tests I just committed in RubyLLM main since they do work!

@ldmosquera
Copy link
Author

ldmosquera commented Apr 22, 2025

At this point the PR has full coverage for the Ollama API, including repeatable tests plus a rake task that automates downloading the models needed for those tests.

If we were to scale back to simply using Ollama's OpenAI compatibility layer, some Ollama-specific code for parsing responses could be dropped, but other code for querying models would still need to remain. Tests would need to be rethought.

That, along with the fact the Ollama's OAI compatibility is not yet complete as mentioned above, makes think we should continue targeting for Ollama API at least for now. I can commit to watch for incoming Ollama-related issues and provide support.


This could be revisited in the future; over time it would be great if we could have generic live model querying capabilities, so that OpenAI compatibility can be used to point at any compatible self-hosted engines like Ollama, KoboldCPP, LlamaCPP, TabbyAPI and so on.

The messy thing is still dealing with uncertain self-hosted model capabitilies - this is inherent, as people can stand up models with any context size, with or without media projector in the case of GGUF based vision models, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new provider New provider integration
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Local Model Support via Ollama Integration
6 participants