Letta ignores the "OLLAMA_BASE_URL" env variable, and try to use OpenAI instead #2388

NeilSCGH · 2025-01-24T21:56:14Z

Describe the bug
Letta ignores the OLLAMA_BASE_URL environment variable, and try to use OpenAI.
The only model available in the new ADE is "letta-free", despite the fact that I use -e OLLAMA_BASE_URL="http://host.docker.internal:11434" when running Letta with docker.

Is it possible to run Letta with Ollama only? Without an OpenAI API key?

Please describe your setup

Describe your setup
- OS => Ubuntu 24.04.1
How did you install letta?
- I've followed the documentation here and I've runned

sudo docker run \
 -v ~/.letta/.persist/pgdata:/var/lib/postgresql/data \
 -p 8283:8283 \
 -e OLLAMA_BASE_URL="http://host.docker.internal:11434" \
 letta/letta:latest

Logs
No interesting logs when I start the server, but when I try to talk to the chatbot when creating an agent with the "letta-free" model, I have these errors:

Letta.letta.agent - ERROR - step() failed with an unrecognized exception: 'The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable'
Letta.letta.server.server - ERROR - Error in server._step: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable
Traceback (most recent call last):

...

  File "/app/.venv/lib/python3.11/site-packages/openai/_client.py", line 110, in __init__
    raise OpenAIError(
openai.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable
None
Traceback (most recent call last):

...

  File "/app/.venv/lib/python3.11/site-packages/openai/_client.py", line 110, in __init__
    raise OpenAIError(
openai.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable
/app/letta/server/rest_api/utils.py:112: UserWarning: SSE stream generator failed: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable
  warnings.warn(f"SSE stream generator failed: {e}")

Additional context
When I run

sudo docker run \
 -v ~/.letta/.persist/pgdata:/var/lib/postgresql/data \
 -p 8283:8283 \
 -e OPENAI_API_KEY="aaaaaaaaaaaaaaaaaaaaaaaa" \
 letta/letta:latest

With a dummy OpenAI token, I now have the following error in the log

/app/letta/server/server.py:1049: UserWarning: An error occurred while listing LLM models for provider id=None name='openai' api_key='aaaaaaaaaaaaaaaaaaaaaaaa' organization_id=None updated_at=None base_url='https://api.openai.com/v1': 401 Client Error: Unauthorized for url: https://api.openai.com/v1/models
  warnings.warn(f"An error occurred while listing LLM models for provider {provider}: {e}")

So Letta can see my environment variables, but the OLLAMA_BASE_URL one is still ignored.

Letta Config
Default initial config, runned letta for the first time.

Local LLM details

I have a working Ollama instance in ubuntu 24.04. (http://localhost:11434/ works)
I want to use llama3.2 but that not the problem here

The text was updated successfully, but these errors were encountered:

sarahwooders · 2025-01-24T22:02:34Z

Thanks for reporting this, we are looking into it now!

cpacker · 2025-01-24T23:38:59Z

Hi @NeilSCGH - thank you so much for trying Letta and your bug report!!

Can you confirm what version of Letta you're on? You should see a version message print out at the top of the server logs.

I'm on the latest version and am not able to reproduce your bug.

For reference, this is the output of ollama list on my test device:

ollama list
NAME                          ID              SIZE      MODIFIED      
deepseek-r1:7b                0a8c26691023    4.7 GB    3 days ago       
nous-hermes:latest            4bfb8ab0bd02    3.8 GB    3 months ago     
...

To grab the latest version:

docker pull letta/letta:latest

Next, I run this:

docker run \
  -v ~/.letta/.persist/pgdata:/var/lib/postgresql/data \
  -p 8283:8283 \
  -e OLLAMA_BASE_URL="http://host.docker.internal:11434" \
  letta/letta:latest

My server logs look like this:

...

Using internal PostgreSQL at: postgresql://letta:letta@localhost:5432/letta
Attempting to migrate database...
Using database:  postgresql://letta:letta@localhost:5432/letta
INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
Database migration completed successfully.
Starting Letta server at http://0.0.0.0:8283...
Executing: letta server --host 0.0.0.0 --port 8283
Creating engine postgresql://letta:letta@localhost:5432/letta

[[ Letta server // v0.6.15 ]]
▶ Server running at: http://0.0.0.0:8283
▶ View using ADE at: https://app.letta.com/development-servers/local/dashboard

INFO:     Started server process [1]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8283 (Press CTRL+C to quit)

...

And when I create an agent in the ADE and click on the model dropdown, I can see all my Ollama models (as well as the free endpoint that we host):

NeilSCGH · 2025-01-25T14:28:25Z

Hello,
Thanks for your quick response.

Letta version: ok

When running docker, it uses the latest version of Letta server: v0.6.15.
Here are the startup logs:

No external Postgres configuration detected, starting internal PostgreSQL...

PostgreSQL Database directory appears to contain a database; Skipping initialization

localhost:5432 - no response
Waiting for PostgreSQL to be ready...
2025-01-25 13:42:22.661 UTC [7] LOG:  starting PostgreSQL 15.4 (Debian 15.4-2.pgdg120+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit
2025-01-25 13:42:22.661 UTC [7] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2025-01-25 13:42:22.661 UTC [7] LOG:  listening on IPv6 address "::", port 5432
2025-01-25 13:42:22.663 UTC [7] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2025-01-25 13:42:22.667 UTC [32] LOG:  database system was interrupted; last known up at 2025-01-25 13:40:07 UTC
2025-01-25 13:42:23.470 UTC [32] LOG:  database system was not properly shut down; automatic recovery in progress
2025-01-25 13:42:23.472 UTC [32] LOG:  redo starts at 0/1B22410
2025-01-25 13:42:23.472 UTC [32] LOG:  invalid record length at 0/1B31670: wanted 24, got 0
2025-01-25 13:42:23.472 UTC [32] LOG:  redo done at 0/1B31638 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
2025-01-25 13:42:23.476 UTC [30] LOG:  checkpoint starting: end-of-recovery immediate wait
2025-01-25 13:42:23.490 UTC [30] LOG:  checkpoint complete: wrote 16 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.003 s, sync=0.006 s, total=0.016 s; sync files=10, longest=0.002 s, average=0.001 s; distance=60 kB, estimate=60 kB
2025-01-25 13:42:23.494 UTC [7] LOG:  database system is ready to accept connections
localhost:5432 - accepting connections
Using internal PostgreSQL at: postgresql://letta:letta@localhost:5432/letta
Attempting to migrate database...
Using database:  postgresql://letta:letta@localhost:5432/letta
INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
Database migration completed successfully.
Starting Letta server at http://0.0.0.0:8283...
Executing: letta server --host 0.0.0.0 --port 8283
Creating engine postgresql://letta:letta@localhost:5432/letta

[[ Letta server // v0.6.15 ]]
▶ Server running at: http://0.0.0.0:8283
▶ View using ADE at: https://app.letta.com/development-servers/local/dashboard

INFO:     Started server process [1]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8283 (Press CTRL+C to quit)

Problem reaching the Ollama server

I noticed something, when loading https://app.letta.com/, I have this in the logs:

/app/letta/server/server.py:1049: UserWarning: An error occurred while listing LLM models for provider id=None name='ollama' api_key=None organization_id=None updated_at=None base_url='http://host.docker.internal:11434' default_prompt_formatter='chatml': HTTPConnectionPool(host='host.docker.internal', port=11434): Max retries exceeded with url: /api/tags (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x7a5a061bbb50>: Failed to resolve 'host.docker.internal' ([Errno -2] Name or service not known)"))
  warnings.warn(f"An error occurred while listing LLM models for provider {provider}: {e}")
INFO:     172.17.0.1:47698 - "GET /v1/models/ HTTP/1.1" 200 OK

So Letta tries to use Ollama, but it failed to resolve 'host.docker.internal'. When searching this issue on google, it seems that host.docker.internal only works for Windows and Mac, but not Ubuntu.

I've then tried to change http://host.docker.internal:11434 to http://localhost:11434 and I got this in the logs:

/app/letta/server/server.py:1061: UserWarning: An error occurred while listing embedding models for provider id=None name='ollama' api_key=None organization_id=None updated_at=None base_url='http://localhost:11434' default_prompt_formatter='chatml': HTTPConnectionPool(host='localhost', port=11434): Max retries exceeded with url: /api/tags (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7b2ed680f050>: Failed to establish a new connection: [Errno 111] Connection refused'))
  warnings.warn(f"An error occurred while listing embedding models for provider {provider}: {e}")

At least a Connection refused error means that Letta can reach Ollama.
I realized that Ollama was only allowing requests from localhost (my phone also had connection refused when requesting ollama on my computer), so I followed these steps to allow all interfaces: ollama/ollama#703 (comment).

It solved the problem for my phone, but Letta is still having the Connection refused' error.

If I'm correcting with the fact that host.docker.internal doesn't works out of the box in Ubuntu, it would be a good point to add to the docs. Option 1 : make host.docker.internal works in Ubuntu by editing the hosts maybe, or Option 2 using localhost, but tell people to configure Ollama to allow all interfaces.

I'm still stuck here at the moment.

Letta is fallbacking to OpenAI, even if there is no OpenAI token given

Regarding the OpenAI error, I still have this error when trying to talk to the agent:

openai.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable
/app/letta/server/rest_api/utils.py:112: UserWarning: SSE stream generator failed: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable
  warnings.warn(f"SSE stream generator failed: {e}")

So it seems that Letta tries to use OpenAI if the Ollama connection fails. I think it will be better to not fallback to OpenAI, because if I haven't added an OpenAI token when running Letta, it's useless to try OpenAI as it will not work. And the resulting error is confusing.
For me the correct error when trying to talk to an agent in this case would be the Connection refused error from Ollama, because this is the real error.

NeilSCGH · 2025-01-25T14:44:49Z

Edit:
I've check the logs of ollama with journalctl -u ollama --no-pager, and the ollama was not getting any requests from Letta. So the http://localhost:11434 address wasn't working, but with the direct ip of my computer, it worked.

sudo docker run \
 -v ~/.letta/.persist/pgdata:/var/lib/postgresql/data \
 -p 8283:8283 \
 -e OLLAMA_BASE_URL="http://192.168.1.15:11434" \
 letta/letta:latest

And now I can see the list of my models in the Letta Ade Page :)

The good solution is probably to make host.docker.internal points to the private ip of my Ubuntu hosting Ollama.

Now I have a new problem haha, when talking to the agent, it makes my ollama server crash. Here are the Letta logs:

Exception: API call got non-200 response code (code=500, msg={"error":"model requires more system memory (16.0 GiB) than is available (13.1 GiB)"}) for address: http://192.168.1.15:11434/api/generate. Make sure that the ollama API server is running and reachable at http://192.168.1.15:11434/api/generate.
/app/letta/server/rest_api/utils.py:112: UserWarning: SSE stream generator failed: API call got non-200 response code (code=500, msg={"error":"model requires more system memory (16.0 GiB) than is available (13.1 GiB)"}) for address: http://192.168.1.15:11434/api/generate. Make sure that the ollama API server is running and reachable at http://192.168.1.15:11434/api/generate.
  warnings.warn(f"SSE stream generator failed: {e}")

Here are the Ollama logs:

time=2025-01-25T15:38:16.254+01:00 level=INFO source=sched.go:428 msg="NewLlamaServer failed" model=/usr/share/ollama/.ollama/models/blobs/sha256-dde5aa3fc5ffc17176b5e8bdc82f587b24b2678c6c66101bf7da77af9f7ccdff error="model requires more system memory (16.0 GiB) than is available (13.1 GiB)"

The llama3.2 models perfectly works with the terminal and with Open Webui.
Does Letta need more ram to run the models? Any idea here?

cpacker · 2025-01-25T19:20:14Z

Amazing, thank you for all the debugging notes!! Looks like we should definitely update our ollama docs to have special instructions for ubuntu/linux.

The llama3.2 models perfectly works with the terminal and with Open Webui. Does Letta need more ram to run the models? Any idea here?

The minimum payload / context window length of a Letta agent will be significantly more tokens than the initial state of an ollama chat in the CLI. You can see the token count on the top right of the ADE at any given point in time:

So what's like happening is when you use llama 3.2 (whatever quant ollama downloaded for you), in the ollama CLI it's like <500 tokens, and more tokens = more RAM required, and it happens that <500 tokens is enough to not cause "spillover" / RAM swapping, which makes things appear super slow / sluggish.

Whereas with the Letta agent, you're sending a payload of ~2000 tokens into ollama, which is significantly more RAM, and it happens to cross a threshold that causes spillover. With Letta we try to make it super clear what's going on in the context window, which is why we have the context window viewer. With ollama, you can probably inject a dump somewhere (or run in debug mode) to see the final formatted prompt (eg in llama.cpp you'll see this by default in the server trace).

You can "fix" this with more memory or a lower model quant (so it'll require less memory). Hope this helps! LMK if you have any other questions.

cpacker self-assigned this Jan 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Letta ignores the "OLLAMA_BASE_URL" env variable, and try to use OpenAI instead #2388

Letta ignores the "OLLAMA_BASE_URL" env variable, and try to use OpenAI instead #2388

NeilSCGH commented Jan 24, 2025 •

edited

Loading

sarahwooders commented Jan 24, 2025

cpacker commented Jan 24, 2025

NeilSCGH commented Jan 25, 2025

NeilSCGH commented Jan 25, 2025 •

edited

Loading

cpacker commented Jan 25, 2025

Letta ignores the "OLLAMA_BASE_URL" env variable, and try to use OpenAI instead #2388

Letta ignores the "OLLAMA_BASE_URL" env variable, and try to use OpenAI instead #2388

Comments

NeilSCGH commented Jan 24, 2025 • edited Loading

sarahwooders commented Jan 24, 2025

cpacker commented Jan 24, 2025

NeilSCGH commented Jan 25, 2025

Letta version: ok

Problem reaching the Ollama server

Letta is fallbacking to OpenAI, even if there is no OpenAI token given

NeilSCGH commented Jan 25, 2025 • edited Loading

cpacker commented Jan 25, 2025

NeilSCGH commented Jan 24, 2025 •

edited

Loading

NeilSCGH commented Jan 25, 2025 •

edited

Loading