Curtailing Ollama memory usage with Avante? #1760

u2mejc · 2025-03-29T00:01:27Z

u2mejc
Mar 29, 2025

I'm testing using Avante locally with Ollama, similar to how I have previously with gen.nvim. Except that with Avante the model's memory balloons out of VRAM as soon as I work with it, making it VERY slow and killing the "vibe" as they say.

Debugging this, I see a gnarly context prompt (not sure if that is from Avante or CodeLlama, but the content, just wow), and I figure it's shipping a snippet or whole of the file too. If there a way I can scope it to limit the context / amount of the file that is sent and reduce the context size? I have a suspicion this is causing a excess of tokens to be loaded in to the model, causing it to expand out of VRAM.

Here's an example: Before 5.7GB 100% GPU, after:

% ollama ps
NAME            ID              SIZE     PROCESSOR          UNTIL
codellama:7b    8fdf8f752f6e    17 GB    77%/23% CPU/GPU    Forever

Ref System:

Linux system running ollama in Docker with nvidia 2060 6GB GPU, Driver Version: 550.120, CUDA Version: 12.4
ollama version is 0.6.2
Avante checked out main@b837274e0fa0fdf13885f533883edeeeb929daee

johnsci911 · 2025-05-22T14:54:05Z

johnsci911
May 22, 2025

I am also experiencing this with my M2 Air 16gb ram. It uses double the amount of ram vs when I use ollama run on terminal. It's also slow on my desktop Hackintosh with 32gb ram and RX580 8gb GPU, ram isn't going up so for sure it uses GPUs ram but I have to wait literally forever before it kicks off.

0 replies

sethCodesSometimes · 2025-12-24T13:55:23Z

sethCodesSometimes
Dec 24, 2025

Did any of you fix it? I'm having the same problem where, with ollama run, I can get around 20-30 tokens per second with high gpu usage but avante I'd estimate is doing maybe 2 tokens per second with low gpu usage

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Curtailing Ollama memory usage with Avante? #1760

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Curtailing Ollama memory usage with Avante? #1760

Uh oh!

u2mejc Mar 29, 2025

Replies: 2 comments

Uh oh!

Uh oh!

johnsci911 May 22, 2025

Uh oh!

sethCodesSometimes Dec 24, 2025

u2mejc
Mar 29, 2025

johnsci911
May 22, 2025

sethCodesSometimes
Dec 24, 2025