Replies: 1 comment
-
this is such a clean proposal — and honestly, you're not wrong: being able to explicitly control num_thread per call (rather than relying on global default) has a massive impact on indexing performance, especially when running mixed loads or testing different quant models. i’ve run into similar bottlenecks before where LLM calls saturated CPU threads unintentionally, just because defaults weren’t adaptive. and patching this manually every time got old fast. also… small note — depending on how LightRAG wires up the config layer, you might need to guard against .env getting loaded after the internal Ollama client gets instantiated. we once hit a silent override because thread count was frozen too early in the call graph. just tossing that in case it saves someone a few hours of head-scratching later. if this ends up moving forward, i’d be super curious to test on a few fringe setups. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
It would be helpful to add a setting to the LightRAG
.env
file that allows users to specify the number of CPU threads passed to Ollama in LLM requests (e.g., for generating completions and embeddings).Ollama can take a
num_thread
parameter in the API that controls how many cores it uses for each request. Right now, there’s no way to set this from the LightRAG configuration, so it always defaults to whatever system or global setting is in place. For high-load indexing or tuning performance, it’s useful to be able to control this per run or workspace.Proposal
• Introduce a new optional setting in
.env
, something likeOLLAMA_NUM_THREAD=8
.• On each request to Ollama, if this variable is set, LightRAG includes
"num_thread": <value>
in the API call.• If it’s left unset, default behavior should be unchanged.
Benefits
• Easier performance tuning for different hardware (especially multi-core CPUs like M1/M2/M3).
• More predictable and efficient CPU allocation when running multiple jobs or sharing a system.
• Avoids needing to manually patch code or set shell variables every time.
Beta Was this translation helpful? Give feedback.
All reactions