Skip to content

Conversation

shanbady
Copy link
Contributor

What are the relevant tickets?

Closes https://github.com/mitodl/hq/issues/8429

Description (What does it do?)

This PR makes it so we cache the dense encoder instance which avoids unnecessary calls to litellm endpoints and also alleviates us from having to pass around a dense encoder instance since calling the dense_encoder method will have no performance hit.

How can this be tested?

  1. checkout main
  2. try instantiating the dense encoder and see that there are calls to either litellm/ollama or openai depending on your local setup:
from vector_search.utils import dense_encoder

encoder = dense_encoder()
[2025-10-20 17:47:16] WARNING 7118 [root] litellm.py:25 - [0c41fc84b062] - Model nomic-embed-text not found in tiktoken. defaulting to None

encoder = dense_encoder()
[2025-10-20 17:47:20] WARNING 7118 [root] litellm.py:25 - [0c41fc84b062] - Model nomic-embed-text not found in tiktoken. defaulting to None

encoder = dense_encoder()
[2025-10-20 17:47:20] WARNING 7118 [root] litellm.py:25 - [0c41fc84b062] - Model nomic-embed-text not found in tiktoken. defaulting to None
  1. checkout this branch and repeat the same and note there is only one call to the endpoint

@shanbady shanbady added the Needs Review An open Pull Request that is ready for review label Oct 20, 2025
@shanbady shanbady marked this pull request as ready for review October 20, 2025 17:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Needs Review An open Pull Request that is ready for review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant