-
-
Notifications
You must be signed in to change notification settings - Fork 5.9k
Description
crawl4ai version
0.7.7
Expected Behavior
When EmbeddingStrategy is initialized without an explicit embedding_llm_config (or with None), I expect it to faithfully return None from _get_embedding_llm_config_dict(). This would allow the downstream get_text_embeddings utility to correctly switch to the local sentence-transformers implementation.
Current Behavior
The method _get_embedding_llm_config_dict contains a fallback mechanism that forces an OpenAI configuration if self.config is present but has no LLM config.
Relevant code in adaptive_crawler.py (lines 633-644):
def _get_embedding_llm_config_dict(self) -> Dict:
# ... check self.config ...
# Fallback to default if no config provided
return {
'provider': 'openai/text-embedding-3-small',
'api_token': os.getenv('OPENAI_API_KEY')
}Because this method always returns a dict, the logic in utils.py's get_text_embeddings never reaches the else block for local embeddings:
# utils.py
if llm_config is not None:
# Uses LiteLLM (calls OpenAI)
else:
# Uses sentence-transformers (UNREACHABLE via EmbeddingStrategy)This effectively makes local embeddings dead code when using AdaptiveCrawler.
Is this reproducible?
Yes
Inputs Causing the Bug
- **Config**: `AdaptiveConfig(strategy="embedding", embedding_llm_config=None)` (intended for local mode).Steps to Reproduce
1. Install `crawl4ai` with `sentence-transformers`.
2. Configure `AdaptiveCrawler` to use `"embedding"` strategy without providing `embedding_llm_config`.
3. Run `crawler.digest(...)`.
4. **Expected**: Uses local model.
5. **Actual**: Fails with OpenAI authentication error (or attempts to use OpenAI if key is in env), ignoring the user's intent to use local embeddings.Code snippets
OS
Windows 11
Python version
3.11
Browser
No response
Browser version
No response
Error logs & Screenshots (if applicable)
No response