Skip to content

[Bug]: Fix EmbeddingStrategy._get_embedding_llm_config_dict ignoring local embeddings #1658

@blghtr

Description

@blghtr

crawl4ai version

0.7.7

Expected Behavior

When EmbeddingStrategy is initialized without an explicit embedding_llm_config (or with None), I expect it to faithfully return None from _get_embedding_llm_config_dict(). This would allow the downstream get_text_embeddings utility to correctly switch to the local sentence-transformers implementation.

Current Behavior

The method _get_embedding_llm_config_dict contains a fallback mechanism that forces an OpenAI configuration if self.config is present but has no LLM config.

Relevant code in adaptive_crawler.py (lines 633-644):

    def _get_embedding_llm_config_dict(self) -> Dict:
        # ... check self.config ...
        
        # Fallback to default if no config provided
        return {
            'provider': 'openai/text-embedding-3-small',
            'api_token': os.getenv('OPENAI_API_KEY')
        }

Because this method always returns a dict, the logic in utils.py's get_text_embeddings never reaches the else block for local embeddings:

# utils.py
if llm_config is not None:
    # Uses LiteLLM (calls OpenAI)
else:
    # Uses sentence-transformers (UNREACHABLE via EmbeddingStrategy)

This effectively makes local embeddings dead code when using AdaptiveCrawler.

Is this reproducible?

Yes

Inputs Causing the Bug

- **Config**: `AdaptiveConfig(strategy="embedding", embedding_llm_config=None)` (intended for local mode).

Steps to Reproduce

1. Install `crawl4ai` with `sentence-transformers`.
2. Configure `AdaptiveCrawler` to use `"embedding"` strategy without providing `embedding_llm_config`.
3. Run `crawler.digest(...)`.
4. **Expected**: Uses local model.
5. **Actual**: Fails with OpenAI authentication error (or attempts to use OpenAI if key is in env), ignoring the user's intent to use local embeddings.

Code snippets

OS

Windows 11

Python version

3.11

Browser

No response

Browser version

No response

Error logs & Screenshots (if applicable)

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    🐞 BugSomething isn't working📌 Root causedidentified the root cause of bug

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions