feat: Add Available Models Support for Embeddings component #11320

edwinjosechittilappilly · 2026-01-15T21:24:40Z

Adding Available models support for embeddings component such that selected providers all available models are available for the embeddings models component.

Summary by CodeRabbit

Release Notes

New Features
- Embedding Model component now exposes a collection of available embedding models for each provider alongside the primary embedding instance.
- Added support for Google Generative AI embedding models.
Tests
- Enhanced test coverage for embedding model functionality with new test cases for multiple model availability.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Eliminated the 'markitdown' dependency and Markdown output option from the URLComponent in Blog Writer, Knowledge Ingestion, and Simple Agent starter projects. Updated the code and configuration to only support 'Text' and 'HTML' output formats. Also added a 'Local' storage option to Document Q&A, News Aggregator, and Portfolio Website Code Generator starter projects.

coderabbitai · 2026-01-15T21:24:46Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

The changes expand the EmbeddingModelComponent to support building multiple embedding models for a given provider. A new EmbeddingsWithModels wrapper is introduced to return both the primary embedding instance and a dictionary of available models keyed by model name. Helper methods are added for constructing model-specific kwargs and enumerating provider-specific embedding models from unified model data.

Changes

Cohort / File(s)	Summary
EmbeddingModelComponent Implementation `src/lfx/src/lfx/components/models_and_agents/embedding_model.py`, `src/backend/base/langflow/initial_setup/starter_projects/Nvidia Remix.json`, `src/lfx/src/lfx/_assets/component_index.json`	Substantial refactor of `EmbeddingModelComponent` to return `EmbeddingsWithModels` wrapper instead of single embedding instance. Added three new internal helper methods: `_build_available_models`, `_build_kwargs_for_model`, and updated `_build_kwargs`. Enhanced provider-specific handling for IBM WatsonX, Ollama, and Google Generative AI with timeout support.
Test Coverage Updates `src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py`	Extended tests to verify `EmbeddingsWithModels` wrapper behavior, added new test for available_models population from unified models, and updated existing test assertions to account for the new composite return type.
Google Generative AI Embedding Models `src/lfx/src/lfx/base/models/google_generative_ai_constants.py`	Added new public constants `GOOGLE_GENERATIVE_AI_EMBEDDING_MODELS` and `GOOGLE_GENERATIVE_AI_EMBEDDING_MODELS_DETAILED` to expose embedding model metadata.
Unified Models Integration `src/lfx/src/lfx/base/models/unified_models.py`	Imported and integrated `GOOGLE_GENERATIVE_AI_EMBEDDING_MODELS_DETAILED` into `get_models_detailed` to expose Google embedding models alongside standard models.
Hash and Metadata Updates `src/lfx/src/lfx/_assets/stable_hash_history.json`	Updated code hash for EmbeddingModel component from `277f5f28215b` to `0b1313e6065f` to reflect implementation changes.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant EmbeddingModelComponent
    participant GetUnifiedModels
    participant EmbeddingClass
    participant EmbeddingsWithModels as EmbeddingsWithModels<br/>(Wrapper)

    Client->>EmbeddingModelComponent: build_embeddings()
    EmbeddingModelComponent->>EmbeddingModelComponent: Extract provider, model, api_key
    EmbeddingModelComponent->>EmbeddingClass: Instantiate primary embedding<br/>via _build_kwargs()
    EmbeddingClass-->>EmbeddingModelComponent: primary_embedding instance
    
    EmbeddingModelComponent->>GetUnifiedModels: get_unified_models_detailed(provider)
    GetUnifiedModels-->>EmbeddingModelComponent: List of all provider models
    
    EmbeddingModelComponent->>EmbeddingModelComponent: _build_available_models()
    loop For each provider model
        EmbeddingModelComponent->>EmbeddingModelComponent: _build_kwargs_for_model(model)
        EmbeddingModelComponent->>EmbeddingClass: Instantiate embedding for model
        EmbeddingClass-->>EmbeddingModelComponent: model_embedding instance
    end
    
    EmbeddingModelComponent->>EmbeddingsWithModels: Create wrapper with<br/>embeddings + available_models dict
    EmbeddingsWithModels-->>Client: Return EmbeddingsWithModels

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

feat: Add OpenSearch multimodal multi-embedding component #10714: Directly related; modifies EmbeddingModelComponent to return EmbeddingsWithModels wrapper with primary embedding and per-model instantiation logic for available models.

Suggested reviewers

phact

Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 3 warnings)

Check name	Status	Explanation	Resolution
Test Coverage For New Implementations	❌ Error	Test coverage is insufficient to validate new functionality including private methods, provider-specific handling, and per-model metadata/embedding_class configurations; bugs in implementation were not caught by tests.	Expand test coverage to directly test private methods, add provider-specific tests, verify per-model metadata/embedding_class handling, and add error handling tests using parametrized approaches.
Docstring Coverage	⚠️ Warning	Docstring coverage is 54.55% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Quality And Coverage	⚠️ Warning	Test coverage is incomplete; critical helper methods lack unit tests, provider-specific parameter construction untested, and type annotation mismatch undetected.	Add dedicated unit tests for _build_available_models, _build_kwargs_for_model, per-model metadata handling, provider-specific params, and error scenarios; correct return type annotation.
Excessive Mock Usage Warning	⚠️ Warning	Tests use 7-8+ mocks per test case, masking real bugs: WatsonX parameters not wired correctly, metadata reuse across models, parameter validation bypassed by permissive mocks.	Add integration tests validating actual parameter mapping with real embedding instances; reduce mocks by directly testing _build_kwargs methods without mocking them internally.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: adding support for available models in the Embeddings component, which is the core feature across all modified files.
Test File Naming And Structure	✅ Passed	Test file follows proper pytest conventions with appropriate fixtures, mocking, and comprehensive coverage of positive and negative scenarios.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch EJ/embedding_models_update

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…low-ai/langflow into EJ/embedding_models_update

lucaseduoli

LGTM, just ruff fixes

…low-ai/langflow into EJ/embedding_models_update

github-actions · 2026-01-21T17:31:57Z

Frontend Unit Test Coverage Report

Coverage Summary

Lines	Statements	Branches	Functions
	17.55% (5029/28643)	10.89% (2403/22050)	11.65% (731/6274)

Unit Test Results

Tests	Skipped	Failures	Errors	Time
2006	0 💤	0 ❌	0 🔥	27.328s ⏱️

codecov · 2026-01-21T17:32:37Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 34.57%. Comparing base (e8753a3) to head (66a036d).
⚠️ Report is 1 commits behind head on main.

❌ Your project status has failed because the head coverage (41.63%) is below the target coverage (60.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #11320      +/-   ##
==========================================
+ Coverage   34.55%   34.57%   +0.02%     
==========================================
  Files        1416     1416              
  Lines       67422    67424       +2     
  Branches     9931     9931              
==========================================
+ Hits        23296    23311      +15     
+ Misses      42902    42888      -14     
- Partials     1224     1225       +1

Flag	Coverage Δ
backend	`53.53% <ø> (+0.02%)`	⬆️
frontend	`16.07% <ø> (ø)`
lfx	`41.63% <100.00%> (+0.04%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
.../lfx/base/models/google_generative_ai_constants.py	`100.00% <100.00%> (ø)`
src/lfx/src/lfx/base/models/unified_models.py	`23.74% <ø> (ø)`

... and 5 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copilot

Pull request overview

This pull request adds support for available models metadata to the Embeddings component, enabling multi-model support by providing a wrapper that contains both the primary embedding instance and a dictionary of all available models from the same provider.

Changes:

Modified EmbeddingModelComponent to return an EmbeddingsWithModels wrapper containing both the primary embedding instance and all available model instances for the provider
Added Google Generative AI embedding models to the unified models constants
Updated tests to verify the new wrapper behavior and available models functionality

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
src/lfx/src/lfx/components/models_and_agents/embedding_model.py	Core implementation: added `_build_available_models` and `_build_kwargs_for_model` methods; modified `build_embeddings` to return `EmbeddingsWithModels` wrapper; fixed Google provider name consistency
src/lfx/src/lfx/base/models/unified_models.py	Added `GOOGLE_GENERATIVE_AI_EMBEDDING_MODELS_DETAILED` to the detailed models list
src/lfx/src/lfx/base/models/google_generative_ai_constants.py	Added embedding models constants for Google Generative AI (text-embedding-004, embedding-001)
src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py	Updated existing tests to verify `EmbeddingsWithModels` wrapper and added new test for available models population
src/lfx/src/lfx/_assets/stable_hash_history.json	Updated component hash for EmbeddingModel component
src/lfx/src/lfx/_assets/component_index.json	Updated component metadata including code hash and full component code

Comments suppressed due to low confidence (2)

src/lfx/src/lfx/components/models_and_agents/embedding_model.py:168

When an Embeddings object is directly connected (line 168), it is returned as-is without wrapping it in EmbeddingsWithModels. This creates an inconsistency with the documented return type behavior. All return paths should consistently return an EmbeddingsWithModels instance to ensure uniform handling downstream. Consider wrapping the directly connected embeddings in an EmbeddingsWithModels instance with an empty available_models dict.

        try:
            from langchain_core.embeddings import Embeddings as BaseEmbeddings

            if isinstance(self.model, BaseEmbeddings):
                return self.model
        except ImportError:

src/lfx/src/lfx/components/models_and_agents/embedding_model.py:15

The logger initialization should be moved before the subsequent imports to follow Python's conventional import organization pattern. Logger setup typically appears after the imports from the standard library and third-party packages, but before importing from other modules in the same package.

from lfx.base.models.watsonx_constants import IBM_WATSONX_URLS
from lfx.field_typing import Embeddings
from lfx.io import (

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-21T17:33:06Z

src/lfx/src/lfx/components/models_and_agents/embedding_model.py

+        model: dict[str, Any],
+        metadata: dict[str, Any],
+        api_key: str | None,
+    ) -> dict[str, Any]:
+        """Build kwargs dictionary for a specific model using parameter mapping.
+
+        This is similar to _build_kwargs but uses the provided api_key directly
+        instead of looking it up again.
+
+        Args:
+            model: Model dict with name and provider
+            metadata: Metadata containing param_mapping
+            api_key: The API key to use
+
+        Returns:
+            kwargs dict for embedding class instantiation
+        """
+        param_mapping = metadata.get("param_mapping", {})
+        if not param_mapping:
+            msg = "Parameter mapping not found in metadata"
+            raise ValueError(msg)
+
+        kwargs = {}
+        provider = model.get("provider")
+
+        # Required parameters - handle both "model" and "model_id" (for watsonx)
+        if "model" in param_mapping:
+            kwargs[param_mapping["model"]] = model.get("name")
+        elif "model_id" in param_mapping:
+            kwargs[param_mapping["model_id"]] = model.get("name")
+
+        # Add API key if mapped
+        if "api_key" in param_mapping and api_key:
+            kwargs[param_mapping["api_key"]] = api_key
+
+        # Optional parameters with their values
+        optional_params = {
+            "api_base": self.api_base if self.api_base else None,
+            "dimensions": int(self.dimensions) if self.dimensions else None,
+            "chunk_size": int(self.chunk_size) if self.chunk_size else None,
+            "request_timeout": float(self.request_timeout) if self.request_timeout else None,
+            "max_retries": int(self.max_retries) if self.max_retries else None,
+            "show_progress_bar": self.show_progress_bar if hasattr(self, "show_progress_bar") else None,
+            "model_kwargs": self.model_kwargs if self.model_kwargs else None,
+        }
+
+        # Watson-specific parameters
+        if provider in {"IBM WatsonX", "IBM watsonx.ai"}:
+            # Map base_url_ibm_watsonx to "url" parameter for watsonx
+            if "url" in param_mapping:
+                url_value = (
+                    self.base_url_ibm_watsonx
+                    if hasattr(self, "base_url_ibm_watsonx") and self.base_url_ibm_watsonx
+                    else "https://us-south.ml.cloud.ibm.com"
+                )
+                kwargs[param_mapping["url"]] = url_value
+            # Map project_id for watsonx
+            if hasattr(self, "project_id") and self.project_id and "project_id" in param_mapping:
+                kwargs[param_mapping["project_id"]] = self.project_id
+
+        # Ollama-specific parameters
+        if provider == "Ollama" and "base_url" in param_mapping:
+            # Map api_base to "base_url" parameter for Ollama
+            base_url_value = self.api_base if hasattr(self, "api_base") and self.api_base else "http://localhost:11434"
+            kwargs[param_mapping["base_url"]] = base_url_value
+
+        # Add optional parameters if they have values and are mapped
+        for param_name, param_value in optional_params.items():
+            if param_value is not None and param_name in param_mapping:
+                # Special handling for request_timeout with Google provider
+                if param_name == "request_timeout":
+                    if provider == "Google Generative AI" and isinstance(param_value, (int, float)):
+                        kwargs[param_mapping[param_name]] = {"timeout": param_value}
+                    else:
+                        kwargs[param_mapping[param_name]] = param_value
+                else:
+                    kwargs[param_mapping[param_name]] = param_value
+
+        return kwargs

    def _build_kwargs(self, model: dict[str, Any], metadata: dict[str, Any]) -> dict[str, Any]:


There is significant code duplication between the _build_kwargs_for_model and _build_kwargs methods. Both methods share identical logic for handling Watson-specific parameters, Ollama-specific parameters, and the Google Generative AI timeout handling. Consider refactoring this shared logic into a common helper method to improve maintainability and reduce the risk of inconsistencies when making future updates.

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/lfx/src/lfx/components/models_and_agents/embedding_model.py (1)
156-169: Keep the return type consistent with the new EmbeddingsWithModels contract.

The docstring now promises an EmbeddingsWithModels, but the early return still returns a raw Embeddings. Consider wrapping direct inputs (or adjust the docstring) to avoid downstream surprises.
✅ Suggested adjustment
-            if isinstance(self.model, BaseEmbeddings):
-                return self.model
+            if isinstance(self.model, BaseEmbeddings):
+                if isinstance(self.model, EmbeddingsWithModels):
+                    return self.model
+                return EmbeddingsWithModels(embeddings=self.model, available_models={})

🤖 Fix all issues with AI agents

In `@src/backend/base/langflow/initial_setup/starter_projects/Nvidia` Remix.json:
- Line 1821: The WatsonX-specific inputs truncate_input_tokens and input_text
are not being forwarded to the Watsonx embeddings because they aren’t present in
param_mapping and must be passed inside a special params dict using IBM SDK
meta-names; update the IBM watsonx param_mapping to include a mapping for a
params/metadata key (e.g., "params") and then in both _build_kwargs and
_build_kwargs_for_model (and where provider in {"IBM WatsonX","IBM watsonx.ai"}
is checked) construct a params_dict that sets
EmbedTextParamsMetaNames.TRUNCATE_INPUT_TOKENS from self.truncate_input_tokens
and EmbedTextParamsMetaNames.RETURN_OPTIONS based on self.input_text, then
attach that params_dict to kwargs using the param_mapping entry (or fallback to
kwargs["params"]) so WatsonxEmbeddings receives the proper params payload.
Ensure you reference EmbedTextParamsMetaNames, truncate_input_tokens,
input_text, _build_kwargs, and _build_kwargs_for_model in the change.

In `@src/lfx/src/lfx/_assets/component_index.json`:
- Line 89139: The build_embeddings method currently annotates its return as "->
Embeddings" but returns an EmbeddingsWithModels instance; update the signature
of EmbeddingModelComponent.build_embeddings to return EmbeddingsWithModels (or a
union like Embeddings | EmbeddingsWithModels) to match the actual return value,
and adjust the docstring if needed; reference the method name build_embeddings
and the class EmbeddingsWithModels so you update the annotation where the method
is defined.

In `@src/lfx/src/lfx/base/models/google_generative_ai_constants.py`:
- Around line 96-111: The embedding model list uses deprecated Google models;
update GOOGLE_GENERATIVE_AI_EMBEDDING_MODELS to contain the current supported
model "models/gemini-embedding-001" and ensure
GOOGLE_GENERATIVE_AI_EMBEDDING_MODELS_DETAILED (which uses
create_model_metadata) will automatically reflect that change by iterating over
the updated list; replace the existing two entries with a single
"models/gemini-embedding-001".

In `@src/lfx/src/lfx/components/models_and_agents/embedding_model.py`:
- Around line 1-15: Move the module-level logger initialization so it appears
after the full import block to satisfy Ruff E402: relocate the line "logger =
logging.getLogger(__name__)" to below the last import (e.g., after the reference
to IBM_WATSONX_URLS) in embedding_model.py and ensure no other executable
statements intervene between imports and that logger assignment.
- Around line 263-283: The loop currently reuses the provider-level metadata and
embedding_class for every model; change it to extract per-model metadata and
embedding_class from model_data (falling back to provider-level values if
absent), then call _build_kwargs_for_model with that model-specific metadata and
api_key and instantiate using the model-specific embedding_class when populating
available_models_dict[model_name]; keep the try/except and logging but ensure
the correct per-model symbols (model_data, metadata_from_model,
embedding_class_from_model, _build_kwargs_for_model, available_models_dict) are
used so models with bespoke param_mapping or classes are configured and
instantiated correctly.

🧹 Nitpick comments (4)

src/lfx/src/lfx/_assets/component_index.json (2)
89139-89139: Import ordering: logger initialization misplaced between imports.

The logger = logging.getLogger(__name__) statement is placed between import blocks, which violates PEP 8 style guidelines. All imports should be grouped together before any module-level code.
Suggested fix (within the embedded code)
 from lfx.base.models.unified_models import (
     get_api_key_for_provider,
     get_embedding_classes,
     get_embedding_model_options,
     get_unified_models_detailed,
     update_model_options_in_build_config,
 )
-
-logger = logging.getLogger(__name__)
 from lfx.base.models.watsonx_constants import IBM_WATSONX_URLS
 from lfx.field_typing import Embeddings
 from lfx.io import (
     BoolInput,
     DictInput,
     DropdownInput,
     FloatInput,
     IntInput,
     MessageTextInput,
     ModelInput,
     SecretStrInput,
 )
+
+logger = logging.getLogger(__name__)
89139-89139: Significant code duplication between _build_kwargs and _build_kwargs_for_model.

These two methods share ~90% identical code for handling optional parameters, Watson-specific parameters, Ollama-specific parameters, and Google-specific timeout handling. The only difference is that _build_kwargs_for_model receives api_key as a parameter while _build_kwargs calls get_api_key_for_provider internally.

Consider refactoring to a single private method that accepts an optional api_key parameter, eliminating the duplication.
Suggested refactor approach
def _build_kwargs(
    self,
    model: dict[str, Any],
    metadata: dict[str, Any],
    api_key: str | None = None,
) -> dict[str, Any]:
    """Build kwargs dictionary using parameter mapping.
    
    Args:
        model: Model dict with name and provider
        metadata: Metadata containing param_mapping
        api_key: Optional API key. If not provided, will be fetched via get_api_key_for_provider.
    """
    param_mapping = metadata.get("param_mapping", {})
    if not param_mapping:
        msg = "Parameter mapping not found in metadata"
        raise ValueError(msg)

    kwargs = {}
    provider = model.get("provider")

    # Required parameters
    if "model" in param_mapping:
        kwargs[param_mapping["model"]] = model.get("name")
    elif "model_id" in param_mapping:
        kwargs[param_mapping["model_id"]] = model.get("name")

    # API key - use provided or fetch
    if "api_key" in param_mapping:
        resolved_api_key = api_key if api_key is not None else get_api_key_for_provider(
            self.user_id, provider, self.api_key
        )
        if resolved_api_key:
            kwargs[param_mapping["api_key"]] = resolved_api_key

    # ... rest of the shared logic (optional params, provider-specific handling)
Then remove _build_kwargs_for_model and update calls to pass api_key when available.
src/lfx/src/lfx/components/models_and_agents/embedding_model.py (1)

288-368: Optional: reduce duplication with _build_kwargs to avoid drift.

_build_kwargs_for_model largely mirrors _build_kwargs. A small helper or an api_key override in _build_kwargs would simplify maintenance.
src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py (1)
210-267: Strengthen available_models verification with instance checks.

Right now the test only asserts keys; adding value assertions will confirm each model maps to its dedicated instance.
✅ Suggested test tightening
         assert "text-embedding-3-small" in embeddings.available_models
         assert "text-embedding-3-large" in embeddings.available_models
         assert "text-embedding-ada-002" in embeddings.available_models
         assert len(embeddings.available_models) == 3
+        assert (
+            embeddings.available_models["text-embedding-3-small"]
+            is mock_instances["text-embedding-3-small"]
+        )
+        assert (
+            embeddings.available_models["text-embedding-3-large"]
+            is mock_instances["text-embedding-3-large"]
+        )
+        assert (
+            embeddings.available_models["text-embedding-ada-002"]
+            is mock_instances["text-embedding-ada-002"]
+        )

coderabbitai · 2026-01-21T17:39:34Z

src/backend/base/langflow/initial_setup/starter_projects/Nvidia Remix.json

                "title_case": false,
                "type": "code",
-                "value": "from typing import Any\n\nfrom lfx.base.embeddings.model import LCEmbeddingsModel\nfrom lfx.base.models.unified_models import (\n    get_api_key_for_provider,\n    get_embedding_classes,\n    get_embedding_model_options,\n    update_model_options_in_build_config,\n)\nfrom lfx.base.models.watsonx_constants import IBM_WATSONX_URLS\nfrom lfx.field_typing import Embeddings\nfrom lfx.io import (\n    BoolInput,\n    DictInput,\n    DropdownInput,\n    FloatInput,\n    IntInput,\n    MessageTextInput,\n    ModelInput,\n    SecretStrInput,\n)\n\n\nclass EmbeddingModelComponent(LCEmbeddingsModel):\n    display_name = \"Embedding Model\"\n    description = \"Generate embeddings using a specified provider.\"\n    documentation: str = \"https://docs.langflow.org/components-embedding-models\"\n    icon = \"binary\"\n    name = \"EmbeddingModel\"\n    category = \"models\"\n\n    def update_build_config(self, build_config: dict, field_value: str, field_name: str | None = None):\n        \"\"\"Dynamically update build config with user-filtered model options.\"\"\"\n        # Update model options\n        build_config = update_model_options_in_build_config(\n            component=self,\n            build_config=build_config,\n            cache_key_prefix=\"embedding_model_options\",\n            get_options_func=get_embedding_model_options,\n            field_name=field_name,\n            field_value=field_value,\n        )\n\n        # Show/hide provider-specific fields based on selected model\n        if field_name == \"model\" and isinstance(field_value, list) and len(field_value) > 0:\n            selected_model = field_value[0]\n            provider = selected_model.get(\"provider\", \"\")\n\n            # Show/hide watsonx fields\n            is_watsonx = provider == \"IBM WatsonX\"\n            build_config[\"base_url_ibm_watsonx\"][\"show\"] = is_watsonx\n            build_config[\"project_id\"][\"show\"] = is_watsonx\n            build_config[\"truncate_input_tokens\"][\"show\"] = is_watsonx\n            build_config[\"input_text\"][\"show\"] = is_watsonx\n            if is_watsonx:\n                build_config[\"base_url_ibm_watsonx\"][\"required\"] = True\n                build_config[\"project_id\"][\"required\"] = True\n\n        return build_config\n\n    inputs = [\n        ModelInput(\n            name=\"model\",\n            display_name=\"Embedding Model\",\n            info=\"Select your model provider\",\n            real_time_refresh=True,\n            required=True,\n            model_type=\"embedding\",\n            input_types=[\"Embeddings\"],  # Override default to accept Embeddings instead of LanguageModel\n        ),\n        SecretStrInput(\n            name=\"api_key\",\n            display_name=\"API Key\",\n            info=\"Model Provider API key\",\n            real_time_refresh=True,\n            advanced=True,\n        ),\n        MessageTextInput(\n            name=\"api_base\",\n            display_name=\"API Base URL\",\n            info=\"Base URL for the API. Leave empty for default.\",\n            advanced=True,\n        ),\n        # Watson-specific inputs\n        DropdownInput(\n            name=\"base_url_ibm_watsonx\",\n            display_name=\"watsonx API Endpoint\",\n            info=\"The base URL of the API (IBM watsonx.ai only)\",\n            options=IBM_WATSONX_URLS,\n            value=IBM_WATSONX_URLS[0],\n            show=False,\n            real_time_refresh=True,\n        ),\n        MessageTextInput(\n            name=\"project_id\",\n            display_name=\"Project ID\",\n            info=\"IBM watsonx.ai Project ID (required for IBM watsonx.ai)\",\n            show=False,\n        ),\n        IntInput(\n            name=\"dimensions\",\n            display_name=\"Dimensions\",\n            info=\"The number of dimensions the resulting output embeddings should have. \"\n            \"Only supported by certain models.\",\n            advanced=True,\n        ),\n        IntInput(\n            name=\"chunk_size\",\n            display_name=\"Chunk Size\",\n            advanced=True,\n            value=1000,\n        ),\n        FloatInput(\n            name=\"request_timeout\",\n            display_name=\"Request Timeout\",\n            advanced=True,\n        ),\n        IntInput(\n            name=\"max_retries\",\n            display_name=\"Max Retries\",\n            advanced=True,\n            value=3,\n        ),\n        BoolInput(\n            name=\"show_progress_bar\",\n            display_name=\"Show Progress Bar\",\n            advanced=True,\n        ),\n        DictInput(\n            name=\"model_kwargs\",\n            display_name=\"Model Kwargs\",\n            advanced=True,\n            info=\"Additional keyword arguments to pass to the model.\",\n        ),\n        IntInput(\n            name=\"truncate_input_tokens\",\n            display_name=\"Truncate Input Tokens\",\n            advanced=True,\n            value=200,\n            show=False,\n        ),\n        BoolInput(\n            name=\"input_text\",\n            display_name=\"Include the original text in the output\",\n            value=True,\n            advanced=True,\n            show=False,\n        ),\n    ]\n\n    def build_embeddings(self) -> Embeddings:\n        \"\"\"Build and return an embeddings instance based on the selected model.\"\"\"\n        # If an Embeddings object is directly connected, return it\n        try:\n            from langchain_core.embeddings import Embeddings as BaseEmbeddings\n\n            if isinstance(self.model, BaseEmbeddings):\n                return self.model\n        except ImportError:\n            pass\n\n        # Safely extract model configuration\n        if not self.model or not isinstance(self.model, list):\n            msg = \"Model must be a non-empty list\"\n            raise ValueError(msg)\n\n        model = self.model[0]\n        model_name = model.get(\"name\")\n        provider = model.get(\"provider\")\n        metadata = model.get(\"metadata\", {})\n\n        # Get API key from user input or global variables\n        api_key = get_api_key_for_provider(self.user_id, provider, self.api_key)\n\n        # Validate required fields (Ollama doesn't require API key)\n        if not api_key and provider != \"Ollama\":\n            msg = (\n                f\"{provider} API key is required. \"\n                f\"Please provide it in the component or configure it globally as \"\n                f\"{provider.upper().replace(' ', '_')}_API_KEY.\"\n            )\n            raise ValueError(msg)\n\n        if not model_name:\n            msg = \"Model name is required\"\n            raise ValueError(msg)\n\n        # Get embedding class\n        embedding_class_name = metadata.get(\"embedding_class\")\n        if not embedding_class_name:\n            msg = f\"No embedding class defined in metadata for {model_name}\"\n            raise ValueError(msg)\n\n        embedding_class = get_embedding_classes().get(embedding_class_name)\n        if not embedding_class:\n            msg = f\"Unknown embedding class: {embedding_class_name}\"\n            raise ValueError(msg)\n\n        # Build kwargs using parameter mapping\n        kwargs = self._build_kwargs(model, metadata)\n\n        return embedding_class(**kwargs)\n\n    def _build_kwargs(self, model: dict[str, Any], metadata: dict[str, Any]) -> dict[str, Any]:\n        \"\"\"Build kwargs dictionary using parameter mapping.\"\"\"\n        param_mapping = metadata.get(\"param_mapping\", {})\n        if not param_mapping:\n            msg = \"Parameter mapping not found in metadata\"\n            raise ValueError(msg)\n\n        kwargs = {}\n\n        # Required parameters - handle both \"model\" and \"model_id\" (for watsonx)\n        if \"model\" in param_mapping:\n            kwargs[param_mapping[\"model\"]] = model.get(\"name\")\n        elif \"model_id\" in param_mapping:\n            kwargs[param_mapping[\"model_id\"]] = model.get(\"name\")\n        if \"api_key\" in param_mapping:\n            kwargs[param_mapping[\"api_key\"]] = get_api_key_for_provider(\n                self.user_id,\n                model.get(\"provider\"),\n                self.api_key,\n            )\n\n        # Optional parameters with their values\n        provider = model.get(\"provider\")\n        optional_params = {\n            \"api_base\": self.api_base if self.api_base else None,\n            \"dimensions\": int(self.dimensions) if self.dimensions else None,\n            \"chunk_size\": int(self.chunk_size) if self.chunk_size else None,\n            \"request_timeout\": float(self.request_timeout) if self.request_timeout else None,\n            \"max_retries\": int(self.max_retries) if self.max_retries else None,\n            \"show_progress_bar\": self.show_progress_bar if hasattr(self, \"show_progress_bar\") else None,\n            \"model_kwargs\": self.model_kwargs if self.model_kwargs else None,\n        }\n\n        # Watson-specific parameters\n        if provider in {\"IBM WatsonX\", \"IBM watsonx.ai\"}:\n            # Map base_url_ibm_watsonx to \"url\" parameter for watsonx\n            if \"url\" in param_mapping:\n                url_value = (\n                    self.base_url_ibm_watsonx\n                    if hasattr(self, \"base_url_ibm_watsonx\") and self.base_url_ibm_watsonx\n                    else \"https://us-south.ml.cloud.ibm.com\"\n                )\n                kwargs[param_mapping[\"url\"]] = url_value\n            # Map project_id for watsonx\n            if hasattr(self, \"project_id\") and self.project_id and \"project_id\" in param_mapping:\n                kwargs[param_mapping[\"project_id\"]] = self.project_id\n\n        # Ollama-specific parameters\n        if provider == \"Ollama\" and \"base_url\" in param_mapping:\n            # Map api_base to \"base_url\" parameter for Ollama\n            base_url_value = self.api_base if hasattr(self, \"api_base\") and self.api_base else \"http://localhost:11434\"\n            kwargs[param_mapping[\"base_url\"]] = base_url_value\n\n        # Add optional parameters if they have values and are mapped\n        for param_name, param_value in optional_params.items():\n            if param_value is not None and param_name in param_mapping:\n                # Special handling for request_timeout with Google provider\n                if param_name == \"request_timeout\":\n                    if provider == \"Google\" and isinstance(param_value, (int, float)):\n                        kwargs[param_mapping[param_name]] = {\"timeout\": param_value}\n                    else:\n                        kwargs[param_mapping[param_name]] = param_value\n                else:\n                    kwargs[param_mapping[param_name]] = param_value\n\n        return kwargs\n"
+                "value": "import logging\nfrom typing import Any\n\nfrom lfx.base.embeddings.embeddings_class import EmbeddingsWithModels\nfrom lfx.base.embeddings.model import LCEmbeddingsModel\nfrom lfx.base.models.unified_models import (\n    get_api_key_for_provider,\n    get_embedding_classes,\n    get_embedding_model_options,\n    get_unified_models_detailed,\n    update_model_options_in_build_config,\n)\n\nlogger = logging.getLogger(__name__)\nfrom lfx.base.models.watsonx_constants import IBM_WATSONX_URLS\nfrom lfx.field_typing import Embeddings\nfrom lfx.io import (\n    BoolInput,\n    DictInput,\n    DropdownInput,\n    FloatInput,\n    IntInput,\n    MessageTextInput,\n    ModelInput,\n    SecretStrInput,\n)\n\n\nclass EmbeddingModelComponent(LCEmbeddingsModel):\n    display_name = \"Embedding Model\"\n    description = \"Generate embeddings using a specified provider.\"\n    documentation: str = \"https://docs.langflow.org/components-embedding-models\"\n    icon = \"binary\"\n    name = \"EmbeddingModel\"\n    category = \"models\"\n\n    def update_build_config(self, build_config: dict, field_value: str, field_name: str | None = None):\n        \"\"\"Dynamically update build config with user-filtered model options.\"\"\"\n        # Update model options\n        build_config = update_model_options_in_build_config(\n            component=self,\n            build_config=build_config,\n            cache_key_prefix=\"embedding_model_options\",\n            get_options_func=get_embedding_model_options,\n            field_name=field_name,\n            field_value=field_value,\n        )\n\n        # Show/hide provider-specific fields based on selected model\n        if field_name == \"model\" and isinstance(field_value, list) and len(field_value) > 0:\n            selected_model = field_value[0]\n            provider = selected_model.get(\"provider\", \"\")\n\n            # Show/hide watsonx fields\n            is_watsonx = provider == \"IBM WatsonX\"\n            build_config[\"base_url_ibm_watsonx\"][\"show\"] = is_watsonx\n            build_config[\"project_id\"][\"show\"] = is_watsonx\n            build_config[\"truncate_input_tokens\"][\"show\"] = is_watsonx\n            build_config[\"input_text\"][\"show\"] = is_watsonx\n            if is_watsonx:\n                build_config[\"base_url_ibm_watsonx\"][\"required\"] = True\n                build_config[\"project_id\"][\"required\"] = True\n\n        return build_config\n\n    inputs = [\n        ModelInput(\n            name=\"model\",\n            display_name=\"Embedding Model\",\n            info=\"Select your model provider\",\n            real_time_refresh=True,\n            required=True,\n            model_type=\"embedding\",\n            input_types=[\"Embeddings\"],  # Override default to accept Embeddings instead of LanguageModel\n        ),\n        SecretStrInput(\n            name=\"api_key\",\n            display_name=\"API Key\",\n            info=\"Model Provider API key\",\n            real_time_refresh=True,\n            advanced=True,\n        ),\n        MessageTextInput(\n            name=\"api_base\",\n            display_name=\"API Base URL\",\n            info=\"Base URL for the API. Leave empty for default.\",\n            advanced=True,\n        ),\n        # Watson-specific inputs\n        DropdownInput(\n            name=\"base_url_ibm_watsonx\",\n            display_name=\"watsonx API Endpoint\",\n            info=\"The base URL of the API (IBM watsonx.ai only)\",\n            options=IBM_WATSONX_URLS,\n            value=IBM_WATSONX_URLS[0],\n            show=False,\n            real_time_refresh=True,\n        ),\n        MessageTextInput(\n            name=\"project_id\",\n            display_name=\"Project ID\",\n            info=\"IBM watsonx.ai Project ID (required for IBM watsonx.ai)\",\n            show=False,\n        ),\n        IntInput(\n            name=\"dimensions\",\n            display_name=\"Dimensions\",\n            info=\"The number of dimensions the resulting output embeddings should have. \"\n            \"Only supported by certain models.\",\n            advanced=True,\n        ),\n        IntInput(\n            name=\"chunk_size\",\n            display_name=\"Chunk Size\",\n            advanced=True,\n            value=1000,\n        ),\n        FloatInput(\n            name=\"request_timeout\",\n            display_name=\"Request Timeout\",\n            advanced=True,\n        ),\n        IntInput(\n            name=\"max_retries\",\n            display_name=\"Max Retries\",\n            advanced=True,\n            value=3,\n        ),\n        BoolInput(\n            name=\"show_progress_bar\",\n            display_name=\"Show Progress Bar\",\n            advanced=True,\n        ),\n        DictInput(\n            name=\"model_kwargs\",\n            display_name=\"Model Kwargs\",\n            advanced=True,\n            info=\"Additional keyword arguments to pass to the model.\",\n        ),\n        IntInput(\n            name=\"truncate_input_tokens\",\n            display_name=\"Truncate Input Tokens\",\n            advanced=True,\n            value=200,\n            show=False,\n        ),\n        BoolInput(\n            name=\"input_text\",\n            display_name=\"Include the original text in the output\",\n            value=True,\n            advanced=True,\n            show=False,\n        ),\n    ]\n\n    def build_embeddings(self) -> Embeddings:\n        \"\"\"Build and return an embeddings instance based on the selected model.\n\n        Returns an EmbeddingsWithModels wrapper that contains:\n        - The primary embedding instance (for the selected model)\n        - available_models dict mapping all available model names to their instances\n        \"\"\"\n        # If an Embeddings object is directly connected, return it\n        try:\n            from langchain_core.embeddings import Embeddings as BaseEmbeddings\n\n            if isinstance(self.model, BaseEmbeddings):\n                return self.model\n        except ImportError:\n            pass\n\n        # Safely extract model configuration\n        if not self.model or not isinstance(self.model, list):\n            msg = \"Model must be a non-empty list\"\n            raise ValueError(msg)\n\n        model = self.model[0]\n        model_name = model.get(\"name\")\n        provider = model.get(\"provider\")\n        metadata = model.get(\"metadata\", {})\n\n        # Get API key from user input or global variables\n        api_key = get_api_key_for_provider(self.user_id, provider, self.api_key)\n\n        # Validate required fields (Ollama doesn't require API key)\n        if not api_key and provider != \"Ollama\":\n            msg = (\n                f\"{provider} API key is required. \"\n                f\"Please provide it in the component or configure it globally as \"\n                f\"{provider.upper().replace(' ', '_')}_API_KEY.\"\n            )\n            raise ValueError(msg)\n\n        if not model_name:\n            msg = \"Model name is required\"\n            raise ValueError(msg)\n\n        # Get embedding class\n        embedding_class_name = metadata.get(\"embedding_class\")\n        if not embedding_class_name:\n            msg = f\"No embedding class defined in metadata for {model_name}\"\n            raise ValueError(msg)\n\n        embedding_class = get_embedding_classes().get(embedding_class_name)\n        if not embedding_class:\n            msg = f\"Unknown embedding class: {embedding_class_name}\"\n            raise ValueError(msg)\n\n        # Build kwargs using parameter mapping for primary instance\n        kwargs = self._build_kwargs(model, metadata)\n        primary_instance = embedding_class(**kwargs)\n\n        # Get all available embedding models for this provider\n        available_models_dict = self._build_available_models(\n            provider=provider,\n            embedding_class=embedding_class,\n            metadata=metadata,\n            api_key=api_key,\n        )\n\n        # Wrap with EmbeddingsWithModels to provide available_models metadata\n        return EmbeddingsWithModels(\n            embeddings=primary_instance,\n            available_models=available_models_dict,\n        )\n\n    def _build_available_models(\n        self,\n        provider: str,\n        embedding_class: type,\n        metadata: dict[str, Any],\n        api_key: str | None,\n    ) -> dict[str, Embeddings]:\n        \"\"\"Build a dictionary of all available embedding model instances for the provider.\n\n        Args:\n            provider: The provider name (e.g., \"OpenAI\", \"Ollama\")\n            embedding_class: The embedding class to instantiate\n            metadata: Metadata containing param_mapping\n            api_key: The API key for the provider\n\n        Returns:\n            Dict mapping model names to their embedding instances\n        \"\"\"\n        available_models_dict: dict[str, Embeddings] = {}\n\n        # Get all embedding models for this provider from unified models\n        all_embedding_models = get_unified_models_detailed(\n            providers=[provider],\n            model_type=\"embeddings\",\n            include_deprecated=False,\n            include_unsupported=False,\n        )\n\n        if not all_embedding_models:\n            return available_models_dict\n\n        # Extract models from the provider data\n        for provider_data in all_embedding_models:\n            if provider_data.get(\"provider\") != provider:\n                continue\n\n            for model_data in provider_data.get(\"models\", []):\n                model_name = model_data.get(\"model_name\")\n                if not model_name:\n                    continue\n\n                # Create a model dict compatible with _build_kwargs\n                model_dict = {\n                    \"name\": model_name,\n                    \"provider\": provider,\n                    \"metadata\": metadata,  # Reuse the same metadata/param_mapping\n                }\n\n                try:\n                    # Build kwargs for this model\n                    model_kwargs = self._build_kwargs_for_model(model_dict, metadata, api_key)\n                    # Create the embedding instance\n                    available_models_dict[model_name] = embedding_class(**model_kwargs)\n                except Exception:  # noqa: BLE001\n                    # Skip models that fail to instantiate\n                    # This handles cases where specific models have incompatible parameters\n                    logger.debug(\"Failed to instantiate embedding model %s: skipping\", model_name, exc_info=True)\n                    continue\n\n        return available_models_dict\n\n    def _build_kwargs_for_model(\n        self,\n        model: dict[str, Any],\n        metadata: dict[str, Any],\n        api_key: str | None,\n    ) -> dict[str, Any]:\n        \"\"\"Build kwargs dictionary for a specific model using parameter mapping.\n\n        This is similar to _build_kwargs but uses the provided api_key directly\n        instead of looking it up again.\n\n        Args:\n            model: Model dict with name and provider\n            metadata: Metadata containing param_mapping\n            api_key: The API key to use\n\n        Returns:\n            kwargs dict for embedding class instantiation\n        \"\"\"\n        param_mapping = metadata.get(\"param_mapping\", {})\n        if not param_mapping:\n            msg = \"Parameter mapping not found in metadata\"\n            raise ValueError(msg)\n\n        kwargs = {}\n        provider = model.get(\"provider\")\n\n        # Required parameters - handle both \"model\" and \"model_id\" (for watsonx)\n        if \"model\" in param_mapping:\n            kwargs[param_mapping[\"model\"]] = model.get(\"name\")\n        elif \"model_id\" in param_mapping:\n            kwargs[param_mapping[\"model_id\"]] = model.get(\"name\")\n\n        # Add API key if mapped\n        if \"api_key\" in param_mapping and api_key:\n            kwargs[param_mapping[\"api_key\"]] = api_key\n\n        # Optional parameters with their values\n        optional_params = {\n            \"api_base\": self.api_base if self.api_base else None,\n            \"dimensions\": int(self.dimensions) if self.dimensions else None,\n            \"chunk_size\": int(self.chunk_size) if self.chunk_size else None,\n            \"request_timeout\": float(self.request_timeout) if self.request_timeout else None,\n            \"max_retries\": int(self.max_retries) if self.max_retries else None,\n            \"show_progress_bar\": self.show_progress_bar if hasattr(self, \"show_progress_bar\") else None,\n            \"model_kwargs\": self.model_kwargs if self.model_kwargs else None,\n        }\n\n        # Watson-specific parameters\n        if provider in {\"IBM WatsonX\", \"IBM watsonx.ai\"}:\n            # Map base_url_ibm_watsonx to \"url\" parameter for watsonx\n            if \"url\" in param_mapping:\n                url_value = (\n                    self.base_url_ibm_watsonx\n                    if hasattr(self, \"base_url_ibm_watsonx\") and self.base_url_ibm_watsonx\n                    else \"https://us-south.ml.cloud.ibm.com\"\n                )\n                kwargs[param_mapping[\"url\"]] = url_value\n            # Map project_id for watsonx\n            if hasattr(self, \"project_id\") and self.project_id and \"project_id\" in param_mapping:\n                kwargs[param_mapping[\"project_id\"]] = self.project_id\n\n        # Ollama-specific parameters\n        if provider == \"Ollama\" and \"base_url\" in param_mapping:\n            # Map api_base to \"base_url\" parameter for Ollama\n            base_url_value = self.api_base if hasattr(self, \"api_base\") and self.api_base else \"http://localhost:11434\"\n            kwargs[param_mapping[\"base_url\"]] = base_url_value\n\n        # Add optional parameters if they have values and are mapped\n        for param_name, param_value in optional_params.items():\n            if param_value is not None and param_name in param_mapping:\n                # Special handling for request_timeout with Google provider\n                if param_name == \"request_timeout\":\n                    if provider == \"Google Generative AI\" and isinstance(param_value, (int, float)):\n                        kwargs[param_mapping[param_name]] = {\"timeout\": param_value}\n                    else:\n                        kwargs[param_mapping[param_name]] = param_value\n                else:\n                    kwargs[param_mapping[param_name]] = param_value\n\n        return kwargs\n\n    def _build_kwargs(self, model: dict[str, Any], metadata: dict[str, Any]) -> dict[str, Any]:\n        \"\"\"Build kwargs dictionary using parameter mapping.\"\"\"\n        param_mapping = metadata.get(\"param_mapping\", {})\n        if not param_mapping:\n            msg = \"Parameter mapping not found in metadata\"\n            raise ValueError(msg)\n\n        kwargs = {}\n\n        # Required parameters - handle both \"model\" and \"model_id\" (for watsonx)\n        if \"model\" in param_mapping:\n            kwargs[param_mapping[\"model\"]] = model.get(\"name\")\n        elif \"model_id\" in param_mapping:\n            kwargs[param_mapping[\"model_id\"]] = model.get(\"name\")\n        if \"api_key\" in param_mapping:\n            kwargs[param_mapping[\"api_key\"]] = get_api_key_for_provider(\n                self.user_id,\n                model.get(\"provider\"),\n                self.api_key,\n            )\n\n        # Optional parameters with their values\n        provider = model.get(\"provider\")\n        optional_params = {\n            \"api_base\": self.api_base if self.api_base else None,\n            \"dimensions\": int(self.dimensions) if self.dimensions else None,\n            \"chunk_size\": int(self.chunk_size) if self.chunk_size else None,\n            \"request_timeout\": float(self.request_timeout) if self.request_timeout else None,\n            \"max_retries\": int(self.max_retries) if self.max_retries else None,\n            \"show_progress_bar\": self.show_progress_bar if hasattr(self, \"show_progress_bar\") else None,\n            \"model_kwargs\": self.model_kwargs if self.model_kwargs else None,\n        }\n\n        # Watson-specific parameters\n        if provider in {\"IBM WatsonX\", \"IBM watsonx.ai\"}:\n            # Map base_url_ibm_watsonx to \"url\" parameter for watsonx\n            if \"url\" in param_mapping:\n                url_value = (\n                    self.base_url_ibm_watsonx\n                    if hasattr(self, \"base_url_ibm_watsonx\") and self.base_url_ibm_watsonx\n                    else \"https://us-south.ml.cloud.ibm.com\"\n                )\n                kwargs[param_mapping[\"url\"]] = url_value\n            # Map project_id for watsonx\n            if hasattr(self, \"project_id\") and self.project_id and \"project_id\" in param_mapping:\n                kwargs[param_mapping[\"project_id\"]] = self.project_id\n\n        # Ollama-specific parameters\n        if provider == \"Ollama\" and \"base_url\" in param_mapping:\n            # Map api_base to \"base_url\" parameter for Ollama\n            base_url_value = self.api_base if hasattr(self, \"api_base\") and self.api_base else \"http://localhost:11434\"\n            kwargs[param_mapping[\"base_url\"]] = base_url_value\n\n        # Add optional parameters if they have values and are mapped\n        for param_name, param_value in optional_params.items():\n            if param_value is not None and param_name in param_mapping:\n                # Special handling for request_timeout with Google provider\n                if param_name == \"request_timeout\":\n                    if provider == \"Google Generative AI\" and isinstance(param_value, (int, float)):\n                        kwargs[param_mapping[param_name]] = {\"timeout\": param_value}\n                    else:\n                        kwargs[param_mapping[param_name]] = param_value\n                else:\n                    kwargs[param_mapping[param_name]] = param_value\n\n        return kwargs\n"


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's search for watsonx-related code and parameter mappings rg "truncate_input_tokens|watsonx" --type json --type py -i -A 3 -B 3 | head -100

Repository: langflow-ai/langflow

Length of output: 9659

🏁 Script executed:

# Search for embedding class definitions and parameter mappings rg "param_mapping" --type json --type py -l | head -20

Repository: langflow-ai/langflow

Length of output: 379

🏁 Script executed:

# Look for IBM WatsonX embedding class references rg "IBMWatsonX|WatsonxEmbeddings" --type py -A 5 -B 2 | head -80

Repository: langflow-ai/langflow

Length of output: 6375

🏁 Script executed:

# Let's look at the param_mappings for IBM WatsonX in unified_models.py rg "IBM WatsonX|param_mappings" src/lfx/src/lfx/base/models/unified_models.py -A 30 | head -150

Repository: langflow-ai/langflow

Length of output: 4934

🏁 Script executed:

# Also let's check the actual param_mapping for IBM WatsonX rg "IBM WatsonX" src/lfx/src/lfx/base/models/unified_models.py -A 50 -B 5 | head -200

Repository: langflow-ai/langflow

Length of output: 7052

🏁 Script executed:

# Let's look at the entire WatsonxEmbeddings component to see how it uses parameters cat src/lfx/src/lfx/components/ibm/watsonx_embeddings.py

Repository: langflow-ai/langflow

Length of output: 4887

🏁 Script executed:

# Search for WATSONX embedding models configuration with param_mapping rg "WATSONX.*EMBEDDING|param_mapping.*IBM|truncate_input" src/lfx/src/lfx/base/models/unified_models.py -B 5 -A 15 | head -200

Repository: langflow-ai/langflow

Length of output: 46

🏁 Script executed:

# Look for the detailed embedding models definitions fd "unified_models.py" -exec grep -n "WATSONX_EMBEDDING" {} +

Repository: langflow-ai/langflow

Length of output: 294

🏁 Script executed:

# Search for all embedding model detailed definitions in unified_models.py rg "EMBEDDING_MODELS_DETAILED|param_mapping" src/lfx/src/lfx/base/models/unified_models.py -B 3 -A 20 | head -300

Repository: langflow-ai/langflow

Length of output: 7821

🏁 Script executed:

# Get the full param_mappings dictionary, specifically looking for IBM WatsonX rg "IBM WatsonX.*{|\"IBM WatsonX\":" src/lfx/src/lfx/base/models/unified_models.py -A 20 | grep -A 20 "param_mappings"

Repository: langflow-ai/langflow

Length of output: 189

🏁 Script executed:

# Let's view the watsonx_constants file directly to see the embedding models config cat src/lfx/src/lfx/base/models/watsonx_constants.py

Repository: langflow-ai/langflow

Length of output: 2225

🏁 Script executed:

# Search for param_mappings dictionary definition, especially IBM WatsonX section rg -A 150 "param_mappings = \{" src/lfx/src/lfx/base/models/unified_models.py | head -200

Repository: langflow-ai/langflow

Length of output: 5559

🏁 Script executed:

# Let's check if there's a specific IBM WatsonX embedding param_mapping rg "IBM WatsonX" src/lfx/src/lfx/base/models/unified_models.py -B 5 -A 15 | grep -A 10 "param_mappings"

Repository: langflow-ai/langflow

Length of output: 468

🏁 Script executed:

# Let's verify the exact flow - check how optional_params are actually used in the kwargs building rg -A 20 "for param_name, param_value in optional_params" src/lfx/src/lfx/components/models_and_agents/embedding_model.py

Repository: langflow-ai/langflow

Length of output: 1751

🏁 Script executed:

# Let's also search for how WatsonxEmbeddings expects these parameters - check langchain_ibm documentation or usage web_search WatsonxEmbeddings truncate_input_tokens input_text params

Repository: langflow-ai/langflow

Length of output: 114

🌐 Web query:

WatsonxEmbeddings langchain_ibm truncate_input_tokens input_text parameters

💡 Result:

Short answer: In the langchain_ibm WatsonxEmbeddings integration you pass IBM embedding model parameters via the IBM SDK meta-names. TRUNCATE_INPUT_TOKENS (truncate_input_tokens) tells the service how many tokens to keep from the input (useful to trim long texts before embedding). RETURN_OPTIONS with {"input_text": True} returns the original input text in the API response. Example (Python):

from ibm_watsonx_ai.metanames import EmbedTextParamsMetaNames
embed_params = {
EmbedTextParamsMetaNames.TRUNCATE_INPUT_TOKENS: 3, # or set truncate_input_tokens=512 in other wrappers
EmbedTextParamsMetaNames.RETURN_OPTIONS: {"input_text": True},
}

then pass embed_params when constructing/using WatsonxEmbeddings.

References: LangChain docs showing TRUNCATE_INPUT_TOKENS and RETURN_OPTIONS usage and LangChain/IBM wrappers that expose truncate_input_tokens parameter. [1][2]

Sources
[1] LangChain WatsonxEmbeddings docs (example using EmbedTextParamsMetaNames.TRUNCATE_INPUT_TOKENS and RETURN_OPTIONS).
[2] LangChain / langchain-ibm / langchain-community reference pages (truncateInputTokens / truncate_input_tokens parameter in JS/Python wrappers).

🏁 Script executed:

# Let's check the test file to see how WatsonX embedding params are expected cat src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py | head -100

Repository: langflow-ai/langflow

Length of output: 4137

WatsonX-specific input parameters are not wired to the embeddings instance.

truncate_input_tokens and input_text are conditionally shown for IBM WatsonX, but they're never passed to WatsonxEmbeddings. The param_mapping for IBM WatsonX omits these parameters, so they're filtered out by the if param_value is not None and param_name in param_mapping check. Additionally, WatsonX requires special handling: these parameters must be passed via a params dict using IBM SDK meta-names (EmbedTextParamsMetaNames.TRUNCATE_INPUT_TOKENS and EmbedTextParamsMetaNames.RETURN_OPTIONS), not as direct kwargs like other providers. Update the param_mapping and add WatsonX-specific logic to construct the params dict correctly in both _build_kwargs and _build_kwargs_for_model.

🤖 Prompt for AI Agents

In `@src/backend/base/langflow/initial_setup/starter_projects/Nvidia` Remix.json at line 1821, The WatsonX-specific inputs truncate_input_tokens and input_text are not being forwarded to the Watsonx embeddings because they aren’t present in param_mapping and must be passed inside a special params dict using IBM SDK meta-names; update the IBM watsonx param_mapping to include a mapping for a params/metadata key (e.g., "params") and then in both _build_kwargs and _build_kwargs_for_model (and where provider in {"IBM WatsonX","IBM watsonx.ai"} is checked) construct a params_dict that sets EmbedTextParamsMetaNames.TRUNCATE_INPUT_TOKENS from self.truncate_input_tokens and EmbedTextParamsMetaNames.RETURN_OPTIONS based on self.input_text, then attach that params_dict to kwargs using the param_mapping entry (or fallback to kwargs["params"]) so WatsonxEmbeddings receives the proper params payload. Ensure you reference EmbedTextParamsMetaNames, truncate_input_tokens, input_text, _build_kwargs, and _build_kwargs_for_model in the change.

coderabbitai · 2026-01-21T17:39:34Z

src/lfx/src/lfx/_assets/component_index.json

              "title_case": false,
              "type": "code",
-              "value": "from typing import Any\n\nfrom lfx.base.embeddings.model import LCEmbeddingsModel\nfrom lfx.base.models.unified_models import (\n    get_api_key_for_provider,\n    get_embedding_classes,\n    get_embedding_model_options,\n    update_model_options_in_build_config,\n)\nfrom lfx.base.models.watsonx_constants import IBM_WATSONX_URLS\nfrom lfx.field_typing import Embeddings\nfrom lfx.io import (\n    BoolInput,\n    DictInput,\n    DropdownInput,\n    FloatInput,\n    IntInput,\n    MessageTextInput,\n    ModelInput,\n    SecretStrInput,\n)\n\n\nclass EmbeddingModelComponent(LCEmbeddingsModel):\n    display_name = \"Embedding Model\"\n    description = \"Generate embeddings using a specified provider.\"\n    documentation: str = \"https://docs.langflow.org/components-embedding-models\"\n    icon = \"binary\"\n    name = \"EmbeddingModel\"\n    category = \"models\"\n\n    def update_build_config(self, build_config: dict, field_value: str, field_name: str | None = None):\n        \"\"\"Dynamically update build config with user-filtered model options.\"\"\"\n        # Update model options\n        build_config = update_model_options_in_build_config(\n            component=self,\n            build_config=build_config,\n            cache_key_prefix=\"embedding_model_options\",\n            get_options_func=get_embedding_model_options,\n            field_name=field_name,\n            field_value=field_value,\n        )\n\n        # Show/hide provider-specific fields based on selected model\n        if field_name == \"model\" and isinstance(field_value, list) and len(field_value) > 0:\n            selected_model = field_value[0]\n            provider = selected_model.get(\"provider\", \"\")\n\n            # Show/hide watsonx fields\n            is_watsonx = provider == \"IBM WatsonX\"\n            build_config[\"base_url_ibm_watsonx\"][\"show\"] = is_watsonx\n            build_config[\"project_id\"][\"show\"] = is_watsonx\n            build_config[\"truncate_input_tokens\"][\"show\"] = is_watsonx\n            build_config[\"input_text\"][\"show\"] = is_watsonx\n            if is_watsonx:\n                build_config[\"base_url_ibm_watsonx\"][\"required\"] = True\n                build_config[\"project_id\"][\"required\"] = True\n\n        return build_config\n\n    inputs = [\n        ModelInput(\n            name=\"model\",\n            display_name=\"Embedding Model\",\n            info=\"Select your model provider\",\n            real_time_refresh=True,\n            required=True,\n            model_type=\"embedding\",\n            input_types=[\"Embeddings\"],  # Override default to accept Embeddings instead of LanguageModel\n        ),\n        SecretStrInput(\n            name=\"api_key\",\n            display_name=\"API Key\",\n            info=\"Model Provider API key\",\n            real_time_refresh=True,\n            advanced=True,\n        ),\n        MessageTextInput(\n            name=\"api_base\",\n            display_name=\"API Base URL\",\n            info=\"Base URL for the API. Leave empty for default.\",\n            advanced=True,\n        ),\n        # Watson-specific inputs\n        DropdownInput(\n            name=\"base_url_ibm_watsonx\",\n            display_name=\"watsonx API Endpoint\",\n            info=\"The base URL of the API (IBM watsonx.ai only)\",\n            options=IBM_WATSONX_URLS,\n            value=IBM_WATSONX_URLS[0],\n            show=False,\n            real_time_refresh=True,\n        ),\n        MessageTextInput(\n            name=\"project_id\",\n            display_name=\"Project ID\",\n            info=\"IBM watsonx.ai Project ID (required for IBM watsonx.ai)\",\n            show=False,\n        ),\n        IntInput(\n            name=\"dimensions\",\n            display_name=\"Dimensions\",\n            info=\"The number of dimensions the resulting output embeddings should have. \"\n            \"Only supported by certain models.\",\n            advanced=True,\n        ),\n        IntInput(\n            name=\"chunk_size\",\n            display_name=\"Chunk Size\",\n            advanced=True,\n            value=1000,\n        ),\n        FloatInput(\n            name=\"request_timeout\",\n            display_name=\"Request Timeout\",\n            advanced=True,\n        ),\n        IntInput(\n            name=\"max_retries\",\n            display_name=\"Max Retries\",\n            advanced=True,\n            value=3,\n        ),\n        BoolInput(\n            name=\"show_progress_bar\",\n            display_name=\"Show Progress Bar\",\n            advanced=True,\n        ),\n        DictInput(\n            name=\"model_kwargs\",\n            display_name=\"Model Kwargs\",\n            advanced=True,\n            info=\"Additional keyword arguments to pass to the model.\",\n        ),\n        IntInput(\n            name=\"truncate_input_tokens\",\n            display_name=\"Truncate Input Tokens\",\n            advanced=True,\n            value=200,\n            show=False,\n        ),\n        BoolInput(\n            name=\"input_text\",\n            display_name=\"Include the original text in the output\",\n            value=True,\n            advanced=True,\n            show=False,\n        ),\n    ]\n\n    def build_embeddings(self) -> Embeddings:\n        \"\"\"Build and return an embeddings instance based on the selected model.\"\"\"\n        # If an Embeddings object is directly connected, return it\n        try:\n            from langchain_core.embeddings import Embeddings as BaseEmbeddings\n\n            if isinstance(self.model, BaseEmbeddings):\n                return self.model\n        except ImportError:\n            pass\n\n        # Safely extract model configuration\n        if not self.model or not isinstance(self.model, list):\n            msg = \"Model must be a non-empty list\"\n            raise ValueError(msg)\n\n        model = self.model[0]\n        model_name = model.get(\"name\")\n        provider = model.get(\"provider\")\n        metadata = model.get(\"metadata\", {})\n\n        # Get API key from user input or global variables\n        api_key = get_api_key_for_provider(self.user_id, provider, self.api_key)\n\n        # Validate required fields (Ollama doesn't require API key)\n        if not api_key and provider != \"Ollama\":\n            msg = (\n                f\"{provider} API key is required. \"\n                f\"Please provide it in the component or configure it globally as \"\n                f\"{provider.upper().replace(' ', '_')}_API_KEY.\"\n            )\n            raise ValueError(msg)\n\n        if not model_name:\n            msg = \"Model name is required\"\n            raise ValueError(msg)\n\n        # Get embedding class\n        embedding_class_name = metadata.get(\"embedding_class\")\n        if not embedding_class_name:\n            msg = f\"No embedding class defined in metadata for {model_name}\"\n            raise ValueError(msg)\n\n        embedding_class = get_embedding_classes().get(embedding_class_name)\n        if not embedding_class:\n            msg = f\"Unknown embedding class: {embedding_class_name}\"\n            raise ValueError(msg)\n\n        # Build kwargs using parameter mapping\n        kwargs = self._build_kwargs(model, metadata)\n\n        return embedding_class(**kwargs)\n\n    def _build_kwargs(self, model: dict[str, Any], metadata: dict[str, Any]) -> dict[str, Any]:\n        \"\"\"Build kwargs dictionary using parameter mapping.\"\"\"\n        param_mapping = metadata.get(\"param_mapping\", {})\n        if not param_mapping:\n            msg = \"Parameter mapping not found in metadata\"\n            raise ValueError(msg)\n\n        kwargs = {}\n\n        # Required parameters - handle both \"model\" and \"model_id\" (for watsonx)\n        if \"model\" in param_mapping:\n            kwargs[param_mapping[\"model\"]] = model.get(\"name\")\n        elif \"model_id\" in param_mapping:\n            kwargs[param_mapping[\"model_id\"]] = model.get(\"name\")\n        if \"api_key\" in param_mapping:\n            kwargs[param_mapping[\"api_key\"]] = get_api_key_for_provider(\n                self.user_id,\n                model.get(\"provider\"),\n                self.api_key,\n            )\n\n        # Optional parameters with their values\n        provider = model.get(\"provider\")\n        optional_params = {\n            \"api_base\": self.api_base if self.api_base else None,\n            \"dimensions\": int(self.dimensions) if self.dimensions else None,\n            \"chunk_size\": int(self.chunk_size) if self.chunk_size else None,\n            \"request_timeout\": float(self.request_timeout) if self.request_timeout else None,\n            \"max_retries\": int(self.max_retries) if self.max_retries else None,\n            \"show_progress_bar\": self.show_progress_bar if hasattr(self, \"show_progress_bar\") else None,\n            \"model_kwargs\": self.model_kwargs if self.model_kwargs else None,\n        }\n\n        # Watson-specific parameters\n        if provider in {\"IBM WatsonX\", \"IBM watsonx.ai\"}:\n            # Map base_url_ibm_watsonx to \"url\" parameter for watsonx\n            if \"url\" in param_mapping:\n                url_value = (\n                    self.base_url_ibm_watsonx\n                    if hasattr(self, \"base_url_ibm_watsonx\") and self.base_url_ibm_watsonx\n                    else \"https://us-south.ml.cloud.ibm.com\"\n                )\n                kwargs[param_mapping[\"url\"]] = url_value\n            # Map project_id for watsonx\n            if hasattr(self, \"project_id\") and self.project_id and \"project_id\" in param_mapping:\n                kwargs[param_mapping[\"project_id\"]] = self.project_id\n\n        # Ollama-specific parameters\n        if provider == \"Ollama\" and \"base_url\" in param_mapping:\n            # Map api_base to \"base_url\" parameter for Ollama\n            base_url_value = self.api_base if hasattr(self, \"api_base\") and self.api_base else \"http://localhost:11434\"\n            kwargs[param_mapping[\"base_url\"]] = base_url_value\n\n        # Add optional parameters if they have values and are mapped\n        for param_name, param_value in optional_params.items():\n            if param_value is not None and param_name in param_mapping:\n                # Special handling for request_timeout with Google provider\n                if param_name == \"request_timeout\":\n                    if provider == \"Google\" and isinstance(param_value, (int, float)):\n                        kwargs[param_mapping[param_name]] = {\"timeout\": param_value}\n                    else:\n                        kwargs[param_mapping[param_name]] = param_value\n                else:\n                    kwargs[param_mapping[param_name]] = param_value\n\n        return kwargs\n"
+              "value": "import logging\nfrom typing import Any\n\nfrom lfx.base.embeddings.embeddings_class import EmbeddingsWithModels\nfrom lfx.base.embeddings.model import LCEmbeddingsModel\nfrom lfx.base.models.unified_models import (\n    get_api_key_for_provider,\n    get_embedding_classes,\n    get_embedding_model_options,\n    get_unified_models_detailed,\n    update_model_options_in_build_config,\n)\n\nlogger = logging.getLogger(__name__)\nfrom lfx.base.models.watsonx_constants import IBM_WATSONX_URLS\nfrom lfx.field_typing import Embeddings\nfrom lfx.io import (\n    BoolInput,\n    DictInput,\n    DropdownInput,\n    FloatInput,\n    IntInput,\n    MessageTextInput,\n    ModelInput,\n    SecretStrInput,\n)\n\n\nclass EmbeddingModelComponent(LCEmbeddingsModel):\n    display_name = \"Embedding Model\"\n    description = \"Generate embeddings using a specified provider.\"\n    documentation: str = \"https://docs.langflow.org/components-embedding-models\"\n    icon = \"binary\"\n    name = \"EmbeddingModel\"\n    category = \"models\"\n\n    def update_build_config(self, build_config: dict, field_value: str, field_name: str | None = None):\n        \"\"\"Dynamically update build config with user-filtered model options.\"\"\"\n        # Update model options\n        build_config = update_model_options_in_build_config(\n            component=self,\n            build_config=build_config,\n            cache_key_prefix=\"embedding_model_options\",\n            get_options_func=get_embedding_model_options,\n            field_name=field_name,\n            field_value=field_value,\n        )\n\n        # Show/hide provider-specific fields based on selected model\n        if field_name == \"model\" and isinstance(field_value, list) and len(field_value) > 0:\n            selected_model = field_value[0]\n            provider = selected_model.get(\"provider\", \"\")\n\n            # Show/hide watsonx fields\n            is_watsonx = provider == \"IBM WatsonX\"\n            build_config[\"base_url_ibm_watsonx\"][\"show\"] = is_watsonx\n            build_config[\"project_id\"][\"show\"] = is_watsonx\n            build_config[\"truncate_input_tokens\"][\"show\"] = is_watsonx\n            build_config[\"input_text\"][\"show\"] = is_watsonx\n            if is_watsonx:\n                build_config[\"base_url_ibm_watsonx\"][\"required\"] = True\n                build_config[\"project_id\"][\"required\"] = True\n\n        return build_config\n\n    inputs = [\n        ModelInput(\n            name=\"model\",\n            display_name=\"Embedding Model\",\n            info=\"Select your model provider\",\n            real_time_refresh=True,\n            required=True,\n            model_type=\"embedding\",\n            input_types=[\"Embeddings\"],  # Override default to accept Embeddings instead of LanguageModel\n        ),\n        SecretStrInput(\n            name=\"api_key\",\n            display_name=\"API Key\",\n            info=\"Model Provider API key\",\n            real_time_refresh=True,\n            advanced=True,\n        ),\n        MessageTextInput(\n            name=\"api_base\",\n            display_name=\"API Base URL\",\n            info=\"Base URL for the API. Leave empty for default.\",\n            advanced=True,\n        ),\n        # Watson-specific inputs\n        DropdownInput(\n            name=\"base_url_ibm_watsonx\",\n            display_name=\"watsonx API Endpoint\",\n            info=\"The base URL of the API (IBM watsonx.ai only)\",\n            options=IBM_WATSONX_URLS,\n            value=IBM_WATSONX_URLS[0],\n            show=False,\n            real_time_refresh=True,\n        ),\n        MessageTextInput(\n            name=\"project_id\",\n            display_name=\"Project ID\",\n            info=\"IBM watsonx.ai Project ID (required for IBM watsonx.ai)\",\n            show=False,\n        ),\n        IntInput(\n            name=\"dimensions\",\n            display_name=\"Dimensions\",\n            info=\"The number of dimensions the resulting output embeddings should have. \"\n            \"Only supported by certain models.\",\n            advanced=True,\n        ),\n        IntInput(\n            name=\"chunk_size\",\n            display_name=\"Chunk Size\",\n            advanced=True,\n            value=1000,\n        ),\n        FloatInput(\n            name=\"request_timeout\",\n            display_name=\"Request Timeout\",\n            advanced=True,\n        ),\n        IntInput(\n            name=\"max_retries\",\n            display_name=\"Max Retries\",\n            advanced=True,\n            value=3,\n        ),\n        BoolInput(\n            name=\"show_progress_bar\",\n            display_name=\"Show Progress Bar\",\n            advanced=True,\n        ),\n        DictInput(\n            name=\"model_kwargs\",\n            display_name=\"Model Kwargs\",\n            advanced=True,\n            info=\"Additional keyword arguments to pass to the model.\",\n        ),\n        IntInput(\n            name=\"truncate_input_tokens\",\n            display_name=\"Truncate Input Tokens\",\n            advanced=True,\n            value=200,\n            show=False,\n        ),\n        BoolInput(\n            name=\"input_text\",\n            display_name=\"Include the original text in the output\",\n            value=True,\n            advanced=True,\n            show=False,\n        ),\n    ]\n\n    def build_embeddings(self) -> Embeddings:\n        \"\"\"Build and return an embeddings instance based on the selected model.\n\n        Returns an EmbeddingsWithModels wrapper that contains:\n        - The primary embedding instance (for the selected model)\n        - available_models dict mapping all available model names to their instances\n        \"\"\"\n        # If an Embeddings object is directly connected, return it\n        try:\n            from langchain_core.embeddings import Embeddings as BaseEmbeddings\n\n            if isinstance(self.model, BaseEmbeddings):\n                return self.model\n        except ImportError:\n            pass\n\n        # Safely extract model configuration\n        if not self.model or not isinstance(self.model, list):\n            msg = \"Model must be a non-empty list\"\n            raise ValueError(msg)\n\n        model = self.model[0]\n        model_name = model.get(\"name\")\n        provider = model.get(\"provider\")\n        metadata = model.get(\"metadata\", {})\n\n        # Get API key from user input or global variables\n        api_key = get_api_key_for_provider(self.user_id, provider, self.api_key)\n\n        # Validate required fields (Ollama doesn't require API key)\n        if not api_key and provider != \"Ollama\":\n            msg = (\n                f\"{provider} API key is required. \"\n                f\"Please provide it in the component or configure it globally as \"\n                f\"{provider.upper().replace(' ', '_')}_API_KEY.\"\n            )\n            raise ValueError(msg)\n\n        if not model_name:\n            msg = \"Model name is required\"\n            raise ValueError(msg)\n\n        # Get embedding class\n        embedding_class_name = metadata.get(\"embedding_class\")\n        if not embedding_class_name:\n            msg = f\"No embedding class defined in metadata for {model_name}\"\n            raise ValueError(msg)\n\n        embedding_class = get_embedding_classes().get(embedding_class_name)\n        if not embedding_class:\n            msg = f\"Unknown embedding class: {embedding_class_name}\"\n            raise ValueError(msg)\n\n        # Build kwargs using parameter mapping for primary instance\n        kwargs = self._build_kwargs(model, metadata)\n        primary_instance = embedding_class(**kwargs)\n\n        # Get all available embedding models for this provider\n        available_models_dict = self._build_available_models(\n            provider=provider,\n            embedding_class=embedding_class,\n            metadata=metadata,\n            api_key=api_key,\n        )\n\n        # Wrap with EmbeddingsWithModels to provide available_models metadata\n        return EmbeddingsWithModels(\n            embeddings=primary_instance,\n            available_models=available_models_dict,\n        )\n\n    def _build_available_models(\n        self,\n        provider: str,\n        embedding_class: type,\n        metadata: dict[str, Any],\n        api_key: str | None,\n    ) -> dict[str, Embeddings]:\n        \"\"\"Build a dictionary of all available embedding model instances for the provider.\n\n        Args:\n            provider: The provider name (e.g., \"OpenAI\", \"Ollama\")\n            embedding_class: The embedding class to instantiate\n            metadata: Metadata containing param_mapping\n            api_key: The API key for the provider\n\n        Returns:\n            Dict mapping model names to their embedding instances\n        \"\"\"\n        available_models_dict: dict[str, Embeddings] = {}\n\n        # Get all embedding models for this provider from unified models\n        all_embedding_models = get_unified_models_detailed(\n            providers=[provider],\n            model_type=\"embeddings\",\n            include_deprecated=False,\n            include_unsupported=False,\n        )\n\n        if not all_embedding_models:\n            return available_models_dict\n\n        # Extract models from the provider data\n        for provider_data in all_embedding_models:\n            if provider_data.get(\"provider\") != provider:\n                continue\n\n            for model_data in provider_data.get(\"models\", []):\n                model_name = model_data.get(\"model_name\")\n                if not model_name:\n                    continue\n\n                # Create a model dict compatible with _build_kwargs\n                model_dict = {\n                    \"name\": model_name,\n                    \"provider\": provider,\n                    \"metadata\": metadata,  # Reuse the same metadata/param_mapping\n                }\n\n                try:\n                    # Build kwargs for this model\n                    model_kwargs = self._build_kwargs_for_model(model_dict, metadata, api_key)\n                    # Create the embedding instance\n                    available_models_dict[model_name] = embedding_class(**model_kwargs)\n                except Exception:  # noqa: BLE001\n                    # Skip models that fail to instantiate\n                    # This handles cases where specific models have incompatible parameters\n                    logger.debug(\"Failed to instantiate embedding model %s: skipping\", model_name, exc_info=True)\n                    continue\n\n        return available_models_dict\n\n    def _build_kwargs_for_model(\n        self,\n        model: dict[str, Any],\n        metadata: dict[str, Any],\n        api_key: str | None,\n    ) -> dict[str, Any]:\n        \"\"\"Build kwargs dictionary for a specific model using parameter mapping.\n\n        This is similar to _build_kwargs but uses the provided api_key directly\n        instead of looking it up again.\n\n        Args:\n            model: Model dict with name and provider\n            metadata: Metadata containing param_mapping\n            api_key: The API key to use\n\n        Returns:\n            kwargs dict for embedding class instantiation\n        \"\"\"\n        param_mapping = metadata.get(\"param_mapping\", {})\n        if not param_mapping:\n            msg = \"Parameter mapping not found in metadata\"\n            raise ValueError(msg)\n\n        kwargs = {}\n        provider = model.get(\"provider\")\n\n        # Required parameters - handle both \"model\" and \"model_id\" (for watsonx)\n        if \"model\" in param_mapping:\n            kwargs[param_mapping[\"model\"]] = model.get(\"name\")\n        elif \"model_id\" in param_mapping:\n            kwargs[param_mapping[\"model_id\"]] = model.get(\"name\")\n\n        # Add API key if mapped\n        if \"api_key\" in param_mapping and api_key:\n            kwargs[param_mapping[\"api_key\"]] = api_key\n\n        # Optional parameters with their values\n        optional_params = {\n            \"api_base\": self.api_base if self.api_base else None,\n            \"dimensions\": int(self.dimensions) if self.dimensions else None,\n            \"chunk_size\": int(self.chunk_size) if self.chunk_size else None,\n            \"request_timeout\": float(self.request_timeout) if self.request_timeout else None,\n            \"max_retries\": int(self.max_retries) if self.max_retries else None,\n            \"show_progress_bar\": self.show_progress_bar if hasattr(self, \"show_progress_bar\") else None,\n            \"model_kwargs\": self.model_kwargs if self.model_kwargs else None,\n        }\n\n        # Watson-specific parameters\n        if provider in {\"IBM WatsonX\", \"IBM watsonx.ai\"}:\n            # Map base_url_ibm_watsonx to \"url\" parameter for watsonx\n            if \"url\" in param_mapping:\n                url_value = (\n                    self.base_url_ibm_watsonx\n                    if hasattr(self, \"base_url_ibm_watsonx\") and self.base_url_ibm_watsonx\n                    else \"https://us-south.ml.cloud.ibm.com\"\n                )\n                kwargs[param_mapping[\"url\"]] = url_value\n            # Map project_id for watsonx\n            if hasattr(self, \"project_id\") and self.project_id and \"project_id\" in param_mapping:\n                kwargs[param_mapping[\"project_id\"]] = self.project_id\n\n        # Ollama-specific parameters\n        if provider == \"Ollama\" and \"base_url\" in param_mapping:\n            # Map api_base to \"base_url\" parameter for Ollama\n            base_url_value = self.api_base if hasattr(self, \"api_base\") and self.api_base else \"http://localhost:11434\"\n            kwargs[param_mapping[\"base_url\"]] = base_url_value\n\n        # Add optional parameters if they have values and are mapped\n        for param_name, param_value in optional_params.items():\n            if param_value is not None and param_name in param_mapping:\n                # Special handling for request_timeout with Google provider\n                if param_name == \"request_timeout\":\n                    if provider == \"Google Generative AI\" and isinstance(param_value, (int, float)):\n                        kwargs[param_mapping[param_name]] = {\"timeout\": param_value}\n                    else:\n                        kwargs[param_mapping[param_name]] = param_value\n                else:\n                    kwargs[param_mapping[param_name]] = param_value\n\n        return kwargs\n\n    def _build_kwargs(self, model: dict[str, Any], metadata: dict[str, Any]) -> dict[str, Any]:\n        \"\"\"Build kwargs dictionary using parameter mapping.\"\"\"\n        param_mapping = metadata.get(\"param_mapping\", {})\n        if not param_mapping:\n            msg = \"Parameter mapping not found in metadata\"\n            raise ValueError(msg)\n\n        kwargs = {}\n\n        # Required parameters - handle both \"model\" and \"model_id\" (for watsonx)\n        if \"model\" in param_mapping:\n            kwargs[param_mapping[\"model\"]] = model.get(\"name\")\n        elif \"model_id\" in param_mapping:\n            kwargs[param_mapping[\"model_id\"]] = model.get(\"name\")\n        if \"api_key\" in param_mapping:\n            kwargs[param_mapping[\"api_key\"]] = get_api_key_for_provider(\n                self.user_id,\n                model.get(\"provider\"),\n                self.api_key,\n            )\n\n        # Optional parameters with their values\n        provider = model.get(\"provider\")\n        optional_params = {\n            \"api_base\": self.api_base if self.api_base else None,\n            \"dimensions\": int(self.dimensions) if self.dimensions else None,\n            \"chunk_size\": int(self.chunk_size) if self.chunk_size else None,\n            \"request_timeout\": float(self.request_timeout) if self.request_timeout else None,\n            \"max_retries\": int(self.max_retries) if self.max_retries else None,\n            \"show_progress_bar\": self.show_progress_bar if hasattr(self, \"show_progress_bar\") else None,\n            \"model_kwargs\": self.model_kwargs if self.model_kwargs else None,\n        }\n\n        # Watson-specific parameters\n        if provider in {\"IBM WatsonX\", \"IBM watsonx.ai\"}:\n            # Map base_url_ibm_watsonx to \"url\" parameter for watsonx\n            if \"url\" in param_mapping:\n                url_value = (\n                    self.base_url_ibm_watsonx\n                    if hasattr(self, \"base_url_ibm_watsonx\") and self.base_url_ibm_watsonx\n                    else \"https://us-south.ml.cloud.ibm.com\"\n                )\n                kwargs[param_mapping[\"url\"]] = url_value\n            # Map project_id for watsonx\n            if hasattr(self, \"project_id\") and self.project_id and \"project_id\" in param_mapping:\n                kwargs[param_mapping[\"project_id\"]] = self.project_id\n\n        # Ollama-specific parameters\n        if provider == \"Ollama\" and \"base_url\" in param_mapping:\n            # Map api_base to \"base_url\" parameter for Ollama\n            base_url_value = self.api_base if hasattr(self, \"api_base\") and self.api_base else \"http://localhost:11434\"\n            kwargs[param_mapping[\"base_url\"]] = base_url_value\n\n        # Add optional parameters if they have values and are mapped\n        for param_name, param_value in optional_params.items():\n            if param_value is not None and param_name in param_mapping:\n                # Special handling for request_timeout with Google provider\n                if param_name == \"request_timeout\":\n                    if provider == \"Google Generative AI\" and isinstance(param_value, (int, float)):\n                        kwargs[param_mapping[param_name]] = {\"timeout\": param_value}\n                    else:\n                        kwargs[param_mapping[param_name]] = param_value\n                else:\n                    kwargs[param_mapping[param_name]] = param_value\n\n        return kwargs\n"


⚠️ Potential issue | 🟡 Minor

Return type annotation mismatch.

The build_embeddings method signature declares -> Embeddings but actually returns EmbeddingsWithModels. This type mismatch can cause issues with static type checkers and mislead developers about the actual return type.

Suggested fix (within the embedded code)

- def build_embeddings(self) -> Embeddings: + def build_embeddings(self) -> EmbeddingsWithModels:

🤖 Prompt for AI Agents

In `@src/lfx/src/lfx/_assets/component_index.json` at line 89139, The build_embeddings method currently annotates its return as "-> Embeddings" but returns an EmbeddingsWithModels instance; update the signature of EmbeddingModelComponent.build_embeddings to return EmbeddingsWithModels (or a union like Embeddings | EmbeddingsWithModels) to match the actual return value, and adjust the docstring if needed; reference the method name build_embeddings and the class EmbeddingsWithModels so you update the annotation where the method is defined.

coderabbitai · 2026-01-21T17:39:34Z

src/lfx/src/lfx/base/models/google_generative_ai_constants.py

+# Google Generative AI Embedding Models
+GOOGLE_GENERATIVE_AI_EMBEDDING_MODELS = [
+    "models/text-embedding-004",
+    "models/embedding-001",
+]
+
+# Embedding models as detailed metadata
+GOOGLE_GENERATIVE_AI_EMBEDDING_MODELS_DETAILED = [
+    create_model_metadata(
+        provider="Google Generative AI",
+        name=name,
+        icon="GoogleGenerativeAI",
+        model_type="embeddings",
+    )
+    for name in GOOGLE_GENERATIVE_AI_EMBEDDING_MODELS
+]


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

Google Generative AI embedding models text-embedding-004 embedding-001 current

💡 Result:

Short answer: Google’s current recommended text embedding model is gemini-embedding-001 (Gemini Embedding). Google is deprecating legacy embedding models (embedding-001 and text-embedding-004); Google recommends migrating to gemini-embedding-001. [1][2][3]

Key sources:

Gemini Embeddings docs (model name, defaults, dims, usage examples). [1]

Google Developers blog announcing Gemini Embedding GA and deprecation schedule (text-embedding-004 retirement Jan 14, 2026; embedding-001 retirement Aug 14, 2025; guidance to migrate to gemini-embedding-001). [2]

Research / paper describing Gemini Embedding and its MTEB/MMTEB results. [3]

If you want, I can show example API calls (Python/REST) to migrate embeddings to gemini-embedding-001 and recommended output dimensionalities.

Update embedding models to use current Google Generative AI API.

The models embedding-001 and text-embedding-004 are deprecated and have been retired by Google:

embedding-001: Retired August 14, 2025

text-embedding-004: Retiring January 14, 2026

Replace these with Google's current recommended model, gemini-embedding-001, which is the officially supported embedding model.

Update:

GOOGLE_GENERATIVE_AI_EMBEDDING_MODELS = [ "models/gemini-embedding-001", ]

🤖 Prompt for AI Agents

In `@src/lfx/src/lfx/base/models/google_generative_ai_constants.py` around lines 96 - 111, The embedding model list uses deprecated Google models; update GOOGLE_GENERATIVE_AI_EMBEDDING_MODELS to contain the current supported model "models/gemini-embedding-001" and ensure GOOGLE_GENERATIVE_AI_EMBEDDING_MODELS_DETAILED (which uses create_model_metadata) will automatically reflect that change by iterating over the updated list; replace the existing two entries with a single "models/gemini-embedding-001".

coderabbitai · 2026-01-21T17:39:34Z

src/lfx/src/lfx/components/models_and_agents/embedding_model.py

+import logging
 from typing import Any

+from lfx.base.embeddings.embeddings_class import EmbeddingsWithModels
 from lfx.base.embeddings.model import LCEmbeddingsModel
 from lfx.base.models.unified_models import (
    get_api_key_for_provider,
    get_embedding_classes,
    get_embedding_model_options,
+    get_unified_models_detailed,
    update_model_options_in_build_config,
 )
+
+logger = logging.getLogger(__name__)
 from lfx.base.models.watsonx_constants import IBM_WATSONX_URLS


⚠️ Potential issue | 🟠 Major

Move logger initialization below imports to fix Ruff E402.

Ruff fails because a module-level statement appears before imports. Move logger = logging.getLogger(__name__) after the full import block to satisfy E402.

🧹 Proposed fix

-import logging -from typing import Any - -from lfx.base.embeddings.embeddings_class import EmbeddingsWithModels -from lfx.base.embeddings.model import LCEmbeddingsModel -from lfx.base.models.unified_models import ( - get_api_key_for_provider, - get_embedding_classes, - get_embedding_model_options, - get_unified_models_detailed, - update_model_options_in_build_config, -) - -logger = logging.getLogger(__name__) -from lfx.base.models.watsonx_constants import IBM_WATSONX_URLS -from lfx.field_typing import Embeddings -from lfx.io import ( +import logging +from typing import Any + +from lfx.base.embeddings.embeddings_class import EmbeddingsWithModels +from lfx.base.embeddings.model import LCEmbeddingsModel +from lfx.base.models.unified_models import ( + get_api_key_for_provider, + get_embedding_classes, + get_embedding_model_options, + get_unified_models_detailed, + update_model_options_in_build_config, +) +from lfx.base.models.watsonx_constants import IBM_WATSONX_URLS +from lfx.field_typing import Embeddings +from lfx.io import ( BoolInput, DictInput, DropdownInput, FloatInput, IntInput, MessageTextInput, ModelInput, SecretStrInput, ) + +logger = logging.getLogger(__name__)

🧰 Tools

🪛 GitHub Actions: Ruff Style Check

[error] 15-15: Ruff check failed. E402: Module level import not at top of file. Move imports to the top of the file. Command: uv run --only-dev ruff check --output-format=github .

🪛 GitHub Check: Ruff Style Check (3.13)

[failure] 15-15: Ruff (E402)
src/lfx/src/lfx/components/models_and_agents/embedding_model.py:15:1: E402 Module level import not at top of file

🤖 Prompt for AI Agents

In `@src/lfx/src/lfx/components/models_and_agents/embedding_model.py` around lines 1 - 15, Move the module-level logger initialization so it appears after the full import block to satisfy Ruff E402: relocate the line "logger = logging.getLogger(__name__)" to below the last import (e.g., after the reference to IBM_WATSONX_URLS) in embedding_model.py and ensure no other executable statements intervene between imports and that logger assignment.

coderabbitai · 2026-01-21T17:39:34Z

src/lfx/src/lfx/components/models_and_agents/embedding_model.py

+            for model_data in provider_data.get("models", []):
+                model_name = model_data.get("model_name")
+                if not model_name:
+                    continue
+
+                # Create a model dict compatible with _build_kwargs
+                model_dict = {
+                    "name": model_name,
+                    "provider": provider,
+                    "metadata": metadata,  # Reuse the same metadata/param_mapping
+                }
+
+                try:
+                    # Build kwargs for this model
+                    model_kwargs = self._build_kwargs_for_model(model_dict, metadata, api_key)
+                    # Create the embedding instance
+                    available_models_dict[model_name] = embedding_class(**model_kwargs)
+                except Exception:  # noqa: BLE001
+                    # Skip models that fail to instantiate
+                    # This handles cases where specific models have incompatible parameters
+                    logger.debug("Failed to instantiate embedding model %s: skipping", model_name, exc_info=True)


⚠️ Potential issue | 🟠 Major

Use per-model metadata (and embedding_class) when building available models.

Right now every model reuses the selected model’s metadata/embedding_class. If a provider has model-specific param_mapping or a different embedding_class, those models will either be misconfigured or silently skipped. Consider deriving metadata and class from each model_data.

🔧 Proposed fix

- for model_data in provider_data.get("models", []): + for model_data in provider_data.get("models", []): model_name = model_data.get("model_name") if not model_name: continue + + model_metadata = model_data.get("metadata") or {} + effective_metadata = model_metadata or metadata + embedding_class_name = ( + effective_metadata.get("embedding_class") or metadata.get("embedding_class") + ) + model_embedding_class = ( + get_embedding_classes().get(embedding_class_name) or embedding_class + ) # Create a model dict compatible with _build_kwargs model_dict = { "name": model_name, "provider": provider, - "metadata": metadata, # Reuse the same metadata/param_mapping + "metadata": effective_metadata, } try: # Build kwargs for this model - model_kwargs = self._build_kwargs_for_model(model_dict, metadata, api_key) + model_kwargs = self._build_kwargs_for_model(model_dict, effective_metadata, api_key) # Create the embedding instance - available_models_dict[model_name] = embedding_class(**model_kwargs) + available_models_dict[model_name] = model_embedding_class(**model_kwargs)

🤖 Prompt for AI Agents

In `@src/lfx/src/lfx/components/models_and_agents/embedding_model.py` around lines 263 - 283, The loop currently reuses the provider-level metadata and embedding_class for every model; change it to extract per-model metadata and embedding_class from model_data (falling back to provider-level values if absent), then call _build_kwargs_for_model with that model-specific metadata and api_key and instantiate using the model-specific embedding_class when populating available_models_dict[model_name]; keep the try/except and logging but ensure the correct per-model symbols (model_data, metadata_from_model, embedding_class_from_model, _build_kwargs_for_model, available_models_dict) are used so models with bespoke param_mapping or classes are configured and instantiated correctly.

Update component_index.json

d0e9bce

github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jan 15, 2026

autofix-ci bot and others added 3 commits January 15, 2026 21:26

[autofix.ci] apply automated fixes

108fb47

Reset flows

e5737e8

Merge branch 'EJ/embedding_models_update' of https://github.com/langf…

347d9f6

…low-ai/langflow into EJ/embedding_models_update

github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jan 15, 2026

reset flows

cc52e2b

github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jan 15, 2026

[autofix.ci] apply automated fixes

7f9050a

github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jan 15, 2026

[autofix.ci] apply automated fixes (attempt 2/3)

d7d0707

github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jan 15, 2026

lucaseduoli approved these changes Jan 16, 2026

View reviewed changes

edwinjosechittilappilly mentioned this pull request Jan 16, 2026

Update OpenRAG Langflow 1.7.1 Nightly langflow-ai/openrag#737

Open

Merge branch 'main' into EJ/embedding_models_update

6996eb5