Skip to content

Conversation

@edwinjosechittilappilly
Copy link
Collaborator

@edwinjosechittilappilly edwinjosechittilappilly commented Jan 15, 2026

Adding Available models support for embeddings component such that selected providers all available models are available for the embeddings models component.

Summary by CodeRabbit

Release Notes

  • New Features

    • Embedding Model component now exposes a collection of available embedding models for each provider alongside the primary embedding instance.
    • Added support for Google Generative AI embedding models.
  • Tests

    • Enhanced test coverage for embedding model functionality with new test cases for multiple model availability.

✏️ Tip: You can customize this high-level summary in your review settings.

Eliminated the 'markitdown' dependency and Markdown output option from the URLComponent in Blog Writer, Knowledge Ingestion, and Simple Agent starter projects. Updated the code and configuration to only support 'Text' and 'HTML' output formats. Also added a 'Local' storage option to Document Q&A, News Aggregator, and Portfolio Website Code Generator starter projects.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 15, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

The changes expand the EmbeddingModelComponent to support building multiple embedding models for a given provider. A new EmbeddingsWithModels wrapper is introduced to return both the primary embedding instance and a dictionary of available models keyed by model name. Helper methods are added for constructing model-specific kwargs and enumerating provider-specific embedding models from unified model data.

Changes

Cohort / File(s) Summary
EmbeddingModelComponent Implementation
src/lfx/src/lfx/components/models_and_agents/embedding_model.py, src/backend/base/langflow/initial_setup/starter_projects/Nvidia Remix.json, src/lfx/src/lfx/_assets/component_index.json
Substantial refactor of EmbeddingModelComponent to return EmbeddingsWithModels wrapper instead of single embedding instance. Added three new internal helper methods: _build_available_models, _build_kwargs_for_model, and updated _build_kwargs. Enhanced provider-specific handling for IBM WatsonX, Ollama, and Google Generative AI with timeout support.
Test Coverage Updates
src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py
Extended tests to verify EmbeddingsWithModels wrapper behavior, added new test for available_models population from unified models, and updated existing test assertions to account for the new composite return type.
Google Generative AI Embedding Models
src/lfx/src/lfx/base/models/google_generative_ai_constants.py
Added new public constants GOOGLE_GENERATIVE_AI_EMBEDDING_MODELS and GOOGLE_GENERATIVE_AI_EMBEDDING_MODELS_DETAILED to expose embedding model metadata.
Unified Models Integration
src/lfx/src/lfx/base/models/unified_models.py
Imported and integrated GOOGLE_GENERATIVE_AI_EMBEDDING_MODELS_DETAILED into get_models_detailed to expose Google embedding models alongside standard models.
Hash and Metadata Updates
src/lfx/src/lfx/_assets/stable_hash_history.json
Updated code hash for EmbeddingModel component from 277f5f28215b to 0b1313e6065f to reflect implementation changes.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant EmbeddingModelComponent
    participant GetUnifiedModels
    participant EmbeddingClass
    participant EmbeddingsWithModels as EmbeddingsWithModels<br/>(Wrapper)

    Client->>EmbeddingModelComponent: build_embeddings()
    EmbeddingModelComponent->>EmbeddingModelComponent: Extract provider, model, api_key
    EmbeddingModelComponent->>EmbeddingClass: Instantiate primary embedding<br/>via _build_kwargs()
    EmbeddingClass-->>EmbeddingModelComponent: primary_embedding instance
    
    EmbeddingModelComponent->>GetUnifiedModels: get_unified_models_detailed(provider)
    GetUnifiedModels-->>EmbeddingModelComponent: List of all provider models
    
    EmbeddingModelComponent->>EmbeddingModelComponent: _build_available_models()
    loop For each provider model
        EmbeddingModelComponent->>EmbeddingModelComponent: _build_kwargs_for_model(model)
        EmbeddingModelComponent->>EmbeddingClass: Instantiate embedding for model
        EmbeddingClass-->>EmbeddingModelComponent: model_embedding instance
    end
    
    EmbeddingModelComponent->>EmbeddingsWithModels: Create wrapper with<br/>embeddings + available_models dict
    EmbeddingsWithModels-->>Client: Return EmbeddingsWithModels
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • phact

Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 3 warnings)
Check name Status Explanation Resolution
Test Coverage For New Implementations ❌ Error Test coverage is insufficient to validate new functionality including private methods, provider-specific handling, and per-model metadata/embedding_class configurations; bugs in implementation were not caught by tests. Expand test coverage to directly test private methods, add provider-specific tests, verify per-model metadata/embedding_class handling, and add error handling tests using parametrized approaches.
Docstring Coverage ⚠️ Warning Docstring coverage is 54.55% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Quality And Coverage ⚠️ Warning Test coverage is incomplete; critical helper methods lack unit tests, provider-specific parameter construction untested, and type annotation mismatch undetected. Add dedicated unit tests for _build_available_models, _build_kwargs_for_model, per-model metadata handling, provider-specific params, and error scenarios; correct return type annotation.
Excessive Mock Usage Warning ⚠️ Warning Tests use 7-8+ mocks per test case, masking real bugs: WatsonX parameters not wired correctly, metadata reuse across models, parameter validation bypassed by permissive mocks. Add integration tests validating actual parameter mapping with real embedding instances; reduce mocks by directly testing _build_kwargs methods without mocking them internally.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding support for available models in the Embeddings component, which is the core feature across all modified files.
Test File Naming And Structure ✅ Passed Test file follows proper pytest conventions with appropriate fixtures, mocking, and comprehensive coverage of positive and negative scenarios.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch EJ/embedding_models_update

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jan 15, 2026
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jan 15, 2026
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jan 15, 2026
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jan 15, 2026
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jan 15, 2026
Copy link
Collaborator

@lucaseduoli lucaseduoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just ruff fixes

@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jan 21, 2026
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jan 21, 2026
Copilot AI review requested due to automatic review settings January 21, 2026 17:25
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jan 21, 2026
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jan 21, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Jan 21, 2026

Frontend Unit Test Coverage Report

Coverage Summary

Lines Statements Branches Functions
Coverage: 17%
17.55% (5029/28643) 10.89% (2403/22050) 11.65% (731/6274)

Unit Test Results

Tests Skipped Failures Errors Time
2006 0 💤 0 ❌ 0 🔥 27.328s ⏱️

@codecov
Copy link

codecov bot commented Jan 21, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 34.57%. Comparing base (e8753a3) to head (66a036d).
⚠️ Report is 1 commits behind head on main.

❌ Your project status has failed because the head coverage (41.63%) is below the target coverage (60.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main   #11320      +/-   ##
==========================================
+ Coverage   34.55%   34.57%   +0.02%     
==========================================
  Files        1416     1416              
  Lines       67422    67424       +2     
  Branches     9931     9931              
==========================================
+ Hits        23296    23311      +15     
+ Misses      42902    42888      -14     
- Partials     1224     1225       +1     
Flag Coverage Δ
backend 53.53% <ø> (+0.02%) ⬆️
frontend 16.07% <ø> (ø)
lfx 41.63% <100.00%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
.../lfx/base/models/google_generative_ai_constants.py 100.00% <100.00%> (ø)
src/lfx/src/lfx/base/models/unified_models.py 23.74% <ø> (ø)

... and 5 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds support for available models metadata to the Embeddings component, enabling multi-model support by providing a wrapper that contains both the primary embedding instance and a dictionary of all available models from the same provider.

Changes:

  • Modified EmbeddingModelComponent to return an EmbeddingsWithModels wrapper containing both the primary embedding instance and all available model instances for the provider
  • Added Google Generative AI embedding models to the unified models constants
  • Updated tests to verify the new wrapper behavior and available models functionality

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/lfx/src/lfx/components/models_and_agents/embedding_model.py Core implementation: added _build_available_models and _build_kwargs_for_model methods; modified build_embeddings to return EmbeddingsWithModels wrapper; fixed Google provider name consistency
src/lfx/src/lfx/base/models/unified_models.py Added GOOGLE_GENERATIVE_AI_EMBEDDING_MODELS_DETAILED to the detailed models list
src/lfx/src/lfx/base/models/google_generative_ai_constants.py Added embedding models constants for Google Generative AI (text-embedding-004, embedding-001)
src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py Updated existing tests to verify EmbeddingsWithModels wrapper and added new test for available models population
src/lfx/src/lfx/_assets/stable_hash_history.json Updated component hash for EmbeddingModel component
src/lfx/src/lfx/_assets/component_index.json Updated component metadata including code hash and full component code
Comments suppressed due to low confidence (2)

src/lfx/src/lfx/components/models_and_agents/embedding_model.py:168

  • When an Embeddings object is directly connected (line 168), it is returned as-is without wrapping it in EmbeddingsWithModels. This creates an inconsistency with the documented return type behavior. All return paths should consistently return an EmbeddingsWithModels instance to ensure uniform handling downstream. Consider wrapping the directly connected embeddings in an EmbeddingsWithModels instance with an empty available_models dict.
        try:
            from langchain_core.embeddings import Embeddings as BaseEmbeddings

            if isinstance(self.model, BaseEmbeddings):
                return self.model
        except ImportError:

src/lfx/src/lfx/components/models_and_agents/embedding_model.py:15

  • The logger initialization should be moved before the subsequent imports to follow Python's conventional import organization pattern. Logger setup typically appears after the imports from the standard library and third-party packages, but before importing from other modules in the same package.
from lfx.base.models.watsonx_constants import IBM_WATSONX_URLS
from lfx.field_typing import Embeddings
from lfx.io import (

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +288 to 368
model: dict[str, Any],
metadata: dict[str, Any],
api_key: str | None,
) -> dict[str, Any]:
"""Build kwargs dictionary for a specific model using parameter mapping.
This is similar to _build_kwargs but uses the provided api_key directly
instead of looking it up again.
Args:
model: Model dict with name and provider
metadata: Metadata containing param_mapping
api_key: The API key to use
Returns:
kwargs dict for embedding class instantiation
"""
param_mapping = metadata.get("param_mapping", {})
if not param_mapping:
msg = "Parameter mapping not found in metadata"
raise ValueError(msg)

kwargs = {}
provider = model.get("provider")

# Required parameters - handle both "model" and "model_id" (for watsonx)
if "model" in param_mapping:
kwargs[param_mapping["model"]] = model.get("name")
elif "model_id" in param_mapping:
kwargs[param_mapping["model_id"]] = model.get("name")

# Add API key if mapped
if "api_key" in param_mapping and api_key:
kwargs[param_mapping["api_key"]] = api_key

# Optional parameters with their values
optional_params = {
"api_base": self.api_base if self.api_base else None,
"dimensions": int(self.dimensions) if self.dimensions else None,
"chunk_size": int(self.chunk_size) if self.chunk_size else None,
"request_timeout": float(self.request_timeout) if self.request_timeout else None,
"max_retries": int(self.max_retries) if self.max_retries else None,
"show_progress_bar": self.show_progress_bar if hasattr(self, "show_progress_bar") else None,
"model_kwargs": self.model_kwargs if self.model_kwargs else None,
}

# Watson-specific parameters
if provider in {"IBM WatsonX", "IBM watsonx.ai"}:
# Map base_url_ibm_watsonx to "url" parameter for watsonx
if "url" in param_mapping:
url_value = (
self.base_url_ibm_watsonx
if hasattr(self, "base_url_ibm_watsonx") and self.base_url_ibm_watsonx
else "https://us-south.ml.cloud.ibm.com"
)
kwargs[param_mapping["url"]] = url_value
# Map project_id for watsonx
if hasattr(self, "project_id") and self.project_id and "project_id" in param_mapping:
kwargs[param_mapping["project_id"]] = self.project_id

# Ollama-specific parameters
if provider == "Ollama" and "base_url" in param_mapping:
# Map api_base to "base_url" parameter for Ollama
base_url_value = self.api_base if hasattr(self, "api_base") and self.api_base else "http://localhost:11434"
kwargs[param_mapping["base_url"]] = base_url_value

# Add optional parameters if they have values and are mapped
for param_name, param_value in optional_params.items():
if param_value is not None and param_name in param_mapping:
# Special handling for request_timeout with Google provider
if param_name == "request_timeout":
if provider == "Google Generative AI" and isinstance(param_value, (int, float)):
kwargs[param_mapping[param_name]] = {"timeout": param_value}
else:
kwargs[param_mapping[param_name]] = param_value
else:
kwargs[param_mapping[param_name]] = param_value

return kwargs

def _build_kwargs(self, model: dict[str, Any], metadata: dict[str, Any]) -> dict[str, Any]:
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is significant code duplication between the _build_kwargs_for_model and _build_kwargs methods. Both methods share identical logic for handling Watson-specific parameters, Ollama-specific parameters, and the Google Generative AI timeout handling. Consider refactoring this shared logic into a common helper method to improve maintainability and reduce the risk of inconsistencies when making future updates.

Copilot uses AI. Check for mistakes.
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jan 21, 2026
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jan 21, 2026
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jan 21, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/lfx/src/lfx/components/models_and_agents/embedding_model.py (1)

156-169: Keep the return type consistent with the new EmbeddingsWithModels contract.

The docstring now promises an EmbeddingsWithModels, but the early return still returns a raw Embeddings. Consider wrapping direct inputs (or adjust the docstring) to avoid downstream surprises.

✅ Suggested adjustment
-            if isinstance(self.model, BaseEmbeddings):
-                return self.model
+            if isinstance(self.model, BaseEmbeddings):
+                if isinstance(self.model, EmbeddingsWithModels):
+                    return self.model
+                return EmbeddingsWithModels(embeddings=self.model, available_models={})
🤖 Fix all issues with AI agents
In `@src/backend/base/langflow/initial_setup/starter_projects/Nvidia` Remix.json:
- Line 1821: The WatsonX-specific inputs truncate_input_tokens and input_text
are not being forwarded to the Watsonx embeddings because they aren’t present in
param_mapping and must be passed inside a special params dict using IBM SDK
meta-names; update the IBM watsonx param_mapping to include a mapping for a
params/metadata key (e.g., "params") and then in both _build_kwargs and
_build_kwargs_for_model (and where provider in {"IBM WatsonX","IBM watsonx.ai"}
is checked) construct a params_dict that sets
EmbedTextParamsMetaNames.TRUNCATE_INPUT_TOKENS from self.truncate_input_tokens
and EmbedTextParamsMetaNames.RETURN_OPTIONS based on self.input_text, then
attach that params_dict to kwargs using the param_mapping entry (or fallback to
kwargs["params"]) so WatsonxEmbeddings receives the proper params payload.
Ensure you reference EmbedTextParamsMetaNames, truncate_input_tokens,
input_text, _build_kwargs, and _build_kwargs_for_model in the change.

In `@src/lfx/src/lfx/_assets/component_index.json`:
- Line 89139: The build_embeddings method currently annotates its return as "->
Embeddings" but returns an EmbeddingsWithModels instance; update the signature
of EmbeddingModelComponent.build_embeddings to return EmbeddingsWithModels (or a
union like Embeddings | EmbeddingsWithModels) to match the actual return value,
and adjust the docstring if needed; reference the method name build_embeddings
and the class EmbeddingsWithModels so you update the annotation where the method
is defined.

In `@src/lfx/src/lfx/base/models/google_generative_ai_constants.py`:
- Around line 96-111: The embedding model list uses deprecated Google models;
update GOOGLE_GENERATIVE_AI_EMBEDDING_MODELS to contain the current supported
model "models/gemini-embedding-001" and ensure
GOOGLE_GENERATIVE_AI_EMBEDDING_MODELS_DETAILED (which uses
create_model_metadata) will automatically reflect that change by iterating over
the updated list; replace the existing two entries with a single
"models/gemini-embedding-001".

In `@src/lfx/src/lfx/components/models_and_agents/embedding_model.py`:
- Around line 1-15: Move the module-level logger initialization so it appears
after the full import block to satisfy Ruff E402: relocate the line "logger =
logging.getLogger(__name__)" to below the last import (e.g., after the reference
to IBM_WATSONX_URLS) in embedding_model.py and ensure no other executable
statements intervene between imports and that logger assignment.
- Around line 263-283: The loop currently reuses the provider-level metadata and
embedding_class for every model; change it to extract per-model metadata and
embedding_class from model_data (falling back to provider-level values if
absent), then call _build_kwargs_for_model with that model-specific metadata and
api_key and instantiate using the model-specific embedding_class when populating
available_models_dict[model_name]; keep the try/except and logging but ensure
the correct per-model symbols (model_data, metadata_from_model,
embedding_class_from_model, _build_kwargs_for_model, available_models_dict) are
used so models with bespoke param_mapping or classes are configured and
instantiated correctly.
🧹 Nitpick comments (4)
src/lfx/src/lfx/_assets/component_index.json (2)

89139-89139: Import ordering: logger initialization misplaced between imports.

The logger = logging.getLogger(__name__) statement is placed between import blocks, which violates PEP 8 style guidelines. All imports should be grouped together before any module-level code.

Suggested fix (within the embedded code)
 from lfx.base.models.unified_models import (
     get_api_key_for_provider,
     get_embedding_classes,
     get_embedding_model_options,
     get_unified_models_detailed,
     update_model_options_in_build_config,
 )
-
-logger = logging.getLogger(__name__)
 from lfx.base.models.watsonx_constants import IBM_WATSONX_URLS
 from lfx.field_typing import Embeddings
 from lfx.io import (
     BoolInput,
     DictInput,
     DropdownInput,
     FloatInput,
     IntInput,
     MessageTextInput,
     ModelInput,
     SecretStrInput,
 )
+
+logger = logging.getLogger(__name__)

89139-89139: Significant code duplication between _build_kwargs and _build_kwargs_for_model.

These two methods share ~90% identical code for handling optional parameters, Watson-specific parameters, Ollama-specific parameters, and Google-specific timeout handling. The only difference is that _build_kwargs_for_model receives api_key as a parameter while _build_kwargs calls get_api_key_for_provider internally.

Consider refactoring to a single private method that accepts an optional api_key parameter, eliminating the duplication.

Suggested refactor approach
def _build_kwargs(
    self,
    model: dict[str, Any],
    metadata: dict[str, Any],
    api_key: str | None = None,
) -> dict[str, Any]:
    """Build kwargs dictionary using parameter mapping.
    
    Args:
        model: Model dict with name and provider
        metadata: Metadata containing param_mapping
        api_key: Optional API key. If not provided, will be fetched via get_api_key_for_provider.
    """
    param_mapping = metadata.get("param_mapping", {})
    if not param_mapping:
        msg = "Parameter mapping not found in metadata"
        raise ValueError(msg)

    kwargs = {}
    provider = model.get("provider")

    # Required parameters
    if "model" in param_mapping:
        kwargs[param_mapping["model"]] = model.get("name")
    elif "model_id" in param_mapping:
        kwargs[param_mapping["model_id"]] = model.get("name")

    # API key - use provided or fetch
    if "api_key" in param_mapping:
        resolved_api_key = api_key if api_key is not None else get_api_key_for_provider(
            self.user_id, provider, self.api_key
        )
        if resolved_api_key:
            kwargs[param_mapping["api_key"]] = resolved_api_key

    # ... rest of the shared logic (optional params, provider-specific handling)

Then remove _build_kwargs_for_model and update calls to pass api_key when available.

src/lfx/src/lfx/components/models_and_agents/embedding_model.py (1)

288-368: Optional: reduce duplication with _build_kwargs to avoid drift.

_build_kwargs_for_model largely mirrors _build_kwargs. A small helper or an api_key override in _build_kwargs would simplify maintenance.

src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py (1)

210-267: Strengthen available_models verification with instance checks.

Right now the test only asserts keys; adding value assertions will confirm each model maps to its dedicated instance.

✅ Suggested test tightening
         assert "text-embedding-3-small" in embeddings.available_models
         assert "text-embedding-3-large" in embeddings.available_models
         assert "text-embedding-ada-002" in embeddings.available_models
         assert len(embeddings.available_models) == 3
+        assert (
+            embeddings.available_models["text-embedding-3-small"]
+            is mock_instances["text-embedding-3-small"]
+        )
+        assert (
+            embeddings.available_models["text-embedding-3-large"]
+            is mock_instances["text-embedding-3-large"]
+        )
+        assert (
+            embeddings.available_models["text-embedding-ada-002"]
+            is mock_instances["text-embedding-ada-002"]
+        )

"title_case": false,
"type": "code",
"value": "from typing import Any\n\nfrom lfx.base.embeddings.model import LCEmbeddingsModel\nfrom lfx.base.models.unified_models import (\n get_api_key_for_provider,\n get_embedding_classes,\n get_embedding_model_options,\n update_model_options_in_build_config,\n)\nfrom lfx.base.models.watsonx_constants import IBM_WATSONX_URLS\nfrom lfx.field_typing import Embeddings\nfrom lfx.io import (\n BoolInput,\n DictInput,\n DropdownInput,\n FloatInput,\n IntInput,\n MessageTextInput,\n ModelInput,\n SecretStrInput,\n)\n\n\nclass EmbeddingModelComponent(LCEmbeddingsModel):\n display_name = \"Embedding Model\"\n description = \"Generate embeddings using a specified provider.\"\n documentation: str = \"https://docs.langflow.org/components-embedding-models\"\n icon = \"binary\"\n name = \"EmbeddingModel\"\n category = \"models\"\n\n def update_build_config(self, build_config: dict, field_value: str, field_name: str | None = None):\n \"\"\"Dynamically update build config with user-filtered model options.\"\"\"\n # Update model options\n build_config = update_model_options_in_build_config(\n component=self,\n build_config=build_config,\n cache_key_prefix=\"embedding_model_options\",\n get_options_func=get_embedding_model_options,\n field_name=field_name,\n field_value=field_value,\n )\n\n # Show/hide provider-specific fields based on selected model\n if field_name == \"model\" and isinstance(field_value, list) and len(field_value) > 0:\n selected_model = field_value[0]\n provider = selected_model.get(\"provider\", \"\")\n\n # Show/hide watsonx fields\n is_watsonx = provider == \"IBM WatsonX\"\n build_config[\"base_url_ibm_watsonx\"][\"show\"] = is_watsonx\n build_config[\"project_id\"][\"show\"] = is_watsonx\n build_config[\"truncate_input_tokens\"][\"show\"] = is_watsonx\n build_config[\"input_text\"][\"show\"] = is_watsonx\n if is_watsonx:\n build_config[\"base_url_ibm_watsonx\"][\"required\"] = True\n build_config[\"project_id\"][\"required\"] = True\n\n return build_config\n\n inputs = [\n ModelInput(\n name=\"model\",\n display_name=\"Embedding Model\",\n info=\"Select your model provider\",\n real_time_refresh=True,\n required=True,\n model_type=\"embedding\",\n input_types=[\"Embeddings\"], # Override default to accept Embeddings instead of LanguageModel\n ),\n SecretStrInput(\n name=\"api_key\",\n display_name=\"API Key\",\n info=\"Model Provider API key\",\n real_time_refresh=True,\n advanced=True,\n ),\n MessageTextInput(\n name=\"api_base\",\n display_name=\"API Base URL\",\n info=\"Base URL for the API. Leave empty for default.\",\n advanced=True,\n ),\n # Watson-specific inputs\n DropdownInput(\n name=\"base_url_ibm_watsonx\",\n display_name=\"watsonx API Endpoint\",\n info=\"The base URL of the API (IBM watsonx.ai only)\",\n options=IBM_WATSONX_URLS,\n value=IBM_WATSONX_URLS[0],\n show=False,\n real_time_refresh=True,\n ),\n MessageTextInput(\n name=\"project_id\",\n display_name=\"Project ID\",\n info=\"IBM watsonx.ai Project ID (required for IBM watsonx.ai)\",\n show=False,\n ),\n IntInput(\n name=\"dimensions\",\n display_name=\"Dimensions\",\n info=\"The number of dimensions the resulting output embeddings should have. \"\n \"Only supported by certain models.\",\n advanced=True,\n ),\n IntInput(\n name=\"chunk_size\",\n display_name=\"Chunk Size\",\n advanced=True,\n value=1000,\n ),\n FloatInput(\n name=\"request_timeout\",\n display_name=\"Request Timeout\",\n advanced=True,\n ),\n IntInput(\n name=\"max_retries\",\n display_name=\"Max Retries\",\n advanced=True,\n value=3,\n ),\n BoolInput(\n name=\"show_progress_bar\",\n display_name=\"Show Progress Bar\",\n advanced=True,\n ),\n DictInput(\n name=\"model_kwargs\",\n display_name=\"Model Kwargs\",\n advanced=True,\n info=\"Additional keyword arguments to pass to the model.\",\n ),\n IntInput(\n name=\"truncate_input_tokens\",\n display_name=\"Truncate Input Tokens\",\n advanced=True,\n value=200,\n show=False,\n ),\n BoolInput(\n name=\"input_text\",\n display_name=\"Include the original text in the output\",\n value=True,\n advanced=True,\n show=False,\n ),\n ]\n\n def build_embeddings(self) -> Embeddings:\n \"\"\"Build and return an embeddings instance based on the selected model.\"\"\"\n # If an Embeddings object is directly connected, return it\n try:\n from langchain_core.embeddings import Embeddings as BaseEmbeddings\n\n if isinstance(self.model, BaseEmbeddings):\n return self.model\n except ImportError:\n pass\n\n # Safely extract model configuration\n if not self.model or not isinstance(self.model, list):\n msg = \"Model must be a non-empty list\"\n raise ValueError(msg)\n\n model = self.model[0]\n model_name = model.get(\"name\")\n provider = model.get(\"provider\")\n metadata = model.get(\"metadata\", {})\n\n # Get API key from user input or global variables\n api_key = get_api_key_for_provider(self.user_id, provider, self.api_key)\n\n # Validate required fields (Ollama doesn't require API key)\n if not api_key and provider != \"Ollama\":\n msg = (\n f\"{provider} API key is required. \"\n f\"Please provide it in the component or configure it globally as \"\n f\"{provider.upper().replace(' ', '_')}_API_KEY.\"\n )\n raise ValueError(msg)\n\n if not model_name:\n msg = \"Model name is required\"\n raise ValueError(msg)\n\n # Get embedding class\n embedding_class_name = metadata.get(\"embedding_class\")\n if not embedding_class_name:\n msg = f\"No embedding class defined in metadata for {model_name}\"\n raise ValueError(msg)\n\n embedding_class = get_embedding_classes().get(embedding_class_name)\n if not embedding_class:\n msg = f\"Unknown embedding class: {embedding_class_name}\"\n raise ValueError(msg)\n\n # Build kwargs using parameter mapping\n kwargs = self._build_kwargs(model, metadata)\n\n return embedding_class(**kwargs)\n\n def _build_kwargs(self, model: dict[str, Any], metadata: dict[str, Any]) -> dict[str, Any]:\n \"\"\"Build kwargs dictionary using parameter mapping.\"\"\"\n param_mapping = metadata.get(\"param_mapping\", {})\n if not param_mapping:\n msg = \"Parameter mapping not found in metadata\"\n raise ValueError(msg)\n\n kwargs = {}\n\n # Required parameters - handle both \"model\" and \"model_id\" (for watsonx)\n if \"model\" in param_mapping:\n kwargs[param_mapping[\"model\"]] = model.get(\"name\")\n elif \"model_id\" in param_mapping:\n kwargs[param_mapping[\"model_id\"]] = model.get(\"name\")\n if \"api_key\" in param_mapping:\n kwargs[param_mapping[\"api_key\"]] = get_api_key_for_provider(\n self.user_id,\n model.get(\"provider\"),\n self.api_key,\n )\n\n # Optional parameters with their values\n provider = model.get(\"provider\")\n optional_params = {\n \"api_base\": self.api_base if self.api_base else None,\n \"dimensions\": int(self.dimensions) if self.dimensions else None,\n \"chunk_size\": int(self.chunk_size) if self.chunk_size else None,\n \"request_timeout\": float(self.request_timeout) if self.request_timeout else None,\n \"max_retries\": int(self.max_retries) if self.max_retries else None,\n \"show_progress_bar\": self.show_progress_bar if hasattr(self, \"show_progress_bar\") else None,\n \"model_kwargs\": self.model_kwargs if self.model_kwargs else None,\n }\n\n # Watson-specific parameters\n if provider in {\"IBM WatsonX\", \"IBM watsonx.ai\"}:\n # Map base_url_ibm_watsonx to \"url\" parameter for watsonx\n if \"url\" in param_mapping:\n url_value = (\n self.base_url_ibm_watsonx\n if hasattr(self, \"base_url_ibm_watsonx\") and self.base_url_ibm_watsonx\n else \"https://us-south.ml.cloud.ibm.com\"\n )\n kwargs[param_mapping[\"url\"]] = url_value\n # Map project_id for watsonx\n if hasattr(self, \"project_id\") and self.project_id and \"project_id\" in param_mapping:\n kwargs[param_mapping[\"project_id\"]] = self.project_id\n\n # Ollama-specific parameters\n if provider == \"Ollama\" and \"base_url\" in param_mapping:\n # Map api_base to \"base_url\" parameter for Ollama\n base_url_value = self.api_base if hasattr(self, \"api_base\") and self.api_base else \"http://localhost:11434\"\n kwargs[param_mapping[\"base_url\"]] = base_url_value\n\n # Add optional parameters if they have values and are mapped\n for param_name, param_value in optional_params.items():\n if param_value is not None and param_name in param_mapping:\n # Special handling for request_timeout with Google provider\n if param_name == \"request_timeout\":\n if provider == \"Google\" and isinstance(param_value, (int, float)):\n kwargs[param_mapping[param_name]] = {\"timeout\": param_value}\n else:\n kwargs[param_mapping[param_name]] = param_value\n else:\n kwargs[param_mapping[param_name]] = param_value\n\n return kwargs\n"
"value": "import logging\nfrom typing import Any\n\nfrom lfx.base.embeddings.embeddings_class import EmbeddingsWithModels\nfrom lfx.base.embeddings.model import LCEmbeddingsModel\nfrom lfx.base.models.unified_models import (\n get_api_key_for_provider,\n get_embedding_classes,\n get_embedding_model_options,\n get_unified_models_detailed,\n update_model_options_in_build_config,\n)\n\nlogger = logging.getLogger(__name__)\nfrom lfx.base.models.watsonx_constants import IBM_WATSONX_URLS\nfrom lfx.field_typing import Embeddings\nfrom lfx.io import (\n BoolInput,\n DictInput,\n DropdownInput,\n FloatInput,\n IntInput,\n MessageTextInput,\n ModelInput,\n SecretStrInput,\n)\n\n\nclass EmbeddingModelComponent(LCEmbeddingsModel):\n display_name = \"Embedding Model\"\n description = \"Generate embeddings using a specified provider.\"\n documentation: str = \"https://docs.langflow.org/components-embedding-models\"\n icon = \"binary\"\n name = \"EmbeddingModel\"\n category = \"models\"\n\n def update_build_config(self, build_config: dict, field_value: str, field_name: str | None = None):\n \"\"\"Dynamically update build config with user-filtered model options.\"\"\"\n # Update model options\n build_config = update_model_options_in_build_config(\n component=self,\n build_config=build_config,\n cache_key_prefix=\"embedding_model_options\",\n get_options_func=get_embedding_model_options,\n field_name=field_name,\n field_value=field_value,\n )\n\n # Show/hide provider-specific fields based on selected model\n if field_name == \"model\" and isinstance(field_value, list) and len(field_value) > 0:\n selected_model = field_value[0]\n provider = selected_model.get(\"provider\", \"\")\n\n # Show/hide watsonx fields\n is_watsonx = provider == \"IBM WatsonX\"\n build_config[\"base_url_ibm_watsonx\"][\"show\"] = is_watsonx\n build_config[\"project_id\"][\"show\"] = is_watsonx\n build_config[\"truncate_input_tokens\"][\"show\"] = is_watsonx\n build_config[\"input_text\"][\"show\"] = is_watsonx\n if is_watsonx:\n build_config[\"base_url_ibm_watsonx\"][\"required\"] = True\n build_config[\"project_id\"][\"required\"] = True\n\n return build_config\n\n inputs = [\n ModelInput(\n name=\"model\",\n display_name=\"Embedding Model\",\n info=\"Select your model provider\",\n real_time_refresh=True,\n required=True,\n model_type=\"embedding\",\n input_types=[\"Embeddings\"], # Override default to accept Embeddings instead of LanguageModel\n ),\n SecretStrInput(\n name=\"api_key\",\n display_name=\"API Key\",\n info=\"Model Provider API key\",\n real_time_refresh=True,\n advanced=True,\n ),\n MessageTextInput(\n name=\"api_base\",\n display_name=\"API Base URL\",\n info=\"Base URL for the API. Leave empty for default.\",\n advanced=True,\n ),\n # Watson-specific inputs\n DropdownInput(\n name=\"base_url_ibm_watsonx\",\n display_name=\"watsonx API Endpoint\",\n info=\"The base URL of the API (IBM watsonx.ai only)\",\n options=IBM_WATSONX_URLS,\n value=IBM_WATSONX_URLS[0],\n show=False,\n real_time_refresh=True,\n ),\n MessageTextInput(\n name=\"project_id\",\n display_name=\"Project ID\",\n info=\"IBM watsonx.ai Project ID (required for IBM watsonx.ai)\",\n show=False,\n ),\n IntInput(\n name=\"dimensions\",\n display_name=\"Dimensions\",\n info=\"The number of dimensions the resulting output embeddings should have. \"\n \"Only supported by certain models.\",\n advanced=True,\n ),\n IntInput(\n name=\"chunk_size\",\n display_name=\"Chunk Size\",\n advanced=True,\n value=1000,\n ),\n FloatInput(\n name=\"request_timeout\",\n display_name=\"Request Timeout\",\n advanced=True,\n ),\n IntInput(\n name=\"max_retries\",\n display_name=\"Max Retries\",\n advanced=True,\n value=3,\n ),\n BoolInput(\n name=\"show_progress_bar\",\n display_name=\"Show Progress Bar\",\n advanced=True,\n ),\n DictInput(\n name=\"model_kwargs\",\n display_name=\"Model Kwargs\",\n advanced=True,\n info=\"Additional keyword arguments to pass to the model.\",\n ),\n IntInput(\n name=\"truncate_input_tokens\",\n display_name=\"Truncate Input Tokens\",\n advanced=True,\n value=200,\n show=False,\n ),\n BoolInput(\n name=\"input_text\",\n display_name=\"Include the original text in the output\",\n value=True,\n advanced=True,\n show=False,\n ),\n ]\n\n def build_embeddings(self) -> Embeddings:\n \"\"\"Build and return an embeddings instance based on the selected model.\n\n Returns an EmbeddingsWithModels wrapper that contains:\n - The primary embedding instance (for the selected model)\n - available_models dict mapping all available model names to their instances\n \"\"\"\n # If an Embeddings object is directly connected, return it\n try:\n from langchain_core.embeddings import Embeddings as BaseEmbeddings\n\n if isinstance(self.model, BaseEmbeddings):\n return self.model\n except ImportError:\n pass\n\n # Safely extract model configuration\n if not self.model or not isinstance(self.model, list):\n msg = \"Model must be a non-empty list\"\n raise ValueError(msg)\n\n model = self.model[0]\n model_name = model.get(\"name\")\n provider = model.get(\"provider\")\n metadata = model.get(\"metadata\", {})\n\n # Get API key from user input or global variables\n api_key = get_api_key_for_provider(self.user_id, provider, self.api_key)\n\n # Validate required fields (Ollama doesn't require API key)\n if not api_key and provider != \"Ollama\":\n msg = (\n f\"{provider} API key is required. \"\n f\"Please provide it in the component or configure it globally as \"\n f\"{provider.upper().replace(' ', '_')}_API_KEY.\"\n )\n raise ValueError(msg)\n\n if not model_name:\n msg = \"Model name is required\"\n raise ValueError(msg)\n\n # Get embedding class\n embedding_class_name = metadata.get(\"embedding_class\")\n if not embedding_class_name:\n msg = f\"No embedding class defined in metadata for {model_name}\"\n raise ValueError(msg)\n\n embedding_class = get_embedding_classes().get(embedding_class_name)\n if not embedding_class:\n msg = f\"Unknown embedding class: {embedding_class_name}\"\n raise ValueError(msg)\n\n # Build kwargs using parameter mapping for primary instance\n kwargs = self._build_kwargs(model, metadata)\n primary_instance = embedding_class(**kwargs)\n\n # Get all available embedding models for this provider\n available_models_dict = self._build_available_models(\n provider=provider,\n embedding_class=embedding_class,\n metadata=metadata,\n api_key=api_key,\n )\n\n # Wrap with EmbeddingsWithModels to provide available_models metadata\n return EmbeddingsWithModels(\n embeddings=primary_instance,\n available_models=available_models_dict,\n )\n\n def _build_available_models(\n self,\n provider: str,\n embedding_class: type,\n metadata: dict[str, Any],\n api_key: str | None,\n ) -> dict[str, Embeddings]:\n \"\"\"Build a dictionary of all available embedding model instances for the provider.\n\n Args:\n provider: The provider name (e.g., \"OpenAI\", \"Ollama\")\n embedding_class: The embedding class to instantiate\n metadata: Metadata containing param_mapping\n api_key: The API key for the provider\n\n Returns:\n Dict mapping model names to their embedding instances\n \"\"\"\n available_models_dict: dict[str, Embeddings] = {}\n\n # Get all embedding models for this provider from unified models\n all_embedding_models = get_unified_models_detailed(\n providers=[provider],\n model_type=\"embeddings\",\n include_deprecated=False,\n include_unsupported=False,\n )\n\n if not all_embedding_models:\n return available_models_dict\n\n # Extract models from the provider data\n for provider_data in all_embedding_models:\n if provider_data.get(\"provider\") != provider:\n continue\n\n for model_data in provider_data.get(\"models\", []):\n model_name = model_data.get(\"model_name\")\n if not model_name:\n continue\n\n # Create a model dict compatible with _build_kwargs\n model_dict = {\n \"name\": model_name,\n \"provider\": provider,\n \"metadata\": metadata, # Reuse the same metadata/param_mapping\n }\n\n try:\n # Build kwargs for this model\n model_kwargs = self._build_kwargs_for_model(model_dict, metadata, api_key)\n # Create the embedding instance\n available_models_dict[model_name] = embedding_class(**model_kwargs)\n except Exception: # noqa: BLE001\n # Skip models that fail to instantiate\n # This handles cases where specific models have incompatible parameters\n logger.debug(\"Failed to instantiate embedding model %s: skipping\", model_name, exc_info=True)\n continue\n\n return available_models_dict\n\n def _build_kwargs_for_model(\n self,\n model: dict[str, Any],\n metadata: dict[str, Any],\n api_key: str | None,\n ) -> dict[str, Any]:\n \"\"\"Build kwargs dictionary for a specific model using parameter mapping.\n\n This is similar to _build_kwargs but uses the provided api_key directly\n instead of looking it up again.\n\n Args:\n model: Model dict with name and provider\n metadata: Metadata containing param_mapping\n api_key: The API key to use\n\n Returns:\n kwargs dict for embedding class instantiation\n \"\"\"\n param_mapping = metadata.get(\"param_mapping\", {})\n if not param_mapping:\n msg = \"Parameter mapping not found in metadata\"\n raise ValueError(msg)\n\n kwargs = {}\n provider = model.get(\"provider\")\n\n # Required parameters - handle both \"model\" and \"model_id\" (for watsonx)\n if \"model\" in param_mapping:\n kwargs[param_mapping[\"model\"]] = model.get(\"name\")\n elif \"model_id\" in param_mapping:\n kwargs[param_mapping[\"model_id\"]] = model.get(\"name\")\n\n # Add API key if mapped\n if \"api_key\" in param_mapping and api_key:\n kwargs[param_mapping[\"api_key\"]] = api_key\n\n # Optional parameters with their values\n optional_params = {\n \"api_base\": self.api_base if self.api_base else None,\n \"dimensions\": int(self.dimensions) if self.dimensions else None,\n \"chunk_size\": int(self.chunk_size) if self.chunk_size else None,\n \"request_timeout\": float(self.request_timeout) if self.request_timeout else None,\n \"max_retries\": int(self.max_retries) if self.max_retries else None,\n \"show_progress_bar\": self.show_progress_bar if hasattr(self, \"show_progress_bar\") else None,\n \"model_kwargs\": self.model_kwargs if self.model_kwargs else None,\n }\n\n # Watson-specific parameters\n if provider in {\"IBM WatsonX\", \"IBM watsonx.ai\"}:\n # Map base_url_ibm_watsonx to \"url\" parameter for watsonx\n if \"url\" in param_mapping:\n url_value = (\n self.base_url_ibm_watsonx\n if hasattr(self, \"base_url_ibm_watsonx\") and self.base_url_ibm_watsonx\n else \"https://us-south.ml.cloud.ibm.com\"\n )\n kwargs[param_mapping[\"url\"]] = url_value\n # Map project_id for watsonx\n if hasattr(self, \"project_id\") and self.project_id and \"project_id\" in param_mapping:\n kwargs[param_mapping[\"project_id\"]] = self.project_id\n\n # Ollama-specific parameters\n if provider == \"Ollama\" and \"base_url\" in param_mapping:\n # Map api_base to \"base_url\" parameter for Ollama\n base_url_value = self.api_base if hasattr(self, \"api_base\") and self.api_base else \"http://localhost:11434\"\n kwargs[param_mapping[\"base_url\"]] = base_url_value\n\n # Add optional parameters if they have values and are mapped\n for param_name, param_value in optional_params.items():\n if param_value is not None and param_name in param_mapping:\n # Special handling for request_timeout with Google provider\n if param_name == \"request_timeout\":\n if provider == \"Google Generative AI\" and isinstance(param_value, (int, float)):\n kwargs[param_mapping[param_name]] = {\"timeout\": param_value}\n else:\n kwargs[param_mapping[param_name]] = param_value\n else:\n kwargs[param_mapping[param_name]] = param_value\n\n return kwargs\n\n def _build_kwargs(self, model: dict[str, Any], metadata: dict[str, Any]) -> dict[str, Any]:\n \"\"\"Build kwargs dictionary using parameter mapping.\"\"\"\n param_mapping = metadata.get(\"param_mapping\", {})\n if not param_mapping:\n msg = \"Parameter mapping not found in metadata\"\n raise ValueError(msg)\n\n kwargs = {}\n\n # Required parameters - handle both \"model\" and \"model_id\" (for watsonx)\n if \"model\" in param_mapping:\n kwargs[param_mapping[\"model\"]] = model.get(\"name\")\n elif \"model_id\" in param_mapping:\n kwargs[param_mapping[\"model_id\"]] = model.get(\"name\")\n if \"api_key\" in param_mapping:\n kwargs[param_mapping[\"api_key\"]] = get_api_key_for_provider(\n self.user_id,\n model.get(\"provider\"),\n self.api_key,\n )\n\n # Optional parameters with their values\n provider = model.get(\"provider\")\n optional_params = {\n \"api_base\": self.api_base if self.api_base else None,\n \"dimensions\": int(self.dimensions) if self.dimensions else None,\n \"chunk_size\": int(self.chunk_size) if self.chunk_size else None,\n \"request_timeout\": float(self.request_timeout) if self.request_timeout else None,\n \"max_retries\": int(self.max_retries) if self.max_retries else None,\n \"show_progress_bar\": self.show_progress_bar if hasattr(self, \"show_progress_bar\") else None,\n \"model_kwargs\": self.model_kwargs if self.model_kwargs else None,\n }\n\n # Watson-specific parameters\n if provider in {\"IBM WatsonX\", \"IBM watsonx.ai\"}:\n # Map base_url_ibm_watsonx to \"url\" parameter for watsonx\n if \"url\" in param_mapping:\n url_value = (\n self.base_url_ibm_watsonx\n if hasattr(self, \"base_url_ibm_watsonx\") and self.base_url_ibm_watsonx\n else \"https://us-south.ml.cloud.ibm.com\"\n )\n kwargs[param_mapping[\"url\"]] = url_value\n # Map project_id for watsonx\n if hasattr(self, \"project_id\") and self.project_id and \"project_id\" in param_mapping:\n kwargs[param_mapping[\"project_id\"]] = self.project_id\n\n # Ollama-specific parameters\n if provider == \"Ollama\" and \"base_url\" in param_mapping:\n # Map api_base to \"base_url\" parameter for Ollama\n base_url_value = self.api_base if hasattr(self, \"api_base\") and self.api_base else \"http://localhost:11434\"\n kwargs[param_mapping[\"base_url\"]] = base_url_value\n\n # Add optional parameters if they have values and are mapped\n for param_name, param_value in optional_params.items():\n if param_value is not None and param_name in param_mapping:\n # Special handling for request_timeout with Google provider\n if param_name == \"request_timeout\":\n if provider == \"Google Generative AI\" and isinstance(param_value, (int, float)):\n kwargs[param_mapping[param_name]] = {\"timeout\": param_value}\n else:\n kwargs[param_mapping[param_name]] = param_value\n else:\n kwargs[param_mapping[param_name]] = param_value\n\n return kwargs\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's search for watsonx-related code and parameter mappings
rg "truncate_input_tokens|watsonx" --type json --type py -i -A 3 -B 3 | head -100

Repository: langflow-ai/langflow

Length of output: 9659


🏁 Script executed:

# Search for embedding class definitions and parameter mappings
rg "param_mapping" --type json --type py -l | head -20

Repository: langflow-ai/langflow

Length of output: 379


🏁 Script executed:

# Look for IBM WatsonX embedding class references
rg "IBMWatsonX|WatsonxEmbeddings" --type py -A 5 -B 2 | head -80

Repository: langflow-ai/langflow

Length of output: 6375


🏁 Script executed:

# Let's look at the param_mappings for IBM WatsonX in unified_models.py
rg "IBM WatsonX|param_mappings" src/lfx/src/lfx/base/models/unified_models.py -A 30 | head -150

Repository: langflow-ai/langflow

Length of output: 4934


🏁 Script executed:

# Also let's check the actual param_mapping for IBM WatsonX
rg "IBM WatsonX" src/lfx/src/lfx/base/models/unified_models.py -A 50 -B 5 | head -200

Repository: langflow-ai/langflow

Length of output: 7052


🏁 Script executed:

# Let's look at the entire WatsonxEmbeddings component to see how it uses parameters
cat src/lfx/src/lfx/components/ibm/watsonx_embeddings.py

Repository: langflow-ai/langflow

Length of output: 4887


🏁 Script executed:

# Search for WATSONX embedding models configuration with param_mapping
rg "WATSONX.*EMBEDDING|param_mapping.*IBM|truncate_input" src/lfx/src/lfx/base/models/unified_models.py -B 5 -A 15 | head -200

Repository: langflow-ai/langflow

Length of output: 46


🏁 Script executed:

# Look for the detailed embedding models definitions
fd "unified_models.py" -exec grep -n "WATSONX_EMBEDDING" {} +

Repository: langflow-ai/langflow

Length of output: 294


🏁 Script executed:

# Search for all embedding model detailed definitions in unified_models.py
rg "EMBEDDING_MODELS_DETAILED|param_mapping" src/lfx/src/lfx/base/models/unified_models.py -B 3 -A 20 | head -300

Repository: langflow-ai/langflow

Length of output: 7821


🏁 Script executed:

# Get the full param_mappings dictionary, specifically looking for IBM WatsonX
rg "IBM WatsonX.*{|\"IBM WatsonX\":" src/lfx/src/lfx/base/models/unified_models.py -A 20 | grep -A 20 "param_mappings"

Repository: langflow-ai/langflow

Length of output: 189


🏁 Script executed:

# Let's view the watsonx_constants file directly to see the embedding models config
cat src/lfx/src/lfx/base/models/watsonx_constants.py

Repository: langflow-ai/langflow

Length of output: 2225


🏁 Script executed:

# Search for param_mappings dictionary definition, especially IBM WatsonX section
rg -A 150 "param_mappings = \{" src/lfx/src/lfx/base/models/unified_models.py | head -200

Repository: langflow-ai/langflow

Length of output: 5559


🏁 Script executed:

# Let's check if there's a specific IBM WatsonX embedding param_mapping
rg "IBM WatsonX" src/lfx/src/lfx/base/models/unified_models.py -B 5 -A 15 | grep -A 10 "param_mappings"

Repository: langflow-ai/langflow

Length of output: 468


🏁 Script executed:

# Let's verify the exact flow - check how optional_params are actually used in the kwargs building
rg -A 20 "for param_name, param_value in optional_params" src/lfx/src/lfx/components/models_and_agents/embedding_model.py

Repository: langflow-ai/langflow

Length of output: 1751


🏁 Script executed:

# Let's also search for how WatsonxEmbeddings expects these parameters - check langchain_ibm documentation or usage
web_search WatsonxEmbeddings truncate_input_tokens input_text params

Repository: langflow-ai/langflow

Length of output: 114


🌐 Web query:

WatsonxEmbeddings langchain_ibm truncate_input_tokens input_text parameters

💡 Result:

Short answer: In the langchain_ibm WatsonxEmbeddings integration you pass IBM embedding model parameters via the IBM SDK meta-names. TRUNCATE_INPUT_TOKENS (truncate_input_tokens) tells the service how many tokens to keep from the input (useful to trim long texts before embedding). RETURN_OPTIONS with {"input_text": True} returns the original input text in the API response. Example (Python):

from ibm_watsonx_ai.metanames import EmbedTextParamsMetaNames
embed_params = {
EmbedTextParamsMetaNames.TRUNCATE_INPUT_TOKENS: 3, # or set truncate_input_tokens=512 in other wrappers
EmbedTextParamsMetaNames.RETURN_OPTIONS: {"input_text": True},
}

then pass embed_params when constructing/using WatsonxEmbeddings.

References: LangChain docs showing TRUNCATE_INPUT_TOKENS and RETURN_OPTIONS usage and LangChain/IBM wrappers that expose truncate_input_tokens parameter. [1][2]

Sources
[1] LangChain WatsonxEmbeddings docs (example using EmbedTextParamsMetaNames.TRUNCATE_INPUT_TOKENS and RETURN_OPTIONS).
[2] LangChain / langchain-ibm / langchain-community reference pages (truncateInputTokens / truncate_input_tokens parameter in JS/Python wrappers).


🏁 Script executed:

# Let's check the test file to see how WatsonX embedding params are expected
cat src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py | head -100

Repository: langflow-ai/langflow

Length of output: 4137


WatsonX-specific input parameters are not wired to the embeddings instance.

truncate_input_tokens and input_text are conditionally shown for IBM WatsonX, but they're never passed to WatsonxEmbeddings. The param_mapping for IBM WatsonX omits these parameters, so they're filtered out by the if param_value is not None and param_name in param_mapping check. Additionally, WatsonX requires special handling: these parameters must be passed via a params dict using IBM SDK meta-names (EmbedTextParamsMetaNames.TRUNCATE_INPUT_TOKENS and EmbedTextParamsMetaNames.RETURN_OPTIONS), not as direct kwargs like other providers. Update the param_mapping and add WatsonX-specific logic to construct the params dict correctly in both _build_kwargs and _build_kwargs_for_model.

🤖 Prompt for AI Agents
In `@src/backend/base/langflow/initial_setup/starter_projects/Nvidia` Remix.json
at line 1821, The WatsonX-specific inputs truncate_input_tokens and input_text
are not being forwarded to the Watsonx embeddings because they aren’t present in
param_mapping and must be passed inside a special params dict using IBM SDK
meta-names; update the IBM watsonx param_mapping to include a mapping for a
params/metadata key (e.g., "params") and then in both _build_kwargs and
_build_kwargs_for_model (and where provider in {"IBM WatsonX","IBM watsonx.ai"}
is checked) construct a params_dict that sets
EmbedTextParamsMetaNames.TRUNCATE_INPUT_TOKENS from self.truncate_input_tokens
and EmbedTextParamsMetaNames.RETURN_OPTIONS based on self.input_text, then
attach that params_dict to kwargs using the param_mapping entry (or fallback to
kwargs["params"]) so WatsonxEmbeddings receives the proper params payload.
Ensure you reference EmbedTextParamsMetaNames, truncate_input_tokens,
input_text, _build_kwargs, and _build_kwargs_for_model in the change.

"title_case": false,
"type": "code",
"value": "from typing import Any\n\nfrom lfx.base.embeddings.model import LCEmbeddingsModel\nfrom lfx.base.models.unified_models import (\n get_api_key_for_provider,\n get_embedding_classes,\n get_embedding_model_options,\n update_model_options_in_build_config,\n)\nfrom lfx.base.models.watsonx_constants import IBM_WATSONX_URLS\nfrom lfx.field_typing import Embeddings\nfrom lfx.io import (\n BoolInput,\n DictInput,\n DropdownInput,\n FloatInput,\n IntInput,\n MessageTextInput,\n ModelInput,\n SecretStrInput,\n)\n\n\nclass EmbeddingModelComponent(LCEmbeddingsModel):\n display_name = \"Embedding Model\"\n description = \"Generate embeddings using a specified provider.\"\n documentation: str = \"https://docs.langflow.org/components-embedding-models\"\n icon = \"binary\"\n name = \"EmbeddingModel\"\n category = \"models\"\n\n def update_build_config(self, build_config: dict, field_value: str, field_name: str | None = None):\n \"\"\"Dynamically update build config with user-filtered model options.\"\"\"\n # Update model options\n build_config = update_model_options_in_build_config(\n component=self,\n build_config=build_config,\n cache_key_prefix=\"embedding_model_options\",\n get_options_func=get_embedding_model_options,\n field_name=field_name,\n field_value=field_value,\n )\n\n # Show/hide provider-specific fields based on selected model\n if field_name == \"model\" and isinstance(field_value, list) and len(field_value) > 0:\n selected_model = field_value[0]\n provider = selected_model.get(\"provider\", \"\")\n\n # Show/hide watsonx fields\n is_watsonx = provider == \"IBM WatsonX\"\n build_config[\"base_url_ibm_watsonx\"][\"show\"] = is_watsonx\n build_config[\"project_id\"][\"show\"] = is_watsonx\n build_config[\"truncate_input_tokens\"][\"show\"] = is_watsonx\n build_config[\"input_text\"][\"show\"] = is_watsonx\n if is_watsonx:\n build_config[\"base_url_ibm_watsonx\"][\"required\"] = True\n build_config[\"project_id\"][\"required\"] = True\n\n return build_config\n\n inputs = [\n ModelInput(\n name=\"model\",\n display_name=\"Embedding Model\",\n info=\"Select your model provider\",\n real_time_refresh=True,\n required=True,\n model_type=\"embedding\",\n input_types=[\"Embeddings\"], # Override default to accept Embeddings instead of LanguageModel\n ),\n SecretStrInput(\n name=\"api_key\",\n display_name=\"API Key\",\n info=\"Model Provider API key\",\n real_time_refresh=True,\n advanced=True,\n ),\n MessageTextInput(\n name=\"api_base\",\n display_name=\"API Base URL\",\n info=\"Base URL for the API. Leave empty for default.\",\n advanced=True,\n ),\n # Watson-specific inputs\n DropdownInput(\n name=\"base_url_ibm_watsonx\",\n display_name=\"watsonx API Endpoint\",\n info=\"The base URL of the API (IBM watsonx.ai only)\",\n options=IBM_WATSONX_URLS,\n value=IBM_WATSONX_URLS[0],\n show=False,\n real_time_refresh=True,\n ),\n MessageTextInput(\n name=\"project_id\",\n display_name=\"Project ID\",\n info=\"IBM watsonx.ai Project ID (required for IBM watsonx.ai)\",\n show=False,\n ),\n IntInput(\n name=\"dimensions\",\n display_name=\"Dimensions\",\n info=\"The number of dimensions the resulting output embeddings should have. \"\n \"Only supported by certain models.\",\n advanced=True,\n ),\n IntInput(\n name=\"chunk_size\",\n display_name=\"Chunk Size\",\n advanced=True,\n value=1000,\n ),\n FloatInput(\n name=\"request_timeout\",\n display_name=\"Request Timeout\",\n advanced=True,\n ),\n IntInput(\n name=\"max_retries\",\n display_name=\"Max Retries\",\n advanced=True,\n value=3,\n ),\n BoolInput(\n name=\"show_progress_bar\",\n display_name=\"Show Progress Bar\",\n advanced=True,\n ),\n DictInput(\n name=\"model_kwargs\",\n display_name=\"Model Kwargs\",\n advanced=True,\n info=\"Additional keyword arguments to pass to the model.\",\n ),\n IntInput(\n name=\"truncate_input_tokens\",\n display_name=\"Truncate Input Tokens\",\n advanced=True,\n value=200,\n show=False,\n ),\n BoolInput(\n name=\"input_text\",\n display_name=\"Include the original text in the output\",\n value=True,\n advanced=True,\n show=False,\n ),\n ]\n\n def build_embeddings(self) -> Embeddings:\n \"\"\"Build and return an embeddings instance based on the selected model.\"\"\"\n # If an Embeddings object is directly connected, return it\n try:\n from langchain_core.embeddings import Embeddings as BaseEmbeddings\n\n if isinstance(self.model, BaseEmbeddings):\n return self.model\n except ImportError:\n pass\n\n # Safely extract model configuration\n if not self.model or not isinstance(self.model, list):\n msg = \"Model must be a non-empty list\"\n raise ValueError(msg)\n\n model = self.model[0]\n model_name = model.get(\"name\")\n provider = model.get(\"provider\")\n metadata = model.get(\"metadata\", {})\n\n # Get API key from user input or global variables\n api_key = get_api_key_for_provider(self.user_id, provider, self.api_key)\n\n # Validate required fields (Ollama doesn't require API key)\n if not api_key and provider != \"Ollama\":\n msg = (\n f\"{provider} API key is required. \"\n f\"Please provide it in the component or configure it globally as \"\n f\"{provider.upper().replace(' ', '_')}_API_KEY.\"\n )\n raise ValueError(msg)\n\n if not model_name:\n msg = \"Model name is required\"\n raise ValueError(msg)\n\n # Get embedding class\n embedding_class_name = metadata.get(\"embedding_class\")\n if not embedding_class_name:\n msg = f\"No embedding class defined in metadata for {model_name}\"\n raise ValueError(msg)\n\n embedding_class = get_embedding_classes().get(embedding_class_name)\n if not embedding_class:\n msg = f\"Unknown embedding class: {embedding_class_name}\"\n raise ValueError(msg)\n\n # Build kwargs using parameter mapping\n kwargs = self._build_kwargs(model, metadata)\n\n return embedding_class(**kwargs)\n\n def _build_kwargs(self, model: dict[str, Any], metadata: dict[str, Any]) -> dict[str, Any]:\n \"\"\"Build kwargs dictionary using parameter mapping.\"\"\"\n param_mapping = metadata.get(\"param_mapping\", {})\n if not param_mapping:\n msg = \"Parameter mapping not found in metadata\"\n raise ValueError(msg)\n\n kwargs = {}\n\n # Required parameters - handle both \"model\" and \"model_id\" (for watsonx)\n if \"model\" in param_mapping:\n kwargs[param_mapping[\"model\"]] = model.get(\"name\")\n elif \"model_id\" in param_mapping:\n kwargs[param_mapping[\"model_id\"]] = model.get(\"name\")\n if \"api_key\" in param_mapping:\n kwargs[param_mapping[\"api_key\"]] = get_api_key_for_provider(\n self.user_id,\n model.get(\"provider\"),\n self.api_key,\n )\n\n # Optional parameters with their values\n provider = model.get(\"provider\")\n optional_params = {\n \"api_base\": self.api_base if self.api_base else None,\n \"dimensions\": int(self.dimensions) if self.dimensions else None,\n \"chunk_size\": int(self.chunk_size) if self.chunk_size else None,\n \"request_timeout\": float(self.request_timeout) if self.request_timeout else None,\n \"max_retries\": int(self.max_retries) if self.max_retries else None,\n \"show_progress_bar\": self.show_progress_bar if hasattr(self, \"show_progress_bar\") else None,\n \"model_kwargs\": self.model_kwargs if self.model_kwargs else None,\n }\n\n # Watson-specific parameters\n if provider in {\"IBM WatsonX\", \"IBM watsonx.ai\"}:\n # Map base_url_ibm_watsonx to \"url\" parameter for watsonx\n if \"url\" in param_mapping:\n url_value = (\n self.base_url_ibm_watsonx\n if hasattr(self, \"base_url_ibm_watsonx\") and self.base_url_ibm_watsonx\n else \"https://us-south.ml.cloud.ibm.com\"\n )\n kwargs[param_mapping[\"url\"]] = url_value\n # Map project_id for watsonx\n if hasattr(self, \"project_id\") and self.project_id and \"project_id\" in param_mapping:\n kwargs[param_mapping[\"project_id\"]] = self.project_id\n\n # Ollama-specific parameters\n if provider == \"Ollama\" and \"base_url\" in param_mapping:\n # Map api_base to \"base_url\" parameter for Ollama\n base_url_value = self.api_base if hasattr(self, \"api_base\") and self.api_base else \"http://localhost:11434\"\n kwargs[param_mapping[\"base_url\"]] = base_url_value\n\n # Add optional parameters if they have values and are mapped\n for param_name, param_value in optional_params.items():\n if param_value is not None and param_name in param_mapping:\n # Special handling for request_timeout with Google provider\n if param_name == \"request_timeout\":\n if provider == \"Google\" and isinstance(param_value, (int, float)):\n kwargs[param_mapping[param_name]] = {\"timeout\": param_value}\n else:\n kwargs[param_mapping[param_name]] = param_value\n else:\n kwargs[param_mapping[param_name]] = param_value\n\n return kwargs\n"
"value": "import logging\nfrom typing import Any\n\nfrom lfx.base.embeddings.embeddings_class import EmbeddingsWithModels\nfrom lfx.base.embeddings.model import LCEmbeddingsModel\nfrom lfx.base.models.unified_models import (\n get_api_key_for_provider,\n get_embedding_classes,\n get_embedding_model_options,\n get_unified_models_detailed,\n update_model_options_in_build_config,\n)\n\nlogger = logging.getLogger(__name__)\nfrom lfx.base.models.watsonx_constants import IBM_WATSONX_URLS\nfrom lfx.field_typing import Embeddings\nfrom lfx.io import (\n BoolInput,\n DictInput,\n DropdownInput,\n FloatInput,\n IntInput,\n MessageTextInput,\n ModelInput,\n SecretStrInput,\n)\n\n\nclass EmbeddingModelComponent(LCEmbeddingsModel):\n display_name = \"Embedding Model\"\n description = \"Generate embeddings using a specified provider.\"\n documentation: str = \"https://docs.langflow.org/components-embedding-models\"\n icon = \"binary\"\n name = \"EmbeddingModel\"\n category = \"models\"\n\n def update_build_config(self, build_config: dict, field_value: str, field_name: str | None = None):\n \"\"\"Dynamically update build config with user-filtered model options.\"\"\"\n # Update model options\n build_config = update_model_options_in_build_config(\n component=self,\n build_config=build_config,\n cache_key_prefix=\"embedding_model_options\",\n get_options_func=get_embedding_model_options,\n field_name=field_name,\n field_value=field_value,\n )\n\n # Show/hide provider-specific fields based on selected model\n if field_name == \"model\" and isinstance(field_value, list) and len(field_value) > 0:\n selected_model = field_value[0]\n provider = selected_model.get(\"provider\", \"\")\n\n # Show/hide watsonx fields\n is_watsonx = provider == \"IBM WatsonX\"\n build_config[\"base_url_ibm_watsonx\"][\"show\"] = is_watsonx\n build_config[\"project_id\"][\"show\"] = is_watsonx\n build_config[\"truncate_input_tokens\"][\"show\"] = is_watsonx\n build_config[\"input_text\"][\"show\"] = is_watsonx\n if is_watsonx:\n build_config[\"base_url_ibm_watsonx\"][\"required\"] = True\n build_config[\"project_id\"][\"required\"] = True\n\n return build_config\n\n inputs = [\n ModelInput(\n name=\"model\",\n display_name=\"Embedding Model\",\n info=\"Select your model provider\",\n real_time_refresh=True,\n required=True,\n model_type=\"embedding\",\n input_types=[\"Embeddings\"], # Override default to accept Embeddings instead of LanguageModel\n ),\n SecretStrInput(\n name=\"api_key\",\n display_name=\"API Key\",\n info=\"Model Provider API key\",\n real_time_refresh=True,\n advanced=True,\n ),\n MessageTextInput(\n name=\"api_base\",\n display_name=\"API Base URL\",\n info=\"Base URL for the API. Leave empty for default.\",\n advanced=True,\n ),\n # Watson-specific inputs\n DropdownInput(\n name=\"base_url_ibm_watsonx\",\n display_name=\"watsonx API Endpoint\",\n info=\"The base URL of the API (IBM watsonx.ai only)\",\n options=IBM_WATSONX_URLS,\n value=IBM_WATSONX_URLS[0],\n show=False,\n real_time_refresh=True,\n ),\n MessageTextInput(\n name=\"project_id\",\n display_name=\"Project ID\",\n info=\"IBM watsonx.ai Project ID (required for IBM watsonx.ai)\",\n show=False,\n ),\n IntInput(\n name=\"dimensions\",\n display_name=\"Dimensions\",\n info=\"The number of dimensions the resulting output embeddings should have. \"\n \"Only supported by certain models.\",\n advanced=True,\n ),\n IntInput(\n name=\"chunk_size\",\n display_name=\"Chunk Size\",\n advanced=True,\n value=1000,\n ),\n FloatInput(\n name=\"request_timeout\",\n display_name=\"Request Timeout\",\n advanced=True,\n ),\n IntInput(\n name=\"max_retries\",\n display_name=\"Max Retries\",\n advanced=True,\n value=3,\n ),\n BoolInput(\n name=\"show_progress_bar\",\n display_name=\"Show Progress Bar\",\n advanced=True,\n ),\n DictInput(\n name=\"model_kwargs\",\n display_name=\"Model Kwargs\",\n advanced=True,\n info=\"Additional keyword arguments to pass to the model.\",\n ),\n IntInput(\n name=\"truncate_input_tokens\",\n display_name=\"Truncate Input Tokens\",\n advanced=True,\n value=200,\n show=False,\n ),\n BoolInput(\n name=\"input_text\",\n display_name=\"Include the original text in the output\",\n value=True,\n advanced=True,\n show=False,\n ),\n ]\n\n def build_embeddings(self) -> Embeddings:\n \"\"\"Build and return an embeddings instance based on the selected model.\n\n Returns an EmbeddingsWithModels wrapper that contains:\n - The primary embedding instance (for the selected model)\n - available_models dict mapping all available model names to their instances\n \"\"\"\n # If an Embeddings object is directly connected, return it\n try:\n from langchain_core.embeddings import Embeddings as BaseEmbeddings\n\n if isinstance(self.model, BaseEmbeddings):\n return self.model\n except ImportError:\n pass\n\n # Safely extract model configuration\n if not self.model or not isinstance(self.model, list):\n msg = \"Model must be a non-empty list\"\n raise ValueError(msg)\n\n model = self.model[0]\n model_name = model.get(\"name\")\n provider = model.get(\"provider\")\n metadata = model.get(\"metadata\", {})\n\n # Get API key from user input or global variables\n api_key = get_api_key_for_provider(self.user_id, provider, self.api_key)\n\n # Validate required fields (Ollama doesn't require API key)\n if not api_key and provider != \"Ollama\":\n msg = (\n f\"{provider} API key is required. \"\n f\"Please provide it in the component or configure it globally as \"\n f\"{provider.upper().replace(' ', '_')}_API_KEY.\"\n )\n raise ValueError(msg)\n\n if not model_name:\n msg = \"Model name is required\"\n raise ValueError(msg)\n\n # Get embedding class\n embedding_class_name = metadata.get(\"embedding_class\")\n if not embedding_class_name:\n msg = f\"No embedding class defined in metadata for {model_name}\"\n raise ValueError(msg)\n\n embedding_class = get_embedding_classes().get(embedding_class_name)\n if not embedding_class:\n msg = f\"Unknown embedding class: {embedding_class_name}\"\n raise ValueError(msg)\n\n # Build kwargs using parameter mapping for primary instance\n kwargs = self._build_kwargs(model, metadata)\n primary_instance = embedding_class(**kwargs)\n\n # Get all available embedding models for this provider\n available_models_dict = self._build_available_models(\n provider=provider,\n embedding_class=embedding_class,\n metadata=metadata,\n api_key=api_key,\n )\n\n # Wrap with EmbeddingsWithModels to provide available_models metadata\n return EmbeddingsWithModels(\n embeddings=primary_instance,\n available_models=available_models_dict,\n )\n\n def _build_available_models(\n self,\n provider: str,\n embedding_class: type,\n metadata: dict[str, Any],\n api_key: str | None,\n ) -> dict[str, Embeddings]:\n \"\"\"Build a dictionary of all available embedding model instances for the provider.\n\n Args:\n provider: The provider name (e.g., \"OpenAI\", \"Ollama\")\n embedding_class: The embedding class to instantiate\n metadata: Metadata containing param_mapping\n api_key: The API key for the provider\n\n Returns:\n Dict mapping model names to their embedding instances\n \"\"\"\n available_models_dict: dict[str, Embeddings] = {}\n\n # Get all embedding models for this provider from unified models\n all_embedding_models = get_unified_models_detailed(\n providers=[provider],\n model_type=\"embeddings\",\n include_deprecated=False,\n include_unsupported=False,\n )\n\n if not all_embedding_models:\n return available_models_dict\n\n # Extract models from the provider data\n for provider_data in all_embedding_models:\n if provider_data.get(\"provider\") != provider:\n continue\n\n for model_data in provider_data.get(\"models\", []):\n model_name = model_data.get(\"model_name\")\n if not model_name:\n continue\n\n # Create a model dict compatible with _build_kwargs\n model_dict = {\n \"name\": model_name,\n \"provider\": provider,\n \"metadata\": metadata, # Reuse the same metadata/param_mapping\n }\n\n try:\n # Build kwargs for this model\n model_kwargs = self._build_kwargs_for_model(model_dict, metadata, api_key)\n # Create the embedding instance\n available_models_dict[model_name] = embedding_class(**model_kwargs)\n except Exception: # noqa: BLE001\n # Skip models that fail to instantiate\n # This handles cases where specific models have incompatible parameters\n logger.debug(\"Failed to instantiate embedding model %s: skipping\", model_name, exc_info=True)\n continue\n\n return available_models_dict\n\n def _build_kwargs_for_model(\n self,\n model: dict[str, Any],\n metadata: dict[str, Any],\n api_key: str | None,\n ) -> dict[str, Any]:\n \"\"\"Build kwargs dictionary for a specific model using parameter mapping.\n\n This is similar to _build_kwargs but uses the provided api_key directly\n instead of looking it up again.\n\n Args:\n model: Model dict with name and provider\n metadata: Metadata containing param_mapping\n api_key: The API key to use\n\n Returns:\n kwargs dict for embedding class instantiation\n \"\"\"\n param_mapping = metadata.get(\"param_mapping\", {})\n if not param_mapping:\n msg = \"Parameter mapping not found in metadata\"\n raise ValueError(msg)\n\n kwargs = {}\n provider = model.get(\"provider\")\n\n # Required parameters - handle both \"model\" and \"model_id\" (for watsonx)\n if \"model\" in param_mapping:\n kwargs[param_mapping[\"model\"]] = model.get(\"name\")\n elif \"model_id\" in param_mapping:\n kwargs[param_mapping[\"model_id\"]] = model.get(\"name\")\n\n # Add API key if mapped\n if \"api_key\" in param_mapping and api_key:\n kwargs[param_mapping[\"api_key\"]] = api_key\n\n # Optional parameters with their values\n optional_params = {\n \"api_base\": self.api_base if self.api_base else None,\n \"dimensions\": int(self.dimensions) if self.dimensions else None,\n \"chunk_size\": int(self.chunk_size) if self.chunk_size else None,\n \"request_timeout\": float(self.request_timeout) if self.request_timeout else None,\n \"max_retries\": int(self.max_retries) if self.max_retries else None,\n \"show_progress_bar\": self.show_progress_bar if hasattr(self, \"show_progress_bar\") else None,\n \"model_kwargs\": self.model_kwargs if self.model_kwargs else None,\n }\n\n # Watson-specific parameters\n if provider in {\"IBM WatsonX\", \"IBM watsonx.ai\"}:\n # Map base_url_ibm_watsonx to \"url\" parameter for watsonx\n if \"url\" in param_mapping:\n url_value = (\n self.base_url_ibm_watsonx\n if hasattr(self, \"base_url_ibm_watsonx\") and self.base_url_ibm_watsonx\n else \"https://us-south.ml.cloud.ibm.com\"\n )\n kwargs[param_mapping[\"url\"]] = url_value\n # Map project_id for watsonx\n if hasattr(self, \"project_id\") and self.project_id and \"project_id\" in param_mapping:\n kwargs[param_mapping[\"project_id\"]] = self.project_id\n\n # Ollama-specific parameters\n if provider == \"Ollama\" and \"base_url\" in param_mapping:\n # Map api_base to \"base_url\" parameter for Ollama\n base_url_value = self.api_base if hasattr(self, \"api_base\") and self.api_base else \"http://localhost:11434\"\n kwargs[param_mapping[\"base_url\"]] = base_url_value\n\n # Add optional parameters if they have values and are mapped\n for param_name, param_value in optional_params.items():\n if param_value is not None and param_name in param_mapping:\n # Special handling for request_timeout with Google provider\n if param_name == \"request_timeout\":\n if provider == \"Google Generative AI\" and isinstance(param_value, (int, float)):\n kwargs[param_mapping[param_name]] = {\"timeout\": param_value}\n else:\n kwargs[param_mapping[param_name]] = param_value\n else:\n kwargs[param_mapping[param_name]] = param_value\n\n return kwargs\n\n def _build_kwargs(self, model: dict[str, Any], metadata: dict[str, Any]) -> dict[str, Any]:\n \"\"\"Build kwargs dictionary using parameter mapping.\"\"\"\n param_mapping = metadata.get(\"param_mapping\", {})\n if not param_mapping:\n msg = \"Parameter mapping not found in metadata\"\n raise ValueError(msg)\n\n kwargs = {}\n\n # Required parameters - handle both \"model\" and \"model_id\" (for watsonx)\n if \"model\" in param_mapping:\n kwargs[param_mapping[\"model\"]] = model.get(\"name\")\n elif \"model_id\" in param_mapping:\n kwargs[param_mapping[\"model_id\"]] = model.get(\"name\")\n if \"api_key\" in param_mapping:\n kwargs[param_mapping[\"api_key\"]] = get_api_key_for_provider(\n self.user_id,\n model.get(\"provider\"),\n self.api_key,\n )\n\n # Optional parameters with their values\n provider = model.get(\"provider\")\n optional_params = {\n \"api_base\": self.api_base if self.api_base else None,\n \"dimensions\": int(self.dimensions) if self.dimensions else None,\n \"chunk_size\": int(self.chunk_size) if self.chunk_size else None,\n \"request_timeout\": float(self.request_timeout) if self.request_timeout else None,\n \"max_retries\": int(self.max_retries) if self.max_retries else None,\n \"show_progress_bar\": self.show_progress_bar if hasattr(self, \"show_progress_bar\") else None,\n \"model_kwargs\": self.model_kwargs if self.model_kwargs else None,\n }\n\n # Watson-specific parameters\n if provider in {\"IBM WatsonX\", \"IBM watsonx.ai\"}:\n # Map base_url_ibm_watsonx to \"url\" parameter for watsonx\n if \"url\" in param_mapping:\n url_value = (\n self.base_url_ibm_watsonx\n if hasattr(self, \"base_url_ibm_watsonx\") and self.base_url_ibm_watsonx\n else \"https://us-south.ml.cloud.ibm.com\"\n )\n kwargs[param_mapping[\"url\"]] = url_value\n # Map project_id for watsonx\n if hasattr(self, \"project_id\") and self.project_id and \"project_id\" in param_mapping:\n kwargs[param_mapping[\"project_id\"]] = self.project_id\n\n # Ollama-specific parameters\n if provider == \"Ollama\" and \"base_url\" in param_mapping:\n # Map api_base to \"base_url\" parameter for Ollama\n base_url_value = self.api_base if hasattr(self, \"api_base\") and self.api_base else \"http://localhost:11434\"\n kwargs[param_mapping[\"base_url\"]] = base_url_value\n\n # Add optional parameters if they have values and are mapped\n for param_name, param_value in optional_params.items():\n if param_value is not None and param_name in param_mapping:\n # Special handling for request_timeout with Google provider\n if param_name == \"request_timeout\":\n if provider == \"Google Generative AI\" and isinstance(param_value, (int, float)):\n kwargs[param_mapping[param_name]] = {\"timeout\": param_value}\n else:\n kwargs[param_mapping[param_name]] = param_value\n else:\n kwargs[param_mapping[param_name]] = param_value\n\n return kwargs\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Return type annotation mismatch.

The build_embeddings method signature declares -> Embeddings but actually returns EmbeddingsWithModels. This type mismatch can cause issues with static type checkers and mislead developers about the actual return type.

Suggested fix (within the embedded code)
-    def build_embeddings(self) -> Embeddings:
+    def build_embeddings(self) -> EmbeddingsWithModels:
🤖 Prompt for AI Agents
In `@src/lfx/src/lfx/_assets/component_index.json` at line 89139, The
build_embeddings method currently annotates its return as "-> Embeddings" but
returns an EmbeddingsWithModels instance; update the signature of
EmbeddingModelComponent.build_embeddings to return EmbeddingsWithModels (or a
union like Embeddings | EmbeddingsWithModels) to match the actual return value,
and adjust the docstring if needed; reference the method name build_embeddings
and the class EmbeddingsWithModels so you update the annotation where the method
is defined.

Comment on lines +96 to +111
# Google Generative AI Embedding Models
GOOGLE_GENERATIVE_AI_EMBEDDING_MODELS = [
"models/text-embedding-004",
"models/embedding-001",
]

# Embedding models as detailed metadata
GOOGLE_GENERATIVE_AI_EMBEDDING_MODELS_DETAILED = [
create_model_metadata(
provider="Google Generative AI",
name=name,
icon="GoogleGenerativeAI",
model_type="embeddings",
)
for name in GOOGLE_GENERATIVE_AI_EMBEDDING_MODELS
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

Google Generative AI embedding models text-embedding-004 embedding-001 current

💡 Result:

Short answer: Google’s current recommended text embedding model is gemini-embedding-001 (Gemini Embedding). Google is deprecating legacy embedding models (embedding-001 and text-embedding-004); Google recommends migrating to gemini-embedding-001. [1][2][3]

Key sources:

  • Gemini Embeddings docs (model name, defaults, dims, usage examples). [1]
  • Google Developers blog announcing Gemini Embedding GA and deprecation schedule (text-embedding-004 retirement Jan 14, 2026; embedding-001 retirement Aug 14, 2025; guidance to migrate to gemini-embedding-001). [2]
  • Research / paper describing Gemini Embedding and its MTEB/MMTEB results. [3]

If you want, I can show example API calls (Python/REST) to migrate embeddings to gemini-embedding-001 and recommended output dimensionalities.


Update embedding models to use current Google Generative AI API.

The models embedding-001 and text-embedding-004 are deprecated and have been retired by Google:

  • embedding-001: Retired August 14, 2025
  • text-embedding-004: Retiring January 14, 2026

Replace these with Google's current recommended model, gemini-embedding-001, which is the officially supported embedding model.

Update:

GOOGLE_GENERATIVE_AI_EMBEDDING_MODELS = [
    "models/gemini-embedding-001",
]
🤖 Prompt for AI Agents
In `@src/lfx/src/lfx/base/models/google_generative_ai_constants.py` around lines
96 - 111, The embedding model list uses deprecated Google models; update
GOOGLE_GENERATIVE_AI_EMBEDDING_MODELS to contain the current supported model
"models/gemini-embedding-001" and ensure
GOOGLE_GENERATIVE_AI_EMBEDDING_MODELS_DETAILED (which uses
create_model_metadata) will automatically reflect that change by iterating over
the updated list; replace the existing two entries with a single
"models/gemini-embedding-001".

Comment on lines 1 to 15
import logging
from typing import Any

from lfx.base.embeddings.embeddings_class import EmbeddingsWithModels
from lfx.base.embeddings.model import LCEmbeddingsModel
from lfx.base.models.unified_models import (
get_api_key_for_provider,
get_embedding_classes,
get_embedding_model_options,
get_unified_models_detailed,
update_model_options_in_build_config,
)

logger = logging.getLogger(__name__)
from lfx.base.models.watsonx_constants import IBM_WATSONX_URLS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Move logger initialization below imports to fix Ruff E402.

Ruff fails because a module-level statement appears before imports. Move logger = logging.getLogger(__name__) after the full import block to satisfy E402.

🧹 Proposed fix
-import logging
-from typing import Any
-
-from lfx.base.embeddings.embeddings_class import EmbeddingsWithModels
-from lfx.base.embeddings.model import LCEmbeddingsModel
-from lfx.base.models.unified_models import (
-    get_api_key_for_provider,
-    get_embedding_classes,
-    get_embedding_model_options,
-    get_unified_models_detailed,
-    update_model_options_in_build_config,
-)
-
-logger = logging.getLogger(__name__)
-from lfx.base.models.watsonx_constants import IBM_WATSONX_URLS
-from lfx.field_typing import Embeddings
-from lfx.io import (
+import logging
+from typing import Any
+
+from lfx.base.embeddings.embeddings_class import EmbeddingsWithModels
+from lfx.base.embeddings.model import LCEmbeddingsModel
+from lfx.base.models.unified_models import (
+    get_api_key_for_provider,
+    get_embedding_classes,
+    get_embedding_model_options,
+    get_unified_models_detailed,
+    update_model_options_in_build_config,
+)
+from lfx.base.models.watsonx_constants import IBM_WATSONX_URLS
+from lfx.field_typing import Embeddings
+from lfx.io import (
     BoolInput,
     DictInput,
     DropdownInput,
     FloatInput,
     IntInput,
     MessageTextInput,
     ModelInput,
     SecretStrInput,
 )
+
+logger = logging.getLogger(__name__)
🧰 Tools
🪛 GitHub Actions: Ruff Style Check

[error] 15-15: Ruff check failed. E402: Module level import not at top of file. Move imports to the top of the file. Command: uv run --only-dev ruff check --output-format=github .

🪛 GitHub Check: Ruff Style Check (3.13)

[failure] 15-15: Ruff (E402)
src/lfx/src/lfx/components/models_and_agents/embedding_model.py:15:1: E402 Module level import not at top of file

🤖 Prompt for AI Agents
In `@src/lfx/src/lfx/components/models_and_agents/embedding_model.py` around lines
1 - 15, Move the module-level logger initialization so it appears after the full
import block to satisfy Ruff E402: relocate the line "logger =
logging.getLogger(__name__)" to below the last import (e.g., after the reference
to IBM_WATSONX_URLS) in embedding_model.py and ensure no other executable
statements intervene between imports and that logger assignment.

Comment on lines +263 to +283
for model_data in provider_data.get("models", []):
model_name = model_data.get("model_name")
if not model_name:
continue

# Create a model dict compatible with _build_kwargs
model_dict = {
"name": model_name,
"provider": provider,
"metadata": metadata, # Reuse the same metadata/param_mapping
}

try:
# Build kwargs for this model
model_kwargs = self._build_kwargs_for_model(model_dict, metadata, api_key)
# Create the embedding instance
available_models_dict[model_name] = embedding_class(**model_kwargs)
except Exception: # noqa: BLE001
# Skip models that fail to instantiate
# This handles cases where specific models have incompatible parameters
logger.debug("Failed to instantiate embedding model %s: skipping", model_name, exc_info=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use per-model metadata (and embedding_class) when building available models.

Right now every model reuses the selected model’s metadata/embedding_class. If a provider has model-specific param_mapping or a different embedding_class, those models will either be misconfigured or silently skipped. Consider deriving metadata and class from each model_data.

🔧 Proposed fix
-            for model_data in provider_data.get("models", []):
+            for model_data in provider_data.get("models", []):
                 model_name = model_data.get("model_name")
                 if not model_name:
                     continue
+
+                model_metadata = model_data.get("metadata") or {}
+                effective_metadata = model_metadata or metadata
+                embedding_class_name = (
+                    effective_metadata.get("embedding_class") or metadata.get("embedding_class")
+                )
+                model_embedding_class = (
+                    get_embedding_classes().get(embedding_class_name) or embedding_class
+                )

                 # Create a model dict compatible with _build_kwargs
                 model_dict = {
                     "name": model_name,
                     "provider": provider,
-                    "metadata": metadata,  # Reuse the same metadata/param_mapping
+                    "metadata": effective_metadata,
                 }

                 try:
                     # Build kwargs for this model
-                    model_kwargs = self._build_kwargs_for_model(model_dict, metadata, api_key)
+                    model_kwargs = self._build_kwargs_for_model(model_dict, effective_metadata, api_key)
                     # Create the embedding instance
-                    available_models_dict[model_name] = embedding_class(**model_kwargs)
+                    available_models_dict[model_name] = model_embedding_class(**model_kwargs)
🤖 Prompt for AI Agents
In `@src/lfx/src/lfx/components/models_and_agents/embedding_model.py` around lines
263 - 283, The loop currently reuses the provider-level metadata and
embedding_class for every model; change it to extract per-model metadata and
embedding_class from model_data (falling back to provider-level values if
absent), then call _build_kwargs_for_model with that model-specific metadata and
api_key and instantiate using the model-specific embedding_class when populating
available_models_dict[model_name]; keep the try/except and logging but ensure
the correct per-model symbols (model_data, metadata_from_model,
embedding_class_from_model, _build_kwargs_for_model, available_models_dict) are
used so models with bespoke param_mapping or classes are configured and
instantiated correctly.

@edwinjosechittilappilly edwinjosechittilappilly added this pull request to the merge queue Jan 21, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jan 21, 2026
@edwinjosechittilappilly edwinjosechittilappilly added this pull request to the merge queue Jan 21, 2026
Merged via the queue into main with commit 3d53437 Jan 21, 2026
92 of 93 checks passed
@edwinjosechittilappilly edwinjosechittilappilly deleted the EJ/embedding_models_update branch January 21, 2026 18:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants