-
Notifications
You must be signed in to change notification settings - Fork 8.4k
feat: Add Available Models Support for Embeddings component #11320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Eliminated the 'markitdown' dependency and Markdown output option from the URLComponent in Blog Writer, Knowledge Ingestion, and Simple Agent starter projects. Updated the code and configuration to only support 'Text' and 'HTML' output formats. Also added a 'Local' storage option to Document Q&A, News Aggregator, and Portfolio Website Code Generator starter projects.
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughThe changes expand the EmbeddingModelComponent to support building multiple embedding models for a given provider. A new Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant EmbeddingModelComponent
participant GetUnifiedModels
participant EmbeddingClass
participant EmbeddingsWithModels as EmbeddingsWithModels<br/>(Wrapper)
Client->>EmbeddingModelComponent: build_embeddings()
EmbeddingModelComponent->>EmbeddingModelComponent: Extract provider, model, api_key
EmbeddingModelComponent->>EmbeddingClass: Instantiate primary embedding<br/>via _build_kwargs()
EmbeddingClass-->>EmbeddingModelComponent: primary_embedding instance
EmbeddingModelComponent->>GetUnifiedModels: get_unified_models_detailed(provider)
GetUnifiedModels-->>EmbeddingModelComponent: List of all provider models
EmbeddingModelComponent->>EmbeddingModelComponent: _build_available_models()
loop For each provider model
EmbeddingModelComponent->>EmbeddingModelComponent: _build_kwargs_for_model(model)
EmbeddingModelComponent->>EmbeddingClass: Instantiate embedding for model
EmbeddingClass-->>EmbeddingModelComponent: model_embedding instance
end
EmbeddingModelComponent->>EmbeddingsWithModels: Create wrapper with<br/>embeddings + available_models dict
EmbeddingsWithModels-->>Client: Return EmbeddingsWithModels
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Important Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional. ❌ Failed checks (1 error, 3 warnings)
✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…low-ai/langflow into EJ/embedding_models_update
lucaseduoli
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just ruff fixes
…low-ai/langflow into EJ/embedding_models_update
Codecov Report✅ All modified and coverable lines are covered by tests. ❌ Your project status has failed because the head coverage (41.63%) is below the target coverage (60.00%). You can increase the head coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #11320 +/- ##
==========================================
+ Coverage 34.55% 34.57% +0.02%
==========================================
Files 1416 1416
Lines 67422 67424 +2
Branches 9931 9931
==========================================
+ Hits 23296 23311 +15
+ Misses 42902 42888 -14
- Partials 1224 1225 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request adds support for available models metadata to the Embeddings component, enabling multi-model support by providing a wrapper that contains both the primary embedding instance and a dictionary of all available models from the same provider.
Changes:
- Modified
EmbeddingModelComponentto return anEmbeddingsWithModelswrapper containing both the primary embedding instance and all available model instances for the provider - Added Google Generative AI embedding models to the unified models constants
- Updated tests to verify the new wrapper behavior and available models functionality
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| src/lfx/src/lfx/components/models_and_agents/embedding_model.py | Core implementation: added _build_available_models and _build_kwargs_for_model methods; modified build_embeddings to return EmbeddingsWithModels wrapper; fixed Google provider name consistency |
| src/lfx/src/lfx/base/models/unified_models.py | Added GOOGLE_GENERATIVE_AI_EMBEDDING_MODELS_DETAILED to the detailed models list |
| src/lfx/src/lfx/base/models/google_generative_ai_constants.py | Added embedding models constants for Google Generative AI (text-embedding-004, embedding-001) |
| src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py | Updated existing tests to verify EmbeddingsWithModels wrapper and added new test for available models population |
| src/lfx/src/lfx/_assets/stable_hash_history.json | Updated component hash for EmbeddingModel component |
| src/lfx/src/lfx/_assets/component_index.json | Updated component metadata including code hash and full component code |
Comments suppressed due to low confidence (2)
src/lfx/src/lfx/components/models_and_agents/embedding_model.py:168
- When an Embeddings object is directly connected (line 168), it is returned as-is without wrapping it in EmbeddingsWithModels. This creates an inconsistency with the documented return type behavior. All return paths should consistently return an EmbeddingsWithModels instance to ensure uniform handling downstream. Consider wrapping the directly connected embeddings in an EmbeddingsWithModels instance with an empty available_models dict.
try:
from langchain_core.embeddings import Embeddings as BaseEmbeddings
if isinstance(self.model, BaseEmbeddings):
return self.model
except ImportError:
src/lfx/src/lfx/components/models_and_agents/embedding_model.py:15
- The logger initialization should be moved before the subsequent imports to follow Python's conventional import organization pattern. Logger setup typically appears after the imports from the standard library and third-party packages, but before importing from other modules in the same package.
from lfx.base.models.watsonx_constants import IBM_WATSONX_URLS
from lfx.field_typing import Embeddings
from lfx.io import (
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| model: dict[str, Any], | ||
| metadata: dict[str, Any], | ||
| api_key: str | None, | ||
| ) -> dict[str, Any]: | ||
| """Build kwargs dictionary for a specific model using parameter mapping. | ||
| This is similar to _build_kwargs but uses the provided api_key directly | ||
| instead of looking it up again. | ||
| Args: | ||
| model: Model dict with name and provider | ||
| metadata: Metadata containing param_mapping | ||
| api_key: The API key to use | ||
| Returns: | ||
| kwargs dict for embedding class instantiation | ||
| """ | ||
| param_mapping = metadata.get("param_mapping", {}) | ||
| if not param_mapping: | ||
| msg = "Parameter mapping not found in metadata" | ||
| raise ValueError(msg) | ||
|
|
||
| kwargs = {} | ||
| provider = model.get("provider") | ||
|
|
||
| # Required parameters - handle both "model" and "model_id" (for watsonx) | ||
| if "model" in param_mapping: | ||
| kwargs[param_mapping["model"]] = model.get("name") | ||
| elif "model_id" in param_mapping: | ||
| kwargs[param_mapping["model_id"]] = model.get("name") | ||
|
|
||
| # Add API key if mapped | ||
| if "api_key" in param_mapping and api_key: | ||
| kwargs[param_mapping["api_key"]] = api_key | ||
|
|
||
| # Optional parameters with their values | ||
| optional_params = { | ||
| "api_base": self.api_base if self.api_base else None, | ||
| "dimensions": int(self.dimensions) if self.dimensions else None, | ||
| "chunk_size": int(self.chunk_size) if self.chunk_size else None, | ||
| "request_timeout": float(self.request_timeout) if self.request_timeout else None, | ||
| "max_retries": int(self.max_retries) if self.max_retries else None, | ||
| "show_progress_bar": self.show_progress_bar if hasattr(self, "show_progress_bar") else None, | ||
| "model_kwargs": self.model_kwargs if self.model_kwargs else None, | ||
| } | ||
|
|
||
| # Watson-specific parameters | ||
| if provider in {"IBM WatsonX", "IBM watsonx.ai"}: | ||
| # Map base_url_ibm_watsonx to "url" parameter for watsonx | ||
| if "url" in param_mapping: | ||
| url_value = ( | ||
| self.base_url_ibm_watsonx | ||
| if hasattr(self, "base_url_ibm_watsonx") and self.base_url_ibm_watsonx | ||
| else "https://us-south.ml.cloud.ibm.com" | ||
| ) | ||
| kwargs[param_mapping["url"]] = url_value | ||
| # Map project_id for watsonx | ||
| if hasattr(self, "project_id") and self.project_id and "project_id" in param_mapping: | ||
| kwargs[param_mapping["project_id"]] = self.project_id | ||
|
|
||
| # Ollama-specific parameters | ||
| if provider == "Ollama" and "base_url" in param_mapping: | ||
| # Map api_base to "base_url" parameter for Ollama | ||
| base_url_value = self.api_base if hasattr(self, "api_base") and self.api_base else "http://localhost:11434" | ||
| kwargs[param_mapping["base_url"]] = base_url_value | ||
|
|
||
| # Add optional parameters if they have values and are mapped | ||
| for param_name, param_value in optional_params.items(): | ||
| if param_value is not None and param_name in param_mapping: | ||
| # Special handling for request_timeout with Google provider | ||
| if param_name == "request_timeout": | ||
| if provider == "Google Generative AI" and isinstance(param_value, (int, float)): | ||
| kwargs[param_mapping[param_name]] = {"timeout": param_value} | ||
| else: | ||
| kwargs[param_mapping[param_name]] = param_value | ||
| else: | ||
| kwargs[param_mapping[param_name]] = param_value | ||
|
|
||
| return kwargs | ||
|
|
||
| def _build_kwargs(self, model: dict[str, Any], metadata: dict[str, Any]) -> dict[str, Any]: |
Copilot
AI
Jan 21, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is significant code duplication between the _build_kwargs_for_model and _build_kwargs methods. Both methods share identical logic for handling Watson-specific parameters, Ollama-specific parameters, and the Google Generative AI timeout handling. Consider refactoring this shared logic into a common helper method to improve maintainability and reduce the risk of inconsistencies when making future updates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/lfx/src/lfx/components/models_and_agents/embedding_model.py (1)
156-169: Keep the return type consistent with the new EmbeddingsWithModels contract.The docstring now promises an
EmbeddingsWithModels, but the early return still returns a rawEmbeddings. Consider wrapping direct inputs (or adjust the docstring) to avoid downstream surprises.✅ Suggested adjustment
- if isinstance(self.model, BaseEmbeddings): - return self.model + if isinstance(self.model, BaseEmbeddings): + if isinstance(self.model, EmbeddingsWithModels): + return self.model + return EmbeddingsWithModels(embeddings=self.model, available_models={})
🤖 Fix all issues with AI agents
In `@src/backend/base/langflow/initial_setup/starter_projects/Nvidia` Remix.json:
- Line 1821: The WatsonX-specific inputs truncate_input_tokens and input_text
are not being forwarded to the Watsonx embeddings because they aren’t present in
param_mapping and must be passed inside a special params dict using IBM SDK
meta-names; update the IBM watsonx param_mapping to include a mapping for a
params/metadata key (e.g., "params") and then in both _build_kwargs and
_build_kwargs_for_model (and where provider in {"IBM WatsonX","IBM watsonx.ai"}
is checked) construct a params_dict that sets
EmbedTextParamsMetaNames.TRUNCATE_INPUT_TOKENS from self.truncate_input_tokens
and EmbedTextParamsMetaNames.RETURN_OPTIONS based on self.input_text, then
attach that params_dict to kwargs using the param_mapping entry (or fallback to
kwargs["params"]) so WatsonxEmbeddings receives the proper params payload.
Ensure you reference EmbedTextParamsMetaNames, truncate_input_tokens,
input_text, _build_kwargs, and _build_kwargs_for_model in the change.
In `@src/lfx/src/lfx/_assets/component_index.json`:
- Line 89139: The build_embeddings method currently annotates its return as "->
Embeddings" but returns an EmbeddingsWithModels instance; update the signature
of EmbeddingModelComponent.build_embeddings to return EmbeddingsWithModels (or a
union like Embeddings | EmbeddingsWithModels) to match the actual return value,
and adjust the docstring if needed; reference the method name build_embeddings
and the class EmbeddingsWithModels so you update the annotation where the method
is defined.
In `@src/lfx/src/lfx/base/models/google_generative_ai_constants.py`:
- Around line 96-111: The embedding model list uses deprecated Google models;
update GOOGLE_GENERATIVE_AI_EMBEDDING_MODELS to contain the current supported
model "models/gemini-embedding-001" and ensure
GOOGLE_GENERATIVE_AI_EMBEDDING_MODELS_DETAILED (which uses
create_model_metadata) will automatically reflect that change by iterating over
the updated list; replace the existing two entries with a single
"models/gemini-embedding-001".
In `@src/lfx/src/lfx/components/models_and_agents/embedding_model.py`:
- Around line 1-15: Move the module-level logger initialization so it appears
after the full import block to satisfy Ruff E402: relocate the line "logger =
logging.getLogger(__name__)" to below the last import (e.g., after the reference
to IBM_WATSONX_URLS) in embedding_model.py and ensure no other executable
statements intervene between imports and that logger assignment.
- Around line 263-283: The loop currently reuses the provider-level metadata and
embedding_class for every model; change it to extract per-model metadata and
embedding_class from model_data (falling back to provider-level values if
absent), then call _build_kwargs_for_model with that model-specific metadata and
api_key and instantiate using the model-specific embedding_class when populating
available_models_dict[model_name]; keep the try/except and logging but ensure
the correct per-model symbols (model_data, metadata_from_model,
embedding_class_from_model, _build_kwargs_for_model, available_models_dict) are
used so models with bespoke param_mapping or classes are configured and
instantiated correctly.
🧹 Nitpick comments (4)
src/lfx/src/lfx/_assets/component_index.json (2)
89139-89139: Import ordering: logger initialization misplaced between imports.The
logger = logging.getLogger(__name__)statement is placed between import blocks, which violates PEP 8 style guidelines. All imports should be grouped together before any module-level code.Suggested fix (within the embedded code)
from lfx.base.models.unified_models import ( get_api_key_for_provider, get_embedding_classes, get_embedding_model_options, get_unified_models_detailed, update_model_options_in_build_config, ) - -logger = logging.getLogger(__name__) from lfx.base.models.watsonx_constants import IBM_WATSONX_URLS from lfx.field_typing import Embeddings from lfx.io import ( BoolInput, DictInput, DropdownInput, FloatInput, IntInput, MessageTextInput, ModelInput, SecretStrInput, ) + +logger = logging.getLogger(__name__)
89139-89139: Significant code duplication between_build_kwargsand_build_kwargs_for_model.These two methods share ~90% identical code for handling optional parameters, Watson-specific parameters, Ollama-specific parameters, and Google-specific timeout handling. The only difference is that
_build_kwargs_for_modelreceivesapi_keyas a parameter while_build_kwargscallsget_api_key_for_providerinternally.Consider refactoring to a single private method that accepts an optional
api_keyparameter, eliminating the duplication.Suggested refactor approach
def _build_kwargs( self, model: dict[str, Any], metadata: dict[str, Any], api_key: str | None = None, ) -> dict[str, Any]: """Build kwargs dictionary using parameter mapping. Args: model: Model dict with name and provider metadata: Metadata containing param_mapping api_key: Optional API key. If not provided, will be fetched via get_api_key_for_provider. """ param_mapping = metadata.get("param_mapping", {}) if not param_mapping: msg = "Parameter mapping not found in metadata" raise ValueError(msg) kwargs = {} provider = model.get("provider") # Required parameters if "model" in param_mapping: kwargs[param_mapping["model"]] = model.get("name") elif "model_id" in param_mapping: kwargs[param_mapping["model_id"]] = model.get("name") # API key - use provided or fetch if "api_key" in param_mapping: resolved_api_key = api_key if api_key is not None else get_api_key_for_provider( self.user_id, provider, self.api_key ) if resolved_api_key: kwargs[param_mapping["api_key"]] = resolved_api_key # ... rest of the shared logic (optional params, provider-specific handling)Then remove
_build_kwargs_for_modeland update calls to passapi_keywhen available.src/lfx/src/lfx/components/models_and_agents/embedding_model.py (1)
288-368: Optional: reduce duplication with_build_kwargsto avoid drift.
_build_kwargs_for_modellargely mirrors_build_kwargs. A small helper or anapi_keyoverride in_build_kwargswould simplify maintenance.src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py (1)
210-267: Strengthen available_models verification with instance checks.Right now the test only asserts keys; adding value assertions will confirm each model maps to its dedicated instance.
✅ Suggested test tightening
assert "text-embedding-3-small" in embeddings.available_models assert "text-embedding-3-large" in embeddings.available_models assert "text-embedding-ada-002" in embeddings.available_models assert len(embeddings.available_models) == 3 + assert ( + embeddings.available_models["text-embedding-3-small"] + is mock_instances["text-embedding-3-small"] + ) + assert ( + embeddings.available_models["text-embedding-3-large"] + is mock_instances["text-embedding-3-large"] + ) + assert ( + embeddings.available_models["text-embedding-ada-002"] + is mock_instances["text-embedding-ada-002"] + )
| "title_case": false, | ||
| "type": "code", | ||
| "value": "from typing import Any\n\nfrom lfx.base.embeddings.model import LCEmbeddingsModel\nfrom lfx.base.models.unified_models import (\n get_api_key_for_provider,\n get_embedding_classes,\n get_embedding_model_options,\n update_model_options_in_build_config,\n)\nfrom lfx.base.models.watsonx_constants import IBM_WATSONX_URLS\nfrom lfx.field_typing import Embeddings\nfrom lfx.io import (\n BoolInput,\n DictInput,\n DropdownInput,\n FloatInput,\n IntInput,\n MessageTextInput,\n ModelInput,\n SecretStrInput,\n)\n\n\nclass EmbeddingModelComponent(LCEmbeddingsModel):\n display_name = \"Embedding Model\"\n description = \"Generate embeddings using a specified provider.\"\n documentation: str = \"https://docs.langflow.org/components-embedding-models\"\n icon = \"binary\"\n name = \"EmbeddingModel\"\n category = \"models\"\n\n def update_build_config(self, build_config: dict, field_value: str, field_name: str | None = None):\n \"\"\"Dynamically update build config with user-filtered model options.\"\"\"\n # Update model options\n build_config = update_model_options_in_build_config(\n component=self,\n build_config=build_config,\n cache_key_prefix=\"embedding_model_options\",\n get_options_func=get_embedding_model_options,\n field_name=field_name,\n field_value=field_value,\n )\n\n # Show/hide provider-specific fields based on selected model\n if field_name == \"model\" and isinstance(field_value, list) and len(field_value) > 0:\n selected_model = field_value[0]\n provider = selected_model.get(\"provider\", \"\")\n\n # Show/hide watsonx fields\n is_watsonx = provider == \"IBM WatsonX\"\n build_config[\"base_url_ibm_watsonx\"][\"show\"] = is_watsonx\n build_config[\"project_id\"][\"show\"] = is_watsonx\n build_config[\"truncate_input_tokens\"][\"show\"] = is_watsonx\n build_config[\"input_text\"][\"show\"] = is_watsonx\n if is_watsonx:\n build_config[\"base_url_ibm_watsonx\"][\"required\"] = True\n build_config[\"project_id\"][\"required\"] = True\n\n return build_config\n\n inputs = [\n ModelInput(\n name=\"model\",\n display_name=\"Embedding Model\",\n info=\"Select your model provider\",\n real_time_refresh=True,\n required=True,\n model_type=\"embedding\",\n input_types=[\"Embeddings\"], # Override default to accept Embeddings instead of LanguageModel\n ),\n SecretStrInput(\n name=\"api_key\",\n display_name=\"API Key\",\n info=\"Model Provider API key\",\n real_time_refresh=True,\n advanced=True,\n ),\n MessageTextInput(\n name=\"api_base\",\n display_name=\"API Base URL\",\n info=\"Base URL for the API. Leave empty for default.\",\n advanced=True,\n ),\n # Watson-specific inputs\n DropdownInput(\n name=\"base_url_ibm_watsonx\",\n display_name=\"watsonx API Endpoint\",\n info=\"The base URL of the API (IBM watsonx.ai only)\",\n options=IBM_WATSONX_URLS,\n value=IBM_WATSONX_URLS[0],\n show=False,\n real_time_refresh=True,\n ),\n MessageTextInput(\n name=\"project_id\",\n display_name=\"Project ID\",\n info=\"IBM watsonx.ai Project ID (required for IBM watsonx.ai)\",\n show=False,\n ),\n IntInput(\n name=\"dimensions\",\n display_name=\"Dimensions\",\n info=\"The number of dimensions the resulting output embeddings should have. \"\n \"Only supported by certain models.\",\n advanced=True,\n ),\n IntInput(\n name=\"chunk_size\",\n display_name=\"Chunk Size\",\n advanced=True,\n value=1000,\n ),\n FloatInput(\n name=\"request_timeout\",\n display_name=\"Request Timeout\",\n advanced=True,\n ),\n IntInput(\n name=\"max_retries\",\n display_name=\"Max Retries\",\n advanced=True,\n value=3,\n ),\n BoolInput(\n name=\"show_progress_bar\",\n display_name=\"Show Progress Bar\",\n advanced=True,\n ),\n DictInput(\n name=\"model_kwargs\",\n display_name=\"Model Kwargs\",\n advanced=True,\n info=\"Additional keyword arguments to pass to the model.\",\n ),\n IntInput(\n name=\"truncate_input_tokens\",\n display_name=\"Truncate Input Tokens\",\n advanced=True,\n value=200,\n show=False,\n ),\n BoolInput(\n name=\"input_text\",\n display_name=\"Include the original text in the output\",\n value=True,\n advanced=True,\n show=False,\n ),\n ]\n\n def build_embeddings(self) -> Embeddings:\n \"\"\"Build and return an embeddings instance based on the selected model.\"\"\"\n # If an Embeddings object is directly connected, return it\n try:\n from langchain_core.embeddings import Embeddings as BaseEmbeddings\n\n if isinstance(self.model, BaseEmbeddings):\n return self.model\n except ImportError:\n pass\n\n # Safely extract model configuration\n if not self.model or not isinstance(self.model, list):\n msg = \"Model must be a non-empty list\"\n raise ValueError(msg)\n\n model = self.model[0]\n model_name = model.get(\"name\")\n provider = model.get(\"provider\")\n metadata = model.get(\"metadata\", {})\n\n # Get API key from user input or global variables\n api_key = get_api_key_for_provider(self.user_id, provider, self.api_key)\n\n # Validate required fields (Ollama doesn't require API key)\n if not api_key and provider != \"Ollama\":\n msg = (\n f\"{provider} API key is required. \"\n f\"Please provide it in the component or configure it globally as \"\n f\"{provider.upper().replace(' ', '_')}_API_KEY.\"\n )\n raise ValueError(msg)\n\n if not model_name:\n msg = \"Model name is required\"\n raise ValueError(msg)\n\n # Get embedding class\n embedding_class_name = metadata.get(\"embedding_class\")\n if not embedding_class_name:\n msg = f\"No embedding class defined in metadata for {model_name}\"\n raise ValueError(msg)\n\n embedding_class = get_embedding_classes().get(embedding_class_name)\n if not embedding_class:\n msg = f\"Unknown embedding class: {embedding_class_name}\"\n raise ValueError(msg)\n\n # Build kwargs using parameter mapping\n kwargs = self._build_kwargs(model, metadata)\n\n return embedding_class(**kwargs)\n\n def _build_kwargs(self, model: dict[str, Any], metadata: dict[str, Any]) -> dict[str, Any]:\n \"\"\"Build kwargs dictionary using parameter mapping.\"\"\"\n param_mapping = metadata.get(\"param_mapping\", {})\n if not param_mapping:\n msg = \"Parameter mapping not found in metadata\"\n raise ValueError(msg)\n\n kwargs = {}\n\n # Required parameters - handle both \"model\" and \"model_id\" (for watsonx)\n if \"model\" in param_mapping:\n kwargs[param_mapping[\"model\"]] = model.get(\"name\")\n elif \"model_id\" in param_mapping:\n kwargs[param_mapping[\"model_id\"]] = model.get(\"name\")\n if \"api_key\" in param_mapping:\n kwargs[param_mapping[\"api_key\"]] = get_api_key_for_provider(\n self.user_id,\n model.get(\"provider\"),\n self.api_key,\n )\n\n # Optional parameters with their values\n provider = model.get(\"provider\")\n optional_params = {\n \"api_base\": self.api_base if self.api_base else None,\n \"dimensions\": int(self.dimensions) if self.dimensions else None,\n \"chunk_size\": int(self.chunk_size) if self.chunk_size else None,\n \"request_timeout\": float(self.request_timeout) if self.request_timeout else None,\n \"max_retries\": int(self.max_retries) if self.max_retries else None,\n \"show_progress_bar\": self.show_progress_bar if hasattr(self, \"show_progress_bar\") else None,\n \"model_kwargs\": self.model_kwargs if self.model_kwargs else None,\n }\n\n # Watson-specific parameters\n if provider in {\"IBM WatsonX\", \"IBM watsonx.ai\"}:\n # Map base_url_ibm_watsonx to \"url\" parameter for watsonx\n if \"url\" in param_mapping:\n url_value = (\n self.base_url_ibm_watsonx\n if hasattr(self, \"base_url_ibm_watsonx\") and self.base_url_ibm_watsonx\n else \"https://us-south.ml.cloud.ibm.com\"\n )\n kwargs[param_mapping[\"url\"]] = url_value\n # Map project_id for watsonx\n if hasattr(self, \"project_id\") and self.project_id and \"project_id\" in param_mapping:\n kwargs[param_mapping[\"project_id\"]] = self.project_id\n\n # Ollama-specific parameters\n if provider == \"Ollama\" and \"base_url\" in param_mapping:\n # Map api_base to \"base_url\" parameter for Ollama\n base_url_value = self.api_base if hasattr(self, \"api_base\") and self.api_base else \"http://localhost:11434\"\n kwargs[param_mapping[\"base_url\"]] = base_url_value\n\n # Add optional parameters if they have values and are mapped\n for param_name, param_value in optional_params.items():\n if param_value is not None and param_name in param_mapping:\n # Special handling for request_timeout with Google provider\n if param_name == \"request_timeout\":\n if provider == \"Google\" and isinstance(param_value, (int, float)):\n kwargs[param_mapping[param_name]] = {\"timeout\": param_value}\n else:\n kwargs[param_mapping[param_name]] = param_value\n else:\n kwargs[param_mapping[param_name]] = param_value\n\n return kwargs\n" | ||
| "value": "import logging\nfrom typing import Any\n\nfrom lfx.base.embeddings.embeddings_class import EmbeddingsWithModels\nfrom lfx.base.embeddings.model import LCEmbeddingsModel\nfrom lfx.base.models.unified_models import (\n get_api_key_for_provider,\n get_embedding_classes,\n get_embedding_model_options,\n get_unified_models_detailed,\n update_model_options_in_build_config,\n)\n\nlogger = logging.getLogger(__name__)\nfrom lfx.base.models.watsonx_constants import IBM_WATSONX_URLS\nfrom lfx.field_typing import Embeddings\nfrom lfx.io import (\n BoolInput,\n DictInput,\n DropdownInput,\n FloatInput,\n IntInput,\n MessageTextInput,\n ModelInput,\n SecretStrInput,\n)\n\n\nclass EmbeddingModelComponent(LCEmbeddingsModel):\n display_name = \"Embedding Model\"\n description = \"Generate embeddings using a specified provider.\"\n documentation: str = \"https://docs.langflow.org/components-embedding-models\"\n icon = \"binary\"\n name = \"EmbeddingModel\"\n category = \"models\"\n\n def update_build_config(self, build_config: dict, field_value: str, field_name: str | None = None):\n \"\"\"Dynamically update build config with user-filtered model options.\"\"\"\n # Update model options\n build_config = update_model_options_in_build_config(\n component=self,\n build_config=build_config,\n cache_key_prefix=\"embedding_model_options\",\n get_options_func=get_embedding_model_options,\n field_name=field_name,\n field_value=field_value,\n )\n\n # Show/hide provider-specific fields based on selected model\n if field_name == \"model\" and isinstance(field_value, list) and len(field_value) > 0:\n selected_model = field_value[0]\n provider = selected_model.get(\"provider\", \"\")\n\n # Show/hide watsonx fields\n is_watsonx = provider == \"IBM WatsonX\"\n build_config[\"base_url_ibm_watsonx\"][\"show\"] = is_watsonx\n build_config[\"project_id\"][\"show\"] = is_watsonx\n build_config[\"truncate_input_tokens\"][\"show\"] = is_watsonx\n build_config[\"input_text\"][\"show\"] = is_watsonx\n if is_watsonx:\n build_config[\"base_url_ibm_watsonx\"][\"required\"] = True\n build_config[\"project_id\"][\"required\"] = True\n\n return build_config\n\n inputs = [\n ModelInput(\n name=\"model\",\n display_name=\"Embedding Model\",\n info=\"Select your model provider\",\n real_time_refresh=True,\n required=True,\n model_type=\"embedding\",\n input_types=[\"Embeddings\"], # Override default to accept Embeddings instead of LanguageModel\n ),\n SecretStrInput(\n name=\"api_key\",\n display_name=\"API Key\",\n info=\"Model Provider API key\",\n real_time_refresh=True,\n advanced=True,\n ),\n MessageTextInput(\n name=\"api_base\",\n display_name=\"API Base URL\",\n info=\"Base URL for the API. Leave empty for default.\",\n advanced=True,\n ),\n # Watson-specific inputs\n DropdownInput(\n name=\"base_url_ibm_watsonx\",\n display_name=\"watsonx API Endpoint\",\n info=\"The base URL of the API (IBM watsonx.ai only)\",\n options=IBM_WATSONX_URLS,\n value=IBM_WATSONX_URLS[0],\n show=False,\n real_time_refresh=True,\n ),\n MessageTextInput(\n name=\"project_id\",\n display_name=\"Project ID\",\n info=\"IBM watsonx.ai Project ID (required for IBM watsonx.ai)\",\n show=False,\n ),\n IntInput(\n name=\"dimensions\",\n display_name=\"Dimensions\",\n info=\"The number of dimensions the resulting output embeddings should have. \"\n \"Only supported by certain models.\",\n advanced=True,\n ),\n IntInput(\n name=\"chunk_size\",\n display_name=\"Chunk Size\",\n advanced=True,\n value=1000,\n ),\n FloatInput(\n name=\"request_timeout\",\n display_name=\"Request Timeout\",\n advanced=True,\n ),\n IntInput(\n name=\"max_retries\",\n display_name=\"Max Retries\",\n advanced=True,\n value=3,\n ),\n BoolInput(\n name=\"show_progress_bar\",\n display_name=\"Show Progress Bar\",\n advanced=True,\n ),\n DictInput(\n name=\"model_kwargs\",\n display_name=\"Model Kwargs\",\n advanced=True,\n info=\"Additional keyword arguments to pass to the model.\",\n ),\n IntInput(\n name=\"truncate_input_tokens\",\n display_name=\"Truncate Input Tokens\",\n advanced=True,\n value=200,\n show=False,\n ),\n BoolInput(\n name=\"input_text\",\n display_name=\"Include the original text in the output\",\n value=True,\n advanced=True,\n show=False,\n ),\n ]\n\n def build_embeddings(self) -> Embeddings:\n \"\"\"Build and return an embeddings instance based on the selected model.\n\n Returns an EmbeddingsWithModels wrapper that contains:\n - The primary embedding instance (for the selected model)\n - available_models dict mapping all available model names to their instances\n \"\"\"\n # If an Embeddings object is directly connected, return it\n try:\n from langchain_core.embeddings import Embeddings as BaseEmbeddings\n\n if isinstance(self.model, BaseEmbeddings):\n return self.model\n except ImportError:\n pass\n\n # Safely extract model configuration\n if not self.model or not isinstance(self.model, list):\n msg = \"Model must be a non-empty list\"\n raise ValueError(msg)\n\n model = self.model[0]\n model_name = model.get(\"name\")\n provider = model.get(\"provider\")\n metadata = model.get(\"metadata\", {})\n\n # Get API key from user input or global variables\n api_key = get_api_key_for_provider(self.user_id, provider, self.api_key)\n\n # Validate required fields (Ollama doesn't require API key)\n if not api_key and provider != \"Ollama\":\n msg = (\n f\"{provider} API key is required. \"\n f\"Please provide it in the component or configure it globally as \"\n f\"{provider.upper().replace(' ', '_')}_API_KEY.\"\n )\n raise ValueError(msg)\n\n if not model_name:\n msg = \"Model name is required\"\n raise ValueError(msg)\n\n # Get embedding class\n embedding_class_name = metadata.get(\"embedding_class\")\n if not embedding_class_name:\n msg = f\"No embedding class defined in metadata for {model_name}\"\n raise ValueError(msg)\n\n embedding_class = get_embedding_classes().get(embedding_class_name)\n if not embedding_class:\n msg = f\"Unknown embedding class: {embedding_class_name}\"\n raise ValueError(msg)\n\n # Build kwargs using parameter mapping for primary instance\n kwargs = self._build_kwargs(model, metadata)\n primary_instance = embedding_class(**kwargs)\n\n # Get all available embedding models for this provider\n available_models_dict = self._build_available_models(\n provider=provider,\n embedding_class=embedding_class,\n metadata=metadata,\n api_key=api_key,\n )\n\n # Wrap with EmbeddingsWithModels to provide available_models metadata\n return EmbeddingsWithModels(\n embeddings=primary_instance,\n available_models=available_models_dict,\n )\n\n def _build_available_models(\n self,\n provider: str,\n embedding_class: type,\n metadata: dict[str, Any],\n api_key: str | None,\n ) -> dict[str, Embeddings]:\n \"\"\"Build a dictionary of all available embedding model instances for the provider.\n\n Args:\n provider: The provider name (e.g., \"OpenAI\", \"Ollama\")\n embedding_class: The embedding class to instantiate\n metadata: Metadata containing param_mapping\n api_key: The API key for the provider\n\n Returns:\n Dict mapping model names to their embedding instances\n \"\"\"\n available_models_dict: dict[str, Embeddings] = {}\n\n # Get all embedding models for this provider from unified models\n all_embedding_models = get_unified_models_detailed(\n providers=[provider],\n model_type=\"embeddings\",\n include_deprecated=False,\n include_unsupported=False,\n )\n\n if not all_embedding_models:\n return available_models_dict\n\n # Extract models from the provider data\n for provider_data in all_embedding_models:\n if provider_data.get(\"provider\") != provider:\n continue\n\n for model_data in provider_data.get(\"models\", []):\n model_name = model_data.get(\"model_name\")\n if not model_name:\n continue\n\n # Create a model dict compatible with _build_kwargs\n model_dict = {\n \"name\": model_name,\n \"provider\": provider,\n \"metadata\": metadata, # Reuse the same metadata/param_mapping\n }\n\n try:\n # Build kwargs for this model\n model_kwargs = self._build_kwargs_for_model(model_dict, metadata, api_key)\n # Create the embedding instance\n available_models_dict[model_name] = embedding_class(**model_kwargs)\n except Exception: # noqa: BLE001\n # Skip models that fail to instantiate\n # This handles cases where specific models have incompatible parameters\n logger.debug(\"Failed to instantiate embedding model %s: skipping\", model_name, exc_info=True)\n continue\n\n return available_models_dict\n\n def _build_kwargs_for_model(\n self,\n model: dict[str, Any],\n metadata: dict[str, Any],\n api_key: str | None,\n ) -> dict[str, Any]:\n \"\"\"Build kwargs dictionary for a specific model using parameter mapping.\n\n This is similar to _build_kwargs but uses the provided api_key directly\n instead of looking it up again.\n\n Args:\n model: Model dict with name and provider\n metadata: Metadata containing param_mapping\n api_key: The API key to use\n\n Returns:\n kwargs dict for embedding class instantiation\n \"\"\"\n param_mapping = metadata.get(\"param_mapping\", {})\n if not param_mapping:\n msg = \"Parameter mapping not found in metadata\"\n raise ValueError(msg)\n\n kwargs = {}\n provider = model.get(\"provider\")\n\n # Required parameters - handle both \"model\" and \"model_id\" (for watsonx)\n if \"model\" in param_mapping:\n kwargs[param_mapping[\"model\"]] = model.get(\"name\")\n elif \"model_id\" in param_mapping:\n kwargs[param_mapping[\"model_id\"]] = model.get(\"name\")\n\n # Add API key if mapped\n if \"api_key\" in param_mapping and api_key:\n kwargs[param_mapping[\"api_key\"]] = api_key\n\n # Optional parameters with their values\n optional_params = {\n \"api_base\": self.api_base if self.api_base else None,\n \"dimensions\": int(self.dimensions) if self.dimensions else None,\n \"chunk_size\": int(self.chunk_size) if self.chunk_size else None,\n \"request_timeout\": float(self.request_timeout) if self.request_timeout else None,\n \"max_retries\": int(self.max_retries) if self.max_retries else None,\n \"show_progress_bar\": self.show_progress_bar if hasattr(self, \"show_progress_bar\") else None,\n \"model_kwargs\": self.model_kwargs if self.model_kwargs else None,\n }\n\n # Watson-specific parameters\n if provider in {\"IBM WatsonX\", \"IBM watsonx.ai\"}:\n # Map base_url_ibm_watsonx to \"url\" parameter for watsonx\n if \"url\" in param_mapping:\n url_value = (\n self.base_url_ibm_watsonx\n if hasattr(self, \"base_url_ibm_watsonx\") and self.base_url_ibm_watsonx\n else \"https://us-south.ml.cloud.ibm.com\"\n )\n kwargs[param_mapping[\"url\"]] = url_value\n # Map project_id for watsonx\n if hasattr(self, \"project_id\") and self.project_id and \"project_id\" in param_mapping:\n kwargs[param_mapping[\"project_id\"]] = self.project_id\n\n # Ollama-specific parameters\n if provider == \"Ollama\" and \"base_url\" in param_mapping:\n # Map api_base to \"base_url\" parameter for Ollama\n base_url_value = self.api_base if hasattr(self, \"api_base\") and self.api_base else \"http://localhost:11434\"\n kwargs[param_mapping[\"base_url\"]] = base_url_value\n\n # Add optional parameters if they have values and are mapped\n for param_name, param_value in optional_params.items():\n if param_value is not None and param_name in param_mapping:\n # Special handling for request_timeout with Google provider\n if param_name == \"request_timeout\":\n if provider == \"Google Generative AI\" and isinstance(param_value, (int, float)):\n kwargs[param_mapping[param_name]] = {\"timeout\": param_value}\n else:\n kwargs[param_mapping[param_name]] = param_value\n else:\n kwargs[param_mapping[param_name]] = param_value\n\n return kwargs\n\n def _build_kwargs(self, model: dict[str, Any], metadata: dict[str, Any]) -> dict[str, Any]:\n \"\"\"Build kwargs dictionary using parameter mapping.\"\"\"\n param_mapping = metadata.get(\"param_mapping\", {})\n if not param_mapping:\n msg = \"Parameter mapping not found in metadata\"\n raise ValueError(msg)\n\n kwargs = {}\n\n # Required parameters - handle both \"model\" and \"model_id\" (for watsonx)\n if \"model\" in param_mapping:\n kwargs[param_mapping[\"model\"]] = model.get(\"name\")\n elif \"model_id\" in param_mapping:\n kwargs[param_mapping[\"model_id\"]] = model.get(\"name\")\n if \"api_key\" in param_mapping:\n kwargs[param_mapping[\"api_key\"]] = get_api_key_for_provider(\n self.user_id,\n model.get(\"provider\"),\n self.api_key,\n )\n\n # Optional parameters with their values\n provider = model.get(\"provider\")\n optional_params = {\n \"api_base\": self.api_base if self.api_base else None,\n \"dimensions\": int(self.dimensions) if self.dimensions else None,\n \"chunk_size\": int(self.chunk_size) if self.chunk_size else None,\n \"request_timeout\": float(self.request_timeout) if self.request_timeout else None,\n \"max_retries\": int(self.max_retries) if self.max_retries else None,\n \"show_progress_bar\": self.show_progress_bar if hasattr(self, \"show_progress_bar\") else None,\n \"model_kwargs\": self.model_kwargs if self.model_kwargs else None,\n }\n\n # Watson-specific parameters\n if provider in {\"IBM WatsonX\", \"IBM watsonx.ai\"}:\n # Map base_url_ibm_watsonx to \"url\" parameter for watsonx\n if \"url\" in param_mapping:\n url_value = (\n self.base_url_ibm_watsonx\n if hasattr(self, \"base_url_ibm_watsonx\") and self.base_url_ibm_watsonx\n else \"https://us-south.ml.cloud.ibm.com\"\n )\n kwargs[param_mapping[\"url\"]] = url_value\n # Map project_id for watsonx\n if hasattr(self, \"project_id\") and self.project_id and \"project_id\" in param_mapping:\n kwargs[param_mapping[\"project_id\"]] = self.project_id\n\n # Ollama-specific parameters\n if provider == \"Ollama\" and \"base_url\" in param_mapping:\n # Map api_base to \"base_url\" parameter for Ollama\n base_url_value = self.api_base if hasattr(self, \"api_base\") and self.api_base else \"http://localhost:11434\"\n kwargs[param_mapping[\"base_url\"]] = base_url_value\n\n # Add optional parameters if they have values and are mapped\n for param_name, param_value in optional_params.items():\n if param_value is not None and param_name in param_mapping:\n # Special handling for request_timeout with Google provider\n if param_name == \"request_timeout\":\n if provider == \"Google Generative AI\" and isinstance(param_value, (int, float)):\n kwargs[param_mapping[param_name]] = {\"timeout\": param_value}\n else:\n kwargs[param_mapping[param_name]] = param_value\n else:\n kwargs[param_mapping[param_name]] = param_value\n\n return kwargs\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, let's search for watsonx-related code and parameter mappings
rg "truncate_input_tokens|watsonx" --type json --type py -i -A 3 -B 3 | head -100Repository: langflow-ai/langflow
Length of output: 9659
🏁 Script executed:
# Search for embedding class definitions and parameter mappings
rg "param_mapping" --type json --type py -l | head -20Repository: langflow-ai/langflow
Length of output: 379
🏁 Script executed:
# Look for IBM WatsonX embedding class references
rg "IBMWatsonX|WatsonxEmbeddings" --type py -A 5 -B 2 | head -80Repository: langflow-ai/langflow
Length of output: 6375
🏁 Script executed:
# Let's look at the param_mappings for IBM WatsonX in unified_models.py
rg "IBM WatsonX|param_mappings" src/lfx/src/lfx/base/models/unified_models.py -A 30 | head -150Repository: langflow-ai/langflow
Length of output: 4934
🏁 Script executed:
# Also let's check the actual param_mapping for IBM WatsonX
rg "IBM WatsonX" src/lfx/src/lfx/base/models/unified_models.py -A 50 -B 5 | head -200Repository: langflow-ai/langflow
Length of output: 7052
🏁 Script executed:
# Let's look at the entire WatsonxEmbeddings component to see how it uses parameters
cat src/lfx/src/lfx/components/ibm/watsonx_embeddings.pyRepository: langflow-ai/langflow
Length of output: 4887
🏁 Script executed:
# Search for WATSONX embedding models configuration with param_mapping
rg "WATSONX.*EMBEDDING|param_mapping.*IBM|truncate_input" src/lfx/src/lfx/base/models/unified_models.py -B 5 -A 15 | head -200Repository: langflow-ai/langflow
Length of output: 46
🏁 Script executed:
# Look for the detailed embedding models definitions
fd "unified_models.py" -exec grep -n "WATSONX_EMBEDDING" {} +Repository: langflow-ai/langflow
Length of output: 294
🏁 Script executed:
# Search for all embedding model detailed definitions in unified_models.py
rg "EMBEDDING_MODELS_DETAILED|param_mapping" src/lfx/src/lfx/base/models/unified_models.py -B 3 -A 20 | head -300Repository: langflow-ai/langflow
Length of output: 7821
🏁 Script executed:
# Get the full param_mappings dictionary, specifically looking for IBM WatsonX
rg "IBM WatsonX.*{|\"IBM WatsonX\":" src/lfx/src/lfx/base/models/unified_models.py -A 20 | grep -A 20 "param_mappings"Repository: langflow-ai/langflow
Length of output: 189
🏁 Script executed:
# Let's view the watsonx_constants file directly to see the embedding models config
cat src/lfx/src/lfx/base/models/watsonx_constants.pyRepository: langflow-ai/langflow
Length of output: 2225
🏁 Script executed:
# Search for param_mappings dictionary definition, especially IBM WatsonX section
rg -A 150 "param_mappings = \{" src/lfx/src/lfx/base/models/unified_models.py | head -200Repository: langflow-ai/langflow
Length of output: 5559
🏁 Script executed:
# Let's check if there's a specific IBM WatsonX embedding param_mapping
rg "IBM WatsonX" src/lfx/src/lfx/base/models/unified_models.py -B 5 -A 15 | grep -A 10 "param_mappings"Repository: langflow-ai/langflow
Length of output: 468
🏁 Script executed:
# Let's verify the exact flow - check how optional_params are actually used in the kwargs building
rg -A 20 "for param_name, param_value in optional_params" src/lfx/src/lfx/components/models_and_agents/embedding_model.pyRepository: langflow-ai/langflow
Length of output: 1751
🏁 Script executed:
# Let's also search for how WatsonxEmbeddings expects these parameters - check langchain_ibm documentation or usage
web_search WatsonxEmbeddings truncate_input_tokens input_text paramsRepository: langflow-ai/langflow
Length of output: 114
🌐 Web query:
WatsonxEmbeddings langchain_ibm truncate_input_tokens input_text parameters
💡 Result:
Short answer: In the langchain_ibm WatsonxEmbeddings integration you pass IBM embedding model parameters via the IBM SDK meta-names. TRUNCATE_INPUT_TOKENS (truncate_input_tokens) tells the service how many tokens to keep from the input (useful to trim long texts before embedding). RETURN_OPTIONS with {"input_text": True} returns the original input text in the API response. Example (Python):
from ibm_watsonx_ai.metanames import EmbedTextParamsMetaNames
embed_params = {
EmbedTextParamsMetaNames.TRUNCATE_INPUT_TOKENS: 3, # or set truncate_input_tokens=512 in other wrappers
EmbedTextParamsMetaNames.RETURN_OPTIONS: {"input_text": True},
}
then pass embed_params when constructing/using WatsonxEmbeddings.
References: LangChain docs showing TRUNCATE_INPUT_TOKENS and RETURN_OPTIONS usage and LangChain/IBM wrappers that expose truncate_input_tokens parameter. [1][2]
Sources
[1] LangChain WatsonxEmbeddings docs (example using EmbedTextParamsMetaNames.TRUNCATE_INPUT_TOKENS and RETURN_OPTIONS).
[2] LangChain / langchain-ibm / langchain-community reference pages (truncateInputTokens / truncate_input_tokens parameter in JS/Python wrappers).
🏁 Script executed:
# Let's check the test file to see how WatsonX embedding params are expected
cat src/backend/tests/unit/components/models_and_agents/test_embedding_model_component.py | head -100Repository: langflow-ai/langflow
Length of output: 4137
WatsonX-specific input parameters are not wired to the embeddings instance.
truncate_input_tokens and input_text are conditionally shown for IBM WatsonX, but they're never passed to WatsonxEmbeddings. The param_mapping for IBM WatsonX omits these parameters, so they're filtered out by the if param_value is not None and param_name in param_mapping check. Additionally, WatsonX requires special handling: these parameters must be passed via a params dict using IBM SDK meta-names (EmbedTextParamsMetaNames.TRUNCATE_INPUT_TOKENS and EmbedTextParamsMetaNames.RETURN_OPTIONS), not as direct kwargs like other providers. Update the param_mapping and add WatsonX-specific logic to construct the params dict correctly in both _build_kwargs and _build_kwargs_for_model.
🤖 Prompt for AI Agents
In `@src/backend/base/langflow/initial_setup/starter_projects/Nvidia` Remix.json
at line 1821, The WatsonX-specific inputs truncate_input_tokens and input_text
are not being forwarded to the Watsonx embeddings because they aren’t present in
param_mapping and must be passed inside a special params dict using IBM SDK
meta-names; update the IBM watsonx param_mapping to include a mapping for a
params/metadata key (e.g., "params") and then in both _build_kwargs and
_build_kwargs_for_model (and where provider in {"IBM WatsonX","IBM watsonx.ai"}
is checked) construct a params_dict that sets
EmbedTextParamsMetaNames.TRUNCATE_INPUT_TOKENS from self.truncate_input_tokens
and EmbedTextParamsMetaNames.RETURN_OPTIONS based on self.input_text, then
attach that params_dict to kwargs using the param_mapping entry (or fallback to
kwargs["params"]) so WatsonxEmbeddings receives the proper params payload.
Ensure you reference EmbedTextParamsMetaNames, truncate_input_tokens,
input_text, _build_kwargs, and _build_kwargs_for_model in the change.
| "title_case": false, | ||
| "type": "code", | ||
| "value": "from typing import Any\n\nfrom lfx.base.embeddings.model import LCEmbeddingsModel\nfrom lfx.base.models.unified_models import (\n get_api_key_for_provider,\n get_embedding_classes,\n get_embedding_model_options,\n update_model_options_in_build_config,\n)\nfrom lfx.base.models.watsonx_constants import IBM_WATSONX_URLS\nfrom lfx.field_typing import Embeddings\nfrom lfx.io import (\n BoolInput,\n DictInput,\n DropdownInput,\n FloatInput,\n IntInput,\n MessageTextInput,\n ModelInput,\n SecretStrInput,\n)\n\n\nclass EmbeddingModelComponent(LCEmbeddingsModel):\n display_name = \"Embedding Model\"\n description = \"Generate embeddings using a specified provider.\"\n documentation: str = \"https://docs.langflow.org/components-embedding-models\"\n icon = \"binary\"\n name = \"EmbeddingModel\"\n category = \"models\"\n\n def update_build_config(self, build_config: dict, field_value: str, field_name: str | None = None):\n \"\"\"Dynamically update build config with user-filtered model options.\"\"\"\n # Update model options\n build_config = update_model_options_in_build_config(\n component=self,\n build_config=build_config,\n cache_key_prefix=\"embedding_model_options\",\n get_options_func=get_embedding_model_options,\n field_name=field_name,\n field_value=field_value,\n )\n\n # Show/hide provider-specific fields based on selected model\n if field_name == \"model\" and isinstance(field_value, list) and len(field_value) > 0:\n selected_model = field_value[0]\n provider = selected_model.get(\"provider\", \"\")\n\n # Show/hide watsonx fields\n is_watsonx = provider == \"IBM WatsonX\"\n build_config[\"base_url_ibm_watsonx\"][\"show\"] = is_watsonx\n build_config[\"project_id\"][\"show\"] = is_watsonx\n build_config[\"truncate_input_tokens\"][\"show\"] = is_watsonx\n build_config[\"input_text\"][\"show\"] = is_watsonx\n if is_watsonx:\n build_config[\"base_url_ibm_watsonx\"][\"required\"] = True\n build_config[\"project_id\"][\"required\"] = True\n\n return build_config\n\n inputs = [\n ModelInput(\n name=\"model\",\n display_name=\"Embedding Model\",\n info=\"Select your model provider\",\n real_time_refresh=True,\n required=True,\n model_type=\"embedding\",\n input_types=[\"Embeddings\"], # Override default to accept Embeddings instead of LanguageModel\n ),\n SecretStrInput(\n name=\"api_key\",\n display_name=\"API Key\",\n info=\"Model Provider API key\",\n real_time_refresh=True,\n advanced=True,\n ),\n MessageTextInput(\n name=\"api_base\",\n display_name=\"API Base URL\",\n info=\"Base URL for the API. Leave empty for default.\",\n advanced=True,\n ),\n # Watson-specific inputs\n DropdownInput(\n name=\"base_url_ibm_watsonx\",\n display_name=\"watsonx API Endpoint\",\n info=\"The base URL of the API (IBM watsonx.ai only)\",\n options=IBM_WATSONX_URLS,\n value=IBM_WATSONX_URLS[0],\n show=False,\n real_time_refresh=True,\n ),\n MessageTextInput(\n name=\"project_id\",\n display_name=\"Project ID\",\n info=\"IBM watsonx.ai Project ID (required for IBM watsonx.ai)\",\n show=False,\n ),\n IntInput(\n name=\"dimensions\",\n display_name=\"Dimensions\",\n info=\"The number of dimensions the resulting output embeddings should have. \"\n \"Only supported by certain models.\",\n advanced=True,\n ),\n IntInput(\n name=\"chunk_size\",\n display_name=\"Chunk Size\",\n advanced=True,\n value=1000,\n ),\n FloatInput(\n name=\"request_timeout\",\n display_name=\"Request Timeout\",\n advanced=True,\n ),\n IntInput(\n name=\"max_retries\",\n display_name=\"Max Retries\",\n advanced=True,\n value=3,\n ),\n BoolInput(\n name=\"show_progress_bar\",\n display_name=\"Show Progress Bar\",\n advanced=True,\n ),\n DictInput(\n name=\"model_kwargs\",\n display_name=\"Model Kwargs\",\n advanced=True,\n info=\"Additional keyword arguments to pass to the model.\",\n ),\n IntInput(\n name=\"truncate_input_tokens\",\n display_name=\"Truncate Input Tokens\",\n advanced=True,\n value=200,\n show=False,\n ),\n BoolInput(\n name=\"input_text\",\n display_name=\"Include the original text in the output\",\n value=True,\n advanced=True,\n show=False,\n ),\n ]\n\n def build_embeddings(self) -> Embeddings:\n \"\"\"Build and return an embeddings instance based on the selected model.\"\"\"\n # If an Embeddings object is directly connected, return it\n try:\n from langchain_core.embeddings import Embeddings as BaseEmbeddings\n\n if isinstance(self.model, BaseEmbeddings):\n return self.model\n except ImportError:\n pass\n\n # Safely extract model configuration\n if not self.model or not isinstance(self.model, list):\n msg = \"Model must be a non-empty list\"\n raise ValueError(msg)\n\n model = self.model[0]\n model_name = model.get(\"name\")\n provider = model.get(\"provider\")\n metadata = model.get(\"metadata\", {})\n\n # Get API key from user input or global variables\n api_key = get_api_key_for_provider(self.user_id, provider, self.api_key)\n\n # Validate required fields (Ollama doesn't require API key)\n if not api_key and provider != \"Ollama\":\n msg = (\n f\"{provider} API key is required. \"\n f\"Please provide it in the component or configure it globally as \"\n f\"{provider.upper().replace(' ', '_')}_API_KEY.\"\n )\n raise ValueError(msg)\n\n if not model_name:\n msg = \"Model name is required\"\n raise ValueError(msg)\n\n # Get embedding class\n embedding_class_name = metadata.get(\"embedding_class\")\n if not embedding_class_name:\n msg = f\"No embedding class defined in metadata for {model_name}\"\n raise ValueError(msg)\n\n embedding_class = get_embedding_classes().get(embedding_class_name)\n if not embedding_class:\n msg = f\"Unknown embedding class: {embedding_class_name}\"\n raise ValueError(msg)\n\n # Build kwargs using parameter mapping\n kwargs = self._build_kwargs(model, metadata)\n\n return embedding_class(**kwargs)\n\n def _build_kwargs(self, model: dict[str, Any], metadata: dict[str, Any]) -> dict[str, Any]:\n \"\"\"Build kwargs dictionary using parameter mapping.\"\"\"\n param_mapping = metadata.get(\"param_mapping\", {})\n if not param_mapping:\n msg = \"Parameter mapping not found in metadata\"\n raise ValueError(msg)\n\n kwargs = {}\n\n # Required parameters - handle both \"model\" and \"model_id\" (for watsonx)\n if \"model\" in param_mapping:\n kwargs[param_mapping[\"model\"]] = model.get(\"name\")\n elif \"model_id\" in param_mapping:\n kwargs[param_mapping[\"model_id\"]] = model.get(\"name\")\n if \"api_key\" in param_mapping:\n kwargs[param_mapping[\"api_key\"]] = get_api_key_for_provider(\n self.user_id,\n model.get(\"provider\"),\n self.api_key,\n )\n\n # Optional parameters with their values\n provider = model.get(\"provider\")\n optional_params = {\n \"api_base\": self.api_base if self.api_base else None,\n \"dimensions\": int(self.dimensions) if self.dimensions else None,\n \"chunk_size\": int(self.chunk_size) if self.chunk_size else None,\n \"request_timeout\": float(self.request_timeout) if self.request_timeout else None,\n \"max_retries\": int(self.max_retries) if self.max_retries else None,\n \"show_progress_bar\": self.show_progress_bar if hasattr(self, \"show_progress_bar\") else None,\n \"model_kwargs\": self.model_kwargs if self.model_kwargs else None,\n }\n\n # Watson-specific parameters\n if provider in {\"IBM WatsonX\", \"IBM watsonx.ai\"}:\n # Map base_url_ibm_watsonx to \"url\" parameter for watsonx\n if \"url\" in param_mapping:\n url_value = (\n self.base_url_ibm_watsonx\n if hasattr(self, \"base_url_ibm_watsonx\") and self.base_url_ibm_watsonx\n else \"https://us-south.ml.cloud.ibm.com\"\n )\n kwargs[param_mapping[\"url\"]] = url_value\n # Map project_id for watsonx\n if hasattr(self, \"project_id\") and self.project_id and \"project_id\" in param_mapping:\n kwargs[param_mapping[\"project_id\"]] = self.project_id\n\n # Ollama-specific parameters\n if provider == \"Ollama\" and \"base_url\" in param_mapping:\n # Map api_base to \"base_url\" parameter for Ollama\n base_url_value = self.api_base if hasattr(self, \"api_base\") and self.api_base else \"http://localhost:11434\"\n kwargs[param_mapping[\"base_url\"]] = base_url_value\n\n # Add optional parameters if they have values and are mapped\n for param_name, param_value in optional_params.items():\n if param_value is not None and param_name in param_mapping:\n # Special handling for request_timeout with Google provider\n if param_name == \"request_timeout\":\n if provider == \"Google\" and isinstance(param_value, (int, float)):\n kwargs[param_mapping[param_name]] = {\"timeout\": param_value}\n else:\n kwargs[param_mapping[param_name]] = param_value\n else:\n kwargs[param_mapping[param_name]] = param_value\n\n return kwargs\n" | ||
| "value": "import logging\nfrom typing import Any\n\nfrom lfx.base.embeddings.embeddings_class import EmbeddingsWithModels\nfrom lfx.base.embeddings.model import LCEmbeddingsModel\nfrom lfx.base.models.unified_models import (\n get_api_key_for_provider,\n get_embedding_classes,\n get_embedding_model_options,\n get_unified_models_detailed,\n update_model_options_in_build_config,\n)\n\nlogger = logging.getLogger(__name__)\nfrom lfx.base.models.watsonx_constants import IBM_WATSONX_URLS\nfrom lfx.field_typing import Embeddings\nfrom lfx.io import (\n BoolInput,\n DictInput,\n DropdownInput,\n FloatInput,\n IntInput,\n MessageTextInput,\n ModelInput,\n SecretStrInput,\n)\n\n\nclass EmbeddingModelComponent(LCEmbeddingsModel):\n display_name = \"Embedding Model\"\n description = \"Generate embeddings using a specified provider.\"\n documentation: str = \"https://docs.langflow.org/components-embedding-models\"\n icon = \"binary\"\n name = \"EmbeddingModel\"\n category = \"models\"\n\n def update_build_config(self, build_config: dict, field_value: str, field_name: str | None = None):\n \"\"\"Dynamically update build config with user-filtered model options.\"\"\"\n # Update model options\n build_config = update_model_options_in_build_config(\n component=self,\n build_config=build_config,\n cache_key_prefix=\"embedding_model_options\",\n get_options_func=get_embedding_model_options,\n field_name=field_name,\n field_value=field_value,\n )\n\n # Show/hide provider-specific fields based on selected model\n if field_name == \"model\" and isinstance(field_value, list) and len(field_value) > 0:\n selected_model = field_value[0]\n provider = selected_model.get(\"provider\", \"\")\n\n # Show/hide watsonx fields\n is_watsonx = provider == \"IBM WatsonX\"\n build_config[\"base_url_ibm_watsonx\"][\"show\"] = is_watsonx\n build_config[\"project_id\"][\"show\"] = is_watsonx\n build_config[\"truncate_input_tokens\"][\"show\"] = is_watsonx\n build_config[\"input_text\"][\"show\"] = is_watsonx\n if is_watsonx:\n build_config[\"base_url_ibm_watsonx\"][\"required\"] = True\n build_config[\"project_id\"][\"required\"] = True\n\n return build_config\n\n inputs = [\n ModelInput(\n name=\"model\",\n display_name=\"Embedding Model\",\n info=\"Select your model provider\",\n real_time_refresh=True,\n required=True,\n model_type=\"embedding\",\n input_types=[\"Embeddings\"], # Override default to accept Embeddings instead of LanguageModel\n ),\n SecretStrInput(\n name=\"api_key\",\n display_name=\"API Key\",\n info=\"Model Provider API key\",\n real_time_refresh=True,\n advanced=True,\n ),\n MessageTextInput(\n name=\"api_base\",\n display_name=\"API Base URL\",\n info=\"Base URL for the API. Leave empty for default.\",\n advanced=True,\n ),\n # Watson-specific inputs\n DropdownInput(\n name=\"base_url_ibm_watsonx\",\n display_name=\"watsonx API Endpoint\",\n info=\"The base URL of the API (IBM watsonx.ai only)\",\n options=IBM_WATSONX_URLS,\n value=IBM_WATSONX_URLS[0],\n show=False,\n real_time_refresh=True,\n ),\n MessageTextInput(\n name=\"project_id\",\n display_name=\"Project ID\",\n info=\"IBM watsonx.ai Project ID (required for IBM watsonx.ai)\",\n show=False,\n ),\n IntInput(\n name=\"dimensions\",\n display_name=\"Dimensions\",\n info=\"The number of dimensions the resulting output embeddings should have. \"\n \"Only supported by certain models.\",\n advanced=True,\n ),\n IntInput(\n name=\"chunk_size\",\n display_name=\"Chunk Size\",\n advanced=True,\n value=1000,\n ),\n FloatInput(\n name=\"request_timeout\",\n display_name=\"Request Timeout\",\n advanced=True,\n ),\n IntInput(\n name=\"max_retries\",\n display_name=\"Max Retries\",\n advanced=True,\n value=3,\n ),\n BoolInput(\n name=\"show_progress_bar\",\n display_name=\"Show Progress Bar\",\n advanced=True,\n ),\n DictInput(\n name=\"model_kwargs\",\n display_name=\"Model Kwargs\",\n advanced=True,\n info=\"Additional keyword arguments to pass to the model.\",\n ),\n IntInput(\n name=\"truncate_input_tokens\",\n display_name=\"Truncate Input Tokens\",\n advanced=True,\n value=200,\n show=False,\n ),\n BoolInput(\n name=\"input_text\",\n display_name=\"Include the original text in the output\",\n value=True,\n advanced=True,\n show=False,\n ),\n ]\n\n def build_embeddings(self) -> Embeddings:\n \"\"\"Build and return an embeddings instance based on the selected model.\n\n Returns an EmbeddingsWithModels wrapper that contains:\n - The primary embedding instance (for the selected model)\n - available_models dict mapping all available model names to their instances\n \"\"\"\n # If an Embeddings object is directly connected, return it\n try:\n from langchain_core.embeddings import Embeddings as BaseEmbeddings\n\n if isinstance(self.model, BaseEmbeddings):\n return self.model\n except ImportError:\n pass\n\n # Safely extract model configuration\n if not self.model or not isinstance(self.model, list):\n msg = \"Model must be a non-empty list\"\n raise ValueError(msg)\n\n model = self.model[0]\n model_name = model.get(\"name\")\n provider = model.get(\"provider\")\n metadata = model.get(\"metadata\", {})\n\n # Get API key from user input or global variables\n api_key = get_api_key_for_provider(self.user_id, provider, self.api_key)\n\n # Validate required fields (Ollama doesn't require API key)\n if not api_key and provider != \"Ollama\":\n msg = (\n f\"{provider} API key is required. \"\n f\"Please provide it in the component or configure it globally as \"\n f\"{provider.upper().replace(' ', '_')}_API_KEY.\"\n )\n raise ValueError(msg)\n\n if not model_name:\n msg = \"Model name is required\"\n raise ValueError(msg)\n\n # Get embedding class\n embedding_class_name = metadata.get(\"embedding_class\")\n if not embedding_class_name:\n msg = f\"No embedding class defined in metadata for {model_name}\"\n raise ValueError(msg)\n\n embedding_class = get_embedding_classes().get(embedding_class_name)\n if not embedding_class:\n msg = f\"Unknown embedding class: {embedding_class_name}\"\n raise ValueError(msg)\n\n # Build kwargs using parameter mapping for primary instance\n kwargs = self._build_kwargs(model, metadata)\n primary_instance = embedding_class(**kwargs)\n\n # Get all available embedding models for this provider\n available_models_dict = self._build_available_models(\n provider=provider,\n embedding_class=embedding_class,\n metadata=metadata,\n api_key=api_key,\n )\n\n # Wrap with EmbeddingsWithModels to provide available_models metadata\n return EmbeddingsWithModels(\n embeddings=primary_instance,\n available_models=available_models_dict,\n )\n\n def _build_available_models(\n self,\n provider: str,\n embedding_class: type,\n metadata: dict[str, Any],\n api_key: str | None,\n ) -> dict[str, Embeddings]:\n \"\"\"Build a dictionary of all available embedding model instances for the provider.\n\n Args:\n provider: The provider name (e.g., \"OpenAI\", \"Ollama\")\n embedding_class: The embedding class to instantiate\n metadata: Metadata containing param_mapping\n api_key: The API key for the provider\n\n Returns:\n Dict mapping model names to their embedding instances\n \"\"\"\n available_models_dict: dict[str, Embeddings] = {}\n\n # Get all embedding models for this provider from unified models\n all_embedding_models = get_unified_models_detailed(\n providers=[provider],\n model_type=\"embeddings\",\n include_deprecated=False,\n include_unsupported=False,\n )\n\n if not all_embedding_models:\n return available_models_dict\n\n # Extract models from the provider data\n for provider_data in all_embedding_models:\n if provider_data.get(\"provider\") != provider:\n continue\n\n for model_data in provider_data.get(\"models\", []):\n model_name = model_data.get(\"model_name\")\n if not model_name:\n continue\n\n # Create a model dict compatible with _build_kwargs\n model_dict = {\n \"name\": model_name,\n \"provider\": provider,\n \"metadata\": metadata, # Reuse the same metadata/param_mapping\n }\n\n try:\n # Build kwargs for this model\n model_kwargs = self._build_kwargs_for_model(model_dict, metadata, api_key)\n # Create the embedding instance\n available_models_dict[model_name] = embedding_class(**model_kwargs)\n except Exception: # noqa: BLE001\n # Skip models that fail to instantiate\n # This handles cases where specific models have incompatible parameters\n logger.debug(\"Failed to instantiate embedding model %s: skipping\", model_name, exc_info=True)\n continue\n\n return available_models_dict\n\n def _build_kwargs_for_model(\n self,\n model: dict[str, Any],\n metadata: dict[str, Any],\n api_key: str | None,\n ) -> dict[str, Any]:\n \"\"\"Build kwargs dictionary for a specific model using parameter mapping.\n\n This is similar to _build_kwargs but uses the provided api_key directly\n instead of looking it up again.\n\n Args:\n model: Model dict with name and provider\n metadata: Metadata containing param_mapping\n api_key: The API key to use\n\n Returns:\n kwargs dict for embedding class instantiation\n \"\"\"\n param_mapping = metadata.get(\"param_mapping\", {})\n if not param_mapping:\n msg = \"Parameter mapping not found in metadata\"\n raise ValueError(msg)\n\n kwargs = {}\n provider = model.get(\"provider\")\n\n # Required parameters - handle both \"model\" and \"model_id\" (for watsonx)\n if \"model\" in param_mapping:\n kwargs[param_mapping[\"model\"]] = model.get(\"name\")\n elif \"model_id\" in param_mapping:\n kwargs[param_mapping[\"model_id\"]] = model.get(\"name\")\n\n # Add API key if mapped\n if \"api_key\" in param_mapping and api_key:\n kwargs[param_mapping[\"api_key\"]] = api_key\n\n # Optional parameters with their values\n optional_params = {\n \"api_base\": self.api_base if self.api_base else None,\n \"dimensions\": int(self.dimensions) if self.dimensions else None,\n \"chunk_size\": int(self.chunk_size) if self.chunk_size else None,\n \"request_timeout\": float(self.request_timeout) if self.request_timeout else None,\n \"max_retries\": int(self.max_retries) if self.max_retries else None,\n \"show_progress_bar\": self.show_progress_bar if hasattr(self, \"show_progress_bar\") else None,\n \"model_kwargs\": self.model_kwargs if self.model_kwargs else None,\n }\n\n # Watson-specific parameters\n if provider in {\"IBM WatsonX\", \"IBM watsonx.ai\"}:\n # Map base_url_ibm_watsonx to \"url\" parameter for watsonx\n if \"url\" in param_mapping:\n url_value = (\n self.base_url_ibm_watsonx\n if hasattr(self, \"base_url_ibm_watsonx\") and self.base_url_ibm_watsonx\n else \"https://us-south.ml.cloud.ibm.com\"\n )\n kwargs[param_mapping[\"url\"]] = url_value\n # Map project_id for watsonx\n if hasattr(self, \"project_id\") and self.project_id and \"project_id\" in param_mapping:\n kwargs[param_mapping[\"project_id\"]] = self.project_id\n\n # Ollama-specific parameters\n if provider == \"Ollama\" and \"base_url\" in param_mapping:\n # Map api_base to \"base_url\" parameter for Ollama\n base_url_value = self.api_base if hasattr(self, \"api_base\") and self.api_base else \"http://localhost:11434\"\n kwargs[param_mapping[\"base_url\"]] = base_url_value\n\n # Add optional parameters if they have values and are mapped\n for param_name, param_value in optional_params.items():\n if param_value is not None and param_name in param_mapping:\n # Special handling for request_timeout with Google provider\n if param_name == \"request_timeout\":\n if provider == \"Google Generative AI\" and isinstance(param_value, (int, float)):\n kwargs[param_mapping[param_name]] = {\"timeout\": param_value}\n else:\n kwargs[param_mapping[param_name]] = param_value\n else:\n kwargs[param_mapping[param_name]] = param_value\n\n return kwargs\n\n def _build_kwargs(self, model: dict[str, Any], metadata: dict[str, Any]) -> dict[str, Any]:\n \"\"\"Build kwargs dictionary using parameter mapping.\"\"\"\n param_mapping = metadata.get(\"param_mapping\", {})\n if not param_mapping:\n msg = \"Parameter mapping not found in metadata\"\n raise ValueError(msg)\n\n kwargs = {}\n\n # Required parameters - handle both \"model\" and \"model_id\" (for watsonx)\n if \"model\" in param_mapping:\n kwargs[param_mapping[\"model\"]] = model.get(\"name\")\n elif \"model_id\" in param_mapping:\n kwargs[param_mapping[\"model_id\"]] = model.get(\"name\")\n if \"api_key\" in param_mapping:\n kwargs[param_mapping[\"api_key\"]] = get_api_key_for_provider(\n self.user_id,\n model.get(\"provider\"),\n self.api_key,\n )\n\n # Optional parameters with their values\n provider = model.get(\"provider\")\n optional_params = {\n \"api_base\": self.api_base if self.api_base else None,\n \"dimensions\": int(self.dimensions) if self.dimensions else None,\n \"chunk_size\": int(self.chunk_size) if self.chunk_size else None,\n \"request_timeout\": float(self.request_timeout) if self.request_timeout else None,\n \"max_retries\": int(self.max_retries) if self.max_retries else None,\n \"show_progress_bar\": self.show_progress_bar if hasattr(self, \"show_progress_bar\") else None,\n \"model_kwargs\": self.model_kwargs if self.model_kwargs else None,\n }\n\n # Watson-specific parameters\n if provider in {\"IBM WatsonX\", \"IBM watsonx.ai\"}:\n # Map base_url_ibm_watsonx to \"url\" parameter for watsonx\n if \"url\" in param_mapping:\n url_value = (\n self.base_url_ibm_watsonx\n if hasattr(self, \"base_url_ibm_watsonx\") and self.base_url_ibm_watsonx\n else \"https://us-south.ml.cloud.ibm.com\"\n )\n kwargs[param_mapping[\"url\"]] = url_value\n # Map project_id for watsonx\n if hasattr(self, \"project_id\") and self.project_id and \"project_id\" in param_mapping:\n kwargs[param_mapping[\"project_id\"]] = self.project_id\n\n # Ollama-specific parameters\n if provider == \"Ollama\" and \"base_url\" in param_mapping:\n # Map api_base to \"base_url\" parameter for Ollama\n base_url_value = self.api_base if hasattr(self, \"api_base\") and self.api_base else \"http://localhost:11434\"\n kwargs[param_mapping[\"base_url\"]] = base_url_value\n\n # Add optional parameters if they have values and are mapped\n for param_name, param_value in optional_params.items():\n if param_value is not None and param_name in param_mapping:\n # Special handling for request_timeout with Google provider\n if param_name == \"request_timeout\":\n if provider == \"Google Generative AI\" and isinstance(param_value, (int, float)):\n kwargs[param_mapping[param_name]] = {\"timeout\": param_value}\n else:\n kwargs[param_mapping[param_name]] = param_value\n else:\n kwargs[param_mapping[param_name]] = param_value\n\n return kwargs\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Return type annotation mismatch.
The build_embeddings method signature declares -> Embeddings but actually returns EmbeddingsWithModels. This type mismatch can cause issues with static type checkers and mislead developers about the actual return type.
Suggested fix (within the embedded code)
- def build_embeddings(self) -> Embeddings:
+ def build_embeddings(self) -> EmbeddingsWithModels:🤖 Prompt for AI Agents
In `@src/lfx/src/lfx/_assets/component_index.json` at line 89139, The
build_embeddings method currently annotates its return as "-> Embeddings" but
returns an EmbeddingsWithModels instance; update the signature of
EmbeddingModelComponent.build_embeddings to return EmbeddingsWithModels (or a
union like Embeddings | EmbeddingsWithModels) to match the actual return value,
and adjust the docstring if needed; reference the method name build_embeddings
and the class EmbeddingsWithModels so you update the annotation where the method
is defined.
| # Google Generative AI Embedding Models | ||
| GOOGLE_GENERATIVE_AI_EMBEDDING_MODELS = [ | ||
| "models/text-embedding-004", | ||
| "models/embedding-001", | ||
| ] | ||
|
|
||
| # Embedding models as detailed metadata | ||
| GOOGLE_GENERATIVE_AI_EMBEDDING_MODELS_DETAILED = [ | ||
| create_model_metadata( | ||
| provider="Google Generative AI", | ||
| name=name, | ||
| icon="GoogleGenerativeAI", | ||
| model_type="embeddings", | ||
| ) | ||
| for name in GOOGLE_GENERATIVE_AI_EMBEDDING_MODELS | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🌐 Web query:
Google Generative AI embedding models text-embedding-004 embedding-001 current
💡 Result:
Short answer: Google’s current recommended text embedding model is gemini-embedding-001 (Gemini Embedding). Google is deprecating legacy embedding models (embedding-001 and text-embedding-004); Google recommends migrating to gemini-embedding-001. [1][2][3]
Key sources:
- Gemini Embeddings docs (model name, defaults, dims, usage examples). [1]
- Google Developers blog announcing Gemini Embedding GA and deprecation schedule (text-embedding-004 retirement Jan 14, 2026; embedding-001 retirement Aug 14, 2025; guidance to migrate to gemini-embedding-001). [2]
- Research / paper describing Gemini Embedding and its MTEB/MMTEB results. [3]
If you want, I can show example API calls (Python/REST) to migrate embeddings to gemini-embedding-001 and recommended output dimensionalities.
Update embedding models to use current Google Generative AI API.
The models embedding-001 and text-embedding-004 are deprecated and have been retired by Google:
embedding-001: Retired August 14, 2025text-embedding-004: Retiring January 14, 2026
Replace these with Google's current recommended model, gemini-embedding-001, which is the officially supported embedding model.
Update:
GOOGLE_GENERATIVE_AI_EMBEDDING_MODELS = [
"models/gemini-embedding-001",
]
🤖 Prompt for AI Agents
In `@src/lfx/src/lfx/base/models/google_generative_ai_constants.py` around lines
96 - 111, The embedding model list uses deprecated Google models; update
GOOGLE_GENERATIVE_AI_EMBEDDING_MODELS to contain the current supported model
"models/gemini-embedding-001" and ensure
GOOGLE_GENERATIVE_AI_EMBEDDING_MODELS_DETAILED (which uses
create_model_metadata) will automatically reflect that change by iterating over
the updated list; replace the existing two entries with a single
"models/gemini-embedding-001".
| import logging | ||
| from typing import Any | ||
|
|
||
| from lfx.base.embeddings.embeddings_class import EmbeddingsWithModels | ||
| from lfx.base.embeddings.model import LCEmbeddingsModel | ||
| from lfx.base.models.unified_models import ( | ||
| get_api_key_for_provider, | ||
| get_embedding_classes, | ||
| get_embedding_model_options, | ||
| get_unified_models_detailed, | ||
| update_model_options_in_build_config, | ||
| ) | ||
|
|
||
| logger = logging.getLogger(__name__) | ||
| from lfx.base.models.watsonx_constants import IBM_WATSONX_URLS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move logger initialization below imports to fix Ruff E402.
Ruff fails because a module-level statement appears before imports. Move logger = logging.getLogger(__name__) after the full import block to satisfy E402.
🧹 Proposed fix
-import logging
-from typing import Any
-
-from lfx.base.embeddings.embeddings_class import EmbeddingsWithModels
-from lfx.base.embeddings.model import LCEmbeddingsModel
-from lfx.base.models.unified_models import (
- get_api_key_for_provider,
- get_embedding_classes,
- get_embedding_model_options,
- get_unified_models_detailed,
- update_model_options_in_build_config,
-)
-
-logger = logging.getLogger(__name__)
-from lfx.base.models.watsonx_constants import IBM_WATSONX_URLS
-from lfx.field_typing import Embeddings
-from lfx.io import (
+import logging
+from typing import Any
+
+from lfx.base.embeddings.embeddings_class import EmbeddingsWithModels
+from lfx.base.embeddings.model import LCEmbeddingsModel
+from lfx.base.models.unified_models import (
+ get_api_key_for_provider,
+ get_embedding_classes,
+ get_embedding_model_options,
+ get_unified_models_detailed,
+ update_model_options_in_build_config,
+)
+from lfx.base.models.watsonx_constants import IBM_WATSONX_URLS
+from lfx.field_typing import Embeddings
+from lfx.io import (
BoolInput,
DictInput,
DropdownInput,
FloatInput,
IntInput,
MessageTextInput,
ModelInput,
SecretStrInput,
)
+
+logger = logging.getLogger(__name__)🧰 Tools
🪛 GitHub Actions: Ruff Style Check
[error] 15-15: Ruff check failed. E402: Module level import not at top of file. Move imports to the top of the file. Command: uv run --only-dev ruff check --output-format=github .
🪛 GitHub Check: Ruff Style Check (3.13)
[failure] 15-15: Ruff (E402)
src/lfx/src/lfx/components/models_and_agents/embedding_model.py:15:1: E402 Module level import not at top of file
🤖 Prompt for AI Agents
In `@src/lfx/src/lfx/components/models_and_agents/embedding_model.py` around lines
1 - 15, Move the module-level logger initialization so it appears after the full
import block to satisfy Ruff E402: relocate the line "logger =
logging.getLogger(__name__)" to below the last import (e.g., after the reference
to IBM_WATSONX_URLS) in embedding_model.py and ensure no other executable
statements intervene between imports and that logger assignment.
| for model_data in provider_data.get("models", []): | ||
| model_name = model_data.get("model_name") | ||
| if not model_name: | ||
| continue | ||
|
|
||
| # Create a model dict compatible with _build_kwargs | ||
| model_dict = { | ||
| "name": model_name, | ||
| "provider": provider, | ||
| "metadata": metadata, # Reuse the same metadata/param_mapping | ||
| } | ||
|
|
||
| try: | ||
| # Build kwargs for this model | ||
| model_kwargs = self._build_kwargs_for_model(model_dict, metadata, api_key) | ||
| # Create the embedding instance | ||
| available_models_dict[model_name] = embedding_class(**model_kwargs) | ||
| except Exception: # noqa: BLE001 | ||
| # Skip models that fail to instantiate | ||
| # This handles cases where specific models have incompatible parameters | ||
| logger.debug("Failed to instantiate embedding model %s: skipping", model_name, exc_info=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use per-model metadata (and embedding_class) when building available models.
Right now every model reuses the selected model’s metadata/embedding_class. If a provider has model-specific param_mapping or a different embedding_class, those models will either be misconfigured or silently skipped. Consider deriving metadata and class from each model_data.
🔧 Proposed fix
- for model_data in provider_data.get("models", []):
+ for model_data in provider_data.get("models", []):
model_name = model_data.get("model_name")
if not model_name:
continue
+
+ model_metadata = model_data.get("metadata") or {}
+ effective_metadata = model_metadata or metadata
+ embedding_class_name = (
+ effective_metadata.get("embedding_class") or metadata.get("embedding_class")
+ )
+ model_embedding_class = (
+ get_embedding_classes().get(embedding_class_name) or embedding_class
+ )
# Create a model dict compatible with _build_kwargs
model_dict = {
"name": model_name,
"provider": provider,
- "metadata": metadata, # Reuse the same metadata/param_mapping
+ "metadata": effective_metadata,
}
try:
# Build kwargs for this model
- model_kwargs = self._build_kwargs_for_model(model_dict, metadata, api_key)
+ model_kwargs = self._build_kwargs_for_model(model_dict, effective_metadata, api_key)
# Create the embedding instance
- available_models_dict[model_name] = embedding_class(**model_kwargs)
+ available_models_dict[model_name] = model_embedding_class(**model_kwargs)🤖 Prompt for AI Agents
In `@src/lfx/src/lfx/components/models_and_agents/embedding_model.py` around lines
263 - 283, The loop currently reuses the provider-level metadata and
embedding_class for every model; change it to extract per-model metadata and
embedding_class from model_data (falling back to provider-level values if
absent), then call _build_kwargs_for_model with that model-specific metadata and
api_key and instantiate using the model-specific embedding_class when populating
available_models_dict[model_name]; keep the try/except and logging but ensure
the correct per-model symbols (model_data, metadata_from_model,
embedding_class_from_model, _build_kwargs_for_model, available_models_dict) are
used so models with bespoke param_mapping or classes are configured and
instantiated correctly.
Adding Available models support for embeddings component such that selected providers all available models are available for the embeddings models component.
Summary by CodeRabbit
Release Notes
New Features
Tests
✏️ Tip: You can customize this high-level summary in your review settings.