Skip to content

Conversation

@swapniel99
Copy link

Related Issues

None

Proposed Changes:

Added a revision parameter to all Sentence Transformers embedder components to allow users to specify a specific model revision/version from the Hugging Face Hub. This parameter is passed through to the underlying Sentence Transformers backend.

How did you test it?

  • Added unit tests for all four embedder components to verify the revision parameter is correctly:
    • Initialized with default value (None)
    • Set when provided explicitly (e.g., "v1.0")
    • Serialized and deserialized in to_dict() and from_dict() methods
  • Updated existing backend initialization tests to include the revision parameter
  • All pre-commit hooks passed successfully

Notes for the reviewer

The revision parameter follows the same pattern as other optional parameters like trust_remote_code and local_files_only. It's passed directly to the Sentence Transformers model initialization, allowing users to pin to specific model versions for reproducibility.

Checklist

  • I have read the contributors guidelines and the code of conduct
  • I have updated the related issue with new insights and changes
  • I added unit tests and updated the docstrings
  • I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test: and added ! in case the PR includes breaking changes.
  • I documented my code
  • I ran pre-commit hooks and fixed any issue

@swapniel99 swapniel99 requested a review from a team as a code owner November 1, 2025 08:58
@swapniel99 swapniel99 requested review from sjrl and removed request for a team November 1, 2025 08:58
@vercel
Copy link

vercel bot commented Nov 1, 2025

@swapniel99 is attempting to deploy a commit to the deepset Team on Vercel.

A member of the Team first needs to authorize it.

@CLAassistant
Copy link

CLAassistant commented Nov 1, 2025

CLA assistant check
All committers have signed the CLA.

@github-actions github-actions bot added topic:tests type:documentation Improvements on the docs labels Nov 1, 2025
meta_fields_to_embed: Optional[list[str]] = None,
embedding_separator: str = "\n",
trust_remote_code: bool = False,
revision: Optional[str] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid this being a breaking change (since we are inserting a new positional argument in the middle of existing ones) put this as the last argument in the init

progress_bar: bool = True,
normalize_embeddings: bool = False,
trust_remote_code: bool = False,
revision: Optional[str] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here let's move this one to the bottom of the init since this init method consists of positional arguments

@sjrl sjrl self-assigned this Nov 3, 2025
@coveralls
Copy link
Collaborator

Pull Request Test Coverage Report for Build 18994388892

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 6 unchanged lines in 4 files lost coverage.
  • Overall coverage increased (+0.002%) to 92.246%

Files with Coverage Reduction New Missed Lines %
components/embedders/sentence_transformers_sparse_document_embedder.py 1 98.41%
components/embedders/sentence_transformers_sparse_text_embedder.py 1 98.08%
components/embedders/sentence_transformers_document_embedder.py 2 97.01%
components/embedders/sentence_transformers_text_embedder.py 2 96.49%
Totals Coverage Status
Change from base Build 18976263337: 0.002%
Covered Lines: 13502
Relevant Lines: 14637

💛 - Coveralls

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

topic:tests type:documentation Improvements on the docs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants