Support `List[str]` in `InferenceClient.feature_extraction` `text` parameter #2824

anakin87 · 2025-02-03T11:45:48Z

Hello and thanks for your work!

Context

I work on the Haystack LLM framework, which provides Embedders based on huggingface_hub.
The HuggingFaceAPITextEmbedder transforms a string into a vector. The HuggingFaceAPIDocumentEmbedder enriches a Document with its embedding.

These components allow using different HF APIs via huggingface_hub: Serverless Inference API, Inference Endpoints, and self-hosted Text Embeddings Inference container.

In the past, we used InferenceClient.post to query the API, mainly to support truncate and normalize (which were missing in the feature_extraction method before #2270).

Now that InferenceClient.post is changing and will be removed in 0.31.0, I'm migrating to InferenceClient.feature_extraction.

Issue

There's only one issue:

In HuggingFaceAPIDocumentEmbedder, we allow batching, which is explicitly supported by TEI (and Inference Endpoints).
Passing a list of inputs to the text parameter of feature_extraction seems to work correctly—I've tested it with Serverless Inference API, Inference Endpoints, and self-hosted TEI.
However, the method signature of feature_extraction officially supports only str, not List[str].

Questions

Given this, I'd like to ask:

Do you plan to support List[str] in feature_extraction?
Will this behavior continue to work in the future?
If not, what alternative approach would you recommend for batching embedding generation via InferenceClient?

The text was updated successfully, but these errors were encountered:

Wauplin · 2025-02-03T15:24:11Z

Hi @anakin87 , thanks for opening this issue and for describing your findings. I agree with you that supporting List[str] would be a great addition here. I've just opened huggingface/huggingface.js#1166 so that our official spec gets updated. Once merged, I'll make the change in huggingface_hub too.

For the record and as mentioned in the PR above, I have pushed back in the past for this feature request (see #1745 and #1746 (comment)) but I think it's time to revisit. I'll let you know once it's done but you can't already consider it safe and future-proof to use it like this.

EDIT: and glad to see the deprecation warning is been carefully listened too ❤️

Related to huggingface/huggingface_hub#2824. This PR makes it possible to send a `string[]` instead of `string` as `feature-extraction` inputs. This is already possible in practice in Inference API but not documented. In the past, I've pushed back on this change (see huggingface/huggingface_hub#1745 and huggingface/huggingface_hub#1746 (comment)) but I think it's fine to revisit it now. The main reason I mentioned was that `feature-extraction`'s server-side implementation was mostly a for-loop on the text input so acception a `string[]` would not really improve performances. That been said, there has been quite some improvements since then and especially the `text-embedding-inference` framework.

anakin87 mentioned this issue Feb 3, 2025

refactor: HF API Embedders - use InferenceClient.feature_extraction instead of InferenceClient.post deepset-ai/haystack#8794

Merged

Wauplin mentioned this issue Feb 3, 2025

Accept string[] as feature-extraction input huggingface/huggingface.js#1166

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support `List[str]` in `InferenceClient.feature_extraction` `text` parameter #2824

Support `List[str]` in `InferenceClient.feature_extraction` `text` parameter #2824

anakin87 commented Feb 3, 2025

Wauplin commented Feb 3, 2025 •

edited

Loading

Support List[str] in InferenceClient.feature_extraction text parameter #2824

Support List[str] in InferenceClient.feature_extraction text parameter #2824

Comments

anakin87 commented Feb 3, 2025

Context

Issue

Questions

Wauplin commented Feb 3, 2025 • edited Loading

Support `List[str]` in `InferenceClient.feature_extraction` `text` parameter #2824

Support `List[str]` in `InferenceClient.feature_extraction` `text` parameter #2824

Wauplin commented Feb 3, 2025 •

edited

Loading