-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Open
Labels
P3Low priority, leave it in the backlogLow priority, leave it in the backlogtype:featureNew feature or requestNew feature or request
Description
Describe the Feature
It would be great to add support for tool calling when running HuggingFaceAPIChatGenerator in streaming mode.
As shown here
haystack/haystack/components/generators/chat/hugging_face_api.py
Lines 411 to 412 in 2ccdba3
| text = choice.delta.content or "" | |
| generated_text += text |
we only process the generated text here and only store it as text content here
| message = ChatMessage.from_assistant(text=generated_text, meta=meta) |
whereas we should properly populate the tool_calls param of ChatMessage if a tool call is present.
The underlying HuggingFace streaming chunk dataclass does contain tool call information
@dataclass_with_extra
class ChatCompletionStreamOutputDelta(BaseInferenceType):
role: str
content: Optional[str] = None
tool_call_id: Optional[str] = None
tool_calls: Optional[List[ChatCompletionStreamOutputDeltaToolCall]] = NoneAdditional context
It looks like _run_streaming would need to be updated to process tool calling streaming chunks.
To Reproduce
from haystack.tools import Tool
from haystack.dataclasses import ChatMessage
from haystack.components.generators.chat.hugging_face_api import HuggingFaceAPIChatGenerator
from haystack.components.generators.utils import print_streaming_chunk
def get_weather(city: str) -> str:
"""Get weather information for a city."""
return f"The weather in {city} is Sunny and 22 C"
tool_parameters = {"type": "object", "properties": {"city": {"type": "string"}}, "required": ["city"]}
tool = Tool(
name="weather",
description="useful to determine the weather in a given location",
parameters=tool_parameters,
function=get_weather,
)
chat_messages = [ChatMessage.from_user("What's the weather like in Paris?")]
generator = HuggingFaceAPIChatGenerator(
api_type=HFGenerationAPIType.SERVERLESS_INFERENCE_API,
api_params={"model": "NousResearch/Hermes-3-Llama-3.1-8B"},
generation_kwargs={"temperature": 0.5},
streaming_callback=print_streaming_chunk,
)
results = generator.run(chat_messages, tools=[tool])Metadata
Metadata
Assignees
Labels
P3Low priority, leave it in the backlogLow priority, leave it in the backlogtype:featureNew feature or requestNew feature or request