Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

haystack/components/generators/hugging_face_local.py "stop_words" not behaving as expected #8672

Open
sephcodes opened this issue Dec 24, 2024 · 3 comments
Assignees
Labels
P2 Medium priority, add to the next sprint if no P1 available

Comments

@sephcodes
Copy link

Describe the bug
according to the documentation: "param stop_words: If the model generates a stop word, the generation stops."
however, stop_words is just being used to remove whatever is defined as a stopword from the final output. It is not stopping generation when a stop word is encountered.
If possible, a way to stop generation using stop words would be useful

Error message
N/A

Expected behavior
Expecting the stop_words argument to stop generation of output when a value in stop_words list is encountered

Additional context
N/A

To Reproduce
generator = self.generator(model="HuggingFaceTB/SmolLM-1.7B-Instruct",
task="text-generation",
stop_words=self.stop_words,
generation_kwargs={
"max_new_tokens": 150,
# "do_sample": False,
"do_sample": True,
"temperature": 0.5,
# "top_p": 0.9,
# "eos_token_id": self.tokenizer.eos_token_id,
# "stopping_criteria": self.stopping_criteria
})
whatever is defined in self.stop_words will just be dropped from the output but not stop generation of text as expected

FAQ Check
N/A

System:
All

@julian-risch julian-risch added the P2 Medium priority, add to the next sprint if no P1 available label Jan 2, 2025
@vblagoje
Copy link
Member

Can you share the smallest reproducible example @sephcodes ? A notebook, python script, whatever suits you the most.

@sephcodes
Copy link
Author

sephcodes commented Jan 16, 2025

using the below class: (note that I added withdraw as a stop-word to show the effect of it being dropped rather than stopping on it like the documentation says it should

`

class GenerateAnswer():
def init(self, time, generator, tokenizer, stop_words=None):
self.time = time
self.generator = generator
self.tokenizer = tokenizer
self.stop_words = stop_words if stop_words is not None else ["withdraw", "system", "Human:", "Assistant:", "\n\n", "Question", "Answer"]
self.stop_ids = [self.tokenizer(stop_word, add_special_tokens=False).input_ids for stop_word in self.stop_words]

  def generate_answer(self, prompt_template, question, context):
      start_time = self.time.time()
      prompt = prompt_template.format(context=context, question=question)
      checkpoint_1 = self.time.time() - start_time
      print(f"Checkpoint 1: {checkpoint_1:.4f} seconds")

      start_time = self.time.time()
      generator = self.generator(model="HuggingFaceTB/SmolLM-1.7B-Instruct",
                                      task="text-generation",
                                      stop_words=self.stop_words,
                                      generation_kwargs={
                                          "max_new_tokens": 150,
                                          "do_sample": True,
                                          "temperature": 0.5,
                                      })
      checkpoint_2 = self.time.time() - start_time
      print(f"Checkpoint 2: {checkpoint_2:.4f} seconds")

      start_time = self.time.time()
      generator.warm_up()
      checkpoint_3 = self.time.time() - start_time
      print(f"Checkpoint 3: {checkpoint_3:.4f} seconds")

      start_time = self.time.time()
      response = generator.run(prompt)
      checkpoint_4 = self.time.time() - start_time
      print(f"Checkpoint 4: {checkpoint_4:.4f} seconds")

      replies = response['replies']

      generate_time = sum([checkpoint_1, checkpoint_2, checkpoint_3, checkpoint_4])
      print(f"Total answer generation time: {generate_time:.4f} seconds")
      print(replies)
      return replies

`

how it's called in my notebook:
`
from transformers import AutoTokenizer, AutoModel
from haystack.components.generators import HuggingFaceLocalGenerator
AutoTokenizer.from_pretrained("BAAI/bge-small-en")
generate_answer = GenerateAnswer(time, HuggingFaceLocalGenerator, tokenizer)

prompt_template = """

You are a helpful AI assistant. You are given context and a question.
You must answer the question briefly and concisely using the information given in the context only as reference.
Summarize the answer IN YOUR OWN WORDS and STOP IMMEDIATELY after answering the question.
If you do not know the answer say 'I do not know!'.

Context: {context}

Question: {question}

Answer:
"""

question = "How do I withdraw a case"

context = get_relevant_context.get_relevant_context(question, embeddings, embedder.embed_text, dataset)

output: 'Can I withdraw a case?\nYes, you can withdraw a case by visiting the Case Details page in the Submitted Cases table. "Withdraw" is one of the case actions available for submitted cases that are in the following statuses:\nIn Process (all case types)\nRFI Issued (Prevailing Wage cases)\nNOD Issued (Temporary Labor cases)\nAccepted - Pending Recruitment (Temporary Labor cases)\n'

context_text = " ".join(context)
answer = generate_answer.generate_answer(prompt_template, question, context_text)

output: ['You can a case by visiting the Case Details page.....

`

Notice the extra space where the word 'withdraw' was in the context I gave the generator. This indicates that the model dropped it as a stop word rather than stopping generation upon encountering it.

Lmk if any other info is needed

@vblagoje
Copy link
Member

vblagoje commented Jan 17, 2025

@sephcodes I don't notice it. Would you please make an isolated example in a notebook preferably that uses minimal code demonstrating the issue? If there is an issue - I'll gladly solve it. 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 Medium priority, add to the next sprint if no P1 available
Projects
None yet
Development

No branches or pull requests

3 participants