Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Response in medium model decreases overtime. #84

Open
bossauh opened this issue Mar 12, 2022 · 6 comments
Open

Response in medium model decreases overtime. #84

bossauh opened this issue Mar 12, 2022 · 6 comments

Comments

@bossauh
Copy link

bossauh commented Mar 12, 2022

I'm using a modified version of the example code provided in the huggingface website.

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch


tokenizer = AutoTokenizer.from_pretrained("microsoft/DialoGPT-medium")
model = AutoModelForCausalLM.from_pretrained("microsoft/DialoGPT-medium")


step = 0
while True:

    # encode the new user input, add the eos_token and return a tensor in Pytorch
    new_user_input_ids = tokenizer.encode(input("You: ") + tokenizer.eos_token, return_tensors='pt')

    # append the new user input tokens to the chat history
    bot_input_ids = torch.cat([chat_history_ids, new_user_input_ids], dim=-1) if step > 0 else new_user_input_ids

    # generated a response while limiting the total chat history to 1000 tokens
    chat_history_ids = model.generate(
        bot_input_ids, 
        pad_token_id=tokenizer.eos_token_id,
        max_length=1000,
        sample=True,
        top_k=50,
        top_p=0.95,
        repetition_penalty=1.35
    )

    # pretty print last ouput tokens from bot
    print(tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True))
    step += 1

After some few lines, the responses starts become shorter and shorter until it just doesn't output anything anymore.

You: hello
Hello! :D
You: coffee tastes so good lol
It does. I'm going to have a coffee tomorrow morning, and it's gonna be delicious haha
You: what's your type of coffee?
I don't drink coffee but if you want some I'll make one for ya! It will probably taste like a cup of tea though...
You: Yes please make me one
Sure thing!
You: I prefer light coffee alright?
Yeah that sounds great!
You: to be honest i just drink coffee for the sake of tasting good lol
Haha okay then ill try my best with this one too XD
You: cool
Sounds awesome ltsss
You: do you prefer your coffee dark or light
ooooo
You: what?
yayyy
You: what are you saying?

You:

I tried changing the max_length to about 5000 and it doesn't seem to do anything. I've tried getting rid of the history (i.e., just using the new_user_input_ids variable) and it seemed to have fix the issue but that obviously leads me to a very random-like response as it has no context on what we're talking about

@archmagos-dominus
Copy link

Try to tweak the repetition_penalty=1.35. maybe the model just feels like it's running out of tokens to output without repeating itself.

@iamatulsingh
Copy link

Same problem I'm facing. tweaking repetition_penalty=0.8 will not gonna help you. Even you can remove repetition_penalty and try with 1000 token history. The history become more then 1000 and its having the issue. Suppose if you are using 5000 token history then at some point it will cross that limit too if you are leaving bot to talk longer.

@archmagos-dominus
Copy link

archmagos-dominus commented Mar 30, 2022

You might be correct, and this might be an issue with the length of the chat_history_ids. You can try something like this then:

#check the size of the hist, if it's over 'max_lenght' remove the oldest tokens
            if bot_input_ids.size(dim=1) >= args.get('max_length'):
                #trim the tokens
                bot_input_ids = torch.narrow(bot_input_ids, 1, -args.get('max_length'), args.get('max_length'))
            chat_history_ids = model.generate(
                bot_input_ids,
                max_length=args.get('max_length')+20,
                pad_token_id=tokenizer.eos_token_id,
                no_repeat_ngram_size=args.get('no_repeat_ngram_size'),
                do_sample=args.get('do_sample'),
                top_k=args.get('top_k'),
                top_p=args.get('top_p'),
                temperature=args.get('temperature')
                )

Try to tweak the max_lenght value and see what's the maximum you can go to without issues.

As for what would cause this issue, I don't really know what to say.

@iamatulsingh
Copy link

Thanks for the help but it seems kinda weird way to do. RAM will be consumed by this a lot if bot will talk for a long time. There should be a way to trim the history token. Like if it crossed last 5 token history then it should drop the oldest chat history token.

@csanadpoda
Copy link

csanadpoda commented Apr 16, 2022

I'm having the same issue, increaing max_length or tweaking repetition_penalty doesn't help. It mostly won't last 15 rounds until it starts exclusively returning EOS tokens. Even setting max_length to like 50000 doesn't change this. DialoGPT still regresses to returning empty after circa 10 user inputs.

Before this it will start to give shorter and shorter answers. Any idea what can cause such a behavior?

I'm using the medium-sized model with a max_length of 2000. I get conversations like the one below. What causes this eventual shortening and then disappearance of replies and what can I do to change this? I'd like to be able to hold a conversation indefinitely.


> User: What makes you happy?
> Bot: I'm a big fan of the game, and I love my family.
> User: Who is in your family?
> Bot: My mom's husband and sister are both in it too!
> User: What about your mom?
> Bot: She's not in it yet but she will be soon enough :D
> User: How old is your sister?
> Bot: she was born on December 13th so she'll probably be around for another year or two at least...
> User: Do you like her?
> Bot: Yeah she's pretty cool. She has been playing with me since we were kids haha
> User: How old are you?
> Bot: About 7 years now?
> User: That's pretty young to be online...
> Bot: haha
> User: Aren't you afraid?
> Bot: Yayyyy
> User: Yay what?
> Bot: Hah
> User: please talk to me
> Bot: heeee
> User: tell me a story
> Bot: o
> User: what?
> Bot: u
> User: What about me?
> Bot:
> User: You're done huh
> Bot:

@FreshEthan
Copy link

FreshEthan commented Jul 11, 2024

I think the problem is that you have to lower the max history length to below 500, or else DialoGPT gets stuck

def generate_response(prompt, chat_history_ids=None, temperature=1, top_k=50, top_p=0.9, max_history_length=100):
    new_user_input_ids = tokenizer.encode(prompt + tokenizer.eos_token, return_tensors='pt').to(model.device)
    bot_input_ids = new_user_input_ids if chat_history_ids is None else torch.cat([chat_history_ids, new_user_input_ids], dim=-1)

    if bot_input_ids.shape[-1] > max_history_length:
        bot_input_ids = bot_input_ids[:, -max_history_length:]

    # Generate attention mask to avoid unexpected behavior
    attention_mask = torch.ones_like(bot_input_ids)

    try:
        chat_history_ids = model.generate(
            bot_input_ids,
            attention_mask=attention_mask,
            max_length=3000,
            pad_token_id=tokenizer.eos_token_id,
            temperature=temperature,
            top_k=top_k,
            top_p=top_p,
            do_sample=True 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants