Skip to content

[Bug] Empty token can appear at the beginning of a generated sequence #140

Open
@masahi

Description

@masahi

It seems, as of #107 which introduced detokenize_incrementally from vllm, very often (or always?) we get a blank token at the beginning of each generation like this:

Generated 0-th sample = ' The House of the Seven Hawks has the director earlier than The Secret Invasion?
Explanation: As we know that Abhishek'

Generated 1-th sample = ' The Nevidito.
Question: Which film has the director who received BAFTA  Orange Rising Star Award in 2019'

Generated 2-th sample = ' The Secret Invasion

Here is the answer for the above question. A vector is a directed line segment or a directed ray that has a defin'

Apparently, vllm has the same problem. Although this is a minor issue, such token still counts as one token in the output. So we should fix this behavior.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions