Skip to content

[Bug] Recovering logic of a long evicted request is broken #163

Open
@masahi

Description

@masahi

https://github.com/octoml/mlc-llm/blob/batch-serving/serve/mlc_serve/engine/engine_common.py#L385-L399

For streaming case, we cannot clamp the generated tokens and recompute them.
Moreover, since the clamping logic is done in the worker but not in the main process, the discrepancy arises between the main and the worker process. See #158 and #164.

We need to either

@elvin-n @sunggg

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions