Remove repeated concat of prompt and decode tokens in detokenization

As of the parallel sampling work, we are maintaining prompt and decode tokens separately. However, to preserve the output of detokenization, I had to concat them repeatedly: 
https://github.com/octoml/mlc-llm/blob/batch-serving/serve/mlc_serve/engine/engine_common.py#L97-L99

It seems very wasteful to do concat and detokenization of the entire tokens repeatedly while what we really need is only the new delta at the postfix. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove repeated concat of prompt and decode tokens in detokenization #139

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Remove repeated concat of prompt and decode tokens in detokenization #139

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions