Open
Description
As of the parallel sampling work, we are maintaining prompt and decode tokens separately. However, to preserve the output of detokenization, I had to concat them repeatedly:
https://github.com/octoml/mlc-llm/blob/batch-serving/serve/mlc_serve/engine/engine_common.py#L97-L99
It seems very wasteful to do concat and detokenization of the entire tokens repeatedly while what we really need is only the new delta at the postfix.