Why Distributed Sampler is ensuring data in a batch are of the same type? https://github.com/Alpha-VLLM/Lumina-mGPT/blob/104abe453ec1acca5863698629c4db2111b0b3fc/xllmx/data/sampler.py#L64