I got an `AssertionError: Mask is silently ignored due to the use of a custom kernel` when training GPT-2 with `examples/pretrain_gpt.sh`. This line leads to the assertion error: https://github.com/bigscience-workshop/Megatron-DeepSpeed/blob/8387ae17c4704f6579f88a84500b535d19d7fbbf/megatron/model/fused_softmax.py#L191 Is this assertion necessary? And is it even correct?