Hi there, loved your work! I wanted to ask why on the diffusers training code for Sana-Sprint, the student attention is converted from linear to softmax (https://github1s.com/huggingface/diffusers/blob/main/examples/research_projects/sana/train_sana_sprint_diffusers.py#L1026-L1027)? Is it done also on the current Sana repo? Thanks