M2 model is applied on single image deraining model based on transformer

Hello! Your Monarch Mixer is extremely awesome work!  I'd like to ask if it's possible for me to apply your M2 model to my Transformer-based single-image deraining model. Can I use the M2 model in the attention layer or the MLP layer of my Transformer, or even apply it to both?