Encoder decoder

Hi 
Can we make the hyena model work like a full vanilla transformer? where we can pass encoder last hidden state as memory to the decoder. 
I was trying to build OCR with the hyena model so I tried prepending the image embeddings with text embeddings but it seems to not learn anything.

Thanks