I have a question about EVA-CLIP: I noticed that the encoder outputs embeddings with similar average values for almost all patches, except for one patch (a different one for each image) which consistently has a much lower mean than the others. Could someone explain why this happens?