The implementation code for cosine attention is incorrect in swing-v2.

To compute the L2 norm in cosine similarity, F.normalize should be used. However, the last dimension (`dim=-1`) is a padded dimension of size 1. Using cosine similarity on `dim=-1` is incorrect, as this leads to all values becoming 1 or -1. The correct approach is to use `dim=-2`, to ensure the code aligns with the paper. 

```python
    def forward(self, x, mask=None):
        """
        Args:
            x: input features with shape of (num_windows*B, N, C)
            mask: (0/-inf) mask with shape of (num_windows, Wh*Ww, Wh*Ww) or None
        """
        B_, N, C = x.shape
        qkv_bias = None
        if self.q_bias is not None:
            qkv_bias = torch.cat((self.q_bias, torch.zeros_like(self.v_bias, requires_grad=False), self.v_bias))
        qkv = F.linear(input=x, weight=self.qkv.weight, bias=qkv_bias)
        qkv = qkv.reshape(B_, N, 3, self.num_heads, -1).permute(2, 0, 3, 1, 4)    # shape (3, B_, heads, N, 1)
        q, k, v = qkv[0], qkv[1], qkv[2] 

        q_, k_ = F.normalize(q, dim=-1).detach(), F.normalize(k, dim=-1).detach()
        print('shape', q_.shape, k_.shape)                                            # print shape
        print('value', q_.numpy().reshape(-1), k_.numpy().reshape(-1))                # print normalized values (-1 or 1)
        
        # cosine attention
        # dim=-1 is incorrect, should be corrected to dim=-2
        attn = (F.normalize(q, dim=-1) @ F.normalize(k, dim=-1).transpose(-2, -1))

        logit_scale = torch.clamp(self.logit_scale, max=torch.log(torch.tensor(1. / 0.01))).exp()
        attn = attn * logit_scale
```

![Image](https://github.com/user-attachments/assets/61eb2086-4b70-42d4-8b82-c9f4d783640a)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The implementation code for cosine attention is incorrect in swing-v2. #373

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

The implementation code for cosine attention is incorrect in swing-v2. #373

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions