I used 4 tesla v100(32G) gpus, batch_size=14(16 is OOM) to reproduce Dog and kept the same config with your paper, but the acc is only 90.5, a big difference from the paper 92.3.

Hi Oliver, just wondering if you can share your pretrained model with me? Thanks in advance!

Thanks for your reply, i use the pretrained model "VIT_B16" downloaded from your link. By the way, i removed the "part_select" and "part_layer"(like pure vit), the performance is similar with TransFG which i reproduced 90.5.

`
class Encoder(nn.Module):
def init(self, config):
super(Encoder, self).init()
self.layer = nn.ModuleList()
for _ in range(config.transformer["num_layers"] - 1):
layer = Block(config)
self.layer.append(copy.deepcopy(layer))
# self.part_select = Part_Attention()
# self.part_layer = Block(config)
self.part_norm = LayerNorm(config.hidden_size, eps=1e-6)

def forward(self, hidden_states):
    # attn_weights = []
    for layer in self.layer:
        hidden_states, _ = layer(hidden_states)
        # attn_weights.append(weights)            
    # part_num, part_inx = self.part_select(attn_weights)
    # part_inx = part_inx + 1
    # parts = []
    # B, num = part_inx.shape
    # for i in range(B):
    #     parts.append(hidden_states[i, part_inx[i,:]])
    # parts = torch.stack(parts).squeeze(1)
    # concat = torch.cat((hidden_states[:,0].unsqueeze(1), parts), dim=1)
    # part_states, part_weights = self.part_layer(concat)
    # part_encoded = self.part_norm(part_states)  
    part_encoded = self.part_norm(hidden_states) 

    return part_encoded

Uh oh!

About Stanford dogs accuracy #24

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions