CRAYON-Attention | Fine-tune the original model for 10 epochs with a batch size of 128, using the Adam optimizer with a learning rate of 5e-5 and a weight decay of 1e-4. We set the hyperparameters α and β as 1e7 and 2e5, respectively. |
CRAYON-Pruning | Prune 1,034 irrelevant neurons in the penultimate layer and trains the last fully connected layer for 10 epochs with a learning rate of 5e-5. |
CRAYON-Attention+Pruning | Fine-tune the original model for 10 epochs with a batch size of 128, using the Adam optimizer with a learning rate of 5e-5 and a weight decay of 1e-4. We set α to 1e7 and β to 2e5. |
JtT | Upweight the loss of the misclassified training data by 100 times. Train with Adam optimizer with a learning rate of 5e-5, a weight decay of 1e-4, and a batch size of 128 for 60 epochs. |
MaskTune | Train with Adam optimizer with a learning rate of 1e-4, a weight decay of 1e-4, and a batch size of 128 for 1 epoch. |
LfF | Train with SGD optimizer with a learning rate of 5e-3, a weight decay of 1e-4, and a batch size of 128 for 50 epochs. We set the GCE hyperparamter q to 0.7. |
SoftCon | Train an auxiliary BagNet18 model with Adam optimizer with a learning rate of 1e-3 for 20 epochs. Then, it refines the Original model using Adam optimizer with a learning rate of 5e-5 and a batch size of 32 for 10 epochs. We set the temperature for the contrastive learning loss to 0.1, cross-entropy loss weight to 1, and the clipping hyperparameter to 50. |
FLAC | Use the same BagNet18 auxiliary model as SoftCon. Then, we refine the Original model with Adam optimizer with a learning rate of 5e-5 for 20 epochs and set the FLAC loss weight to 1000. |
LC | We refine the Original model with SGD optimizer with a learning rate of 5e-3, a weight decay of 1e-3, and a batch size of 128 for 50 epochs. In parallel, it trains an auxiliary ResNet50 model with a SGD optimizer with a learning rate of 5e-4. The logit correction hyperparameter is set to 1, GCE hyperparameter |
CnC | Train the model with Adam optimizer with a learning rate of 1e-5, a weight decay of 1e-4, and a batch size of 128 for 10 epochs. |
RRR | Train the model with Adam optimizer with a learning rate of 5e-5, a weight decay of 1e-4, and a batch size of 128 for 10 epochs. We set the RRR loss weights to 200. |
GradMask | Train the model with Adam optimizer with a learning rate of 5e-5, a weight decay of 1e-4, and a batch size of 128 for 10 epochs. |
ActDiff | Train the model with Adam optimizer with a learning rate of 5e-5, a weight decay of 1e-4, and a batch size of 128 for 10 epochs. 0.1 is multiplied to the loss for the distance between masked and unmasked representations. |
GradIA | Train the model with Adam optimizer with a learning rate of 5e-5, a weight decay of 1e-4, and a batch size of 128 for 10 epochs. |
Bounding Box | Train the model with Adam optimizer with a learning rate of 5e-5, a weight decay of 1e-4, and a batch size of 128 for 10 epochs. |
CRAYON-Attention | Fine-tune the Original model for 10 epochs with a batch size of 64, using the Adam optimizer with a learning rate of 1e-5 and a weight decay of 1e-4. The hyperparameters α and β are set to 5e7 and 1e6, respectively. |
CRAYON-Pruning | We prune 1,871 irrelevant neurons and train the last layer for 50 epochs with a learning rate of 5e-6. |
CRAYON-Attention+Pruning | Fine-tune the Original model for 10 epochs with a batch size of 64, using the Adam optimizer with a learning rate of 1e-5 and a weight decay of 1e-4. The hyperparameters α and β are set to 5e7 and 1e6, respectively. |
JtT | Upweight the loss of the misclassified training data by 20 times. Trains the model with the Adam optimizer with a learning rate of 1e-5, a weight decay of 1e-1, and a batch size of 64 for 30 epochs. |
MaskTune | Train the Original model with Adam optimizer with a learning rate of 1e-9, a weight decay of 1e-4, and a batch size of 64 for 1 epoch. |
LfF | Train the Original model with SGD optimizer with a learning rate of 5e-2, a weight decay of 1e-4, and a batch size of 64 for 50 epochs. We set the GCE hyperparamter q to 0.7. |
SoftCon | Train an auxiliary BagNet18 model with Adam optimizer with a learning rate of 1e-3 for 20 epochs. Then, it refines the Original model using Adam optimizer with a learning rate of 5e-5 and a batch size of 32 for 10 epochs. We set the temperature for the contrastive learning loss to 0.1, cross-entropy loss weight α to 1, and the clipping hyperparameter γ to 50. |
FLAC | Use the same BagNet18 auxiliary model as SoftCon. We refine the Original model with Adam optimizer with a learning rate of 5e-5 for 5 epochs and set the FLAC loss weight of 1000. |
LC | Refine the Original model with SGD optimizer with a learning rate of 1e-3, a weight decay of 1e-3, and a batch size of 64 for 50 epochs. In parallel, it trains an auxiliary ResNet50 model with a SGD optimizer with a learning rate of 1e-4. The logit correction hyperparameter η is set to 1, GCE hyperparameter q to 0.8, and temperature to 0.1. |
CnC | Train the model with Adam optimizer with a learning rate of 1e-5, a weight decay of 1e-1, and a batch size of 64 for 5 epochs. |
RRR | Train the model with Adam optimizer with a learning rate of 5e-6, a weight decay of 1e-4, and a batch size of 64 for 5 epochs. We set the RRR loss weights to 25000. |
GradMask | Train the model with Adam optimizer with a learning rate of 5e-6, a weight decay of 1e-4, and a batch size of 64 for 10 epochs. |
ActDiff | Train the model with Adam optimizer with a learning rate of 5e-5, a weight decay of 1e-4, and a batch size of 64 for 10 epochs. 1e-5 is multiplied to the loss for the distance between masked and unmasked representations. |
GradIA | Train the model with Adam optimizer with a learning rate of 1e-3, a weight decay of 1e-4, and a batch size of 64 for 10 epochs. |
Bounding Box | Train the model with Adam optimizer with a learning rate of 5e-6, a weight decay of 1e-4, and a batch size of 64 for 10 epochs. |
CRAYON-Attention | Fine-tune the classifier for 10 epochs with a batch size of 256 using the SGD optimizer with a learning rate of 5e-6 and a weight decay of 1e-1. The hyperparameters α and β set to 5000 and 500, respectively. |
CRAYON-Pruning | 407 irrelevant neurons are pruned, and the last layer is trained for 10 epochs with a learning rate of 1e-6. |
CRAYON-Attention+Pruning | For Backgrounds Challenge, we set α to 1000 and β to 50. We use the SGD optimizer with a learning rate of 5e-5 and weight decay to 1e-1. |
JtT | Upweight the loss of the misclassified training data by 5 times. Train the model with the Adam optimizer with a learning rate of 1e-5, a weight decay of 1e-1, and a batch size of 256 for 10 epochs. |
MaskTune | Train the Original model with Adam optimizer with a learning rate of 1e-7, a weight decay of 1e-5, and a batch size of 256 for 1 epoch. |
LfF | Train the Original model with SGD optimizer with a learning rate of 1e-4, a weight decay of 1e-1, and a batch size of 256 for 1 epochs. We set the GCE hyperparamter q to 0.7. |
SoftCon | Train an auxiliary BagNet18 model with Adam optimizer with a learning rate of 1e-3 for 20 epochs. Then, it refines the Original model using Adam optimizer with a learning rate of 5e-5 and a batch size of 128 for 10 epochs. We set the temperature for the contrastive learning loss to 0.07, cross-entropy loss weight α to 1e4, and the clipping hyperparameter γ to 0. |
FLAC | Use the same BagNet18 auxiliary model as SoftCon. We refine the Original model with Adam optimizer with a learning rate of 5e-6, a weight decay of 0.1, and a batch size of 128 for 5 epochs and set the FLAC loss weight of 100. |
LC | Refine the Original model with SGD optimizer with a learning rate of 1e-4, a weight decay of 1e-1, and a batch size of 256 for 10 epochs. In parallel, it trains an auxiliary ResNet50 model with a SGD optimizer with a learning rate of 1e-5. The logit correction hyperparameter η is set to 1, GCE hyperparameter q to 0.8, and temperature to 0.1. |
CnC | Train the model with Adam optimizer with a learning rate of 1e-5, a weight decay of 1e-1, and a batch size of 256 for 10 epochs. |
RRR | Train the model with Adam optimizer with a learning rate of 1e-5, a weight decay of 1e-4, and a batch size of 256 for 10 epochs. We set the RRR loss weights to 0.1. |
GradMask | Train the model with Adam optimizer with a learning rate of 1e-5, a weight decay of 1e-4, and a batch size of 256 for 10 epochs. |
ActDiff | Train the model with Adam optimizer with a learning rate of 1e-5, a weight decay of 1e-4, and a batch size of 256 for 10 epochs. 1e-2 is multiplied to the loss for the distance between masked and unmasked representations. |
GradIA | Train the model with Adam optimizer with a learning rate of 1e-6, a weight decay of 1e-4, and a batch size of 256 for 5 epochs. |
Bounding Box | Train the model with Adam optimizer with a learning rate of 1e-5, a weight decay of 1e-4, and a batch size of 256 for 10 epochs. |