Skip to content

Latest commit

 

History

History
197 lines (195 loc) · 10.6 KB

training_config.md

File metadata and controls

197 lines (195 loc) · 10.6 KB

Waterbirds

CRAYON-Attention Fine-tune the original model for 10 epochs with a batch size of 128, using the Adam optimizer with a learning rate of 5e-5 and a weight decay of 1e-4. We set the hyperparameters α and β as 1e7 and 2e5, respectively.
CRAYON-Pruning Prune 1,034 irrelevant neurons in the penultimate layer and trains the last fully connected layer for 10 epochs with a learning rate of 5e-5.
CRAYON-Attention+Pruning Fine-tune the original model for 10 epochs with a batch size of 128, using the Adam optimizer with a learning rate of 5e-5 and a weight decay of 1e-4. We set α to 1e7 and β to 2e5.
JtT Upweight the loss of the misclassified training data by 100 times. Train with Adam optimizer with a learning rate of 5e-5, a weight decay of 1e-4, and a batch size of 128 for 60 epochs.
MaskTune Train with Adam optimizer with a learning rate of 1e-4, a weight decay of 1e-4, and a batch size of 128 for 1 epoch.
LfF Train with SGD optimizer with a learning rate of 5e-3, a weight decay of 1e-4, and a batch size of 128 for 50 epochs. We set the GCE hyperparamter q to 0.7.
SoftCon Train an auxiliary BagNet18 model with Adam optimizer with a learning rate of 1e-3 for 20 epochs. Then, it refines the Original model using Adam optimizer with a learning rate of 5e-5 and a batch size of 32 for 10 epochs. We set the temperature for the contrastive learning loss to 0.1, cross-entropy loss weight to 1, and the clipping hyperparameter to 50.
FLAC Use the same BagNet18 auxiliary model as SoftCon. Then, we refine the Original model with Adam optimizer with a learning rate of 5e-5 for 20 epochs and set the FLAC loss weight to 1000.
LC We refine the Original model with SGD optimizer with a learning rate of 5e-3, a weight decay of 1e-3, and a batch size of 128 for 50 epochs. In parallel, it trains an auxiliary ResNet50 model with a SGD optimizer with a learning rate of 5e-4. The logit correction hyperparameter is set to 1, GCE hyperparameter $q$ to 0.8, and temperature to 0.1.
CnC Train the model with Adam optimizer with a learning rate of 1e-5, a weight decay of 1e-4, and a batch size of 128 for 10 epochs.
RRR Train the model with Adam optimizer with a learning rate of 5e-5, a weight decay of 1e-4, and a batch size of 128 for 10 epochs. We set the RRR loss weights to 200.
GradMask Train the model with Adam optimizer with a learning rate of 5e-5, a weight decay of 1e-4, and a batch size of 128 for 10 epochs.
ActDiff Train the model with Adam optimizer with a learning rate of 5e-5, a weight decay of 1e-4, and a batch size of 128 for 10 epochs. 0.1 is multiplied to the loss for the distance between masked and unmasked representations.
GradIA Train the model with Adam optimizer with a learning rate of 5e-5, a weight decay of 1e-4, and a batch size of 128 for 10 epochs.
Bounding Box Train the model with Adam optimizer with a learning rate of 5e-5, a weight decay of 1e-4, and a batch size of 128 for 10 epochs.

Biased CelebA

CRAYON-Attention Fine-tune the Original model for 10 epochs with a batch size of 64, using the Adam optimizer with a learning rate of 1e-5 and a weight decay of 1e-4. The hyperparameters α and β are set to 5e7 and 1e6, respectively.
CRAYON-Pruning We prune 1,871 irrelevant neurons and train the last layer for 50 epochs with a learning rate of 5e-6.
CRAYON-Attention+Pruning Fine-tune the Original model for 10 epochs with a batch size of 64, using the Adam optimizer with a learning rate of 1e-5 and a weight decay of 1e-4. The hyperparameters α and β are set to 5e7 and 1e6, respectively.
JtT Upweight the loss of the misclassified training data by 20 times. Trains the model with the Adam optimizer with a learning rate of 1e-5, a weight decay of 1e-1, and a batch size of 64 for 30 epochs.
MaskTune Train the Original model with Adam optimizer with a learning rate of 1e-9, a weight decay of 1e-4, and a batch size of 64 for 1 epoch.
LfF Train the Original model with SGD optimizer with a learning rate of 5e-2, a weight decay of 1e-4, and a batch size of 64 for 50 epochs. We set the GCE hyperparamter q to 0.7.
SoftCon Train an auxiliary BagNet18 model with Adam optimizer with a learning rate of 1e-3 for 20 epochs. Then, it refines the Original model using Adam optimizer with a learning rate of 5e-5 and a batch size of 32 for 10 epochs. We set the temperature for the contrastive learning loss to 0.1, cross-entropy loss weight α to 1, and the clipping hyperparameter γ to 50.
FLAC Use the same BagNet18 auxiliary model as SoftCon. We refine the Original model with Adam optimizer with a learning rate of 5e-5 for 5 epochs and set the FLAC loss weight of 1000.
LC Refine the Original model with SGD optimizer with a learning rate of 1e-3, a weight decay of 1e-3, and a batch size of 64 for 50 epochs. In parallel, it trains an auxiliary ResNet50 model with a SGD optimizer with a learning rate of 1e-4. The logit correction hyperparameter η is set to 1, GCE hyperparameter q to 0.8, and temperature to 0.1.
CnC Train the model with Adam optimizer with a learning rate of 1e-5, a weight decay of 1e-1, and a batch size of 64 for 5 epochs.
RRR Train the model with Adam optimizer with a learning rate of 5e-6, a weight decay of 1e-4, and a batch size of 64 for 5 epochs. We set the RRR loss weights to 25000.
GradMask Train the model with Adam optimizer with a learning rate of 5e-6, a weight decay of 1e-4, and a batch size of 64 for 10 epochs.
ActDiff Train the model with Adam optimizer with a learning rate of 5e-5, a weight decay of 1e-4, and a batch size of 64 for 10 epochs. 1e-5 is multiplied to the loss for the distance between masked and unmasked representations.
GradIA Train the model with Adam optimizer with a learning rate of 1e-3, a weight decay of 1e-4, and a batch size of 64 for 10 epochs.
Bounding Box Train the model with Adam optimizer with a learning rate of 5e-6, a weight decay of 1e-4, and a batch size of 64 for 10 epochs.

Backgrounds Challenge

CRAYON-Attention Fine-tune the classifier for 10 epochs with a batch size of 256 using the SGD optimizer with a learning rate of 5e-6 and a weight decay of 1e-1. The hyperparameters α and β set to 5000 and 500, respectively.
CRAYON-Pruning 407 irrelevant neurons are pruned, and the last layer is trained for 10 epochs with a learning rate of 1e-6.
CRAYON-Attention+Pruning For Backgrounds Challenge, we set α to 1000 and β to 50. We use the SGD optimizer with a learning rate of 5e-5 and weight decay to 1e-1.
JtT Upweight the loss of the misclassified training data by 5 times. Train the model with the Adam optimizer with a learning rate of 1e-5, a weight decay of 1e-1, and a batch size of 256 for 10 epochs.
MaskTune Train the Original model with Adam optimizer with a learning rate of 1e-7, a weight decay of 1e-5, and a batch size of 256 for 1 epoch.
LfF Train the Original model with SGD optimizer with a learning rate of 1e-4, a weight decay of 1e-1, and a batch size of 256 for 1 epochs. We set the GCE hyperparamter q to 0.7.
SoftCon Train an auxiliary BagNet18 model with Adam optimizer with a learning rate of 1e-3 for 20 epochs. Then, it refines the Original model using Adam optimizer with a learning rate of 5e-5 and a batch size of 128 for 10 epochs. We set the temperature for the contrastive learning loss to 0.07, cross-entropy loss weight α to 1e4, and the clipping hyperparameter γ to 0.
FLAC Use the same BagNet18 auxiliary model as SoftCon. We refine the Original model with Adam optimizer with a learning rate of 5e-6, a weight decay of 0.1, and a batch size of 128 for 5 epochs and set the FLAC loss weight of 100.
LC Refine the Original model with SGD optimizer with a learning rate of 1e-4, a weight decay of 1e-1, and a batch size of 256 for 10 epochs. In parallel, it trains an auxiliary ResNet50 model with a SGD optimizer with a learning rate of 1e-5. The logit correction hyperparameter η is set to 1, GCE hyperparameter q to 0.8, and temperature to 0.1.
CnC Train the model with Adam optimizer with a learning rate of 1e-5, a weight decay of 1e-1, and a batch size of 256 for 10 epochs.
RRR Train the model with Adam optimizer with a learning rate of 1e-5, a weight decay of 1e-4, and a batch size of 256 for 10 epochs. We set the RRR loss weights to 0.1.
GradMask Train the model with Adam optimizer with a learning rate of 1e-5, a weight decay of 1e-4, and a batch size of 256 for 10 epochs.
ActDiff Train the model with Adam optimizer with a learning rate of 1e-5, a weight decay of 1e-4, and a batch size of 256 for 10 epochs. 1e-2 is multiplied to the loss for the distance between masked and unmasked representations.
GradIA Train the model with Adam optimizer with a learning rate of 1e-6, a weight decay of 1e-4, and a batch size of 256 for 5 epochs.
Bounding Box Train the model with Adam optimizer with a learning rate of 1e-5, a weight decay of 1e-4, and a batch size of 256 for 10 epochs.