Open
Description
Hello. First of all, thank you for sharing code and experiment results.
Reading the code, I found that the model will use fast weights to infer. According to LookAhead, fast weights (before synchronization) may perform worse than slow weights. By chance of (1-1/k) probability (80% when k=5), we will use unsynchronized fast weights to validate/test. Therefore, it should be better if we manually synchronize before evaluation.
Metadata
Metadata
Assignees
Labels
No labels