super-convergence

Here are the Caffe files of our recent work: Smith, Leslie N. and Nicholay Topin "Super-Convergence: Very Fast Training of Residual Networks Using Large Learning Rates" arXiv preprint arXiv:1708.07120 (2017). Please read the paper for details. In addition, see the paper "Cyclical Learning Rates for Training Neural Networks" at https://arxiv.org/pdf/1506.01186.pdf for instructions on implementing cyclical learning rates in Caffe.

Note: if you have a better theoretical understanding of the cause for super-convergence than the ones described in the paper, please contact [email protected] about a collaboration on a follow up paper.

Instructions:

To simplify the replication of the figures in the paper, a shell script x.sh is included, which we used to replicate our experiments and create the figures in the paper. This execution script shows the changes to each file needed for each run. Below we spell out these changes.

From caffe home directory: ./build/tools/caffe train --solver=$SOLVER -gpu=all

As provided, this solver file trains the CLR network from Figure 1a. Changes must be made to reproduce other experiments, as listed below.

Fig. 1a:
	LR=0.35:
	$SOLVER should be the provided "solver.prototxt". 	
		net: ".../Resnet56Cifar.prototxt"
		test_iter: 200
		test_interval: 100
		display: 100
		lr_policy: "multistep"
		stepvalue: 50000
		stepvalue: 70000
		base_lr: 0.35
		gamma: 0.1
		max_iter: 80000
		weight_decay: 1e-4
		momentum: 0.9

	CLR=0.1-3.0:
	$SOLVER should be the provided "clrsolver.prototxt". 
		net: ".../Resnet56Cifar.prototxt"
		lr_policy: "triangular"
		base_lr: 0.1
		max_lr: 3.0
		stepsize: 5000
		max_iter: 10000
		weight_decay: 1e-4
		momentum: 0.9

Fig. 1b:
$SOLVER should be the provided "clrsolver.prototxt". 
	Stepsize=10k:
		net: ".../Resnet56Cifar.prototxt"
		lr_policy: "triangular"
		base_lr: 0.1
		max_lr: 3.0
		stepsize: 10000
		max_iter: 20000
	Stepsize=5k:
		net: ".../Resnet56Cifar.prototxt"
		lr_policy: "triangular"
		base_lr: 0.1
		max_lr: 3.0
		stepsize: 5000
		max_iter: 10000
	Stepsize=3k:
		net: ".../Resnet56Cifar.prototxt"
		lr_policy: "triangular"
		base_lr: 0.1
		max_lr: 3.0
		stepsize: 3000
		max_iter: 6000
	Stepsize=1k:
		net: ".../Resnet56Cifar.prototxt"
		lr_policy: "triangular"
		base_lr: 0.1
		max_lr: 3.0
		stepsize: 1000
		max_iter: 2000
		
Fig. 2a:
	Figured reproduced from Smith [2017] with permission.

Fig. 2b:
$SOLVER should be the provided "lrRangeSolver.prototxt". 
	Max Iter=5k:
		net: ".../Resnet56Cifar.prototxt"
		lr_policy: "triangular"
		base_lr: 0
		max_lr: 3.0
		stepsize: 5000
		max_iter: 5000
	Max Iter=20k:
		net: ".../Resnet56Cifar.prototxt"
		lr_policy: "triangular"
		base_lr: 0
		max_lr: 3.0
		stepsize: 20000
		max_iter: 20000
	Max Iter=100k:
		net: ".../Resnet56Cifar.prototxt"
		lr_policy: "triangular"
		base_lr: 0
		max_lr: 3.0
		stepsize: 100000
		max_iter: 100000

Fig. 3a:
	Figure reproduced from Goodfellow et al. [2014] with permission.
	
Fig. 3b:
	Figure reproduced from Goodfellow et al. [2014] with permission.

Fig. 4a:
$SOLVER should be the provided "lrRangeSolver.prototxt". 
	Single network:
		net: ".../Resnet56Cifar.prototxt"
		lr_policy: "triangular"
		base_lr: 0
		max_lr: 1.5
		stepsize: 20000
		max_iter: 20000

Fig. 4b:
	Resnet-20:
		net: ".../Resnet20Cifar.prototxt"
		lr_policy: "triangular"
		base_lr: 0
		max_lr: 1.5
		stepsize: 20000
		max_iter: 20000
	Resnet-110:
		net: ".../Resnet110Cifar.prototxt"
		lr_policy: "triangular"
		base_lr: 0
		max_lr: 1.5
		stepsize: 20000
		max_iter: 20000

Fig. 5a:
	Need to write out snapshots every iteration for 300 iterations
	Modify L2TermComputation3.py:
		netProto = 
		presnapshot = 
		postsnapshot = 


Fig. 5b:
	Need to write out snapshots every 10th iteration for 10000 iterations
	Modify L2TermComputation4.py:
		netProto = 
		presnapshot = 
		postsnapshot = 

Fig. 6a:
	Same solver settings as Fig. 1a. 
	Training LMDB (or other source) listed within architecture must be re-made with fewer samples.

Fig. 6b:
	$SOLVER should be the provided "solver.prototxt". 	
	Resnet-110 LR=0.35:
		net: ".../Resnet110Cifar.prototxt"
		lr_policy: "multistep"
		stepvalue: 50000
		stepvalue: 70000
		base_lr: 0.35
		gamma: 0.1
		max_iter: 80000
	$SOLVER should be the provided "clrsolver.prototxt". 
	Resnet-110 CLR=0.1-3 SS=10k:
		net: ".../Resnet110Cifar.prototxt"
		lr_policy: "triangular"
		base_lr: 0.1
		max_lr: 3
		stepsize: 10000
		max_iter: 20000
	$SOLVER should be the provided "solver.prototxt". 	
	Resnet-20 LR=0.35:
		net: ".../Resnet20Cifar.prototxt"
		lr_policy: "multistep"
		stepvalue: 50000
		stepvalue: 70000
		base_lr: 0.35
		gamma: 0.1
		max_iter: 80000
	$SOLVER should be the provided "clrsolver.prototxt". 
	Resnet-20 CLR=0.1-3 SS=10k:
		net: ".../Resnet20Cifar.prototxt"
		lr_policy: "triangular"
		base_lr: 0.1
		max_lr: 3
		stepsize: 10000
		max_iter: 20000

Fig. 7a:
dataset used by network must be changed to CIFAR-100
$SOLVER should be the provided "lrRangeSolver.prototxt". 
	Single network:
		net: ".../Resnet56Cifar.prototxt"
		lr_policy: "triangular"
		base_lr: 0
		max_lr: 3.0
		stepsize: 20000
		max_iter: 20000

Fig. 7b:
dataset used by network must be changed to CIFAR-100
	LR=0.35:
	$SOLVER should be the provided "solver.prototxt". 
		net: ".../Resnet56Cifar.prototxt" 
		lr_policy: "multistep"
		stepvalue: 50000
		stepvalue: 70000
		base_lr: 0.35
		gamma: 0.1
		max_iter: 80000
	CLR=0.1-3 SS=5k:
	$SOLVER should be the provided "clrsolver.prototxt". 
		net: ".../Resnet56Cifar.prototxt"
		lr_policy: "triangular"
		base_lr: 0.1
		max_lr: 3.0
		stepsize: 5000
		max_iter: 10000
	
Fig. 8a:
$SOLVER should be the provided "solver.prototxt". 
	LR=0.35:
	All use solver settings from LR=0.35 in Fig. 1a, but with solver type changed.
		type: "Nesterov"
		type: "AdaDelta"
		type: "AdaGrad"  and remove momentum
		type: "Adam"     and base_lr:  0.0035

Fig. 8b:
	LR=0.35:
	$SOLVER should be the provided "solver.prototxt". 
	Same solver settings as Fig. 9a, but with:
		type: "Nesterov"
	CLR=0.1-3 SS=5k:
	$SOLVER should be the provided "clrsolver.prototxt". 
	Same solver settings as Fig. 1a, but with:
		type: "Nesterov"

Fig. 9a:
	Same solver settings as CLR=0.1-3.0 in Fig. 1a, but with batchSize changed within architecture.
	
Fig. 9b:
	Same solver settings as CLR=0.1-3.0 in Fig. 1a, but with dropout ratio changed within architecture.

Fig. 10a:
	Same solver settings as CLR=0.1-3.0 in Fig. 1a, but with momentum changed.
	
Fig. 10b:
	Same solver settings as CLR=0.1-3.0 in Fig. 1a, but with weight_decay changed.
	
Fig. 11a:
$SOLVER should be the provided "clrsolver.prototxt". 
	Single network:
		net: ".../bottleneckResnet56.prototxt"
		lr_policy: "triangular"
		base_lr: 0
		max_lr: 0.61
		stepsize: 50000
		max_iter: 50000

Fig. 11b:
$SOLVER should be the provided "clrsolver.prototxt". 
	Single network:
		net: ".../ResNeXt56.prototxt"
		lr_policy: "triangular"
		base_lr: 0
		max_lr: 0.7
		stepsize: 50000
		max_iter: 50000

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

super-convergence

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
architectures		architectures
L2TermComputation3.py		L2TermComputation3.py
L2TermComputation4.py		L2TermComputation4.py
README.md		README.md
clrsolver.prototxt		clrsolver.prototxt
lrRangeSolver.prototxt		lrRangeSolver.prototxt
solver.prototxt		solver.prototxt
train.sh		train.sh
x.sh		x.sh

deepdad/super-convergence

Folders and files

Latest commit

History

Repository files navigation

super-convergence

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages