Table of Contents generated with DocToc
Use train
to run training (deepclrun
in v5.8.3 and below).
- Syntax is based on that specified in Ciresan et al's Multi-column Deep Neural Networks for Image Classification, section 3, first paragraph:
- network is defined by a string like:
100C5-MP2-100C5-MP2-100C4-MP2-300N-100N-6N
100c5
means: a convolutional layer, with 100 filters, each 5x5- adding
z
to a convolutional layer makes it zero-padded, eg8c5z
is: a convolutional layer, with 8 filters, each 5x5, zero-padded mp2
means a max-pooling layer, over non-overlapping regions of 2x2300n
means a fully connected layer with 300 hidden unitsrelu
means a relu layertanh
means a tanh layer
- network is defined by a string like:
- Thus, you can do, for example:
./train netdef=8c5z-relu-mp2-16c5z-relu-mp3-150n-tanh-10n learningrate=0.002 dataset=mnist
... in order to learn mnist, using the same neural net architecture as used in the convnetjs mnist demo
- Similarly, you can learn NORB, using approximately the architecture specified in lecun-04, by doing:
./train netdef=8c5-relu-mp4-24c6-relu-mp3-80c6-relu-5n learningrate=0.0001 dataset=norb
- Or, you can train NORB using the very deep, broad architecture specified by Ciresan et al in Flexible, High Performance Convolutional Neural Networks for Image Classification:
./train netdef=MP3-300C6-RELU-MP2-500C4-RELU-MP4-500N-TANH-5N learningrate=0.0001 dataset=norb
- eg
-32c5
is a convolutional layer with 32 filters of 5x5 -32c5z
is a convolutional layer with zero-padding, of 32 filters of 5x5
- eg
-150n
is a fully connected layer, with 150 neurons.
- Eg
-mp3
will add a max-pooling layer, over 3x3 non-overlapping regions. The number is the size of the regions, and can be modified
- Simply add
-drop
into the netdef string- this will use a dropout ratio of 0.5
- Simply add any of the following into the netdef string:
-tanh
-sigmoid
-relu
-elu
RP24
means a random patch layer, which will cut a 24x24 patch from a random position in each incoming image, and send that to its output- during testing, the patch will be cut from the centre of each image
RT2
means a random translations layer, which will translate the image randomly during training, up to 2 pixels, in either direction, along both axes- Can specify any non-negative integer, less than the image size
- During testing, no translation is done
- You can train several neural networks at the same time, and predict using the average output across all of them using the
multinet
option - Simply add eg
multinet=3
in the commandline, to train across 3 nets in parallel, or put a number of your choice
- simply prefix a layer with eg
3*
to repeat it.3*
will repeat the layer 3 times, and similar for other numbers, eg:
./train netdef=6*(32c5z-relu)-500n-361n learningrate=0.0001 dataset=kgsgoall
... will create 6 convolutional layers of 32 5x5 filters each.
- you can also use parentheses
(...)
to repeat multiple layers, eg:
./train netdef=3*(32c5z-relu-mp2)-150n-10n
... will be expanded to:
./train netdef=32c5z-relu-mp2-32c5z-relu-mp2-32c5z-relu-mp2-150n-10n
- Simply pass in the filename of the data file with the images in
- Filetype will be detected automatically
- See Loaders for information on available loaders
- By default, weights will be written to
weights.dat
, after each epoch- You can add option
writeweightsinterval=5
to write weights every 5 minutes, even if the epoch hasnt finished yet. Just replace5
with the number of minutes between each write
- You can add option
- If you specify option
loadweights=1
, the weights will be loadeded at the start - You can change the weights filepath with option eg
weightsfile=somefilename.dat
- If you specify option
loadweights=1
, thenetdef
will be compared to that used to generate the current weights file: if it is different, then DeepCL will ask you if you're sure you want to continue, to avoid corrupting the weights file - Epoch number, batch number, batch loss, and batch numcorrect will all be loaded from where they left off, from the weights file, so you can freely stop and start training, without losing the training
- be sure to use the
writeweightsinterval=5
option if you are going to stop/start often, with long epochs, to avoid losing hours/days of training!
- be sure to use the
Option | Description |
---|---|
gpuindex=1 | choose which gpu device to use. Default -1 means first gpu, or else cpu. Otherwise, gpu index from 0 |
dataset=norb | sets datadir, trainfile and validatefile according to one of several dataset profiles. Current choices: mnist, norb, cifar10, kgsgo, kgsgoall |
datadir=../data/mnist | path to data files |
trainfile=train-dat.mat | name of training data file, the one with the images in. Note that the labels file will be determined automatically, based on the data filename and type, eg in this case train-cat.mat |
validationfile=validate-dat.mat | name of the validation data file, the one with the images in. Note that the labels file will be determined automatically, based on the data filename and type, eg in this case validate-cat.mat |
numtrain=1000 | only uses the first 1000 training samples |
numtest=1000 | only uses the first 1000 testing samples |
netdef=100c5-10n | provide the network definition, as documented in Commandline usage above |
weightsinitializer=uniform | choose weight initializer. valid choices: original, uniform (default: original) |
initialweights=10 | set size of initial weights, sampled uniformally from range +/- initialweights divided by fanin. used by uniform initializer (default: 1.0) |
trainer=sgd | choose trainer. valid choices are sgd, anneal, nesterov, adagrad, or rmsprop. (default: sgd) |
learningrate=0.0001 | specify learning rate. works with any trainer, except adadelta |
momentum=0.1 | specify momentum (default: 0). works with sgd and nesterov trainers |
rho=0.9 | rho decay, from equation 1 of adadelta paper, http://arxiv.org/pdf/1212.5701v1.pdf (default: 0.9) |
weightdecay=0.001 | weight decay, 0 means no decay, 1 means complete decay (default:0). works with sgd trainer |
anneal=0.95 | anneal learning. 1 means no annealing. 0 means learningrate is 0 (default:1). works with anneal trainer |
numepochs=20 | train for this many epochs |
batchsize=128 | size of each mini-batch. Too big, and the learning rate will need to be reduced. Too small, and performance will decrease. 128 might be a reasonable compromise |
normalization=maxmin | can choose maxmin or stddev. Default is stddev |
normalizationnumstds=2 | how many standard deviations from mean should be +1/-1? Default is 2 |
normalizationexamples=50000 | how many examples to read, to determine normalization values |
multinet=3 | train 3 networks at the same time, and predict using average output from all 3, can put any integer greater than 1 |
loadondemand=1 | Load the file in chunks, as learning proceeds, to reduce memory requirements. Default 0 |
filebatchsize=50 | When loadondemand=1, load this many batches at a time. Numbers larger than 1 increase efficiency of disk reads, speeding up learning, but use up more memory |
weightsfile=weights.dat | file to store weights in, after each epoch. If blank, then weights not stored |
writeweightsinterval=5 | write the weights to file every 5 minutes of training, even if epoch hasnt finished yet. Default is 0, ie only write weights after each epoch |
loadweights=1 | load weights at start, from weightsfile. Current training config, ie netdef and trainingfile, should match that used to create the weightsfile. Note that epoch number will continue from file, so make sure to increase numepochs sufficiently |
Use predict to run prediction (
deepclexec` in v5.8.3 and below)