-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Parameters And Constants
Creates a scalar, vector, matrix, or tensor of learnable parameters.
ParameterTensor {shape,
init='uniform'/*|gaussian|...*/, initOutputRank=1, initValueScale=1.0, randomSeed=-1,
initValue=0.0, initFromFilePath='',
learningRateMultiplier=1.0}
-
shape
: shape (dimensions) of parameter as an array. E.g.(13:42)
to create a matrix with 13 rows and 42 columns. For some operations, dimensions given as 0 are automatically inferred (see here) -
init
(default 'uniform'): specifies random initialization, e.g.init='heNormal'
(see here) -
initOutputRank
(default 1): specifies number of leading fan-out axes. If negative, -number of fan-out trailing axes (see here) -
initValueScale
(default 1): additional scaling factor applied to random initialization values -
randomSeed
(default -1): if positive, use this random seed for random initialization. If negative, use a counter that gets increased for eachParameterTensor{}
-
initValue
: specifies initialization with a constant value, e.g.initValue=0
-
initFromFilePath
: specifies initialization by loading initial values from a file. E.g.initFromFilePath="my_init_vals.txt"
-
learningRateMultiplier
: system learning rate will be scaled by this (0 to disable learning) (see here)
A tensor of learnable parameters.
This factory function creates a scalar, vector, matrix or tensor of learnable parameters, that is,
a tensor that is recognized by the "train"
action as containing parameters
that shall be updated during training.
The values will be initialized, depending on which optional parameter is given, to
- random numbers, if
init
is given; - a constant if
initValue
is given; or - a tensor read from an external input file if
initFromFilePath
is given. The default isinit="uniform"
.
To create a scalar, vector, matrix, or tensor with rank>2, pass the following as the shape
parameter:
-
(1)
for a scalar; -
(M)
for a column vector withM
elements; -
(1:N)
for a row vector withN
elements. Row vectors are one-row matrices; -
(M:N)
for a matrix withN
rows andI
columns; and -
(I:J:K...)
for a tensor of arbitrary rank>2 (note: the maximum allowed rank is 12).
When a ParameterTensor
is used for weights as an immediate input of specific operations, it is allowed to specify
some dimensions as 0. For example, the matrix product ParameterTensor{42:0} * x)
will automatically infer the second dimension to be equal to the dimension of x
.
This is extremely handy for inputs of layers, as it frees the user's BrainScript code from the burden of passing around the input dimensions. Further, in some situations it is very difficult to know the precise input dimensions of a layer, for example for the first fully connected layer on top of a pyramid of convolution/pooling combinations without padding, where each convolution and pooling operation may drop rows or columns of boundary pixels, and strides scale the dimensions.
This feature is what allows CNTK's predefined layers to be specified by their output dimension only
(e.g. DenseLayer{1024}
).
Random initialization is selected by the init
parameter,
which chooses between uniform and normal distribution,
where the range/standard deviation is computed as a function of fan-in and fan-out:
value of init
|
distribution | range/standard deviation |
---|---|---|
'heNormal' | normal | sqrt (2 / fanIn) |
'heUniform' | uniform | sqrt (6 / fanIn) |
'glorotNormal' | normal | sqrt (2 / (fanIn+fanOut)) |
'glorotUniform' | uniform | sqrt (6 / (fanIn+fanOut)) |
'xavier' | uniform | sqrt (3 / fanIn) |
'uniform' | uniform | 1/20 |
'gaussian' | normal | sqrt (0.04 / fanIn) |
'zero' | n/a | 0 |
(Where zero
is a sometimes convenient alternative to specifying initValue=0
.)
Random initialization assumes that the parameters are part of some form of matrix-product like operation which has a well-defined fan-in and fan-out, which are used in determining the scaling of the random values per above table. By default, the first axis is considered fan-out, and the remaining axis/axes are fan-in, matching semantics of the regular matrix product.
The optional parameter initOutputRank
can be used to specify the number of leading axes that
should be considered fan-out.
For example, for a matrix product in CNTK's extended tensor interpretation that maps a [K]
-dimensional vector x
to a [I x J]
-dimensional rank-2 object can be written as Times (W, x, outputRank=2)
,
where W
has the shape [I x J x K]
.
Here, initOutputRank=2
specifies that in scaling the random initialization values,
the fan-out is I*J
and the fan-in K
.
Negative values for initOutputRank
indicate that the fan-out axes are trailing axes. For example,
the filter kernel of the ConvolutionalLayer{}
and the underlying Convolution()
operation
for a typical image-processing setup
has a shape [W x H x C x K]
, where K
is the fan-out, while the fan-in is W*H*C
.
This is specified by initOutputRank=-1
.
The initial values can be read from a text file. To do this, pass a pathname for the optional
parameter initFromFilePath
.
The text file is expected to consist of one line per matrix rows, which consist of space-separated numbers, one per column.
The row and column dimensions in the file must match shape
.
Parameter-specific learning rates can be realized with the optional learningRateMultiplier
parameter.
This factor is multiplied with the actual learning rate when performing parameter updates.
For example, if specified as 0, the parameter will not be updated, it is constant.
Create a constant tensor.
Constant {scalarValue, rows = 1, cols = 1}