Skip to content

Times and TransposeTimes

Frank Seide edited this page Aug 5, 2016 · 20 revisions
A * B
Times (A, B, outputRank=1)
TransposeTimes (A, B, outputRank=1)

The Times() function implements the matrix product, with extensions for tensors. The * operator is a short-hand for it. TransposeTimes() transposes the first argument.

If A and B are matrices (rank-2 tensor) or column vectors (rank-1 tensor), A * B will compute the common matrix product, just as one would expect.

To compute the matrix product A^T * B (with ^T denoting transposition), you could use Transpose (A) * B, but the special function TransposeTimes (A, B) is more efficient (but there is no corresponding efficient version of A * Transpose (B)).

Time sequences

Both A and B can be either single matrices or time sequences. A common case for recurrent networks is that A is a weight matrix, while B is a sequence of inputs.

Note: If A is a time sequence, the operation is not efficient, as it will launch a separate GEMM invocation for every time step. The exception is TransposeTimes() where both inputs are column vectors, for which a special optimization exists.

Sparse support

Times() and TransposeTimes() support sparse matrix. The result is a dense matrix unless both are sparse. The two most important use cases are:

  • B being a one-hot representation of an input word. Then, A * B denotes a word embedding, where the columns of A are the embedding vectors of the words. The following is a function that embeds a word vector:

    Embedding (x, dim) = Parameter (dim, 0) * x
    
  • A being a one-hot representation of an label word. The popular ce criterion and the error counter can be written using TransposeTimes() as follows, respectively, where z is the input to the top-level Softmax() classifier, and L the label sequence which may be sparse:

    CrossEntropyWithSoftmax (L, z) = ReduceLogSum (z) - TransposeTimes (L,          z)
    ErrorPrediction         (L, z) = BS.Constants.One - TransposeTimes (L, Hardmax (z))
    

Multiplying with a scalar

The matrix product can not be used to multiply a matrix with a scalar. You will get an error regarding mismatching dimensions. To multiply with a scalar, use the element-wise product .* instead. For example, the weighted average of two matrices could be written like this:

z = Constant (alpha) .* x + Constant (1-alpha) .* y

Extended interpretation of matrix product for tensors of rank > 2

If A and/or B are tensors of higher rank, the * operation denotes a generalized matrix product where all but the first dimension of A must match with the leading dimensions of B, and are interpreted by flattening. For example a product of a [I x J x K] and a [J x K x L] tensor (which we will abbreviate henceforth as [I x J x K] * [J x K x L]) gets reinterpreted by reshaping the two tensors as matrices as [I x (J * K)] * [(J * K) x L], for which the matrix product is defined and yields a result of dimension [I x L]. This makes sense if one considers the rows of a weight matrix to be patterns that activation vectors are matched against. The above generalization allows these patterns themselves to be multi-dimensional, such as images or running windows of speech features.

It is also possible to have more than one non-matched dimension in B. For example [I x J] * [J x K x L] is interpreted as this matrix product: [I x J] * [J x (K * L)] which thereby yields a result of dimensions [I x K x L]. For example, this allows to apply a matrix to all vectors inside a rolling window of L speech features of dimension J.

If the result of the product should have multiple dimensions (such as arranging a layer's activations as a 2D field), then instead of using the * operator, one must say Times (A, B, outputRank=m) where m is the number of dimensions in which the 'patterns' are arranged, and which are kept in the output. For example, Times (tensor of dim [I x J x K], tensor of dim [K x L], outputRank=2) will be interpreted as the matrix product [(I * J) x K] * [K x L] and yield a result of dimensions [I x J x L].

Clone this wiki locally