Skip to content

Times and TransposeTimes

Frank Seide edited this page Aug 5, 2016 · 20 revisions
A * B
Times (A, B, outputRank=1)
TransposeTimes (A, B, outputRank=1)

The A * B operation has rich semantics for matrices and tensors.

If A and B are rank-2 or rank-1 tensors, A * B will compute the common matrix product.

To compute the matrix product A^T * B (with ^T denoting transposition), you could use Transpose (A) * B, but the special function TransposeTimes (A, B) is more efficient (but there is no corresponding efficient version of A * Transpose (B)).

If A and/or B are tensors of higher rank, the * operation denotes a generalized matrix product where all but the first dimension of A must match with the leading dimensions of B, and are interpreted by flattening. For example a product of a [I x J x K] and a [J x K x L] tensor (which we will abbreviate henceforth as [I x J x K] * [J x K x L]) gets reinterpreted by reshaping the two tensors as matrices as [I x (J * K)] * [(J * K) x L], for which the matrix product is defined and yields a result of dimension [I x L]. This makes sense if one considers the rows of a weight matrix to be patterns that activation vectors are matched against. The above generalization allows these patterns themselves to be multi-dimensional, such as images or running windows of speech features.

It is also possible to have more than one non-matched dimension in B. For example [I x J] * [J x K x L] is interpreted as this matrix product: [I x J] * [J x (K * L)] which thereby yields a result of dimensions [I x K x L]. For example, this allows to apply a matrix to all vectors inside a rolling window of L speech features of dimension J.

If the result of the product should have multiple dimensions (such as arranging a layer's activations as a 2D field), then instead of using the * operator, one must say Times (A, B, outputRank=m) where m is the number of dimensions in which the 'patterns' are arranged, and which are kept in the output. For example, Times (tensor of dim [I x J x K], tensor of dim [K x L], outputRank=2) will be interpreted as the matrix product [(I * J) x K] * [K x L] and yield a result of dimensions [I x J x L].

Clone this wiki locally