The Beautiful Einstein Summations

Einstein summations are a powerful mechanism for array operations. Major scientific computing / neural network libraries like numpy, TensorFlow, and PyTorch have built in support for performing linear algebraic array operations using Einstein summation convention. This article is a gentle introduction on using Einstein summations for such operations. The libraries have highly efficient implementations available for computing Einstein summations and thus they would run much faster than any hand written alternative code.

Notation

Vectors will be denoted using small letters \(u, v, w, x, y, z\). Scalars will also be small letters \(a,b,c,m,n,s\) etc. The letters \(i,j,k,l\) will be used to indicate indices. Matrices will be denoted with capital letters \(A,B,C\) etc.

Elements of a vector will be denoted using numpy notation \(v[i]\). Elements, rows and columns of a matrix will also be denoted using numpy notation \(A[i,j]\), \(A[i,:]\) and \(A[:, j]\).

Einstein Notation

Consider a matrix multiplication operation \(C = A B\). Individual elements of \(C\) are computed by the summation:

\[C[i, k] = \sum_{j} A[i, j] B [j, k]\]

Einstein notation simply drops the summations from the equation:

\[C[i, k] = A[i, j] B [j, k]\]

The idea is simple. If an index doesn’t appear in the L.H.S. but it appears in the R.H.S., then we assume that the expression on the R.H.S. is to be summed over that index. In the matrix multiplication example above, since the index \(j\) appears on R.H.S. and doesn’t appear on the L.H.S., we assume that the expression \(A[i, j] B [j, k]\) is to be summed over \(j\).

The equation \(s = v[i]\) would mean that we need to sum over all the elements of a vector \(v\).

The equation \(s = A[i,j]\) would mean that we need to sum over all the elements of the matrix \(A\).

The equation \(C[i, j] = A [i, j] B [i, j]\) would mean that we need to multiply \(A\) and \(B\) element-wise to compute \(C\).

The equation \(v[i] = A[i, j]\) would mean that each element of the vector \(v\) is the sum of elements in corresponding row of matrix \(A\), i.e. row wise sum of \(A\).

In this article, we will explore Einstein summations using numpy.

einsum

numpy.einsum is a general purpose workhorse function in numpy for computing summations following the Einstein Notation a.k.a. Einstein summations. For example, the matrix multiplication \(C = AB\) is computed as einsum('ij,jk->ik', A, B).

Note

The explanation may seem heavy in first go. Read it once, read all the examples in following sections, come back, and read this section again.

To understand this function, we need to understand the subscripts specification in the first argument ij,jk->ik. Let’s revisit the matrix multiplication equation:

\[C[i, k] = A [i, j] B [j, k]\]
  • There are two arguments to matrix multiplication: \(A\) and \(B\).

  • \(A\) is indexed by \(i\) (for rows) and \(j\) (for columns).

  • \(B\) is indexed by \(j\) (for rows) and \(k\) (for columns).

  • The output of the operation is the matrix \(C\).

  • \(C\) is indexed by \(i\) (for rows) and \(k\) (for columns).

Now let’s parse ij,jk->ik.

  • The Einstein summation einsum('ij,jk->ik', A, B) has two arguments \(A\) and \(B\).

  • The specification ij,jk->ik describes how the summation is to proceed.

  • There are two parts: ij,jk and ik separated by ->. Input -> Output.

  • ik are the indices for the output of the summation

  • ij,jk are indices for the input arguments for the summation

  • The comma separates indices for different input arguments. Thus, in this case, there are two arguments.

  • Splitting ij,jk, ij are indices for first argument \(A\) and jk are indices for second argument \(B\).

  • Since there are two letters for each argument, we assume that the arguments are both two dimension arrays (i.e. matrices).

  • The first letter refers to the index for the rows of a matrix and the second letter refers to the index for the columns of the same matrix.

  • Thus, it translates to A[i, j] * B[j, k] and the result is stored in C[i,k].

We will now look at all kinds of linear algebraic array operations possible via einsum.

Load the numpy library first:

>>> import numpy as np

In the examples below:

  • We present the underlying mathematical equation first in Einstein notation for each linear algebraic operation.

  • We then present its implementation using einsum.

  • We also contrast it with the standard numpy functions for same operation.

Vector Operations

Let \(u, v\) be a vectors of length \(n=3\).

>>> u = np.array([0, 1, 4])
>>> v = np.array([1, 2, 3])

Sum of all elements of a vector

In normal mathematical notation, we say \(s = \sum_{i=0}^{n-1} v[i]\). In Einstein notation, we say \(s = v[i]\).

>>> np.einsum("i->", u)
5
>>> np.einsum("i->", v)
6

Element wise multiplication

\[w[i] = u[i] v[i]\]

In this case, there is no summation happening in the R.H.S.

>>> w = np.einsum("i,i->i", u, v)
>>> w
array([ 0,  2, 12])

Inner Product

Normal mathematical notation:

\[s = \langle u, v \rangle = \sum_{i=0}^{n-1} u[i] v[i]\]

Einstein notation:

\[s = \langle u, v \rangle = u[i] v[i]\]
>>> np.einsum("i,i->", u, v)
14

>>> np.inner(u, v)
14

>>> np.dot(u, v)
14

Outer Product

\[A[i,j] = u[i] v[j]\]
>>> np.outer(u, v)
array([[ 0,  0,  0],
   [ 1,  2,  3],
   [ 4,  8, 12]])


>>> np.einsum('i,j->ij', u, v)
array([[ 0,  0,  0],
   [ 1,  2,  3],
   [ 4,  8, 12]])

Matrix Operations

Let’s construct 2 square matrices \(A, B\) for the examples below.

>>> A = np.array([[0,1,2],[3,4,5],[0,1,2]])
>>> B = np.array([[-1,1,-2],[3,2,-3],[0,-1,1]])
>>> A
array([[0, 1, 2],
   [3, 4, 5],
   [0, 1, 2]])
>>> B
array([[-1,  1, -2],
   [ 3,  2, -3],
   [ 0, -1,  1]])

Matrix Transpose

The expression \(C = A^T\) is computed as:

\[C[j, i] = A [i, j]\]
>>> np.einsum('ij->ji', A)
array([[0, 3, 0],
   [1, 4, 1],
   [2, 5, 2]])

>> A.T
array([[0, 3, 0],
   [1, 4, 1],
   [2, 5, 2]])

Main Diagonal

The main diagonal of a matrix \(A\) can be extracted as:

\[v[i] = A[i,i]\]
>>> np.einsum('ii->i', A)
array([0, 4, 2])

>>> np.diag(A)
array([0, 4, 2])

Trace

The trace of a matrix is the sum of its diagonal elements.

\[\text{Trace}(A) = \sum_{i} A[i, i]\]

In Einstein notation:

\[\text{Trace}(A) = A[i, i]\]
>>> np.einsum('ii->', A)
6

Sum of Elements

The sum of all elements of a matrix can be expressed as \(\sum_{i} \sum_{j} A[i, j]\). In Einstein notation:

\[s = A[i, j]\]
>>> np.einsum('ij->', A)
18

Column-wise Sum

\[v[j] = A[i, j] = \sum_i A[i, j]\]
>>> np.sum(A, axis=0)
array([3, 6, 9])

>>> np.einsum('ij->j', A)
array([3, 6, 9])

Row-wise Sum

\[v[i] = A[i, j] = \sum_j A[i, j]\]
>>> np.sum(A, axis=1)
array([ 3, 12,  3])

>>> np.einsum('ij->i', A)
array([ 3, 12,  3])

Matrix Multiplication

We can consider four possibilities:

  1. \(C = A B\)

  2. \(C = A^T B\)

  3. \(C = A B^T\)

  4. \(C = A^T B^T = (BA)^T\)

\(C = A B\)

\[C[i, k] = A[i, j] B [j, k]\]
>>> A @ B
array([[  3,   0,  -1],
   [  9,   6, -13],
   [  3,   0,  -1]])

>>> np.einsum('ij,jk->ik', A, B)
array([[  3,   0,  -1],
   [  9,   6, -13],
   [  3,   0,  -1]])

\(C = A^T B\)

\[C[i, k] = A[j, i] B [j, k]\]
>>> A.T @ B
array([[  9,   6,  -9],
   [ 11,   8, -13],
   [ 13,  10, -17]])

>>> np.einsum('ji,jk->ik', A, B)
array([[  9,   6,  -9],
   [ 11,   8, -13],
   [ 13,  10, -17]])

\(C = A B^T\) (inner product)

\[C[i, k] = A[i, j] B [k, j]\]
>>> A @ B.T
array([[-3, -4,  1],
   [-9,  2,  1],
   [-3, -4,  1]])

>>> np.einsum('ij,kj->ik', A, B)
array([[-3, -4,  1],
   [-9,  2,  1],
   [-3, -4,  1]])

This is also known as inner product of two matrices in numpy as numpy is row major.

>>> np.inner(A, B)
array([[-3, -4,  1],
   [-9,  2,  1],
   [-3, -4,  1]])

\(C = A^T B^T\)

\[C[i, k] = A[j, i] B [k, j]\]
>>> A.T @ B.T
array([[ 3,  6, -3],
   [ 1,  8, -3],
   [-1, 10, -3]])

>>> np.einsum('ji,kj->ik', A, B)
array([[ 3,  6, -3],
   [ 1,  8, -3],
   [-1, 10, -3]])

Element-wise Multiplication

\(C[i,j] = A[i, j] B[i,j]\)

>>> A * B
array([[  0,   1,  -4],
   [  9,   8, -15],
   [  0,  -1,   2]])

>>> np.einsum('ij,ij->ij', A, B)
array([[  0,   1,  -4],
   [  9,   8, -15],
   [  0,  -1,   2]])

We can similarly construct element wise multiplications with the transposes of either A or B or both.

\(C[i, j] = A[i,j] B[j, i]\)

>>> A * B.T
array([[ 0,  3,  0],
   [ 3,  8, -5],
   [ 0, -3,  2]])

>>> np.einsum('ij,ji->ij', A, B)
array([[ 0,  3,  0],
   [ 3,  8, -5],
   [ 0, -3,  2]])

Column-wise Dot Product

Sometimes we need to compute the inner product of each column of \(A\) with corresponding column of \(B\).

Suppose we have \(n\) columns in both \(A\) and \(B\) with \(m\) rows each. Then the result will be a vector of dimension \(n\) such that:

\[v[j] = \langle A[:, j], B[:, j] \rangle\]

In a way we are multiplying the matrices element wise and then summing over each column. In Einstein notation:

\[v[j] = A[i, j] B[i, j]\]
>>> np.einsum('ij,ij->j', A, B)
array([  9,   8, -17])

>>> np.sum(A*B, axis=0)
array([  9,   8, -17])

General Array Operations

Following operations work on arrays of any dimension.

A view of an array:

>>> np.einsum('...', A)
array([[0, 1, 2],
   [3, 4, 5],
   [0, 1, 2]])