The Beautiful Einstein Summations¶
Einstein summations are a powerful mechanism for array operations. Major scientific computing / neural network libraries like numpy, TensorFlow, and PyTorch have built in support for performing linear algebraic array operations using Einstein summation convention. This article is a gentle introduction on using Einstein summations for such operations. The libraries have highly efficient implementations available for computing Einstein summations and thus they would run much faster than any hand written alternative code.
Notation
Vectors will be denoted using small letters \(u, v, w, x, y, z\). Scalars will also be small letters \(a,b,c,m,n,s\) etc. The letters \(i,j,k,l\) will be used to indicate indices. Matrices will be denoted with capital letters \(A,B,C\) etc.
Elements of a vector will be denoted using numpy
notation \(v[i]\).
Elements, rows and columns of a matrix will also be denoted
using numpy
notation \(A[i,j]\),
\(A[i,:]\) and \(A[:, j]\).
Einstein Notation¶
Consider a matrix multiplication operation \(C = A B\). Individual elements of \(C\) are computed by the summation:
Einstein notation simply drops the summations from the equation:
The idea is simple. If an index doesn’t appear in the L.H.S. but it appears in the R.H.S., then we assume that the expression on the R.H.S. is to be summed over that index. In the matrix multiplication example above, since the index \(j\) appears on R.H.S. and doesn’t appear on the L.H.S., we assume that the expression \(A[i, j] B [j, k]\) is to be summed over \(j\).
The equation \(s = v[i]\) would mean that we need to sum over all the elements of a vector \(v\).
The equation \(s = A[i,j]\) would mean that we need to sum over all the elements of the matrix \(A\).
The equation \(C[i, j] = A [i, j] B [i, j]\) would mean that we need to multiply \(A\) and \(B\) element-wise to compute \(C\).
The equation \(v[i] = A[i, j]\) would mean that each element of the vector \(v\) is the sum of elements in corresponding row of matrix \(A\), i.e. row wise sum of \(A\).
In this article, we will explore Einstein summations
using numpy
.
einsum
¶
numpy.einsum
is a general purpose workhorse
function in numpy
for computing summations following
the Einstein Notation a.k.a. Einstein summations.
For example, the matrix multiplication \(C = AB\)
is computed as einsum('ij,jk->ik', A, B)
.
Note
The explanation may seem heavy in first go. Read it once, read all the examples in following sections, come back, and read this section again.
To understand this function, we need to understand
the subscripts specification in the first argument
ij,jk->ik
. Let’s revisit the matrix multiplication equation:
There are two arguments to matrix multiplication: \(A\) and \(B\).
\(A\) is indexed by \(i\) (for rows) and \(j\) (for columns).
\(B\) is indexed by \(j\) (for rows) and \(k\) (for columns).
The output of the operation is the matrix \(C\).
\(C\) is indexed by \(i\) (for rows) and \(k\) (for columns).
Now let’s parse ij,jk->ik
.
The Einstein summation
einsum('ij,jk->ik', A, B)
has two arguments \(A\) and \(B\).The specification
ij,jk->ik
describes how the summation is to proceed.There are two parts:
ij,jk
andik
separated by->
. Input -> Output.ik
are the indices for the output of the summationij,jk
are indices for the input arguments for the summationThe comma separates indices for different input arguments. Thus, in this case, there are two arguments.
Splitting
ij,jk
,ij
are indices for first argument \(A\) andjk
are indices for second argument \(B\).Since there are two letters for each argument, we assume that the arguments are both two dimension arrays (i.e. matrices).
The first letter refers to the index for the rows of a matrix and the second letter refers to the index for the columns of the same matrix.
Thus, it translates to
A[i, j] * B[j, k]
and the result is stored inC[i,k]
.
We will now look at all kinds of linear algebraic array operations possible
via einsum
.
Load the numpy
library first:
>>> import numpy as np
In the examples below:
We present the underlying mathematical equation first in Einstein notation for each linear algebraic operation.
We then present its implementation using
einsum
.We also contrast it with the standard
numpy
functions for same operation.
Vector Operations¶
Let \(u, v\) be a vectors of length \(n=3\).
>>> u = np.array([0, 1, 4])
>>> v = np.array([1, 2, 3])
Sum of all elements of a vector¶
In normal mathematical notation, we say \(s = \sum_{i=0}^{n-1} v[i]\). In Einstein notation, we say \(s = v[i]\).
>>> np.einsum("i->", u)
5
>>> np.einsum("i->", v)
6
Element wise multiplication¶
In this case, there is no summation happening in the R.H.S.
>>> w = np.einsum("i,i->i", u, v)
>>> w
array([ 0, 2, 12])
Inner Product¶
Normal mathematical notation:
Einstein notation:
>>> np.einsum("i,i->", u, v)
14
>>> np.inner(u, v)
14
>>> np.dot(u, v)
14
Outer Product¶
>>> np.outer(u, v)
array([[ 0, 0, 0],
[ 1, 2, 3],
[ 4, 8, 12]])
>>> np.einsum('i,j->ij', u, v)
array([[ 0, 0, 0],
[ 1, 2, 3],
[ 4, 8, 12]])
Matrix Operations¶
Let’s construct 2 square matrices \(A, B\) for the examples below.
>>> A = np.array([[0,1,2],[3,4,5],[0,1,2]])
>>> B = np.array([[-1,1,-2],[3,2,-3],[0,-1,1]])
>>> A
array([[0, 1, 2],
[3, 4, 5],
[0, 1, 2]])
>>> B
array([[-1, 1, -2],
[ 3, 2, -3],
[ 0, -1, 1]])
Matrix Transpose¶
The expression \(C = A^T\) is computed as:
>>> np.einsum('ij->ji', A)
array([[0, 3, 0],
[1, 4, 1],
[2, 5, 2]])
>> A.T
array([[0, 3, 0],
[1, 4, 1],
[2, 5, 2]])
Main Diagonal¶
The main diagonal of a matrix \(A\) can be extracted as:
>>> np.einsum('ii->i', A)
array([0, 4, 2])
>>> np.diag(A)
array([0, 4, 2])
Trace¶
The trace of a matrix is the sum of its diagonal elements.
In Einstein notation:
>>> np.einsum('ii->', A)
6
Sum of Elements¶
The sum of all elements of a matrix can be expressed as \(\sum_{i} \sum_{j} A[i, j]\). In Einstein notation:
>>> np.einsum('ij->', A)
18
Column-wise Sum¶
>>> np.sum(A, axis=0)
array([3, 6, 9])
>>> np.einsum('ij->j', A)
array([3, 6, 9])
Row-wise Sum¶
>>> np.sum(A, axis=1)
array([ 3, 12, 3])
>>> np.einsum('ij->i', A)
array([ 3, 12, 3])
Matrix Multiplication¶
We can consider four possibilities:
\(C = A B\)
\(C = A^T B\)
\(C = A B^T\)
\(C = A^T B^T = (BA)^T\)
\(C = A B\)
>>> A @ B
array([[ 3, 0, -1],
[ 9, 6, -13],
[ 3, 0, -1]])
>>> np.einsum('ij,jk->ik', A, B)
array([[ 3, 0, -1],
[ 9, 6, -13],
[ 3, 0, -1]])
\(C = A^T B\)
>>> A.T @ B
array([[ 9, 6, -9],
[ 11, 8, -13],
[ 13, 10, -17]])
>>> np.einsum('ji,jk->ik', A, B)
array([[ 9, 6, -9],
[ 11, 8, -13],
[ 13, 10, -17]])
\(C = A B^T\) (inner product)
>>> A @ B.T
array([[-3, -4, 1],
[-9, 2, 1],
[-3, -4, 1]])
>>> np.einsum('ij,kj->ik', A, B)
array([[-3, -4, 1],
[-9, 2, 1],
[-3, -4, 1]])
This is also known as inner product of two matrices
in numpy
as numpy
is row major.
>>> np.inner(A, B)
array([[-3, -4, 1],
[-9, 2, 1],
[-3, -4, 1]])
\(C = A^T B^T\)
>>> A.T @ B.T
array([[ 3, 6, -3],
[ 1, 8, -3],
[-1, 10, -3]])
>>> np.einsum('ji,kj->ik', A, B)
array([[ 3, 6, -3],
[ 1, 8, -3],
[-1, 10, -3]])
Element-wise Multiplication¶
\(C[i,j] = A[i, j] B[i,j]\)
>>> A * B
array([[ 0, 1, -4],
[ 9, 8, -15],
[ 0, -1, 2]])
>>> np.einsum('ij,ij->ij', A, B)
array([[ 0, 1, -4],
[ 9, 8, -15],
[ 0, -1, 2]])
We can similarly construct element wise multiplications with the transposes of either A or B or both.
\(C[i, j] = A[i,j] B[j, i]\)
>>> A * B.T
array([[ 0, 3, 0],
[ 3, 8, -5],
[ 0, -3, 2]])
>>> np.einsum('ij,ji->ij', A, B)
array([[ 0, 3, 0],
[ 3, 8, -5],
[ 0, -3, 2]])
Column-wise Dot Product¶
Sometimes we need to compute the inner product of each column of \(A\) with corresponding column of \(B\).
Suppose we have \(n\) columns in both \(A\) and \(B\) with \(m\) rows each. Then the result will be a vector of dimension \(n\) such that:
In a way we are multiplying the matrices element wise and then summing over each column. In Einstein notation:
>>> np.einsum('ij,ij->j', A, B)
array([ 9, 8, -17])
>>> np.sum(A*B, axis=0)
array([ 9, 8, -17])
General Array Operations¶
Following operations work on arrays of any dimension.
A view of an array:
>>> np.einsum('...', A)
array([[0, 1, 2],
[3, 4, 5],
[0, 1, 2]])