2  Matrix operations

2.1 ToDo

2.2 The dot/inner product

Definition 2.1 Let \(\mathbf{u}\) and \(\mathbf{v}\) be vectors in \(\mathcal{R}^n\). Then, the dot product (also called the inner product) of \(\mathbf{u}\) and \(\mathbf{v}\) is \(\mathbf{u}' \mathbf{v}\). The vectors \(\mathbf{u}\) and \(\mathbf{v}\) are \(n \times 1\) matrices (\(n\) rows and one column) where \(\mathbf{u}'\) is a \(1 \times n\) matrix and the inner product \(\mathbf{u}' \mathbf{v}\) is a scalar (\(1 \times 1\) matrix). The inner product is also sometimes called the dot product and written as \(\mathbf{u} \cdot \mathbf{v}\).

If the vectors

\[ \begin{aligned} \mathbf{u} = \begin{pmatrix} u_1 \\ u_2 \\ \vdots \\ u_n \end{pmatrix} & & \mathbf{v} = \begin{pmatrix} v_1 \\ v_2 \\ \vdots \\ v_n \end{pmatrix} \end{aligned} \]

then \(\mathbf{u}' \mathbf{v} = u_1 v_1 + u_2 v_2 + \cdots u_n v_n\)

Example 2.1 Find the inner product \(\mathbf{u}'\mathbf{v}\) and \(\mathbf{v}'\mathbf{u}\) of

\[ \begin{aligned} \mathbf{u} = \begin{pmatrix} 2 \\ -3 \\ 1 \end{pmatrix} & & \mathbf{v} = \begin{pmatrix} 4 \\ -2 \\ 3 \end{pmatrix} \end{aligned} \]

  • do by hand
u <- c(2, -3, 1)
v <- c(4, -2, 3)
# u'v
sum(u*v)
[1] 17
t(u) %*% v
     [,1]
[1,]   17
# v'u
sum(v*u)
[1] 17
t(v) %*% u
     [,1]
[1,]   17

The properties of inner products are defined with the following theorem.

Theorem 2.1 (Inner Product) Let \(\mathbf{u}\), \(\mathbf{v}\), and \(\mathbf{w}\) be vectors in \(\mathcal{R}^n\) and let \(c\) be a scalar. Then

  1. \(\mathbf{u}'\mathbf{v} = \mathbf{v}'\mathbf{u}\)

  2. \((\mathbf{u} + \mathbf{v})' \mathbf{w} = \mathbf{u}' \mathbf{w} + \mathbf{v}' \mathbf{w}\)

  3. \(( c \mathbf{u} )' \mathbf{v} = c ( \mathbf{v}'\mathbf{u} )\)

  4. \(\mathbf{u}'\mathbf{u} \geq 0\) with \(\mathbf{u}'\mathbf{u} = 0\) only when \(\mathbf{u} = \mathbf{0}\)

2.3 Properties of matrices

2.3.1 Matrix Addition

Matrix Addition: If the matrices \(\mathbf{A}\) and \(\mathbf{B}\) are of the same dimension (e.g., both \(\mathbf{A}\) and \(\mathbf{B}\) have the same number of rows \(m\) and the same number of columns \(n\)), then

\[ \begin{aligned} \mathbf{A} + \mathbf{B} & = \begin{pmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{pmatrix} + \begin{pmatrix} b_{11} & b_{12} & \cdots & b_{1n} \\ b_{21} & b_{22} & \cdots & b_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ b_{m1} & b_{m2} & \cdots & b_{mn} \end{pmatrix} \\ & = \begin{pmatrix} a_{11} + b_{11} & a_{12} + b_{12} & \cdots & a_{1n} + b_{1n} \\ a_{21} + b_{21} & a_{22} + b_{22} & \cdots & a_{2n} + b_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} + b_{m1} & a_{m2} + b_{m2} & \cdots & a_{mn} + b_{mn} \end{pmatrix} \\ & = \left\{ a_{ij} + b_{ij} \right\} \end{aligned} \tag{2.1}\]

… Another way to

If \(\mathbf{A}\) and \(\mathbf{B}\) are of the same dimension (same number of rows and columns) you can add the matrices together

A <- matrix(c(4, 1, 33, 2, 0, -4), 3, 2)
B <- matrix(c(7, -24, 3, 9, 11, -9), 3, 2)
A
     [,1] [,2]
[1,]    4    2
[2,]    1    0
[3,]   33   -4
B
     [,1] [,2]
[1,]    7    9
[2,]  -24   11
[3,]    3   -9
A + B
     [,1] [,2]
[1,]   11   11
[2,]  -23   11
[3,]   36  -13

We can also write this using for loops

# initialize an empty matrix to fill
C <- matrix(0, 3, 2)

for (i in 1:nrow(A)) {         # loop over the rows
    for (j in 1:ncol(A)) {     # loop over the columns
        C[i, j] <- A[i, j] + B[i, j]
    }
}
C
     [,1] [,2]
[1,]   11   11
[2,]  -23   11
[3,]   36  -13

If \(\mathbf{A}\) and \(\mathbf{B}\) are of different dimensions (they differ in either the number of rows or columns), R will return an error warning you that the matrices are of different sizes and can’t be added

A <- matrix(c(4, 1, 33, 2, 0, -4), 3, 2)
B <- matrix(c(7, -24, 3, 9), 2, 2)
A
     [,1] [,2]
[1,]    4    2
[2,]    1    0
[3,]   33   -4
B
     [,1] [,2]
[1,]    7    3
[2,]  -24    9
A + B
Error in A + B: non-conformable arrays

Theorem 2.2 Let \(\mathbf{A}\), \(\mathbf{B}\), and \(\mathbf{C}\) be \(m \times n\) matrices and let \(a\) and \(b\) be scalars, then:

  1. \(\mathbf{A} + \mathbf{B} = \mathbf{B} + \mathbf{A}\)

  2. \((\mathbf{A} + \mathbf{B}) + \mathbf{C} = \mathbf{A} + (\mathbf{B} + \mathbf{C})\)

  3. \(\mathbf{A} + \mathbf{0} = \mathbf{A}\)

  4. \(a (\mathbf{A} + \mathbf{B}) = a \mathbf{A} + a \mathbf{B}\)

  5. \((a + b)\mathbf{A} = a \mathbf{A} + b \mathbf{A}\)

  6. \((ab)\mathbf{A} = a (b\mathbf{A})\)

2.3.2 Matrix Multiplication

Matrix Multiplication: If \(\mathbf{A} = \left\{ a_{ij} \right\}\) is an \(m \times n\) matrix and \(\mathbf{B} = \left\{ b_{jk} \right\}\) is a \(n \times p\) matrix, then the matrix product \(\mathbf{C} = \mathbf{A} \mathbf{B}\) is an \(m \times p\) matrix where \(\mathbf{C} = \left\{ \sum_{j=1}^n a_{ij} b_{jk} \right\}\)

\[ \begin{aligned} \mathbf{A} \mathbf{B} & = \begin{pmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{pmatrix} \begin{pmatrix} b_{11} & b_{12} & \cdots & b_{1p} \\ b_{21} & b_{22} & \cdots & b_{2p} \\ \vdots & \vdots & \ddots & \vdots \\ b_{n1} & b_{n2} & \cdots & b_{np} \end{pmatrix} \\ & = \begin{pmatrix} \sum_{j=1}^n a_{1j} b_{j1} & \sum_{j=1}^n a_{1j} b_{j2} & \cdots & \sum_{j=1}^n a_{1j} b_{jp} \\ \sum_{j=1}^n a_{2j} b_{j1} &\sum_{j=1}^n a_{2j} b_{j2} & \cdots & \sum_{j=1}^n a_{2j} b_{jp} \\ \vdots & \vdots & \ddots & \vdots \\ \sum_{j=1}^n a_{mj} b_{j1} &\sum_{j=1}^n a_{mj} b_{j2} & \cdots & \sum_{j=1}^n a_{mj} b_{jp} \end{pmatrix} \end{aligned} \tag{2.2}\]

Another way to define matrix multiplication is through inner product notation. Define the \(m \times n\) matrix \(\mathbf{A}\) and the \(n \times p\) matrix \(\mathbf{B}\) as the partition

\[ \begin{aligned} \mathbf{A} & = \begin{pmatrix} \mathbf{a}_{1}' \\ \mathbf{a}_{2}' \\ \vdots \\ \mathbf{a}_{m}' \end{pmatrix} & \mbox{ and } && \mathbf{B} & = \begin{pmatrix} \mathbf{b}_{1} & \mathbf{b}_{2} & \cdots & \mathbf{b}_{p} \end{pmatrix} \end{aligned} \]

where \(\mathbf{a}_i\) and \(\mathbf{b}_k\) are both \(n\)-vectors. Then, we have \(\mathbf{C} = \mathbf{A} \mathbf{B}\) can be written as

\[ \begin{aligned} \mathbf{A} \mathbf{B} = \mathbf{A} \begin{pmatrix} \mathbf{b}_1 & \mathbf{b}_2 & \cdots & \mathbf{b}_p \end{pmatrix} = \begin{pmatrix} \mathbf{A} \mathbf{b}_1 & \mathbf{A} \mathbf{b}_2 & \cdots & \mathbf{A} \mathbf{b}_p \end{pmatrix} \end{aligned} \]

Note that in this representation, each column of the matrix \(\mathbf{A}\mathbf{B}\) is a linear combination the the columns of \(\mathbf{A}\) with coefficients given by the corresponding column of \(\mathbf{B}\).

\[ \begin{aligned} \mathbf{A} \mathbf{B} & = \begin{pmatrix} \mathbf{a}_1' \mathbf{b}_1 & \mathbf{a}_1' \mathbf{b}_2 & \cdots & \mathbf{a}_1' \mathbf{b}_p \\ \mathbf{a}_2' \mathbf{b}_1 & \mathbf{a}_2' \mathbf{b}_2 & \cdots & \mathbf{a}_2' \mathbf{b}_p \\ \vdots & \vdots & \ddots & \vdots \\ \mathbf{a}_m' \mathbf{b}_1 & \mathbf{a}_m' \mathbf{b}_2 & \cdots & \mathbf{a}_m' \mathbf{b}_p \end{pmatrix} \\ & = \left\{ \mathbf{a}_i' \mathbf{b}_k \right\}. \end{aligned} \]

Written in this notation, we arrive at the multiplication rule for \(\mathbf{C} = \mathbf{A} \mathbf{B}\) – the \(ik\)th element \(c_{ik}\) of \(\mathbf{C}\) is the inner product of the \(i\)th row of \(\mathbf{A}\) and the \(j\)th column of \(\mathbf{B}\).

2.3.3 Properties of Matrix Multiplication

Define the \(m \times m\) identity matrix \(\mathbf{I}_m\) with ones on the diagonal and zeros off diagonal as

\[ \mathbf{I}_m = \begin{pmatrix} 1 & 0 & \cdots & 0 \\ 0 & 1 & \cdots & 0 \\ 0 & 0 & \ddots & 0 \\ 0 & 0 & \cdots & 1\end{pmatrix} \]

Let \(\mathbf{A}\) be an \(m \times n\) matrix, then:

  1. Let \(\mathbf{B}\) be an \(n \times p\) matrix and \(\mathbf{C}\) a \(p \times q\) matrix. Then \(\mathbf{A}(\mathbf{B}\mathbf{C}) = (\mathbf{A}\mathbf{B})\mathbf{C}\) is an \(m \times q\) matrix.

  2. Let \(\mathbf{B}\) and \(\mathbf{C}\) be \(n \times p\) matrices. Then \(\mathbf{A}(\mathbf{B} + \mathbf{C}) = \mathbf{A}\mathbf{B} + \mathbf{A}\mathbf{C}\) is an \(m \times p\) matrix.

  3. Let \(\mathbf{B}\) and \(\mathbf{C}\) be \(p \times m\) matrices. Then \((\mathbf{B} + \mathbf{C})\mathbf{A} = \mathbf{B}\mathbf{A} + \mathbf{C}\mathbf{A}\) is an \(p \times m\) matrix.

  4. Let \(\mathbf{B}\) be an \(n \times p\) matrix and \(c\) a scalar. Then \(c(\mathbf{A} \mathbf{B}) = (c \mathbf{A}) \mathbf{B} = \mathbf{A}(c\mathbf{B})\) is an \(m \times p\) matrix.

  5. \(\mathbf{I}_m \mathbf{A} = \mathbf{A} \mathbf{I}_n = \mathbf{A}\)

Examples: in class

Note: Matrix multiplication violates some of the rules of multiplication that you might be used to. Pay attention for the following:

  1. In general \(\mathbf{A} \mathbf{B} \neq \mathbf{B} \mathbf{A}\) (sometimes these are equal, but usually are not)

  2. \(\mathbf{A}\mathbf{B} = \mathbf{A} \mathbf{C}\) does not imply \(\mathbf{B} = \mathbf{C}\)

  3. \(\mathbf{A}\mathbf{B} = \mathbf{0}\) does not imply that \(\mathbf{A} = \mathbf{0}\) or \(\mathbf{B} = \mathbf{0}\)

2.3.4 Matrix Multiplication complexity (Big O notation)

In the study of algorithms, the notation \(O(n)\) is used to describe the number of calculations that need to be done to evaluate the equation. As an example, consider \(\mathbf{A} = \begin{pmatrix}3 & 1 \\ 2 & -3 \end{pmatrix}\), \(\mathbf{B} = \begin{pmatrix} -2 & 4 \\ -1 & 2 \end{pmatrix}\), and \(\mathbf{x} = \begin{pmatrix} -3 \\ 1 \end{pmatrix}\).

By hand: Calculate

  1. \((\mathbf{A} \mathbf{B}) \mathbf{x}\)

  2. \(\mathbf{A} (\mathbf{B} \mathbf{x})\)

Which was easier? Which required less calculation?

  • Matrix-matrix multiplication of and \(m \times n\) matrix \(\mathbf{A}\) and an \(n \times p\) matrix \(\mathbf{B}\) has complexity \(O(m n p)\).

  • Matrix-vector multiplication of and \(m \times n\) matrix \(\mathbf{A}\) and an \(n\)-vector \(\mathbf{x}\) has complexity \(O(n m)\).

From example above:

  1. \(O(m n p)\) matrix-matrix multiplication \((\mathbf{A} \mathbf{B})\) followed by \(O(m n)\) matrix-vector multiplication \((\mathbf{A} \mathbf{B}) \mathbf{x}\) which has computational complexity \(O(m n p) + O(m n)\)

  2. \(O(m n)\) matrix-vector multiplication \((\mathbf{B} \mathbf{x})\) followed by \(O(m n)\) matrix-vector multiplication \(\mathbf{A} (\mathbf{B} \mathbf{x})\) which has computational complexity \(O(m n) + O(m n)\)

2.3.5 Matrix powers

Powers of a \(n \times n\) (square) matrix are defined as the product of \(\mathbf{A}\) multiplied \(k\) times

\[ \mathbf{A}^k = \underbrace{\mathbf{A} \cdots \mathbf{A}}_k \]

2.3.6 Matrix Transpose

The matrix transpose is an operator that swaps the rows and columns of a matrix. If \(\mathbf{A}\) is an \(m \times n\) matrix (\(m\) rows and \(n\) columns), then \(\mathbf{A}'\) is a \(n \times m\) matrix (\(n\) rows and \(m\) columns. Note: some use \(\mathbf{A}^T\) to denote a transpose; I prefer the \('\) notation as it is much simpler and cleaner notation). The matrix

\[ \begin{aligned} \mathbf{A} & = \begin{pmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{pmatrix} \end{aligned} \]

has transpose

\[ \begin{aligned} \mathbf{A}' & = \begin{pmatrix} a_{11} & a_{21} & \cdots & a_{m1} \\ a_{12} & a_{22} & \cdots & a_{m2} \\ \vdots & \vdots & \ddots & \vdots \\ a_{1n} & a_{2n} & \cdots & a_{mn} \end{pmatrix}, \end{aligned} \]

Theorem 2.3 Let \(\mathbf{A}\) be an \(m \times n\) matrix, then

  1. \((\mathbf{A}')' = \mathbf{A}\).

  2. Let \(\mathbf{B}\) be an \(m \times n\) matrix, then \((\mathbf{A} + \mathbf{B})' = \mathbf{A}' + \mathbf{B}'\).

  3. For any scalar \(c\), \((c \mathbf{A})' = c \mathbf{A}'\).

  4. Let \(\mathbf{B}\) be an \(n \times p\) matrix, then \(( \mathbf{A} \mathbf{B})' = \mathbf{B}' \mathbf{A}'\) is an \(p \times m\) matrix.

Note: The power of video games: GPUs and modern CPUs are becoming more and more parallelized. Because the \(ij\)th element of \(\mathbf{A}\mathbf{B}\) requires only the \(i\)th row of \(\mathbf{A}\) and the \(j\)th column of \(\mathbf{B}\), matrix multiplication is easily parallelized under modern computing architectures. Thanks to video games, this parallelization has been made faster than ever.

Examples: in class