Chapter 2 Matrix operations
2.1 ToDo
Show matrix multiplication using dot products
move to chapter 1 or 2
Note: add examples:
2.2 The dot/inner product
Definition 2.1 Let \(\mathbf{u}\) and \(\mathbf{v}\) be vectors in \(\mathcal{R}^n\). Then, the dot product (also called the inner product) of \(\mathbf{u}\) and \(\mathbf{v}\) is \(\mathbf{u}' \mathbf{v}\). The vectors \(\mathbf{u}\) and \(\mathbf{v}\) are \(n \times 1\) matrices (\(n\) rows and one column) where \(\mathbf{u}'\) is a \(1 \times n\) matrix and the inner product \(\mathbf{u}' \mathbf{v}\) is a scalar (\(1 \times 1\) matrix). The inner product is also sometimes called the dot product and written as \(\mathbf{u} \cdot \mathbf{v}\).
If the vectors \[ \begin{aligned} \mathbf{u} = \begin{pmatrix} u_1 \\ u_2 \\ \vdots \\ u_n \end{pmatrix} & & \mathbf{v} = \begin{pmatrix} v_1 \\ v_2 \\ \vdots \\ v_n \end{pmatrix} \end{aligned} \] then \(\mathbf{u}' \mathbf{v} = u_1 v_1 + u_2 v_2 + \cdots u_n v_n\)
Example 2.1 Find the inner product \(\mathbf{u}'\mathbf{v}\) and \(\mathbf{v}'\mathbf{u}\) of \[ \begin{aligned} \mathbf{u} = \begin{pmatrix} 2 \\ -3 \\ 1 \end{pmatrix} & & \mathbf{v} = \begin{pmatrix} 4 \\ -2 \\ 3 \end{pmatrix} \end{aligned} \]
- do by hand
## [1] 17
## [,1]
## [1,] 17
## [1] 17
## [,1]
## [1,] 17
The properties of inner products are defined with the following theorem.
Theorem 2.1 Let \(\mathbf{u}\), \(\mathbf{v}\), and \(\mathbf{w}\) be vectors in \(\mathcal{R}^n\) and let \(c\) be a scalar. Then
\(\mathbf{u}'\mathbf{v} = \mathbf{v}'\mathbf{u}\)
\((\mathbf{u} + \mathbf{v})' \mathbf{w} = \mathbf{u}' \mathbf{w} + \mathbf{v}' \mathbf{w}\)
\(( c \mathbf{u} )' \mathbf{v} = c ( \mathbf{v}'\mathbf{u} )\)
\(\mathbf{u}'\mathbf{u} \geq 0\) with \(\mathbf{u}'\mathbf{u} = 0\) only when \(\mathbf{u} = \mathbf{0}\)
2.3 Properties of matrices
2.3.1 Matrix Addition
Matrix Addition: If the matrices \(\mathbf{A}\) and \(\mathbf{B}\) are of the same dimension (e.g., both \(\mathbf{A}\) and \(\mathbf{B}\) have the same number of rows \(m\) and the same number of columns \(n\)), then
\[ \begin{aligned} \mathbf{A} + \mathbf{B} & = \begin{pmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{pmatrix} + \begin{pmatrix} b_{11} & b_{12} & \cdots & b_{1n} \\ b_{21} & b_{22} & \cdots & b_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ b_{m1} & b_{m2} & \cdots & b_{mn} \end{pmatrix} \\ & = \begin{pmatrix} a_{11} + b_{11} & b_{12} + b_{12} & \cdots & a_{1n} + b_{1n} \\ a_{21} + b_{21} & a_{22} + b_{22} & \cdots & a_{2n} + b_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} + b_{m1} & a_{m2} + b_{m2} & \cdots & a_{mn} + b_{mn} \end{pmatrix} \\ & = \left\{ a_{ij} + b_{ij} \right\} \end{aligned} \tag{2.1} \]
… Another way to
If \(\mathbf{A}\) and \(\mathbf{B}\) are of the same dimension (same number of rows and columns) you can add the matrices together
## [,1] [,2]
## [1,] 4 2
## [2,] 1 0
## [3,] 33 -4
## [,1] [,2]
## [1,] 7 9
## [2,] -24 11
## [3,] 3 -9
## [,1] [,2]
## [1,] 11 11
## [2,] -23 11
## [3,] 36 -13
We can also write this using for
loops
# initialize an empty matrix to fill
C <- matrix(0, 3, 2)
for (i in 1:nrow(A)) { # loop over the rows
for (j in 1:ncol(A)) { # loop over the columns
C[i, j] <- A[i, j] + B[i, j]
}
}
C
## [,1] [,2]
## [1,] 11 11
## [2,] -23 11
## [3,] 36 -13
If \(\mathbf{A}\) and \(\mathbf{B}\) are of different dimensions (they differ in either the number of rows or columns), R
will return an error warning you that the matrices are of different sizes and can’t be added
## [,1] [,2]
## [1,] 4 2
## [2,] 1 0
## [3,] 33 -4
## [,1] [,2]
## [1,] 7 3
## [2,] -24 9
## Error in A + B: non-conformable arrays
Theorem 2.2 Let \(\mathbf{A}\), \(\mathbf{B}\), and \(\mathbf{C}\) be \(m \times n\) matrices and let \(a\) and \(b\) be scalars, then:
\(\mathbf{A} + \mathbf{B} = \mathbf{B} + \mathbf{A}\)
\((\mathbf{A} + \mathbf{B}) + \mathbf{C} = \mathbf{A} + (\mathbf{B} + \mathbf{C})\)
\(\mathbf{A} + \mathbf{0} = \mathbf{A}\)
\(a (\mathbf{A} + \mathbf{B}) = a \mathbf{A} + a \mathbf{B}\)
\((a + b)\mathbf{A} = a \mathbf{A} + b \mathbf{A}\)
\((ab)\mathbf{A} = a (b\mathbf{A})\)
2.3.2 Matrix Multipliation
Matrix Multiplication: If \(\mathbf{A} = \left\{ a_{ij} \right\}\) is an \(m \times n\) matrix and \(\mathbf{B} = \left\{ a_{jk} \right\}\) is a \(n \times p\) matrix, then the matrix product \(\mathbf{C} = \mathbf{A} \mathbf{B}\) is an \(m \times p\) matrix where \(\mathbf{C} = \left\{ \sum_{j=1}^p a_{ij} b{jk} \right\}\)
\[ \begin{aligned} \mathbf{A} \mathbf{B} & = \begin{pmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{pmatrix} \begin{pmatrix} b_{11} & b_{12} & \cdots & b_{1p} \\ b_{21} & b_{22} & \cdots & b_{2p} \\ \vdots & \vdots & \ddots & \vdots \\ b_{n1} & b_{n2} & \cdots & b_{np} \end{pmatrix} \\ & = \begin{pmatrix} \sum_{j=1}^n a_{1j} b_{j1} & \sum_{j=1}^n a_{1j} b_{j2} & \cdots & \sum_{j=1}^n a_{1j} b_{jp} \\ \sum_{j=1}^n a_{2j} b_{j1} &\sum_{j=1}^n a_{2j} b_{j2} & \cdots & \sum_{j=1}^n a_{2j} b_{jp} \\ \vdots & \vdots & \ddots & \vdots \\ \sum_{j=1}^n a_{mj} b_{j1} &\sum_{j=1}^n a_{nj} b_{j2} & \cdots & \sum_{j=1}^n a_{mj} b_{jp} \end{pmatrix} \end{aligned} \tag{2.2} \]
Another way to define matrix multiplication is through inner product notation. Define the \(m \times n\) matrix \(\mathbf{A}\) and the \(n \times p\) matrix \(\mathbf{B}\) as the partition
\[ \begin{aligned} \mathbf{A} & = \begin{pmatrix} \mathbf{a}_{1}' \\ \mathbf{a}_{2}' \\ \vdots \\ \mathbf{a}_{m}' \end{pmatrix} & \mbox{ and } && \mathbf{B} & = \begin{pmatrix} \mathbf{b}_{1} & \mathbf{b}_{2} & \cdots & \mathbf{b}_{p} \end{pmatrix} \end{aligned} \]
where \(\mathbf{a}_i\) and \(\mathbf{b}_k\) are both \(n\)-vectors. Then, we have \(\mathbf{C} = \mathbf{A} \mathbf{B}\) can be written as
\[ \begin{aligned} \mathbf{A} \mathbf{B} = \mathbf{A} \begin{pmatrix} \mathbf{b}_1 & \mathbf{b}_2 & \cdots & \mathbf{b}_p \end{pmatrix} = \begin{pmatrix} \mathbf{A} \mathbf{b}_1 & \mathbf{A} \mathbf{b}_2 & \cdots & \mathbf{A} \mathbf{b}_p \end{pmatrix} \end{aligned} \]
Note that in this representation, each column of the matrix \(\mathbf{A}\mathbf{B}\) is a linear combination the the columns of \(\mathbf{A}\) with coefficients given by the corresponding column of \(\mathbf{B}\).
\[ \begin{aligned} \mathbf{A} \mathbf{B} & = \begin{pmatrix} \mathbf{a}_1' \mathbf{b}_1 & \mathbf{a}_1' \mathbf{b}_2 & \cdots & \mathbf{a}_1' \mathbf{b}_q \\ \mathbf{a}_2' \mathbf{b}_1 & \mathbf{a}_2' \mathbf{b}_2 & \cdots & \mathbf{a}_2' \mathbf{b}_q \\ \vdots & \vdots & \ddots & \vdots \\ \mathbf{a}_n' \mathbf{b}_1 & \mathbf{a}_n' \mathbf{b}_2 & \cdots & \mathbf{a}_n' \mathbf{b}_q \end{pmatrix} \\ & = \left\{ \mathbf{a}_i' \mathbf{b}_k \right\}. \end{aligned} \]
Written in this notation, we arrive at the multiplication rule for \(\mathbf{C} = \mathbf{A} \mathbf{B}\) – the \(ik\)th element \(c_{ik}\) of \(\mathbf{C}\) is the inner product of the \(i\)th row of \(\mathbf{A}\) and the \(j\)th column of \(\mathbf{B}\).
2.3.3 Properties of Matrix Multiplication
Define the \(m \times m\) identity matrix \(\mathbf{I}_m\) with ones on the diagonal and zeros off diagonal as \[ \mathbf{I}_m = \begin{pmatrix} 1 & 0 & \cdots & 0 \\ 0 & 1 & \cdots & 0 \\ 0 & 0 & \ddots & 0 \\ 0 & 0 & \cdots & 1\end{pmatrix} \] Let \(\mathbf{A}\) be an \(m \times n\) matrix, then:
Let \(\mathbf{B}\) be an \(n \times p\) matrix and \(\mathbf{C}\) a \(p \times q\) matrix. Then \(\mathbf{A}(\mathbf{B}\mathbf{C}) = (\mathbf{A}\mathbf{B})\mathbf{C}\) is an \(m \times q\) matrix.
Let \(\mathbf{B}\) and \(\mathbf{C}\) be \(n \times p\) matrices. Then \(\mathbf{A}(\mathbf{B} + \mathbf{C}) = \mathbf{A}\mathbf{B} + \mathbf{A}\mathbf{C}\) is an \(m \times p\) matrix.
Let \(\mathbf{B}\) and \(\mathbf{C}\) be \(p \times m\) matrices. Then \((\mathbf{B} + \mathbf{C})\mathbf{A} = \mathbf{B}\mathbf{A} + \mathbf{C}\mathbf{A}\) is an \(p \times m\) matrix.
Let \(\mathbf{B}\) be an \(p \times m\) matrix and \(c\) a scalar. Then \(c(\mathbf{A} \mathbf{B}) = (c \mathbf{A}) \mathbf{B} = \mathbf{A}(c\mathbf{B})\) is an \(p \times m\) matrix.
\(\mathbf{I}_m \mathbf{A} = \mathbf{A} \mathbf{I}_n = \mathbf{A}\)
Examples: in class
Note: Matrix multiplication violates some of the rules of multiplication that you might be used to. Pay attention for the following:
In general \(\mathbf{A} \mathbf{B} \neq \mathbf{B} \mathbf{A}\) (sometimes these are equal, but usually are not)
\(\mathbf{A}\mathbf{B} = \mathbf{A} \mathbf{C}\) does not imply \(\mathbf{B} = \mathbf{C}\)
\(\mathbf{A}\mathbf{B} = \mathbf{0}\) does not imply that \(\mathbf{A} = \mathbf{0}\) or \(\mathbf{B} = \mathbf{0}\)
2.3.4 Matrix Multiplication complexity (Big O notation)
In the study of algorithms, the notation \(O(n)\) is used to describe the number of calculations that need to be done to evaluate the equation. As an example, consider \(\mathbf{A} = \begin{pmatrix}3 & 1 \\ 2 & -3 \end{pmatrix}\), \(\mathbf{B} = \begin{pmatrix} -2 & 4 \\ -1 & 2 \end{pmatrix}\), and \(\mathbf{x} = \begin{pmatrix} -3 \\ 1 \end{pmatrix}\).
By hand: Calculate
\((\mathbf{A} \mathbf{B}) \mathbf{x}\)
\(\mathbf{A} (\mathbf{B} \mathbf{x})\)
Which was easier? Which required less calculation?
Matrix-matrix multiplication of and \(m \times n\) matrix \(\mathbf{A}\) and an \(m \times p\) matrix \(\mathbf{B}\) has complexity \(O(m n p)\).
Matrix-vector multiplication of and \(m \times n\) matrix \(\mathbf{A}\) and an \(p\)-vector \(\mathbf{x}\) has complexity \(O(n m)\).
From example above:
\(O(m n p)\) matrix-matrix multiplication \((\mathbf{A} \mathbf{B})\) followed by \(O(m n)\) matrix-vector multiplication \((\mathbf{A} \mathbf{B}) \mathbf{x}\) which has computational complexity \(O(m n p) + O(m n)\)
\(O(m n)\) matrix-vector multiplication \((\mathbf{B} \mathbf{x})\) followed by \(O(m n)\) matrix-vector multiplication \(\mathbf{A} (\mathbf{B} \mathbf{x})\) which has computational complexity \(O(m n) + O(m n)\)
2.3.5 Matrix powers
Powers of a \(n \times n\) (square) matrix are defined as the product of \(\mathbf{A}\) multiplied \(k\) times \[ \mathbf{A}^k = \underbrace{\mathbf{A} \cdots \mathbf{A}}_k \]
2.3.6 Matrix Transpose
The matrix transpose is an operator that swaps the rows and columns of a matrix. If \(\mathbf{A}\) is an \(m \times n\) matrix (\(m\) rows and \(n\) columns), then \(\mathbf{A}'\) is a \(n \times m\) matrix (\(n\) rows and \(m\) columns. Note: some use \(\mathbf{A}^T\) to denote a transpose; I prefer the \('\) notation as it is much simpler and cleaner notation). The matrix
\[ \begin{aligned} \mathbf{A} & = \begin{pmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{pmatrix} \end{aligned} \] has transpose
\[ \begin{aligned} \mathbf{A}' & = \begin{pmatrix} a_{11} & a_{21} & \cdots & a_{m1} \\ a_{12} & a_{22} & \cdots & a_{m2} \\ \vdots & \vdots & \ddots & \vdots \\ a_{1n} & a_{2n} & \cdots & a_{mn} \end{pmatrix}, \end{aligned} \]
Theorem 2.3 Let \(\mathbf{A}\) be an \(m \times n\) matrix, then
\((\mathbf{A}')' = \mathbf{A}\).
Let \(\mathbf{B}\) be an \(m \times n\) matrix, then \((\mathbf{A} + \mathbf{B})' = \mathbf{A}' + \mathbf{B}'\).
For any scalar \(c\), \((c \mathbf{A})' = c \mathbf{A}'\).
Let \(\mathbf{B}\) be an \(n \times p\) matrix, then \(( \mathbf{A} \mathbf{B})' = \mathbf{B}' \mathbf{A}'\) is an \(p \times m\) matrix.
Note: The power of video games: GPUs and modern CPUs are becoming more and more parallelized. Because the \(ij\)th element of \(\mathbf{A}\mathbf{B}\) requires only the \(i\)th row of \(\mathbf{A}\) and the \(j\)th column of \(\mathbf{B}\), matrix multiplication is easily parallelized under modern computing architectures. Thanks to video games, this parallelization has been made faster than ever.
Examples: in class