Multivariable Mathematics for Data Science

Author

John Tipton

Published

December 5, 2022

1 Preface

This book will introduce students to multivariable Calculus and linear algebra methods and techniques to be successful in data science, statistics, computer science, and other data-driven, computational disciplines.

The motiviation for this text is to provide both a theoretical understanding of important multivariable methods used in data science as well as giving a hands-on experience using software. Throughout this text, we assume the reader has a solid foundation in univariate calculus (typically two semesters) as well as familiarity with a scripting language (e.g., R or python).

1.1 Getting started in R

TBD

1.2 Some videos that explain useful concepts of linear algebra

1.3 Notation

For notation, we let lowercase Roman letters represent scalar numbers (e.g., n = 5, d = 7), lowercase bold letters represent vectors

\[ \begin{aligned} \textbf{x} = \begin{pmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{pmatrix}, \end{aligned} \]

where the elements \(x_1, \ldots, x_n\) are scalars written in lowercase Roman. Note that vectors are assumed to follow a vertical notation where the elements of the vector (the \(x_i\)s are stacked on top of one another) and the order matters. For example, the vector

\[ \begin{aligned} \mathbf{x} & = \begin{pmatrix} 5 \\ 2 \\ 8 \end{pmatrix} \end{aligned} \]

has the first element \(x_1 = 5\), second element \(x_2 = 2\) and third element \(x_3 = 8\). Note that the vector \(\begin{pmatrix} 5 \\ 2 \\ 8 \end{pmatrix}\) is not the same as the vector \(\begin{pmatrix} 8 \\ 2 \\ 5 \end{pmatrix}\) because the order of the elements matters.

We can also write the vector as

\[ \begin{aligned} \textbf{x} = \left( x_1, x_2, \ldots, x_n \right)', \end{aligned} \]

where the \('\) symbol represents the transpose function. For our example matrix, we have \(\begin{pmatrix} 5 \\ 2 \\ 8 \end{pmatrix}' = \begin{pmatrix} 5 & 2 & 8 \end{pmatrix}\) which is the original vector but arranged in a row rather than a column. Likewise, the transpose of a row vector \(\begin{pmatrix} 5 & 2 & 8 \end{pmatrix}' = \begin{pmatrix} 5 \\ 2 \\ 8 \end{pmatrix}\) is a column vector. If \(\mathbf{x}\) is a column vector, we say that \(\mathbf{x}'\) is a row vector and if \(\mathbf{x}\) is a row vector, the \(\mathbf{x}'\) is a column vector.

To create a vector we can use the concatenate function c(). For example, the vector \(\mathbf{x} = \begin{pmatrix} 5 \\ 2 \\ 8 \end{pmatrix}\) can be created as the R object using

x <- c(5, 2, 8)

where the <- assigns the values in the vector c(5, 2, 8) to the object named x. To print the values of x, we can use

x
[1] 5 2 8

which prints the elements of x. Notice that R prints the elements of \(\mathbf{x}\) in a row; however, \(\mathbf{x}\) is a column vector. This inconsistency is present to allow the output to be printed in a manner easier to read (more numbers fit on a row). If we put the column vector into a data.frame, then the vector will be presented as a column vector

data.frame(x)
  x
1 5
2 2
3 8

One can use the index operator \([\ ]\) to select specific elements of the vector \(\mathbf{x}\). For example, the first element of \(\mathbf{x}\), \(x_1\), is

x[1]
[1] 5

and the third element of \(\mathbf{x}\), \(x_3\), is

x[3]
[1] 8

The transpose function t() turns a column vector into a row vector (or a row vector into a column vector). For example the transpose \(\mathbf{x}'\) of \(\mathbf{x}\) is

tx <- t(x)
tx
     [,1] [,2] [,3]
[1,]    5    2    8

where tx is R object storing the transpose of \(\mathbf{x}\) and is a row vector. The transpose of tx. Notice the indices on the output of the row vector tx. The index operator [1, ] selects the first row to tx and the index operator [, 1] gives the first column tx. Taking the transpose again gives us back the original column vector

t(tx)
     [,1]
[1,]    5
[2,]    2
[3,]    8

1.3.1 Matrices

We let uppercase bold letters \(\mathbf{A}\), \(\mathbf{B}\), etc., represent matrices. We define the matrix \(\mathbf{A}\) with \(m\) rows and \(n\) columns as

\[ \begin{aligned} \mathbf{A} & = \begin{pmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{pmatrix}, \end{aligned} \]

with \(a_{ij}\) being the value of the matrix \(\mathbf{A}\) in the \(i\)th row and the \(j\)th column.

If the matrix

\[ \begin{aligned} \mathbf{A} & = \begin{pmatrix} 5 & 7 & 1 \\ 5 & -22 & 2 \\ -14 & 5 & 99 \\ 42 & -3 & 0\end{pmatrix}, \end{aligned} \]

the elements \(a_{11}\) = 5, \(a_{12}\) = 7, \(a_{21}\) = 5, and \(a_{33}\) = 99, etc.

In R, we can define the matrix A using the matrix() function

A <- matrix(
    data = c(5, 5, -14, 42, 7, -22, 5, -3, 1, 2, 99, 0),
    nrow = 4,
    ncol = 3
)

A
     [,1] [,2] [,3]
[1,]    5    7    1
[2,]    5  -22    2
[3,]  -14    5   99
[4,]   42   -3    0

Notice in the above creation of \(\mathbf{A}\), we wrote defined the elements of the \(\mathbf{A}\) using the columns stacked on top of one another. If we want to fill in the elements of \(\mathbf{A}\) using the rows, we can add the option byrow = TRUE to the matrix() function

A <- matrix(
    data  = c(5, 7, 1, 5, -22, 2, -14, 5, 99, 42, -3, 0), 
    nrow  = 4,
    ncol  = 3,
    byrow = TRUE
)
A
     [,1] [,2] [,3]
[1,]    5    7    1
[2,]    5  -22    2
[3,]  -14    5   99
[4,]   42   -3    0

To select the \(ij\)th elements of \(\mathbf{A}\), we use the subset operator [ to select the element. For example, to get the element \(a_{11} = 5\) in the first row and first column of \(\mathbf{A}\), we use

A[1, 1]
[1] 5

The element \(a_{3, 3} = 99\) in the third row and third column can be selected using

A[3, 3]
[1] 99

The matrix \(\mathbf{A}\) can also be represented as a set of either column vectors \(\{\mathbf{c}_j \}_{j=1}^n\) or row vectors \(\{\mathbf{r}_i \}_{i=1}^m\). For example, the column vector representation is

\[ \begin{aligned} \mathbf{A} & = \left( \mathbf{c}_{1} \middle| \mathbf{c}_{2} \middle| \cdots \middle| \mathbf{c}_{n} \right), \end{aligned} \]

where the notation \(|\) is used to separate the vectors

\[ \begin{aligned} \mathbf{c}_1 & = \begin{pmatrix} a_{11} \\ a_{21} \\ \vdots \\ a_{m1} \end{pmatrix}, & \mathbf{c}_2 & = \begin{pmatrix} a_{12} \\ a_{22} \\ \vdots \\ a_{m2} \end{pmatrix}, & \cdots, & & \mathbf{c}_n & = \begin{pmatrix} a_{1n} \\ a_{2n} \\ \vdots \\ a_{mn} \end{pmatrix} \end{aligned} \]

In R you can extract the columns using the [ selection operator

c1 <- A[, 1] # first column
c2 <- A[, 2] # second column
c3 <- A[, 3] # third column

and you can give the column representation of the matrix A with with column bind function cbind()

cbind(c1, c2, c3)
      c1  c2 c3
[1,]   5   7  1
[2,]   5 -22  2
[3,] -14   5 99
[4,]  42  -3  0

The row vector representation of \(\mathbf{A}\) is

\[ \begin{aligned} \mathbf{A} & = \begin{pmatrix} \mathbf{r}_{1} \\ \mathbf{r}_{2} \\ \vdots \\ \mathbf{r}_{m} \end{pmatrix}, \end{aligned} \]

where the row vectors \(\mathbf{r}_i\) are

\[ \begin{aligned} \mathbf{r}_1 & = \left( a_{11}, a_{12}, \ldots, a_{1n} \right) \\ \mathbf{r}_2 & = \left( a_{21}, a_{22}, \ldots, a_{2n} \right) \\ & \vdots \\ \mathbf{r}_m & = \left( a_{m1}, a_{m2}, \ldots, a_{mn} \right) \end{aligned} \]

In R you can extract the rows using the [ selection operator

r1 <- A[1, ] # first row
r2 <- A[2, ] # second row
r3 <- A[3, ] # third row
r4 <- A[4, ] # fourth row

and you can give the row representation of the matrix A with with row bind function rbind()

rbind(r1, r2, r3, r4)
   [,1] [,2] [,3]
r1    5    7    1
r2    5  -22    2
r3  -14    5   99
r4   42   -3    0