Chapter 26 The chain rule
Recall the univariate chain rule: If \(y = f(x)\) is a function of \(x\) and \(z = g(y)\) is a function of \(y\), a question of interest is "What is the change in \(z\) relative to change in \(x\)?
We can write \(z = g(y) = g(f(x))\) and using this notation, the change in \(z\) with respect to the variable \(x\) is \(\frac{dz}{dx} = \frac{dz}{dy}\frac{dy}{dx} = \frac{df(y)}{dy}\frac{dg(x)}{dx}\). Written in functional form \[ \begin{aligned} (g(f(x)))' = (g \circ f)'(x) = g'(f(x)) f'(x) \end{aligned} \]
Example 26.1 Let \(z = f(y) = y^3\) and let \(y = g(x) = e^{x}\) what is \(\frac{dz}{dx}\)?
26.1 The chain rule with one independent variable
Drawing in class
Definition 26.1 (Chain rule for one independent variable) Let \(z\) be a differentiable function of two variables \(x\) and \(y\) so that \(z = f(x, y)\) and let \(x=g(t)\) be a function of \(t\) and \(y=h(t)\) a function of \(t\). Written in functional form, \(z\) can be written as \(z = f(x, y) = f(g(t), h(t))\), with \(x=g(t)\) and \(y=h(t)\). Then we can define the derivative of \(z\) with respect to \(t\) as \[ \begin{aligned} \frac{dz}{dt} = \frac{\partial z}{ \partial x}\frac{dx}{dt} + \frac{\partial z}{\partial y}\frac{dy}{dt} \end{aligned} \]
For the definition above, we have the dependent variable \(z\) and we have intermediate variables \(x\) and \(y\).
Notice in the definition above that there is a mix of partial derivatives (\(\partial\)) and ordinary derivatives (\(d\)).
Example 26.2 Let \(z = x^2 + e^y\) and let \(x = \cos(t)\) and \(y = \sin(t)\)
The results from above can also be extended to have more than two intermediate variables.
Draw picture
26.2 The chain rule with several independent variables
Often, functions will have more than one independent variables.
Definition 26.2 (Chain rule for two independent variables) Let \(z\) be a differentiable function of two variables \(x\) and \(y\) so that \(z = f(x, y)\) and let \(x=g(t, s)\) be a function of \(s\) and \(t\) and \(y=h(s, t)\) a function of \(s\) and \(t\). Written in functional form, \(z\) can be written as \(z = f(x, y) = f(g(s, t), h(s, t))\), with \(x=g(s, t)\) and \(y=h(s, t)\). Then we can define the partial derivative of \(z\) with respect to \(s\) as \[ \begin{aligned} \frac{\partial z}{\partial s} = \frac{\partial z}{ \partial x}\frac{\partial x}{\partial s} + \frac{\partial z}{\partial y}\frac{\partial y}{\partial s} \end{aligned} \] the partial derivative of \(z\) with respect to \(t\) as \[ \begin{aligned} \frac{\partial z}{\partial t} = \frac{\partial z}{ \partial x}\frac{\partial x}{\partial t} + \frac{\partial z}{\partial y}\frac{\partial y}{\partial t} \end{aligned} \]
Example 26.3 Let \(z = f(x, y) = x^2 e^y\) and let \(x = 2s - t\) and \(y = 4s^3-3t^2\)
26.3 The chain rule in matrix notation
To get a better understanding of the chain rule, it helps to show the chain rule using matrix notation. Using the matrix notation will enable you to apply the chain rule to any number of intermediate variables. For example, consider the extension of the definition for the chain rule of a function with one independent variable.
Definition 26.3 (Matrix chain rule for one independent variable) Let \(z\) be a differentiable function of two variables \(x\) and \(y\) so that \(z = f(x, y)\) and let \(x=g(t)\) be a function of \(t\) and \(y=h(t)\) a function of \(t\). Written in functional form, \(z\) can be written as \(z = f(x, y) = f(g(t), h(t))\), with \(x=g(t)\) and \(y=h(t)\). Then we can define the derivative of \(z\) with respect to \(t\) as \[ \begin{aligned} \frac{dz}{dt} = \frac{\partial z}{ \partial x}\frac{dx}{dt} + \frac{\partial z}{\partial y}\frac{dy}{dt} \end{aligned} \]
Written in matrix notation, this is \[ \begin{aligned} \frac{dz}{dt} = \begin{pmatrix} \frac{\partial z}{ \partial x} & \frac{\partial z}{\partial y} \end{pmatrix} \begin{pmatrix} \frac{dx}{dt} \\ \frac{dy}{dt} \end{pmatrix} = \frac{\partial z}{ \partial x}\frac{dx}{dt} + \frac{\partial z}{\partial y}\frac{dy}{dt} \end{aligned} \]
The definition above for the chain rule with two variables is given by
Definition 26.4 (Chain rule for two independent variables) Let \(z\) be a differentiable function of two variables \(x\) and \(y\) so that \(z = f(x, y)\) and let \(x=g(t, s)\) be a function of \(s\) and \(t\) and \(y=h(s, t)\) a function of \(s\) and \(t\). Written in functional form, \(z\) can be written as \(z = f(x, y) = f(g(s, t), h(s, t))\), with \(x=g(s, t)\) and \(y=h(s, t)\). Then we can define the partial derivative of \(z\) with respect to \(s\) as \[ \begin{aligned} \frac{\partial z}{\partial s} = \frac{\partial z}{ \partial x}\frac{\partial x}{\partial s} + \frac{\partial z}{\partial y}\frac{\partial y}{\partial s} \end{aligned} \] which, in matrix notation is \[ \begin{aligned} \frac{dz}{dt} = \begin{pmatrix} \frac{\partial z}{ \partial x} & \frac{\partial z}{\partial y} \end{pmatrix} \begin{pmatrix} \frac{\partial x}{\partial s} \\ \frac{\partial y}{\partial s} \end{pmatrix} = \frac{\partial z}{ \partial x}\frac{\partial x}{\partial s} + \frac{\partial z}{\partial y}\frac{\partial y}{\partial s} \end{aligned} \]
The partial derivative of \(z\) with respect to \(t\) as \[ \begin{aligned} \frac{\partial z}{\partial t} = \frac{\partial z}{ \partial x}\frac{\partial x}{\partial t} + \frac{\partial z}{\partial y}\frac{\partial y}{\partial t} \end{aligned} \] which, in matrix notation is \[ \begin{aligned} \frac{dz}{dt} = \begin{pmatrix} \frac{\partial z}{ \partial x} & \frac{\partial z}{\partial y} \end{pmatrix} \begin{pmatrix} \frac{\partial x}{\partial t} \\ \frac{\partial y}{\partial t} \end{pmatrix} = \frac{\partial z}{ \partial x}\frac{\partial x}{\partial t} + \frac{\partial z}{\partial y}\frac{\partial y}{\partial t} \end{aligned} \]
This use of matrix notation for derivatives will be useful in understanding the gradient.