Directional Derivative and its relation with Gradient Vector
Gradient Vector is the basis of gradient descent, the heart of Deep Learning. And one of the mathematical terms, known as the Directional Derivative, is deeply related to the gradient vector. If we understand the meaning of a directional derivative, it’s quite easy to grasp the nuances of gradient descent.
Let’s understand the concept of how the directional derivative is related to the gradient vector. Consider a two-variable function that depends on independent variables x and y. The plot of this function in a 3-dimensional rectangular coordinate system produces a fixed surface, with output co-ordinate being z-axis and input coordinates x and y axes. For any function, let’s consider the surface plot, as shown below:
Consider any point P in the input plane (XY plane) whose coordinates are (x, y). Let’s move a very small displacement, known as elementary displacement. So, the co-ordinates of Q are (x+dx, y+dy). R is a point parallel to Y-axis. Co-ordinates of R, therefore, becomes (x, y+dy). Clearly, position vectors of P and Q are given as:
So, from the triangle law of vector addition,
Now output z corresponding to point P is the vertical height as shown. The vertical height corresponding to point Q is z+dz, as the magnitude of the elementary displacement is very small. Therefore, when we move an elementary displacement from P to Q, there will be an elementary change dz in the height z.
Well studied so far. Now let’s define the term directional derivative. As the name depicts, derivative refers to the slope(gradient), and “directional” says that we are moving in a particular direction (from P to Q here).
So directional derivative, from the diagram, is defined as
Also, the gradient vector, which we often encounter in Deep Learning to update the trainable parameters, is defined as:
The latter representation, matrix representation of a vector, is often encountered in Deep Learning.
Here, dz/dx refers to the partial derivative of z with respect to (w.r.t) x. It means an elementary change in z while moving an elementary displacement dx parallel to X-axis. Similarly, dz/dy refers to the partial derivative of z w.r.t y.
Now it’s time to establish the relation, which is our final objective, between directional derivative and gradient vector.
It can be intuitively observed from the diagram that elementary change dz while moving from P to Q is equal to the sum of the elementary change in z while first moving from P to R( i.e., parallel to Y-axis) and then moving from R to Q (i.e., parallel to X-axis).
Note that dz/dx is calculated at point R whose coordinates are (x, y+dy). Interestingly, it can also approximately be the same at point P as the separation between the points P and R is very, very small.
Equation (1) can be written beautifully as the dot product of two vectors (from the definition of the dot product of two vectors). So equation (1) becomes
Dividing both sides by the magnitude of vector PQ, we get
The left-hand side of this equation is nothing but a directional derivative. Also, we know that a vector divided by its magnitude gives a unit vector that specifies its direction. So, the above equation becomes
Here n-hat is a unit vector in the direction of elementary displacement.
This is the sought relation between directional derivative and gradient vector. Now that we have an awesome relation, we will have fun understanding the Gradient Descent, which forms the heart of Deep Learning! I have explained this concept in my Youtube video: https://youtu.be/n7_gFWyI-yQ