Data science, Machine Learning, or any Mathematical Optimization related technical interviews encounter the most common question on one of the properties of the method of Steepest Descent. It is also called Gradient Descent method. The successive directions of the steepest descent are normal to one another. They ask for proof or some example to explain this property. In this article, I start with the proof and give a simple example to describe it –

Let

extremum is

We update the iterative point as follows:

Next successive point

The new point is a function of

If

Note, derivative is w.r.t to

If we further simplify

Now we have,

From above it looks clear that the product of successive gradients is = 0. Hence from the basic definition of gradient, successive gradients are orthogonal to each other.

Note:

## Understand with one simple example

From example,

Starting point

Now, gradient of

(1)

According to Steepest Descent rule, new update point

i.e.,

So,

Note, new point

Function value at new point will also be a function of

Now, gradient of

(2)

We set

Now new point we have is

From 1 and 2 we have,

The dot product of two vectors:

Also read Column Generation Method for Cutting Stock Problem