Recall that the gradient descent converges slowly if the condition number is large. For example, consider the function
More generally, recall the Newton's method introduced before for finding roots, where we use a Taylor series to estimate the objective function. When
Note that if the objective function is strictly convex, then
Recall our requirement for the descending direction. We hope
Does the Newton's method always work well?
Intuitively this is not true since we use the second order Taylor series to approximate the function, but the Taylor series only works locally.
If the second-order Taylor series estimates the value of functions well, then
Consider the function
If we set
The reason why the Newton's method does not converge in this example is that the second-order derivative of
However, just keeping
Consider the function
In this case,
In this example, the reason to failure is that
We now give some conditions to guarantee the convergence of the Newton’s method iterates.
Given a twice continuously differentiable function
Suppose
Let
This is called the quadratic convergence, or the convergence of order
Fix
Recall the gradient descent iteration
The second order Taylor series locally approximate
Note that
A simple idea is to view a
The Frobenius norm, sometimes also called the Euclidean norm, is the matrix norm of an
However, a more natural way is to consider the following definition. We may view an
Given a norm
Unless specified in context, we use
If we use
The spectrum of a matrix is the set of all its eigenvalues. This proposition shows that why this norm is called the "spectral" norm.
We have
The advantage to use operator norm is that we usually need the Cauchy-Schwarz inequality, which is trivially true (by definition) under the operator norm: for all
Unfortunately, Newton’s method does not guarantee descent of the function values even when the Hessian matrix is positive definite. Similar to the gradient descent with a step size
Since
The convergence of the damped Newton’s method has two phase: damped Newton phase and quadratically convergent phase. We can show that there exists
Another way to control
Moreover, Newton’s method is affinely invariant. Suppose
A convex function
More generally, a multivariate convex function
The self-concordant functions include many of the logarithmic barrier functions that play an important role in barrier method and interior point method for solving convex optimization problems.
In fact, the coefficient
For strictly convex self-concordant function, we obtain bounds in terms of the Newton decrement