So far we only consider minimizing differentiable functions. If the objective function is not differentiable, clearly neither the gradient descent nor the Newton's method does not work. In this lecture, we focus on how to solve the optimization problem for much more families of convex functions. We will generalize a method of gradient descent named the proximal gradient descent for nondifferentiable functions.
Recall the gradient descent iteration
Note that
Now we define the proximal operator as follows.
Given
Then we can rewrite the proximal gradient descent method as
Another viewpoint of the proximal operator is a discretization of the gradient flow. Recall the gradient flow
The key is that we need to decompose
Fortunately for some important problems we have a "good" decomposition. For example, we consider the problem of linear progression avoiding overfitting. Suppose we have a data set
Let
The objective function
Given
Finally, recall that
We now show the correctness and convergence of the proximal gradient descent.
Assume that
In the proximal gradient descent iterations,
The "only if" direction is easy, since we have
Now we would like to show that if
A reasonable method is to use convexity. Note that
Let
If
If
This lemma immediately implies that if
By the definition of
For the sake of contradiction, assume that there exists
Let
Finally, we present the results of convergence rate.
Suppose