Lecture 2. Optimality Condition
...


2.1 Existence of the optimal solution
...

Given an optimization problem the optimal solution is usually denoted by The first question is: for which optimization problems, the optimal solution exist?
In general, the question is hard to answer. We only have the following conclusion for some special objective functions and feasible sets.

Theorem (Weierstrass extreme value theorem)

Given a compact set , if function is continuous on , then it is bounded and has (both min/max) extreme values.

We now review some definitions in analysis.

Definition (Open ball)

For a norm function and , an -dimensional open ball of radius is the collection of points of distance less than . Explicitly, the open ball with center and radius is defined by .

Example

The following figure shows the open balls of -norm and -norm:

We can define open sets and closed sets.

Definition
  • (open set) A set is open if
  • (closed set) A set is closed if its complement is open.

For closed sets, there is another different but equivalent definition.

Theorem

A set is closed iff for all sequence , where , , it holds that

Example
  1. For ,since , there exists a open ball where , hence, is a open set.
  2. For ,since , hence, is not a closed set.

Then we define compact sets.

Definiton (Compact sets)

A set is compact if any open cover of it has a finite subcover.

In , there is another definition.

Theorem (Heine–Borel Theorem)

A set is compact iff it is bounded and closed.

For optimization problems whose feasible sets are not compact, we usually cannot have simple ways to determine whether optimal solutions exist. However, for continuous function and , if , , then is a compact set, and thus has minimum values.

2.2 Global minimum and local minimum
...

Just like the vs. problem, verifying a solution is believed to be easier. So we first study how to justify a solution is indeed an optimal one.

We first identify global minima and local minima.

Definition

Given a function , where is . A point is said to be a

  • local minimum point, if there exists such that
  • global minimum point, if .

The value is called the global / local minimum value of , respectively.

Similarly, we can also define strictly global minima and strictly local minima.

Unfortunately, it is too hard to verify global minima in general. It also provides evidence why general optimization problems are difficult to solve. In this course we will study a special type of optimization problem, where local minima are also global minima.

We now give some criteria that can be used to prove local minima.

2.3 First-order optimality condition
...

Suppose is continuous and differentiable. We know that if is a extreme point only if . Can we have similar results in high dimensions?

The generalization of derivative in high dimensions is the directional derivative.

Definition (Directional derivative)

Given , , , the directional derivative of at with respect to is defined by if the limit exists.
In particular, if , the directional derivative is called the partial derivative

Given , we can use to do a linear approximation of at , where can be seen as a linear mapping. It is natural to define the differential of a function at by a linear mapping if .

Definition (Differential)

Given , if there exists a matrix (i.e., ), such that then we call is differentiable at , and is the differential of at (sometimes it also known as the Jacobian matrix).
In particular, if , is called the gradient of .
If , suppose . Then the Jacobian matrix is given by

Tip

If is differentiable at , then the directional derivatives at form a linear mapping with respect to . Thus it gives that immediately.

Remark

The existence of directional derivatives cannot imply the existence of differential.
Consider the following function: Then has directional derivative at for all direction, but is not differential at . (Actually, is even not continuous at .)

Now we give some examples and calculation rules of differentials.

Example
  • where and . Then .
  • where . Then and .
  • where and . Then .

Here is a simple proof of the last example: , so which yields that .

Proposition
  • Multiplication: Given two functions , let . Then .
  • Chain rule: Given differentiable at , differentiable at , let (i.e, ). Then

We are ready to give the first-order optimality condition.

Theorem (First-order necessary condition)

Suppose is a function differential at some and continuous in . If is a local minimum point, then for any feasible direction (i.e. such that for any ),

An important idea is to restrict a multivariate function to a line.

Proof

Fix . Define by . Then . Since is a local minimum point, it holds that for any . Therefore, , which gives that .

Corollary

Suppose is further an interior point (i.e., such that ). Then .

Proof

Let . Then . It implies that .

In particular, if is an open set, any point is an interior point. So .

2.4 Second-order optimality condition
...

Unfortunately, the first-order condition is a necessary condition. If , we still do not know whether is a local minimum. An simple example is function and . For multivariate functions, there is another case called the saddle point.

Example (Saddle point)

Consider function . Clearly . But is a saddle point, neither a minimum nor a maximum.
1694623536064.png

We can compute the high-order derivatives to refute saddle points.

For a multivariate function , is a mapping . We can further compute the Jacobian matrix of :The transpose matrix of the Jacobian is called the Hessian matrix of , and denoted by , or . So .

Theorem (Schwarz’s theorem, or Clairaut's theorem)

Given a function , and a point such that for some . If has continuous for all in . Then for all , which yields that is a symmetric matrix.

We are ready to establish the second-order condition. Consider a function . Intuitively, if is a local minimum, then we have , and for sufficiently small . Thus .

Now let be a multivariate function . Fix and consider the restriction of . Let . Using the chain rule, we have In particular, we need .

Another idea is to consider the second-order Taylor series: Hence we can reasonable guess that since .

Theorem (Second-order necessary condition)

Suppose is a twice continuously differentiable function, and is a local minimum. Then ,

Definite matrix
...

In order to determine whether the Hessian of a function satisfies above condition, we introduce the definition of definite matrix.

Definition (Definite matrix)

Let be a symmetric matrix. Then is

  • positive definite (denoted by , or ), if , ;
  • positive semidefinite (denoted by , or ), if , ;
  • negative definite (denoted by , or ), if , ;
  • negative semidefinite (denoted by , or ), if , ;
  • indefinite, if , .
Proposition

Suppose is a real symmetric matrix, then

  • iff all of its eigenvalues are non-negative,
  • iff all of its eigenvalues are positive.

To prove this proposition, we first introduce the eigendecomposition, which is a simplified case of SVD (singular value decomposition).

Definition (Eigendecomposition)

Let be a real symmetric matrix with eigenvalues . Then can be decomposed as , where is a diagonal matrix of eigenvalues, and consists of orthonormal eigenvectors, namely is an orthonormal eigenvector of corresponding (i.e., , and , , and it implies that ).

For any eigenvector , we have . So . Thus ;

Proof of the proposition

We use the eigendecomposition of . Since , we have Note that . So . Clearly the result for all iff for all (just by letting ).

Example

Consider the following matrix Since and is if , is positive definite.
In addition, each eigenvalue of satisfies . By solving this equation, we obtain that . Since all of the two eigenvalues are positive, is positive definite.

Sylvester’s criterion
...

Given a matrix a principal submatrix of is a submatrix of , consisting of rows and columns of the same indices , The determinant of is called the principal minor (主子式). In particular, if , is called the leading principal minor (顺序主子式).

Theorem (Sylvester's criterion)

Suppose is a symmetric matrix, then

  • iff for all ,
  • iff for all ,
  • if for , and .
Remark

We cannot get a criterion for semidefiniteness similar to the first criterion for positive definiteness. Consider the following matrix, all of its principal minor are non-negative. Consider the following example: It is easy to see that for all . However, is not positive semidefinite.

Second-order sufficient condition
...

Finally, we give a sufficient condition to assert a local minimum point.

Theorem (Second-order sufficient condition)

Suppose is a twice continuously differentiable function. Then is a local minimum if and .

Remark

Many minimum points do not satisfy this condition. Consider the function . Clearly is a local minimum. But the Hessian of at is .