...

...

Given an optimization problem *optimal solution* is usually denoted by

In general, the question is hard to answer. We only have the following conclusion for some special objective functions and feasible sets.

Theorem (*Weierstrass extreme value theorem*)

Given a compact set

We now review some definitions in analysis.

Definition (*Open ball*)

For a norm function

Example

The following figure shows the open balls of

We can define *open sets* and *closed sets*.

Definition

- (
*open set*) A set is*open*if - (
*closed set*) A set is*closed*if its complement is open.

For *closed sets*, there is another different but equivalent definition.

Theorem

A set *closed* iff for all sequence

Example

- For
，since , there exists a open ball where , hence, is a open set. - For
，since , hence, is not a closed set.

Then we define *compact sets*.

Definiton (*Compact sets*)

A set *compact* if any open cover of it has a finite subcover.

In

Theorem (Heine–Borel Theorem)

A set *bounded* and *closed*.

For optimization problems whose feasible sets are not compact, we usually cannot have simple ways to determine whether optimal solutions exist. However, for continuous function

...

Just like the

We first identify *global minima* and *local minima*.

Definition

Given a function

*local minimum point*, if there exists such that*global minimum point*, if .

The value *global / local minimum value* of

Similarly, we can also define *strictly global minima* and *strictly local minima*.

Unfortunately, it is too hard to verify global minima in general. It also provides evidence why general optimization problems are difficult to solve. In this course we will study a special type of optimization problem, where local minima are also global minima.

We now give some criteria that can be used to prove local minima.

...

Suppose

The generalization of *derivative* in high dimensions is the *directional derivative*.

Definition (*Directional derivative*)

Given *directional derivative* of

In particular, if *directional derivative* is called the *partial derivative*

Given *differential* of a function

Definition (*Differential*)

Given *differentiable* at *differential* of *Jacobian matrix*).

In particular, if *gradient* of

If

Tip

If

Remark

The existence of directional derivatives **cannot** imply the existence of differential.

Consider the following function:

Now we give some examples and calculation rules of differentials.

Example

where and . Then . where . Then and . where and . Then .

Here is a simple proof of the last example:

Proposition

*Multiplication*: Given two functions , let . Then .- Chain rule: Given
differentiable at , differentiable at , let (i.e, ). Then

We are ready to give the *first-order optimality condition*.

Theorem (*First-order necessary condition*)

Suppose

An important idea is to restrict a multivariate function to a line.

Proof

Fix

Corollary

Suppose

Proof

Let

In particular, if

...

Unfortunately, the first-order condition is a necessary condition. If *saddle point*.

Example (*Saddle point*)

Consider function *saddle point*, neither a minimum nor a maximum.

We can compute the high-order derivatives to refute saddle points.

For a multivariate function *Hessian matrix* of

Theorem (*Schwarz’s theorem*, or *Clairaut's theorem*)

Given a function

We are ready to establish the second-order condition. Consider a function

Now let

Another idea is to consider the second-order Taylor series:

Theorem (*Second-order necessary condition*)

Suppose

...

In order to determine whether the Hessian of a function satisfies above condition, we introduce the definition of *definite matrix*.

Definition (*Definite matrix*)

Let

*positive definite*(denoted by , or ), if , ;*positive semidefinite*(denoted by , or ), if , ;*negative definite*(denoted by , or ), if , ;*negative semidefinite*(denoted by , or ), if , ;*indefinite*, if , .

Proposition

Suppose

iff all of its eigenvalues are non-negative, iff all of its eigenvalues are positive.

To prove this proposition, we first introduce the *eigendecomposition*, which is a simplified case of SVD (*singular value decomposition*).

Definition (*Eigendecomposition*)

Let

For any eigenvector

Proof of the proposition

We use the eigendecomposition of

Example

Consider the following matrix

In addition, each eigenvalue

...

Given a matrix *principal submatrix* of *principal minor* (主子式). In particular, if *leading principal minor* (顺序主子式).

Theorem (*Sylvester's criterion*)

Suppose

iff for all , iff for all , if for , and .

Remark

We cannot get a criterion for semidefiniteness similar to the first criterion for positive definiteness. Consider the following matrix, all of its principal minor are non-negative. Consider the following example:

...

Finally, we give a sufficient condition to assert a local minimum point.

Theorem (*Second-order sufficient condition*)

Suppose

Remark

Many minimum points do not satisfy this condition. Consider the function