Lecture 6. Convex Functions (cont’d)
...


6.1 Convexity-preserving operations
...

For the convex sets, we have known that if and are both convex sets, then , , and are all convex sets. We wonder if there exist similar convexity-preserving operations for functions.

Nonnegative Sums
...

Theorem

If are convex, and , then is convex.
Furthermore, if a two-dimension function is convex for any fixed , and there exist a series of cofficients , then is also a convex function.

We can verify the convexity of objective functions via Jensen inequality.

Pointwise Maximum
...

Theorem

If are convex, then is also convex.
Futhermore, if is convex for any fixed , then is also a convex function.


The proof is immediate by noting that the epigraph of the pointwise maximum function is the intersection of the epigraphs of all 's.

Affine mapping
...

Theorem

Suppose is convex/concave, and , then is also convex or concave (the same as ).

Example

is convex for any norm function.

By the triangle inequality, it holds thatTherefore, is convex and its affine transformation is also convex due to the above theorem.

Scalar composition
...

Given two convex and differentiable functions and , consider their scalar composition . When will also be convex?
When , we first compute the second order derivative of : With the help of the direct computation results, we have the following theorem:

Theorem
  • is convex if is convex and one of the following proposition is true:
    • is increasing and is convex;
    • is decreasing and is concave.
  • is convex if is concave and one of the following proposition is true:
    • is increasing and is concave;
    • is decreasing and is convex.
Proof

We just show the proof of case 2, the other three cases can reuse this proof. If is concave and is decreasing, then Let . We have

Note that for other four cases, we can not determine whether is convex just by the monotonic and convexity.

Example
  • is convex if (but if , we have no idea whether is convex or not).
  • If and , then is neither convex nor concave.
  • If and , then is concave.
  • (log-sum-exp) If and , then is convex.

We should explain more on the log-sum-exp function. This function is very useful for approximately computing the maximum of ().
We usually hope the objective function of the optimization problem is differentiable. However is not. So log-sum-exp gives a good approximation of (log-sum-exp is a smooth function).
Given a series of points , the softmax function returns a probability distribution. Moreover, the distribution is equal to the gradient of the log-sum-exp.

Vector composition
...

Suppose , or . Let Then we have the following theorem by defining "increasing" as: if for all .

Theorem
  • is convex if is convex and one of the following proposition is true:
    • is increasing and is convex for all ;
    • is decreasing and is concave for all .
  • is convex if is concave and one of the following proposition is true:
    • is increasing and is concave for all ;
    • is decreasing and is convex for all .

Minimization over convex sets
...

Theorem

Suppose is convex, and is convex, then is convex. (e.g., .)

Proof

We prove this theorem by verifying the Jensen's inequality. In other words, we want to show that By the definition of , for any , there exist such that . Therefore, Since is convex and is convex, we have Therefore, for any , Taking the limit , it certifies the Jensen's inequality.

6.2 Applications of convexity
...

Now we consider the problem of the triangle inequality for general -norms, which we omitted before. Let us first prove as warm-up.

To verify the triangle inequality for all , we need a generalized version of Cauchy-Schwarz. We first introduce the monotonicity for norms.

Proposition

Let . Then if .

Proof
  • If , .
  • If . First we normalize . Let and (namely, ), then we have . Thus,

Recall the detail of the proof of -norm, we can notice that the key point is to apply the Cauchy-Schwarz inequality: for any two vectors , .
When considering general -norm, we wonder if there also exists an inequality in form of . In fact, there exists an important inequality called Hölder's inequality.

Theorem (Hölder' inequality)

Let and be two conjugate exponents, i.e., . Then for any two vectors , the following inequality holds:

Proof

Without loss of generality, we assume that for any . If , the inequality obviously holds. Otherwise, we first normalize these two vectors. Let Then . So our goal is to prove that .
We first claim that Taking the logarithm on both sides, it is equivalent to the following inequality: By the Jensen's inequality, the above inequality holds since is concave.
Next, applying this claim to and , we have Summing them up we have

Now we are going to show the triangle inequality for norms, which is also called the Minkowski inequality.

Theorem (Minkowski inequality)

For any two vectors , and any such that , the following inequality holds:

Proof

Assuming , we have By the Hölder's inequality, Therefore, we conclude that

6.3 Convex optimization problems
...

After defining and discussing properties of convex sets and convex functions, we now introduce what type of optimization problems we should consider in this course.

Recall that, in general, an optimization problem is to find the minimum value of where satisfies and . Namely, it can be written as the following forms.

Definition (Optimization Problem)

The following problem is the standard form of an optimization problem.

  • The domain of is given by
  • The feasible set of is given by
  • The optimal value of is
  • The optimal solution of (if exists) is

For convenience, we usually allow to take the extended value . Conventionally,

  • if is infeasible (i.e., );
  • if is unbounded below over ;
  • is an optimal solution iff and
  • is a locally optimal point if there exists such that holds for all .

In particular, in this course, we mainly consider the convex optimization.

Definition (Convex optimization problem)

Given an optimization problem , it is called a convex optimization, if the objective function is convex, every equality constraint is affine, and every inequality constraint is convex.

Clearly, the domain of is convex, since all domains of , and are convex and the domain of is their intersection.
We also note that the feasible set is a convex set, since the solution sets is affine, the -sublevel sets are all convex, and is their intersection.

Proposition
  1. For a convex optimization problem, any local minimum is also a global minimum.
  2. The set of optimal solutions is also convex.
  3. In particular, if is strictly convex, there is at most one optimal solution.
Proof of item 2

For any two optimal solutions , for all , since is convex. By the Jensen's inequality, we have Obviously, . Therefore, .

In fact, we can show that the -sublevel set of a function, defined byis convex if is a convex function. (Exercise!)
Similarly, we can also define the -level set of as and define the -superlevel set of as . The -superlevel set of a concave function is convex.
Thus, another proof of item is to note that is the intersection of two convex sets: and the -sublevel set of .

Now we can say that a convex optimization problem is to compute the minimum value of a convex function over a convex set. However, the converse statement is not true. Calculating the minimum of a convex function on a convex set is not always a convex optimization problem. Consider the following example:

Example

The following optimization problem has convex objective function and convex feasible set. But it is not a convex optimization problem. The feasible set is just , which is a convex set. However, is not affine and is not convex. Hence this problem is not convex.

Here are some canonical types of convex optimization problems.

Linear programming
...

A linear programming is a convex optimization where the objective function and constraint functions are all affine (linear).

Example (Linear programming)

Quadratic programming
...

A quadratic programming is a convex optimization where the objective function is quadratic and constraint functions are all affine.

Example (Quadratic programming)

Quadratically constrained quadratic programming
...

A quadratically constrained quadratic programming is a convex program where the objective function and inequality-constraint functions are all quadratic functions.

Example (Quadratically constrained quadratic programming)

Note that it is convex iff and for all .

The linear least square regression is a typical QP or QCQP. Given , our goal is to find to minimize . By the direct calculation, we have since and the optimal solution satisfies .