lec05

Lecture 5. Convex Functions
...

5.1 Definition
...

We now introduce convex functions. For convenience, use to denote for any .

Definition (Convex functions)

Let be a real-valued function. Then it is convex if

the domain is convex;
satisfies the Jensen’s inequality, i.e., for all and , it holds that

The function is concave if is convex.

Geometrically, the line segment between and lies above the graph of .
Jensen's inequality

Definition (Strictly convex functions)

Let be a real-valued function. Then it is strictly convex if

the domain is convex;
satisfies the strict Jensen’s inequality, i.e., for all and , it holds that

The function is strictly concave if is strictly convex.

Note that an affine function is both convex and concave, but not strictly convex or strictly concave. The following proposition shows that if a function is both convex and concave, then it must be an affine function.

Proposition

Let be convex. If for some , then it holds for any , i.e., is an affine function for .

Why? Suppose there exists such that . Then choose any and note that is a convex combination of and . Applying the Jensen’s inequality on them, we can obtain contradictions.

Why these functions are called convex? Someone may think their graphs are somehow concave. Actually, what we concern is the area above the graphs of convex functions.

Definition

Given a real-valued function ,

the graph of is defined as
the epigraph of is defined by
the hypograph of is defined by

Theorem

Let be a real-valued function. Then is convex if and only if is convex.

Proof

"". Given any and , since is convex, and both and in , then which implies that and Thus is convex and the Jensen’s inequality holds.
"". Given where and , and , let By the convexity of , . By the definition of , Then, , which means is convex.

Example

(Univariate functions) where , and where are all strictly convex. (Or we say and are convex over , and is convex over .)
(Norm functions) Any norm is a convex function, but can not be strictly convex (e.g., -norm, not strictly convex by absolute homogeneity).

Why do we consider convex functions? One of the most important properties of convex functions is that local minimum points must be global minimum.

Theorem

If is a convex function and has a local minimum point , then is a global minimum point.

Proof

Assume not. Then there exists such that . Since is a local minimum point, there exists such that for all , . Choose sufficiently small such that . Then . By the convexity of , and thus , which contradicts that is local minimum.

Extended-value functions
...

Suppose the domain of function is not . Then we can extend the value of to so that the domain of can be extended to , namely, we can define for a function as follows:where we assume , for any and .
Note that the extended-value function of a convex function is still convex, since the epigraph remains the same.

Generalization of the Jensen’s inequality
...

It is easy to show the following generalization of Jensen’s inequality by induction.

Proposition

Suppose is a convex function. Then for any where and any , it holds that

Intuitively we can generalize the inequality to the convex combination of infinite many variables. We actually have the following generalized form of the Jensen’s inequality but we should note that the proof of it is nontrivial since we cannot use induction!

Theorem

Let be a probability space, be an integrable real-valued random variable and be a convex function. Then it holds that Equivalently we have the following measure-theoretic form

5.2 Properties and conditions of convexity
...

Midpoint convexity
...

Now we would like to proof that is a convex function over .
We verify the Jensen's inequality: Taking the exponent on the both sides, it is equivalent to show thatIf , it is trivial by the AM-GM inequality. However, how can we verify the inequality for , or is it sufficient for us to verify Jensen's inequality only for ?

Remark

Technically we cannot use the weighted AM-GM inequality here, since the weighted version is usually proved by the Jensen's inequality and concavity of the logarithm, which is what we want to show!

We say a function is midpoint convex if the Jensen's inequality holds for and every . Clearly, convex functions are midpoint convex. Conversely, it is not necessarily true. But luckily, if the function is also continuous, then it is convex.

Theorem (Jensen, 1905)

If is a continuous midpoint convex function defined on a convex set , then is convex.

Proof

Prove by contradiction. Assume and such that . Let Then we have and . By the compactness of , there exists , and exists such that .
Now let . By the continuity of , . Thus . Select sufficiently small such that . Since is midpoint convexity , we have However, by the definition of , we have , , and , which leads to a contradiction.

Warning

There exists midpoint convex but not convex functions if we admit the axiom of choice. Such a function would have to be non-measurable.

Now we use this theorem to verify a more complicated example: is convex for positive definite matrix .

We admit the fact that is continuous. Then we verify the midpoint convexity: Since , exists and . So our goal is equivalent to show thatNote that , so , where is the -th eigenvalue of matrix . Thus we have and Now it suffices to show that for all . Since , consider the eigen-decomposition . It implies that there exists and invertible (note that ).
If , then and vice versa. So Note that is symmetric, and , , where . Hence, , which yields that all eigenvalues are nonnegative.
Combining all of above, we conclude that is convex.

Zeroth order condition
...

We now consider some properties of convexity. Conversely, these properties also provide some criteria to verify convexity.

Let be a single-variable function. Usually it is easy to verify the Jensen's inequality. So our first condition is that, is convex if and only if its restriction to any line is convex.

Theorem

Suppose is a function defined on a convex set . Then is convex iff , is convex.

Example

is convex, since , is convex.

Proof

"". Assume is convex. Fix . For any , let . It suffices to show that , (i) ; (ii) .
Let . Since , . Thus, , which indicates that . Furthermore,
"". Given , let and . Since is convex and , we have that , . Thus . Moreover, , which implies that is convex.

First order condition
...

If is further differentiable, we have the following important criterion (and an important property) for convex functions.

Theorem

Suppose is differentiable in an open convex set . Then is convex in iff

The first order condition shows that convex functions have linear lower bounds.

We usually use to denote the inner product. So the first condition is also written as .

Example (Bernoulli's inequality)

if and ;
.

Proof

"". Fix any . Let . By the Jensen's inequality, Rearranging it, we have Recall that . Taking the limits on both sides, we have
"". For all and , let . The first-order condition gives that Then immediately implies that .

Corollary

In particular, if , then for all . If is further strictly convex, is the unique global minimum point.
Given , is a supporting hyperplane of at .

The first order condition also holds for the strict convexity if applying strict inequality. For the proof for strict convexity, the direction remains the same. However how can we prove the direction? Note that taking the limit cannot keep the strict inequality.

Theorem (First order condition for strict convexity)

Suppose is differentiable in an open convex set . Then is strictly convex in iff

Proof

Let , similar to the proof of non-strict version, we have Consider another coefficient such that , then we also have
Applying Jensen inequality (writing as a convex combination of and ), it's easy to verify that Taking , we have which prove the first order condition for strictly convex functions.

An important corollary of the first order condition is the property of monotone gradient.

Corollary (Monotone gradient)

Let be a continuously differentiable function. Then, is a convex function if and only if is monotone, i.e., .

Proof

"". When is convex, for all , by the first order condition, Then, Thus we obtain the monotone gradient.
"". When is monotone, for all , define the function . Then For , by elementary calculation Note that . Then This means, we can assume the dimension . For all (without loss of generality, assuming ), by the mean value theorem, there exists such that Since , we have , which leads to the first order condition .

Second order condition
...

The property of monotone gradients indicates that the second order derivative is somehow nonnegative. Assume is a univariate function. Then implies that is increasing and thus is nonnegative. If be a multivariate function, the second order derivative is . A generalized notion of is in this case.

Theorem

Suppose is twice differentiable in an open convex set . Then is convex over iff , .
Furthermore, if , , then is strictly convex over (but not vice versa).

Example

is strictly concave over , since and .
is strictly convex for all , since .
is convex over for or , and concave otherwise.
The log-sum-exp function is convex over . (Exercise. Hint: , , , so , where and , thus )

Proof

"". For any , we define , where . Hence . By the first order condition, for all , which implies by the second-order condition for optimality. So we have .
"". Given two arbitrary points , let , and . Applying Taylor series with Lagrange remainder to , there exists such that Since, for all , it follows that Therefore, is convex by the first order condition.

For strict convexity, we can replace by and by in the "" direction, and apply the first order condition for strict convexity. However, for the "" direction, similar argument cannot be true, since strictly optimal point cannot imply .

Example

Consider the function . It is strictly convex, but is not strictly greater than zero.
Similarly, consider the function . It is strictly convex, but , which is not positive definite for .

However, for a series of special functions, the equivalent relation of strict convexity holds. Consider quadratic functions, Without loss of generality, we can assume is symmetric. This is because It is easy to compute that . Then the following propositions are true:

is convex iff . It can be implied by the above theorem.
is strictly convex iff . The "" direction is easy to verify. So we only need to prove the "" direction. Note that Since is strictly convex, we have for all (applying the first order condition). That is, , which implies that .

Example

The following figures show the different convexity of when takes different values.

The first one is strict convex since .
The second one is convex since .
The third one is not convex since .

Lecture 5. Convex Functions...

5.1 Definition...

Extended-value functions...

Generalization of the Jensen’s inequality...

5.2 Properties and conditions of convexity...

Midpoint convexity...

Zeroth order condition...

First order condition...

Second order condition...

Lecture 5. Convex Functions
...

5.1 Definition
...

Extended-value functions
...

Generalization of the Jensen’s inequality
...

5.2 Properties and conditions of convexity
...

Midpoint convexity
...

Zeroth order condition
...

First order condition
...

Second order condition
...