Last Upgraded on August 28, 2021

In a previous post, we introduced the method of Lagrange multipliers to find regional minima or regional optimums of a function with equality restraints. The same method can be applied to those with inequality restraints also.

In this tutorial, you will discover the approach of Lagrange multipliers applied to find the regional minimum or optimum of a function when inequality constraints are present, additionally together with equality restraints.

After completing this tutorial, you will understand

  • How to find points of regional maximum or minimum of a function with equality constraints
  • Method of Lagrange multipliers with equality restraints

Let’s get going.

Lagrange Multiplier Approach with Inequality Constraints

Lagrange Multiplier Technique with Inequality Constraints Photo by Christine Roy, some rights reserved.

Requirements

For this tutorial, we presume that you already have actually evaluated:

as well as

You can review these ideas by clicking the links above.

Constrained Optimization and Lagrangians

Extending from our previous post, a constrained optimization problem can be usually considered as

$$
start lined up
min & & f(X)
textrm subject to & & g(X) &= 0
& & h(X) & ge 0
& & k(X) & le 0
end aligned
$$

where $X$ is a scalar or vector worths. Here, $g(X)=0$ is the equality constraint, and $h(X) ge 0$, $k(X) le 0$ are inequality restrictions. Keep in mind that we constantly use $ ge$ and $ le$ instead of $ gt$ and $ lt$ in optimization issues due to the fact that the previous defined a closed set in mathematics from where we should try to find the value of $X$. These can be numerous restrictions of each enter an optimization issue.

The equality constraints are easy to deal with but the inequality restrictions are not. For that reason, one way to make it much easier to deal with is to convert the inequalities into equalities, by presenting slack variables:

$$
start
minutes & & f(X)
textrm & & g(X) &= 0
& & h(X)– s ^ 2 &= 0
& & k(X) + t ^ 2 &= 0
end
$$

When something is unfavorable, adding a specific favorable amount into it will make it equivalent to no, and vice versa. That amount is the slack variable; the $s ^ 2$ and $t ^ 2$ above are examples. We deliberately put $s ^ 2$ and $t ^ 2$ terms there to denote that they need to not be unfavorable.

With the slack variables introduced, we can utilize the Lagrange multipliers approach to resolve it, in which the Lagrangian is defined as:

$$
L(X, lambda, theta, phi) = f(X)– lambda g(X)– theta (h(X)-s ^ 2) + phi (k(X)+t ^ 2)
$$

It works to know that, for the optimum option $X ^ *$ to the problem, the inequality restrictions are either having the equality holds (which the slack variable is no), or not. For those inequality restraints with their equality hold are called the active constraints. Otherwise, the inactive constraints. In this sense, you can consider that the equality constraints are constantly active.

The Complementary Slackness Condition

The reason we need to know whether a restriction is active or not is since of the Krush-Kuhn-Tucker (KKT) conditions. Precisely, the KKT conditions describe what occurs when $X ^ *$ is the ideal solution to a constrained optimization issue:

  1. The gradient of the Lagrangian function is absolutely no
  2. All constraints are pleased
  3. The inequality restraints satisfied complementary slackness condition

The most crucial of them is the complementary slackness condition. While we discovered that optimization problem with equality restriction can be resolved using Lagrange multiplier which the gradient of the Lagrangian is no at the optimum option, the complementary slackness condition extends this to the case of inequality restriction by saying that at the ideal option $X ^ *$, either the Lagrange multiplier is no or the matching inequality restraint is active.

Making use of complementary slackness condition is to assist us check out different cases in resolving the optimization issue. It is the very best to be explained with an example.

Example 1: Mean-variance portfolio optimization

This is an example from finance. If we have 1 dollar and were to participate in 2 different investments, in which their return is modeled as a bi-variate Gaussian circulation. How much should we purchase each to reduce the total variation in return?

This optimization problem, also called Markowitz mean-variance portfolio optimization, is formulated as:

$$
start lined up
minutes & & f(w_1, w_2) &= w_1 ^ 2 sigma_1 ^ 2+w_2 ^ 2 sigma_2 ^ 2 +2 w_1w_2 sigma _
textrm & & w_1+w_2 &= 1
& & w_1 & ge 0
& & w_1 & le 1
end
$$

which the last 2 are to bound the weight of each investment to between 0 and 1 dollar. Let’s assume $ sigma_1 ^ 2=0.25$, $ sigma_2 ^ 2=0.10$, $ sigma _ = 0.15$ Then the Lagrangian function is defined as:

$$
begin lined up
L(w_1, w_2, lambda, theta, phi) =& 0.25 w_1 ^ 2 +0.1 w_2 ^ 2 +0.3 w_1w_2
&- lambda(w_1+w_2-1)
&- theta(w_1-s ^ 2)– phi(w_1-1+t ^ 2)
end aligned
$$

and we have the gradients:

$$
start aligned
frac partial L partial w_1 &= 0.5 w_1 +0.3 w_2- lambda- theta- phi
frac partial L partial w_2 &= 0.2 w_2 +0.3 w_1- lambda
frac partial L &= 1-w_1-w_2
frac partial L partial theta &= s ^ 2-w_1
frac &= 1-w_1-t ^ 2
end
$$

From this point onward, the complementary slackness condition need to be considered. We have two slack variables $s$ and $t$ and the matching Lagrange multipliers are $ theta$ and $ phi$. We now need to think about whether a slack variable is zero (which the corresponding inequality constraint is active) or the Lagrange multiplier is absolutely no (the restriction is non-active). There are 4 possible cases:

  1. $ theta= phi=0$ and $s ^ 2 > 0$, $t ^ 2 > 0$
  2. $ theta=0$ however $ phi ne 0$, and $s ^ 2=0$, $t ^ 2 > 0$
  3. $ theta ne 0$ however $ phi=0$, and $s ^ 2 > 0$, $t ^ 2=0$
  4. $ theta ne 0$ and $ phi ne 0$, and $s ^ 2=t ^ 2=0$

For case 1, utilizing $ partial L/ partial lambda=0$, $ partial L/ partial w_1=0$ and $ partial L/ partial w_2=0$ we get

$$
begin
w_2 &= 1-w_1
0.5 w_1 + 0.3 w_2 &= lambda
0.3 w_1 + 0.2 w_2 &= lambda
end line up
$$

which we get $w_1=-1$, $w_2=2$, $ lambda=0.1$. But with $ partial L/ partial theta=0$, we get $s ^ 2=-1$, which we can not discover a service ($s ^ 2$ can not be unfavorable). Hence this case is infeasible.

For case 2, with $ partial L/ partial theta=0$ we get $w_1=0$. Hence from $ partial L/ partial lambda=0$, we understand $w_2=1$. And with $ partial L/ partial w_2=0$, we found $ lambda=0.2$ and from $ partial L/ partial w_1$ we get $ phi=0.1$. In this case, the unbiased function is 0.1

For case 3, with $ partial L/ partial phi=0$ we get $w_1=1$. For this reason from $ partial L/ partial lambda=0$, we understand $w_2=0$. And with $ partial L/ partial w_2=0$, we get $ lambda=0.3$ and from $ partial L/ partial w_1$ we get $ theta=0.2$. In this case, the objective function is 0.25

For case 4, we get $w_1=0$ from $ partial L/ partial theta=0$ however $w_1=1$ from $ partial L/ partial phi=0$. Hence this case is infeasible.

Comparing the unbiased function from case 2 and case 3, we see that the worth from case 2 is lower. Hence that is taken as our service to the optimization problem, with the optimum service achieved at $w_1=0$, $w_2=1$.

As an exercise, you can retry the above with $ sigma _ 12 =-0.15$. The option would be 0.0038 obtained when $w_1= frac 5 13 $, with the two inequality restraints inactive.

Example 2: Water-filling algorithm

This is an example from interaction engineering. If we have a channel (say, a cordless bandwidth) in which the sound power is $N$ and the signal power is $S$, the channel capability (in terms of bits per second) is proportional to $ log_2(1+S/N)$. If we have $k$ similar channels, each has its own noise and signal level, the total capacity of all channels is the sum $ sum_i log_2(1+S_i/ N_i)$.

Assume we are utilizing a battery that can give only 1 watt of power and this power need to distribute to the $k$ channels (signified as $p_1, cdots, p_k$). Each channel might have different attenuation so at the end, the signal power is discounted by a gain $g_i$ for each channel. Then the maximum total capacity we can accomplish by using these $k$ channels is created as an optimization issue

$$
start
max & & f(p_1, cdots, p_k) &= sum _ ^ k log_2 left(1+ frac g_ip_i n_i right)
textrm & & sum _ ^ k p_i &= 1
& & p_1, cdots, p_k & ge 0
end
$$

For convenience of differentiation, we see $ log_2x= log x/ log 2$ and $ log(1+g_ip_i/ n_i)= log(n_i+g_ip_i)- log(n_i)$, for this reason the unbiased function can be replaced with

$$
f(p_1, cdots, p_k) = amount _ i=1 ^ k log(n_i+g_ip_i)
$$

Presume we have $k=3$ channels, each has sound level of 1.0, 0.9, 1.0 respectively, and the channel gain is 0.9, 0.8, 0.7, then the optimization problem is

$$
start lined up
max & & f(p_1, p_2, p_k) &= log(1 +0.9 p_1) + log(0.9 +0.8 p_2) + log(1 +0.7 p_3)
textrm subject to & & p_1+p_2+p_3 &= 1
& & p_1, p_2, p_3 & ge 0
end lined up
$$

We have three inequality constraints here. The Lagrangian function is defined as

$$
begin
& L(p_1, p_2, p_3, lambda, theta_1, theta_2, theta_3)
= & log(1 +0.9 p_1) + log(0.9 +0.8 p_2) + log(1 +0.7 p_3)
&– lambda(p_1+p_2+p_3-1)
&– theta_1(p_1-s_1 ^ 2)– theta_2(p_2-s_2 ^ 2)– theta_3(p_3-s_3 ^ 2)
end lined up
$$

The gradient is for that reason

$$
begin
frac partial L partial p_1 & = frac – lambda- theta_1
frac partial p_2 & = frac 0.8 0.9 +0.8 p_2 – lambda- theta_2
frac & = frac 0.7 – lambda- theta_3
frac partial L partial lambda &= 1-p_1-p_2-p_3
frac partial L &= s_1 ^ 2-p_1
frac partial L &= s_2 ^ 2-p_2
frac &= s_3 ^ 2-p_3
end
$$

Today we have 3 slack variables and we need to think about 8 cases:

  1. $ theta_1= theta_2= theta_3=0$, thus none of $s_1 ^ 2, s_2 ^ 2, s_3 ^ 2$ are zero
  2. $ theta_1= theta_2=0$ but $ theta_3 ne 0$, thus only $s_3 ^ 2=0$
  3. $ theta_1= theta_3=0$ but $ theta_2 ne 0$, hence only $s_2 ^ 2=0$
  4. $ theta_2= theta_3=0$ however $ theta_1 ne 0$, for this reason only $s_1 ^ 2=0$
  5. $ theta_1=0$ but $ theta_2, theta_3$ non-zero, hence only $s_2 ^ 2=s_3 ^ 2=0$
  6. $ theta_2=0$ but $ theta_1, theta_3$ non-zero, thus only $s_1 ^ 2=s_3 ^ 2=0$
  7. $ theta_3=0$ but $ theta_1, theta_2$ non-zero, for this reason only $s_1 ^ 2=s_2 ^ 2=0$
  8. all of $ theta_1, theta_2, theta_3$ are non-zero, thus $s_1 ^ 2=s_2 ^ 2=s_3 ^ 2=0$

Immediately we can inform case 8 is infeasible since from $ partial L/ partial theta_i=0$ we can make $p_1 =p _ 2 =p _ 3=0$ however it can not make $ partial L/ partial lambda=0$.

For case 1, we have
$$
frac 0.9 = frac = frac 0.7 = lambda
$$
from $ partial L/ partial p_1= partial L/ partial p_2= partial L/ partial p_3=0$. Together with $p_3=1-p_1-p_2$ from $ partial L/ partial lambda=0$, we found the option to be $p_1=0.444$, $p_2=0.430$, $p_3=0.126$, and the unbiased function $f(p_1, p_2, p_3)=0.639$.

For case 2, we have $p_3=0$ from $ partial L/ partial theta_3=0$. Even more, utilizing $p_2=1-p_1$ from $ partial L/ partial lambda=0$, and
$$
frac = frac = lambda
$$
from $ partial L/ partial p_1= partial L/ partial p_2=0$, we can fix for $p_1=0.507$ and $p_2=0.493$. The unbiased function $f(p_1, p_2, p_3)=0.634$.

Likewise in case 3, $p_2=0$ and we resolved $p_1=0.659$ and $p_3=0.341$, with the objective function $f(p_1, p_2, p_3)=0.574$.

In case 4, we have $p_1=0$, $p_2=0.652$, $p_3=0.348$, and the unbiased function $f(p_1, p_2, p_3)=0.570$.

Case 5 we have $p_2 =p _ 3=0$ and thus $p_3=1$. Therefore we have the objective function $f(p_1, p_2, p_3)=0.0.536$.

Likewise in case 6 and case 7, we have $p_2=1$ and $p_1=1$ respectively. The unbiased function obtained 0.531 and 0.425 respectively.

Comparing all these cases, we discovered that the optimum value that the unbiased function attained is in case 1. Hence the option to this optimization problem is
$p_1=0.444$, $p_2=0.430$, $p_3=0.126$, with $f(p_1, p_2, p_3)=0.639$.

Extensions and More Checking out

While in the above example, we introduced the slack variables into the Lagrangian function, some books might prefer not to include the slack variables however to limit the Lagrange multipliers for inequality constraints as positive. Because case you might see the Lagrangian function written as

$$
L(X, lambda, theta, phi) = f(X)– lambda g(X)– theta h(X) + phi k(X)
$$

but requires $ theta ge 0; phi ge 0$.

The Lagrangian function is likewise beneficial to use to primal-dual approach for finding the optimum or minimum. This is particularly valuable if the objectives or restrictions are non-linear, which the solution may not be easily found.

Some books that covers this topic are:

Summary

In this tutorial, you discovered how the method of Lagrange multipliers can be applied to inequality constraints. Specifically, you discovered:

  • Lagrange multipliers and the Lagrange function in presence of inequality constraints
  • How to use KKT conditions to solve an optimization problem when inequality restrictions are provided