Convergence of Bregman Alternating Direction Method of Multipliers for Nonseparable Nonconvex Objective with Linear Constraints ()
1. Introduction
We consider the following two-block linearly constrained nonseparable nonconvex optimization problem:
(1)
where
is a proper lower semicontinuous function,
is a continuous differentiable function,
is a smooth function,
is a matrix and
is a vector. Problem (1) finds numerous applications in scenarios such as the control of a smart grid system [1] [2] [3] , the appliance load model [4] , cognitive radio network and other related domains [5] [6] .
To solve this problem, the commonly employed method is the Alternating Direction Method of Multipliers (ADMM) [7] which stands out as a popular and well-established approach.
In 2017, Guo et al. studied the convergence of the ADMM to solve the problem (1) [7] and its iteration is as follows:
(2)
Here,
denotes the augmented Lagrangian function defined as
where
is the Lagrangian multiplier associated with the linear constraints and
is the penalty parameter.
Note that, there is a critical condition that the function
is L-smooth in their convergence analysis. Indeed, for most algorithms, the absence of convexity makes the smoothness condition a necessary requirement for convergence analysis. Unfortunately, the same holds true for ADMM [8] [9] [10] [11] . However, there exists a multitude of applications that are nonsmooth. The prevalence of non-smooth problems necessitates contemplating how to relax smoothness in nonconvex situations.
To alleviate the L-smoothness condition, Tan and Guo introduced in 2023 the convergence of the following Bregman ADMM for the special case of problem (1) [12] where the coupling term
:
(3)
where
is a proper lower semicontinuous function,
is a continuous differentiable function,
is a matrix and
is a vector. It also encompasses a range of significant applications in diverse areas, such as imaging processing [13] [14] [15] , statistical learning [16] [17] [18] , engineering [19] [20] [21] and so on. Problem (1) reduces to (3) when the coupled function H is absent.
The iterative format of Bregman ADMM is as follows:
(4)
Here, the augmented Lagrangian function
which is defined as
where
is the Lagrangian multiplier associated with the linear constraints and
is the penalty parameter. The above Bregman ADMM reduces to the
classic ADMM when
. However, their proof only established the con-
vergence of the separable problem (3), where there is no coupling term. Therefore, the motivation of this paper is to extend the algorithm to the nonseparable problem (1) with a coupling term.
The iterative format of Bregman ADMM for the problem (1) is as follows:
(5)
Here, the augmented Lagrangian function
which is defined as
where
is the Lagrangian multiplier associated with the linear constraints and
is the penalty parameter.
Given our aim to relax the strict smoothness condition
to be relatively smooth, our focus is solely on modifying the Euclidean distance in the iterative formulation of the y variable to the Bregman distance, and the iterative format of x is the same as the classical ADMM. The iterative format called new Bregman ADMM for the problem (1) is as follows:
(6)
Here denotes
the augmented Lagrangian function which is defined as
where
is the Lagrangian multiplier associated with the linear constraints and
is the penalty parameter.
Building upon the aforementioned motivation, our attention is directed towards the Bregman ADMM applied to the nonseparable problem (1). We delve into the following aspects: For two-block nonseparable nonconvex optimization problems with linear constraints, we develop an appropriate Lyapunov function and leverage the KL inequality to examine the global convergence of the Bregman ADMM. The paper is organized as follows. We first summarize some necessary preliminaries for further analysis in Section 2. In Section 3, we analyze the convergence of variant Bregman ADMM and its convergence rate. Finally, some conclusions are made in Section 4.
2. Preliminaries
In this section, we summarized some notations and preliminaries to be used for further analysis.
Definition 2.1. [22] For an extended real-valued function
, the effective domain or just the domain is the set
Definition 2.2. [22] A function
is called proper if there at least one
such that
.
Definition 2.3. [22] A function
is called lower semicontinuous at
if
for any sequence
for which
as
. Moreover, f is called lower semicontinuous if it is lower semicontinuous at each point in
.
Definition 2.4. ( [23] kernel generating distance) Let C be a nonempty, convex, and open subset of
. Associated with C, a function
is called a kernel generating distance if it satisfies the following:
1)
is proper, lower semicontinuous, and convex, with
and
.
2)
is
on
.
We denote the class of kernel generating distance by
.
Given
, define Bregman distance
by
(7)
Since h is convex,
, and
only when
.
Lemma 2.1. [24] Let
be a kernel generating distance. The following properties follow directly from (7). Let
and
, then
1)
.
2) Three points identity holds:
(8)
In the subsequent analysis, we will use a pair of functions
satisfying
1)
.
2)
is a proper and lower semicontinuous function
, which is continuously differentiable on
.
Definition 2.5. ( [23] L-smooth adaptable) A pair
is called L-smooth adaptable on C if there exists
such that
and
are convex on C.
Remark 2.1. Since
and
is convex, then its gradient are monotonous, we obtain
and
If we assume,
, for all
, (9)
then by Cauchy-Schward inequality we immediately obtain the above two inequalities. Thus, (9) provides a Lipschitz-like gradient property of g with respect to
.
Remark 2.2. Definition 2.5 naturally complements and extends the definition of “A Lipschitz-like/Convexity Condition” in [22] , which allows to obtain the following two-sided descend lemma.
Lemma 2.2. ( [23] extended descent lemma) The pair of functions
is L-smooth adaptable on C if and only if
(10)
In particular, when the set
and
, (10) reduces to the classical descent lemma for a function g with a L-smooth gradient on
, i.e.,
Definition 2.6. [23] Let
be a proper lower semicontinuous function.
1) The Fréchet subdifferential, or regular subdifferential of f at
, written
, is the set of vectors
that satisfy
When
we set
.
2) The limiting-subdifferential, or simply the subdifferential, of
at
, written
, is defined as follows:
Remark 2.3. From the above definition, we note that
1) The above definition implies
for each
, where the first set is closed convex while the second one is only closed.
2) Let
be a sequence that converges to
. By the definition of
, if
converges to
as
, then
.
3) A necessary condition for
to be a minimizer of
is
(11)
4) If
is a proper lower semicontinuous and
is continuous differentiable, then
for any
.
A point satisfying Equation (11) is called a critical point or a station point. The critical points set of f is denoted by crit
.
Now, we recall an important property of subdifferential calculus.
Lemma 2.3. [25] Suppose that
, where
and
are proper lower
semicontinuous functions. Then for all
, we have
Definition 2.7. ( [25] Kurdyka-Lojasiewicz inequality) Let
be a proper lower semicontinuous function. For
, set
We say that function f has the KL property at
if there exist
, a neighbourhood U of
, and a continuous concave function
, such that
1)
;
2)
is
on
and continuous at 0;
3)
;
4) for all
in
, the Kurdyka-Lojasiewicz inequality holds
Definition 2.8. ( [26] Kurdyka-Lojasiewicz function) Denote
be the set of functions that satisfy Defination 2.7 1) - 3). If f satisfies the KL property at each point of
, then
is called a KL function.
Lemma 2.4 ( [27] Uniformized KL property) Let Ω be a compact set and let
be a proper and lower semicontinuous function. Assume that f is constant on Ω and satisfies the KL property at each point of Ω. Then, there exist
,
, and
such that for all
and for all x in the following intersection:
one has
Definition 2.9. We say that
is a critical point of the Augmented Lagrangian Function with Bregman distance
if it satisfies
Lemma 2.5. [28] Let
be a continuously differentiable function whose gradient
is Lipschitz continuous with constant
, then for any
, we have
Definition 2.10. A proper lower semicontinuous function
is called semi-convex with constant
if the function
is convex. Specially, if
, then g is convex.
Remark 2.4. Definition (2.10) is equivalent to
for all
and all
.
3. Convergence Analysis
We commence our analysis by examining the optimality condition. By invoking the optimality condition for Equation (6), we obtain
(12)
By utilizing the previous equation and reorganizing terms, we derive
(13)
In what follows, we assume:
Assumption 3.1. Let
be a semiconvex function with constant
,
be L-smooth adaptable on domh where h be
on the interior of domh,
strong-convex and
is Lipschitz continuous with
on any bounded subset of
, and let
be a smooth function. Assume the following holds:
1)
,
,
;
2) For any fixed x, the partial gradient
is globally Lipschitz with constant
, that is
For any fixed y, the partial gradient
is globally Lipschitz with constant
, that is
3) There exists
such that
4)
is Lipschitz continuous on bounded subsets of
. In other words, for each bounded subset
, there exists
such that for all
,
:
5)
for some
.
6)
For the sake of convenience, in the following analysis, we frequently employ the notation
and
. We commence our analysis with the subsequent lemma.
Lemma 3.1. Let
be the sequence generated by the ADMM (1.3) which is assumed to be bounded, then we have
(14)
where
Proof.
By considering the definition
, we obtain
(15)
By applying the fact that h is
-strong-convex, we get
(16)
Using the Cauchy-Schwarz inequality, we have
(17)
Combining (16) and (17), we can conclude that
(18)
Based on the iterative scheme of the algorithm, we can infer the following
Therefore,
(19)
By combining Equations (18) and (19), we can infer that
(20)
Substituting the previous inequality into Equation (15), which yields
(21)
As defined by
, we have
(22)
Assumption A (3.1) demonstrates
is L-smooth adaptable, which implies
(23)
For any fixed
, the partial gradient
is globally Lipschitz with constant
, which can be inferred from assumptions 2). Further applying the Lemma (2.5) yields
(24)
By three points identity (8), we obtain,
(25)
The above two equations use that
Inserting the above three Equations (24), (25), (23) into (22) yields
(26)
From optimal condition,
(13) we get
(27)
From the condition that h is
-strong-convex and
-smooth, we get
(28)
Substituting the aforementioned inequality into (27) reveals
(29)
By defination of
, we obtain
(30)
From the Assumption 3.1 that the partial gradient
is globally Lipschitz with constant
, which implies
By Lemma 2.5, we know
(31)
Besides,
is semiconvex with parameter
, which implies (2.4) as follows:
for all
and all
.
From optimality condition (13), we could know
Therefore,
(32)
Inserting Equation (32), (31) into (30) yields
(33)
From the three-point equation, we get
(34)
Using (28), we could know
(35)
Substituting above two equations (34), (35) into (33), we obtain
(36)
By the fact that
is Lipschitz continuous with
on any bounded subset of
, we get
(37)
Using Equation (21), (29) and (36), which yields
(38)
It is not difficult to see
(39)
Using the condition that
, which yields
(40)
From the assumption that
for some
, we obtain
(41)
Substituting (40), (41) and (20) into (38), which yields
(42)
From the optimality condition (13), we have
(43)
Then,
(44)
Using assumption 4),
We can obtain
which implies
(45)
Next, we consider two different scenarios:
1)
,
From the given assumption that
is L-smooth adaptable and Equation (9), which yields
, for all
,
then,
(46)
2)
, it is easy to verify that (46) holds.
By substituting Equation (45) and (46) into (44), we obtain
(47)
Combining (47) and (42), we have
(48)
By simple calculation, when
where
(49)
(50)
then
where
Thus, the theorem is established.
Remark 3.1. If
, then we have
. Moreover, by
, we have
. This result is larger than 2L which is
proved by Guo et al.’s classical result [8] . Howerver, if we directly solve the separable problem, it can cover their conclusion. The reason of our weaker conclusion could be that the inequality bounds are not tight enough in our proof.
Lemma 3.2. Let
be the sequence generated by the Bragman ADMM (6) which is assumed to be bounded. Then the following holds
(51)
Proof.
Since
is bounded, then there exists a subsequence
such that
. By the condition that f is lower semi-continuous and g, H are continuous, we obtain
is semicontinuous, which implies
Consequently,
is bounded from below.
Note that, Lemma 3.1 implies that
is nonincreasing and thus
is convergent. Moreover, we have
is convergent and
. Reassanging terms of (3.1) leads to
Now, summing the above inequality over
yields
Since
, we have
, thus
(52)
Moreover, it follows from (52) and (47) that
, therefore, we obtain
.
Lemma 3.3. Let
be the sequence generated by the Bragman ADMM (6) which is assumed to be bounded. Then there exists
such that
(53)
Proof.
By the definition of function
, it follows
(54)
This, together with Equation (13), which yields
(55)
Let
(56)
Then it follows from Lemma 2.3 that
. Furthermore, there exist
such that
(57)
Again, since
is lipschitz continuous on bounded subsets and
is bounded, it follows
(58)
From (47), we know
(59)
By setting
, it follows above three Equations (57), (58), (59), we can obtain
(60)
where the third inequality follows from the Cauchy inequality. Thus, the Lemma 3.3 is proven.
Lemma 3.4. Let
be the sequence generated by the Bragman ADMM (6) which is assumed to be bounded. Let
denote the set of its limit point. Then,
1)
is a nonempty compact set,and
, as
;
2)
;
3)
is finite and constant on
, equal to
.
Proof.
1) This item follows as an elementary consequence of the definition of limit points.
2) For any fixed
, there exists a subsequence
of
converges to
.
Since
is the global minimizer of
for the variable
, it holds
Using the above inequality and the continuity of
with respect to
and
ensure
On the other hand, Lemma 3.3 implies
, which means that the subsequence
also converges to
.
From the lower semicontinuity of
, we have
(61)
Then by combining above two equations together we can get
(62)
which implies
(63)
Passing to the limit in (13) and (54) along the subsequence
and invoking (63), the continuity of
,
,
, it follows that
The last equation implies that
due to the strong convexity of
.
By the definition of
, we have
(64)
By the definition of critical point of the augmented Lagrangian function
(1.2), if it satisfies:
Then,
is a critical point of
, i.e.,
, and
.
3) For any point
, there exists a subsequence
converges to
.
Since
is nonincreasing, combining (61) and (62) leads to
(65)
Therefore,
is constant on
. Moreover,
.
Next, we will prove the convergence of the iterative sequence.
Theorem 3.1. Let
be the sequence generated by the ADMM (6) which is assumed to be bounded. Suppose that
is a KL function, then
has finite length, that is
and as a consequence, we have
converged to a critical point of
.
Proof.
Since from the proof of Lemma (3.4), it follows that
for all
. We consider two cases:
1) If there exists an integer
for which
. Rearranging terms of (14) we have that for any
,
implying that
for any
.
Associated with (47), for any
, it follows that
and the assertion holds.
2) Now, assume
for all
. We claim there exists
such that for all
,
(66)
where
. To see this, note that
and
, then for all
, there exists
such that when
, we have
Since
is a nonempty compact set and
is constant on
, applying Lemma (2.4) with
, we deduce that for any
Since
, making use of the concavity of
we get that
Thus, using the inequalities
and
, we obtain
Combining Lemma 3.1 with the above relation gives (66), as desired. Moreover, (66) implies
Notice that
, for all
. Then we obtain
Summing the inequality above over
yields
Since
, rearranging terms and letting
lead to
which implies
Thus, it follows that
These, together with (47), we obtain
Moreover, note that
Therefore,
and
is a Cauchy sequence (see P.482 for a simple proof), which shows that the sequence generated by our algorithm converges. The assertion then follows immediately from Lemma 3.4.
We now establish the convergence rate for the ADMM (6). The proof of the following result is similar to that in [8] and hence omitted here.
Theorem 3.2 Let
be the sequence generated by the ADMM (6) and converges to
. Assume that
has the KL property at
with
,
,
. Then the following estimations hold:
1) If
, then the sequence
converges in a finite number of steps.
2) If
, then there exist
and
, such that
3) If
, then there exists
, such that
4. Conclusions
In this paper, we conducted an analysis of the convergence properties of the Bregman Alternating Direction Method of Multipliers (ADMM) for addressing linearly constrained nonconvex minimization problems with coupled objective functions. Our study operated under the assumption that the associated functions satisfy the Kurdyka-Lojasiewicz inequality. We successfully demonstrated that the iterative sequence generated by the Bregman ADMM converges to a critical point of the augmented Lagrangian function, given that the penalty parameter in the augmented Lagrangian surpasses a certain threshold. Additionally, with further conditions on the problem’s data, we established the convergence rate of the algorithm. Looking forward, extending Algorithm (6) to handle multi-block non-separable linearly constrained nonconvex optimization problems and incorporating generalized ADMM represent meaningful avenues for future exploration.
Funding
These authors were supported by the Natural Science Foundation of China (Grant Nos. 11801455), the Sichuan Science and Technology Program (Grant No. 2023NSFSC1922), the Innovation Team Funds of China West Normal University (Grant No. KCXTD2023-3), the Fundamental Research Funds of China West Normal University (Grant No. 23kc010).