1. Introduction
In this paper, we consider the following unconstrained optimization problem:
, (1.1)
where
is a real-valued twice continuously differentiable function.
The trust region method is an effective iterative method for solving problem (1.1). In 1944, Levenberg [1] firstly proposed a trust region method, where a modified Gauss-Newton method is given for solving nonlinear least square problems. Pioneer researches on trust region methods were given by Powell [2] [3] [4] [5], Fletcher [6], Hebden [7], Madsen [8], Osborne [9], Moré [10], Toint [11] [12] [13] [14] [15], Dennis and Mei [16], Sorensen [17] [18], and Steihaug [19]. Moré [20] gave good research of early works on trust region methods, which promoted trust region methods and standardized the term “trust region”. The modern version of trust region methods can be traced back to Powell [4]. Later, Conn [21] applied the modern trust region methods to solve ill-conditioned problems and proved that it has strong global convergence. In order to minimize
, the trust region method constructs trust region subproblem to compute a trial step
through the following quadratic approximation:
(1.2)
where
,
is the gradient of the objective function at the current iteration point, the approximation of the Hessian matrix
is an
symmetric matrix,
denotes the Euclidean norm and
is a positive parameter which is called trust region radius.
At each iteration, the strategy of choosing a trust region radius
is very crucial. In the standard trust region method, the following ratio to make a comparison between the objective function and the model is defined:
. (1.3)
In the case, if the ratio
is close to 1, it is concluded that there is a good agreement between the objective function and the model over this step, so it is safe to increase the trust region radius for the next iteration. Otherwise, if
is close to 0 or even negative, we must shrink the trust region radius.
The strategies of determining and updating trust region radius affect the number of computational cost and convergence of the algorithm. There are many researchers who pay much attention to determining and updating the trust region radius [22] - [27]. In 1997, Sartenaer [22] proposed a new approach to determine a radius by monitoring an agreement between the model and the objective function along the direction
computed at the starting point. But the parameters of this procedure may be dependent on the problem that should be solved. In 2005, Gould et al. [28] examine the sensitivity of trust-region algorithms on the parameters related to the step acceptance and update of the trust region, although, they did not discuss an initial trust-region radius. Motivated by a problem in neural network, in 2002, Zhang et al. [26] proposed a strategy to determine the trust region radius. Specifically, they solved the subproblem (1.2) with
,
where
, p is a nonnegative integer and
is a positive definite matrix based on a modified Cholesky factorization from Schnabel and Eskow [29]. It needs to estimate
at each iteration which can not use this radius for large-scale optimization problems. Inspired by Zhang’s method, Cui and Wu [30] proposed a new adaptive trust region method, which can automatically update the trust region radius during calculation. The adaptive radius is given by
,
avoiding the repeated solution of
, and the trust region is no longer dependent on the current iteration information
. In 2004, Shi and Wang [27] proposed an adaptive radius given by
,
where
, p is a non-negative integer and
is a positive definite matrix. More recently, Shi and Guo [31] proposed an adaptive trust region:
, (1.4)
the vector
satisfies the angle consdition:
. (1.5)
where
. Theoretical analysis shows that the proposed trust region method has global convergence for first-order critical points, and preliminary numerical results show that the proposed method is effective for solving medium-scale unconstrained optimization problems. Kamandi et al. [32] give an improved version of the trust-region radius (1.4). They proposed a modification of
:
(1.6)
where
is a solution of the subproblem (1.2) which can be accessed and
. It is straightforward that
satisfies the condition (1.5). To avoid a very small trust region radius, the formula is defined:
(1.7)
where
, and
is determined by (1.5). Then, the trust region radius is updated by
, (1.8)
where
is a real-valued constant,
and p is a nonnegative integer.
Due to the high efficiency of nonmonotone techniques, many researchers use the nonmonotone technique in the trust region algorithm framework. In 1986, Grippo et al. [33] put forward the nonmonotone line search technology for the first time. The stepsize
satisfies condition
, (1.9)
where
, the nonmonotone term
is defined by
, (1.10)
where
,
, for
, and
is a given non-negative integer. This technique leads to a breakthrough in nonmonotonic algorithms for nonlinear optimization. Based on the proposed nonmonotone technique by Grippo et al, Ke and Han [34], Sun [35], Fu and Sun [36] presented various nonmonotone trust region methods. In 2004, Zhang and Hager [37] found that nonmonotone techniques (1.10) have some drawbacks. For example, the numerical performances are seriously dependent on the choice of parameter
; A good function value generated at any iteration may not be useful; For any given bound
on the memory, even an iterative method is generating R-linearly convergence for a strongly convex function, the iterates may not satisfy the condition (1.3) for k sufficiently large [38]. In order to cope with these disadvantages, Zhang and Hager [37] propose another nonmonotonic technique where the stepsize
satisfies the following condition:
,
where
(2.1)
(2.2)
where
;
and
are two prefixed constants. Inspired by this nonmonotone technique, in 2006, Mo et al. [39] introduced it into trust region method and developed a nonmonotone algorithm. The numerical results indicate that the algorithm is robust and encouraging. In 2019, Xue et al. [40] propose a new improved nonmonotone adaptive trust region method for solving unconstrained optimization problems. From the perspective of theoretical analysis, it is shown that algorithm possesses global convergence and superlinear convergence under classical assumptions.
Among the existing nonmonotone strategies, Gu and Mo [41] propose a simpler nonmonotone technique, and its computational complexity is greatly reduced. Therefore, based on the method in [40] and [41], we propose a new improved nonmonotone adaptive trust region method for solving unconstrained optimization problems. Under appropriate conditions, we analyze the global convergence and superlinear convergence of the algorithm.
2. The Structure of the New Algorithm
In this section, we talk about our algorithm for solving unconstrained optimization problems in detail. As we see, The nonmonotone technique proposed by Zhang and Hager [37] implies that each
is a convex combination of the previous
and
, including the complex
and
. In practice, it becomes an encumbrance to update
and
at each iteration. Therefore, Gu and Mo [41] proposed another nonmonotone technique where the nonmonotone term is revised by:
(2.1)
where
;
and
are two prefixed constants.
Then, the actual reduction of the objective function value is given by:
, (2.2)
the predicted reduction of the objective function value is given by:
. (2.3)
In order to determine whether the trial step is feasible and how to update the new trust region radius, we compute the modified ratio that is given by:
. (2.4)
We describe the new trust region region algorithm with adjustable radius as below:
In Algorithm 2.1, if
, it is called a successful iteration. The loop started from Step 3 to Step 4 is called the inner cycle.
The flowchart of our algorithm is provided here:
3. Convergence Analysis
In this paper, we consider the following assumptions that will be used to analyze the convergence properties and the superlinear convergence rate of the below new algorithm:
(H1) The level set
, where
is a closed and bounded set and the objective function f is a twice continuously differentiable over
;
(H2) The approximation matrix
is uniformly bounded, i.e. there exists a positive constant M such that
for all
;
(H3) Matrix
is invertible and the step
computed from Algorithm 2.1 for all
.
The subsequent results are essential results to establish the global convergence of Algorithm 2.1.
Lemma 3.1. Suppose that the sequence
is generated by Algorithm 2.1. Then we have
. (3.1)
Proof. See [43] for reference. £
Lemma 3.2. If (H2) holds, the sequence
is generated by Algorithm 2.1, and
is a solution of (1.2) with
given by (1.7), then we get
, (3.2)
for all
.
Proof. See [32] for reference. £
Lemma 3.3. Let
be the sequence generated by Algorithm 2.1. Then we have
,
. (3.3)
Proof. From the definition of
, we have
(3.4)
and
, (3.5)
Now we Let
and
. We consider two cases:
Case 1.
. From (2.4) and (3.2), we have
.(3.6)
From (3.4), (3.5) and (3.6), we have
,
. (3.7)
Case 2.
. If
, from Case 1, we have
, and from Step 4 of Algorithm 2.1 we get
,
,
. Then we get
. (3.8)
From (3.4), (3.5) and (3.8), we have
,
. (3.9)
If
, let
. If
, then from Step 4 of Algorithm 2.1, we have
,
. Therefore, from the definition of
,
. On the other hand, if
, let
, then we have
,
. Obviously,
, then we get
from Case 1. Then we have
Then from (3.4) and (3.5), we get
. Therefore, from the definition of
, we have
. From (3.4) and (3.5), we get
.
Both Case 1 and Case 2 imply that
,
. So the proof is completed. £
Lemma 3.4. Step 3 and Step 4 of Algorithm 2.1 are well-defined in the sense that at each iteration they terminate finitely.
Proof. The proof this lemma is quite as the same as [40], for the completeness of this work, we prove it again here. By contradiction, assume that the inner loop from Step 3 to Step 4 of algorithm 2.1 is infinite. Now, let
be the solution of subproblem (1.2) corresponding to
at
. Then we have
,
(3.10)
Since
is not optimum, then we have
,
, (3.11)
using (3.11) and (1.5), we get
. (3.12)
It follows from Lemma 3.1-3.2 and (3.12) that
(3.13)
By the assumption that the inner cycle cycles infinity and (1.8), we obtain that
with
.
implies that the right-hand side of the above equation (3.13) tends to zero. This means that for sufficiently large i, we get
, (3.14)
combining (2.4) and Lemma 3.3, we get
, (3.15)
this means for
,
which is contradictory to (3.10), so the proof is completed. £
Theorem 3.1. If (H1) holds and the sequence
is generated by Algorithm 2.1, then we have
. (3.16)
Proof. By contradiction, suppose that there exists a constant
such that
,
. (3.17)
Using (2.4), Lemma 3.2 and
, we conclude that
. (3.18)
From the definitions of
and (3.18), we can get
(3.19)
Using (3.12) and (3.19), then we get
. (3.20)
We can conclude from Lemma 3.3 that the sequence
is monotonically nonincreasing and
. According to assumption (H1) that f has a lower bound, then we deduce that
is convergent. From (3.20) we have
, (3.21)
we define
, then
. (3.22)
From (3.17) and (3.22), we get there exists an index set H such that
, (3.23)
Therefore,
. (3.24)
From (1.8),
as
and
. Now, assume that there are more than one inner cycles in the loop from Steps 3 to 4 at the kth iterate for all
. Therefore, the solution
of the subproblem
(3.25)
is not accepted at the kth iteration, then we denote
,
, (3.26)
but we have
for
from Lemma 3.4, which is a contradiction with (3.26). This implies the result is valid. £
Theorem 3.2. If (H1)-(H3) holds, the sequence
is generated by Algorithm2.1 converges to
, suppose that
is a Lipschitz continuous matrix in a neighborhood
of
, also suppose that
and
are uniformly positive definite matrices such that
. (3.27)
Proof. See [44] for reference. £
4. Conclusion
In this paper, we introduce the algorithm of a new nonmonotone adaptive trust region method for solving unconstrained optimization problems based on (1.8) and (2.1). The nonmonotone strategy is introduced into a new adaptive trust region. Maratos effects are avoided and the amount of calculation is reduced. Furthermore, it is obvious that the current objective function value
is fully employed. With the help of nonmonotone technique and adaptive trust region radius, our algorithm can reduce the number of ineffective iterations so that we enhance the effectiveness of the algorithm. Under some standard and suitable assumptions, the global convergence and superlinear convergence of the new algorithm are analyzed theoretically. However, our algorithm still has some continuation and expansion, we can consider the following aspect: although this paper gives the theoretical proof of the proposed method as detailed as possible, it does not fully demonstrate the superiority of the new algorithm through numerical experiments, which will be the focus of further work.