A New Nonmonotone Adaptive Trust Region Method ()

Yang Zhang^{1}, Quanming Ji^{1}, Qinghua Zhou^{1,2*}

^{1}College of Mathematics & Information Science, Hebei University, Baoding, China.

^{2}School of Applied Mathematics, Beijing Normal University, Zhuhai, China.

**DOI: **10.4236/jamp.2021.912202
PDF HTML XML
109
Downloads
479
Views
Citations

The trust region method plays an important role in solving optimization problems. In this paper, we propose a new nonmonotone adaptive trust region method for solving unconstrained optimization problems. Actually, we combine a popular nonmonotone technique with an adaptive trust region algorithm. The new ratio to adjusting the next trust region radius is different from the ratio in the traditional trust region methods. Under some appropriate conditions, we show that the new algorithm has good global convergence and superlinear convergence.

Keywords

Unconstrained Optimization, Trust Region Method, Nonmonotone Technique, Global Convergence, Superlinear Convergence

Share and Cite:

Zhang, Y. , Ji, Q. and Zhou, Q. (2021) A New Nonmonotone Adaptive Trust Region Method. *Journal of Applied Mathematics and Physics*, **9**, 3102-3114. doi: 10.4236/jamp.2021.912202.

1. Introduction

In this paper, we consider the following unconstrained optimization problem:

$\underset{x\in {R}^{n}}{\mathrm{min}}\text{}f\left(x\right)$, (1.1)

where $f\left(x\right):{R}^{n}\to R$ is a real-valued twice continuously differentiable function.

The trust region method is an effective iterative method for solving problem (1.1). In 1944, Levenberg [1] firstly proposed a trust region method, where a modified Gauss-Newton method is given for solving nonlinear least square problems. Pioneer researches on trust region methods were given by Powell [2] [3] [4] [5], Fletcher [6], Hebden [7], Madsen [8], Osborne [9], Moré [10], Toint [11] [12] [13] [14] [15], Dennis and Mei [16], Sorensen [17] [18], and Steihaug [19]. Moré [20] gave good research of early works on trust region methods, which promoted trust region methods and standardized the term “trust region”. The modern version of trust region methods can be traced back to Powell [4]. Later, Conn [21] applied the modern trust region methods to solve ill-conditioned problems and proved that it has strong global convergence. In order to minimize $f\left(x\right)$, the trust region method constructs trust region subproblem to compute a trial step ${d}_{k}$ through the following quadratic approximation:

$\begin{array}{l}\underset{d\in {R}^{n}}{\mathrm{min}}\text{}{q}_{k}\left(d\right):={f}_{k}+{g}_{k}^{\text{T}}d+\frac{1}{2}{d}^{\text{T}}{B}_{k}d\\ \text{s}\text{.t}\text{.}\text{\hspace{0.05em}}\text{}\Vert d\Vert \le {\Delta}_{k}\end{array}$ (1.2)

where ${f}_{k}=f\left({x}_{k}\right)$, ${g}_{k}=g\left({x}_{k}\right)=\nabla f\left({x}_{k}\right)$ is the gradient of the objective function at the current iteration point, the approximation of the Hessian matrix ${B}_{k}={\nabla}^{\text{2}}f\left({x}_{k}\right)\in {R}^{n\times n}$ is an $n\times n$ symmetric matrix, $\Vert \text{\hspace{0.05em}}\cdot \text{\hspace{0.05em}}\Vert $ denotes the Euclidean norm and ${\Delta}_{k}$ is a positive parameter which is called trust region radius.

At each iteration, the strategy of choosing a trust region radius ${\Delta}_{k}$ is very crucial. In the standard trust region method, the following ratio to make a comparison between the objective function and the model is defined:

${r}_{k}=\frac{Are{d}_{k}}{Pre{d}_{k}}=\frac{{f}_{k}-f\left({x}_{k}+{d}_{k}\right)}{{q}_{k}\left(0\right)-{q}_{k}\left({d}_{k}\right)}$. (1.3)

In the case, if the ratio ${r}_{k}$ is close to 1, it is concluded that there is a good agreement between the objective function and the model over this step, so it is safe to increase the trust region radius for the next iteration. Otherwise, if ${r}_{k}$ is close to 0 or even negative, we must shrink the trust region radius.

The strategies of determining and updating trust region radius affect the number of computational cost and convergence of the algorithm. There are many researchers who pay much attention to determining and updating the trust region radius [22] - [27]. In 1997, Sartenaer [22] proposed a new approach to determine a radius by monitoring an agreement between the model and the objective function along the direction
$-{g}_{k}$ computed at the starting point. But the parameters of this procedure may be dependent on the problem that should be solved. In 2005, Gould *et al.* [28] examine the sensitivity of trust-region algorithms on the parameters related to the step acceptance and update of the trust region, although, they did not discuss an initial trust-region radius. Motivated by a problem in neural network, in 2002, Zhang *et al.* [26] proposed a strategy to determine the trust region radius. Specifically, they solved the subproblem (1.2) with

${\Delta}_{k}={c}^{p}\Vert {g}_{k}\Vert \Vert {\stackrel{^}{B}}_{k}^{-1}\Vert $,

where
$c\in \left(0,1\right)$, *p* is a nonnegative integer and
${\stackrel{^}{B}}_{k}={B}_{k}+iE$ is a positive definite matrix based on a modified Cholesky factorization from Schnabel and Eskow [29]. It needs to estimate
$\Vert {\stackrel{^}{B}}_{k}^{-1}\Vert $ at each iteration which can not use this radius for large-scale optimization problems. Inspired by Zhang’s method, Cui and Wu [30] proposed a new adaptive trust region method, which can automatically update the trust region radius during calculation. The adaptive radius is given by

${\Delta}_{k}={\Vert {d}_{k}\Vert}^{2}/{d}_{k}^{\text{T}}{B}_{k+1}{d}_{k}$,

avoiding the repeated solution of $\Vert {\stackrel{^}{B}}_{k}^{-1}\Vert $, and the trust region is no longer dependent on the current iteration information ${g}_{k}$. In 2004, Shi and Wang [27] proposed an adaptive radius given by

${\Delta}_{k}={c}^{p}{\Vert {g}_{k}\Vert}^{3}/{g}_{k}^{\text{T}}{\stackrel{^}{B}}_{k}{g}_{k}$,

where
$c\in \left(0,1\right)$, *p* is a non-negative integer and
${\stackrel{^}{B}}_{k}={B}_{k}+iE$ is a positive definite matrix. More recently, Shi and Guo [31] proposed an adaptive trust region:

${\Delta}_{k}=-{c}^{p}\frac{{g}_{k}^{\text{T}}{q}_{k}}{{q}_{k}^{\text{T}}{\stackrel{^}{B}}_{k}{q}_{k}}\Vert {q}_{k}\Vert $, (1.4)

the vector ${q}_{k}$ satisfies the angle consdition:

$-\frac{{g}_{k}^{\text{T}}{q}_{k}}{\Vert {g}_{k}\Vert \cdot \Vert {q}_{k}\Vert}\ge \theta $. (1.5)

where
$\theta \in \left(0,1\right)$. Theoretical analysis shows that the proposed trust region method has global convergence for first-order critical points, and preliminary numerical results show that the proposed method is effective for solving medium-scale unconstrained optimization problems. Kamandi *et al.* [32] give an improved version of the trust-region radius (1.4). They proposed a modification of
${q}_{k}$ :

${q}_{k}=\{\begin{array}{l}-{g}_{k},\text{\hspace{0.05em}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}\text{}\text{\hspace{0.05em}}\text{if}\text{\hspace{0.05em}}\text{}k=0\text{\hspace{0.05em}}\text{or}\text{\hspace{0.05em}}\text{}\frac{-\left({g}_{k}^{\text{T}}{d}_{k-1}\right)}{\Vert {g}_{k}\Vert \text{\hspace{0.05em}}\Vert {d}_{k-1}\Vert}\le \theta ,\\ {d}_{k-1},\text{\hspace{0.05em}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}\text{otherwise,}\end{array}$ (1.6)

where ${d}_{k-1}$ is a solution of the subproblem (1.2) which can be accessed and $\theta \in \left(0,1\right)$. It is straightforward that ${q}_{k}$ satisfies the condition (1.5). To avoid a very small trust region radius, the formula is defined:

${s}_{k}=\{\begin{array}{l}-\frac{{g}_{k}^{\text{T}}{q}_{k}}{{q}_{k}^{\text{T}}{B}_{k}{q}_{k}}\Vert {q}_{k}\Vert \text{,}\text{\hspace{0.05em}}\text{if}\text{\hspace{0.05em}}k=0,\\ \mathrm{max}\left\{\text{\hspace{0.05em}}-\frac{{g}_{k}^{\text{T}}{q}_{k}}{{q}_{k}^{\text{T}}{B}_{k}{q}_{k}}\Vert {q}_{k}\Vert ,\lambda {\Delta}_{k-1}\right\},\text{\hspace{0.05em}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}\text{if}\text{\hspace{0.05em}}\text{\hspace{0.05em}}k\ge 1.\end{array}$ (1.7)

where $\lambda >1$, and ${q}_{k}$ is determined by (1.5). Then, the trust region radius is updated by

${\Delta}_{k}={h}^{p}\mathrm{min}\left\{{s}_{k},\stackrel{\u02dc}{\Delta}\right\}$, (1.8)

where
$\stackrel{\u02dc}{\Delta}>0$ is a real-valued constant,
$h\in \left(0,1\right)$ and *p* is a nonnegative integer.

Due to the high efficiency of nonmonotone techniques, many researchers use the nonmonotone technique in the trust region algorithm framework. In 1986, Grippo *et al.* [33] put forward the nonmonotone line search technology for the first time. The stepsize
${\alpha}_{k}$ satisfies condition

$f\left({x}_{k}+{\alpha}_{k}{d}_{k}\right)\le {f}_{l\left(k\right)}+\beta {\alpha}_{k}{g}_{k}^{\text{T}}{d}_{k}$, (1.9)

where $\beta \in \left(0,\frac{1}{2}\right)$, the nonmonotone term ${f}_{l\left(k\right)}$ is defined by

${f}_{l\left(k\right)}\le \underset{0\le j\le m\left(k\right)}{\mathrm{max}}\left\{f\left(x-j\right)\right\}$, (1.10)

where
$m\left(0\right)=0$,
$0\le m\left(k\right)\le \mathrm{min}\left\{m\left(k-1\right)+1,\stackrel{\u02dc}{M}\right\}$, for
$k\ge 1$, and
$\stackrel{\u02dc}{M}$ is a given non-negative integer. This technique leads to a breakthrough in nonmonotonic algorithms for nonlinear optimization. Based on the proposed nonmonotone technique by Grippo et al, Ke and Han [34], Sun [35], Fu and Sun [36] presented various nonmonotone trust region methods. In 2004, Zhang and Hager [37] found that nonmonotone techniques (1.10) have some drawbacks. For example, the numerical performances are seriously dependent on the choice of parameter
$\stackrel{\u02dc}{M}$ ; A good function value generated at any iteration may not be useful; For any given bound
$\stackrel{\u02dc}{M}$ on the memory, even an iterative method is generating R-linearly convergence for a strongly convex function, the iterates may not satisfy the condition (1.3) for *k* sufficiently large [38]. In order to cope with these disadvantages, Zhang and Hager [37] propose another nonmonotonic technique where the stepsize
${\alpha}_{k}$ satisfies the following condition:

$f\left({x}_{k}+{\alpha}_{k}{d}_{k}\right)\le {C}_{k}+\beta {\alpha}_{k}{g}_{k}^{\text{T}}{d}_{k}$,

where

${C}_{k}=\{\begin{array}{l}f\left({x}_{k}\right),\text{}\text{\hspace{0.05em}}\text{}k\ge 0\\ \frac{{\eta}_{k-1}{Q}_{k-1}{C}_{k-1}+f\left({x}_{k}\right)}{{Q}_{k}},\text{}k\ge 1\end{array}$ (2.1)

${Q}_{k}=\{\begin{array}{l}1,\text{}\text{\hspace{0.05em}}\text{}k\ge 0\\ {\eta}_{k-1}{Q}_{k-1}+1,\text{}k\ge 1\end{array}$ (2.2)

where
${\eta}_{k-1}\in \left[{\eta}_{\mathrm{min}},{\eta}_{\mathrm{max}}\right]$ ;
${\eta}_{\mathrm{min}}\in \left[0,1\right)$ and
${\eta}_{\mathrm{max}}\in \left[{\eta}_{\mathrm{min}},1\right)$ are two prefixed constants. Inspired by this nonmonotone technique, in 2006, Mo *et al.* [39] introduced it into trust region method and developed a nonmonotone algorithm. The numerical results indicate that the algorithm is robust and encouraging. In 2019, Xue *et al.* [40] propose a new improved nonmonotone adaptive trust region method for solving unconstrained optimization problems. From the perspective of theoretical analysis, it is shown that algorithm possesses global convergence and superlinear convergence under classical assumptions.

Among the existing nonmonotone strategies, Gu and Mo [41] propose a simpler nonmonotone technique, and its computational complexity is greatly reduced. Therefore, based on the method in [40] and [41], we propose a new improved nonmonotone adaptive trust region method for solving unconstrained optimization problems. Under appropriate conditions, we analyze the global convergence and superlinear convergence of the algorithm.

2. The Structure of the New Algorithm

In this section, we talk about our algorithm for solving unconstrained optimization problems in detail. As we see, The nonmonotone technique proposed by Zhang and Hager [37] implies that each ${C}_{k}$ is a convex combination of the previous ${C}_{k-1}$ and ${f}_{k}$, including the complex ${\eta}_{k}$ and ${Q}_{k}$. In practice, it becomes an encumbrance to update ${\eta}_{k}$ and ${Q}_{k}$ at each iteration. Therefore, Gu and Mo [41] proposed another nonmonotone technique where the nonmonotone term is revised by:

${D}_{k}=\{\begin{array}{l}\text{}{f}_{k},\text{}k=0,\\ {\eta}_{k}{D}_{k-1}+\left(1-{\eta}_{k}\right){f}_{k},\text{}k\ge 1,\end{array}$ (2.1)

where ${\eta}_{k}\in \left[{\eta}_{\mathrm{min}},{\eta}_{\mathrm{max}}\right]$ ; ${\eta}_{\mathrm{min}}\in \left(0,1\right)$ and ${\eta}_{\mathrm{max}}\in \left[{\eta}_{\mathrm{min}},1\right)$ are two prefixed constants.

Then, the actual reduction of the objective function value is given by:

$Are{d}_{k}={D}_{k}-f\left({x}_{k}+{d}_{k}\right)$, (2.2)

the predicted reduction of the objective function value is given by:

$Pre{d}_{k}={q}_{k}\left(0\right)-{q}_{k}\left({d}_{k}\right)$. (2.3)

In order to determine whether the trial step is feasible and how to update the new trust region radius, we compute the modified ratio that is given by:

${r}_{k}=\frac{Are{d}_{k}}{Pre{d}_{k}}=\frac{{D}_{k}-f\left({x}_{k}+{d}_{k}\right)}{{q}_{k}\left(0\right)-{q}_{k}\left({d}_{k}\right)}$. (2.4)

We describe the new trust region region algorithm with adjustable radius as below:

In Algorithm 2.1, if ${r}_{k}\ge v$, it is called a successful iteration. The loop started from Step 3 to Step 4 is called the inner cycle.

The flowchart of our algorithm is provided here:

3. Convergence Analysis

In this paper, we consider the following assumptions that will be used to analyze the convergence properties and the superlinear convergence rate of the below new algorithm:

(H1) The level set
${W}_{0}=\left\{\text{}x\in {R}^{n}:f\left(x\right)\le f\left({x}_{0}\right)\text{}\right\}\text{}\subset \Omega $, where
$\Omega \in {R}^{n}$ is a closed and bounded set and the objective function *f* is a twice continuously differentiable over
${W}_{0}$ ;

(H2) The approximation matrix
${B}_{k}$ is uniformly bounded, *i.e.* there exists a positive constant *M* such that
$\Vert {B}_{k}\Vert \le M$ for all
$k\in N$ ;

(H3) Matrix ${B}_{k}$ is invertible and the step ${d}_{k}=-{B}_{k}^{-1}{g}_{k}$ computed from Algorithm 2.1 for all $k\in N$.

The subsequent results are essential results to establish the global convergence of Algorithm 2.1.

Lemma 3.1. Suppose that the sequence $\left\{{x}_{k}\right\}$ is generated by Algorithm 2.1. Then we have

$\left|{f}_{k}-f\left({x}_{k}+{d}_{k}\right)-Pre{d}_{k}\right|\le O\left({\Vert {d}_{k}\Vert}^{2}\right)$. (3.1)

Proof. See [43] for reference. £

Lemma 3.2. If (H2) holds, the sequence $\left\{{x}_{k}\right\}$ is generated by Algorithm 2.1, and ${d}_{k}$ is a solution of (1.2) with ${\Delta}_{k}$ given by (1.7), then we get

${q}_{k}\left(0\right)-{q}_{k}\left({d}_{k}\right)\ge \frac{1}{2}{h}^{{p}_{k}}\mathrm{min}\left\{\frac{1}{M}{\left(\frac{-{g}_{k}^{\text{T}}{q}_{k}}{\Vert {q}_{k}\Vert}\right)}^{2},\stackrel{\u02dc}{\Delta}\left(\frac{-{g}_{k}^{\text{T}}{q}_{k}}{\Vert {q}_{k}\Vert}\right)\right\}$, (3.2)

for all $k\in N$.

Proof. See [32] for reference. £

Lemma 3.3. Let $\left\{{x}_{k}\right\}$ be the sequence generated by Algorithm 2.1. Then we have

${f}_{k+1}\le {D}_{k+1}\le {D}_{k}$, $\forall k\in N$. (3.3)

Proof. From the definition of ${D}_{k}$, we have

${D}_{k+1}-{D}_{k}=\left(1-{\eta}_{k+1}\right)\left({f}_{k+1}-{D}_{k}\right)$ (3.4)

and

${D}_{k+1}-{f}_{k+1}={\eta}_{k+1}\left({D}_{k}-{f}_{k+1}\right)$, (3.5)

Now we Let $I=\left\{k|{r}_{k}\ge \nu \right\}$ and $J=\left\{k|{r}_{k}<\nu \right\}$. We consider two cases:

Case 1. $k\in I$. From (2.4) and (3.2), we have

${D}_{k}-{f}_{k+1}\ge \nu \left[{q}_{k}\left(0\right)-{q}_{k}\left({d}_{k}\right)\right]\ge \frac{1}{2}\nu {h}^{{p}_{k}}\mathrm{min}\left\{\frac{1}{M}{\left(\frac{-{g}_{k}^{\text{T}}{q}_{k}}{\Vert {q}_{k}\Vert}\right)}^{2},\stackrel{\u02dc}{\Delta}\left(\frac{-{g}_{k}^{\text{T}}{q}_{k}}{\Vert {q}_{k}\Vert}\right)\right\}\ge 0$.(3.6)

From (3.4), (3.5) and (3.6), we have

${f}_{k+1}\le {D}_{k+1}\le {D}_{k}$, $\forall k\in I$. (3.7)

Case 2. $k\in J$. If $k-1\in I$, from Case 1, we have ${f}_{k}\le {D}_{k}$, and from Step 4 of Algorithm 2.1 we get ${x}_{k+1}={x}_{k}$, ${f}_{k+1}={f}_{k}$, $\forall k\in J$. Then we get

${D}_{k+1}={\eta}_{k+1}{D}_{k}+\left(1-{\eta}_{k+1}\right){f}_{k+1}\ge {\eta}_{k+1}{f}_{k+1}+\left(1-{\eta}_{k+1}\right){f}_{k+1}={f}_{k+1}$. (3.8)

From (3.4), (3.5) and (3.8), we have

${f}_{k+1}\le {D}_{k+1}\le {D}_{k}$, $\forall k-1\in I$. (3.9)

If $k-1\in J$, let $K=\left\{i|1<i<k,k-i\in I\right\}$. If $K=\varnothing $, then from Step 4 of Algorithm 2.1, we have ${f}_{0}={f}_{k-j}={f}_{k+1}$, $j=0,1,\cdots ,k-1$. Therefore, from the definition of ${D}_{k}$, ${D}_{k+1}={D}_{k}={f}_{k+1}$. On the other hand, if $K\ne \varnothing $, let $m=\mathrm{min}\left\{i|i\in K\right\}$, then we have ${f}_{k-j}={f}_{k}={f}_{k+1}$, $j=0,1,\cdots ,m-1$. Obviously, $k-m\in I$, then we get ${f}_{k-m+1}\le {D}_{k-m+1}\le {D}_{k-m}$ from Case 1. Then we have

$\begin{array}{c}{D}_{k-m+2}={\eta}_{k-m+2}{D}_{k-m+1}+\left(1-{\eta}_{k-m+2}\right){f}_{k-m+2}\\ \ge {\eta}_{k-m+2}{f}_{k-m+2}+\left(1-{\eta}_{k-m+2}\right){f}_{k-m+2}={f}_{k-m+2}\end{array}$

Then from (3.4) and (3.5), we get ${D}_{k-m+2}\le {D}_{k-m+1}$. Therefore, from the definition of ${D}_{k}$, we have ${D}_{k+1}\le {D}_{k}$. From (3.4) and (3.5), we get ${f}_{k+1}\le {D}_{k+1}\le {D}_{k}$.

Both Case 1 and Case 2 imply that ${f}_{k+1}\le {D}_{k+1}\le {D}_{k}$, $\forall k\in N$. So the proof is completed. £

Lemma 3.4. Step 3 and Step 4 of Algorithm 2.1 are well-defined in the sense that at each iteration they terminate finitely.

Proof. The proof this lemma is quite as the same as [40], for the completeness of this work, we prove it again here. By contradiction, assume that the inner loop from Step 3 to Step 4 of algorithm 2.1 is infinite. Now, let ${d}_{k}^{i}$ be the solution of subproblem (1.2) corresponding to $i\in N$ at ${x}_{k}$. Then we have

${r}_{k}^{i}<\nu $, $i=1,2,\cdots $ (3.10)

Since ${x}_{k}$ is not optimum, then we have

$\Vert {g}_{k}\Vert \ge \delta $, $k\in N$, (3.11)

using (3.11) and (1.5), we get

$-\frac{{g}_{k}{q}_{k}}{\Vert {q}_{k}\Vert}\ge \theta \delta $. (3.12)

It follows from Lemma 3.1-3.2 and (3.12) that

$\begin{array}{c}\left|\frac{f\left({x}_{k}\right)-f\left({x}_{k}+{d}_{k}^{i}\right)}{{q}_{k}\left(0\right)-{q}_{k}\left({d}_{k}^{i}\right)}-1\right|=\left|\frac{f\left({x}_{k}\right)-f\left({x}_{k}+{d}_{k}^{i}\right)-Pre{d}_{k}^{i}}{{q}_{k}\left(0\right)-{q}_{k}\left({d}_{k}^{i}\right)}\right|\\ \le \frac{O\left({\Vert {d}_{k}^{i}\Vert}^{2}\right)}{{q}_{k}\left(0\right)-{q}_{k}\left({d}_{k}^{i}\right)}\\ \le \frac{O\left({\Vert {d}_{k}^{i}\Vert}^{2}\right)}{\frac{1}{2}{h}^{{p}_{{k}_{i}}}\mathrm{min}\left\{\frac{1}{M}{\left(\frac{-{g}_{k}^{\text{T}}{q}_{k}}{\Vert {q}_{k}\Vert}\right)}^{2},\stackrel{\u02dc}{\Delta}\left(\frac{-{g}_{k}^{\text{T}}{q}_{k}}{\Vert {q}_{k}\Vert}\right)\right\}}\\ \le \frac{O\left({\Vert {d}_{k}^{i}\Vert}^{2}\right)}{\frac{1}{2}{h}^{{p}_{{k}_{i}}}\mathrm{min}\left\{{\frac{\left(\theta \delta \right)}{M}}^{2},\stackrel{\u02dc}{\Delta}\left(\theta \delta \right)\right\}}\end{array}$ (3.13)

By the assumption that the inner cycle cycles infinity and (1.8), we obtain that
${\Delta}_{k}^{i}\to 0$ with
$i\to \infty $.
$\Vert {d}_{k}^{i}\Vert \le {\Delta}_{k}^{i}\le {h}_{k}^{i}{}^{p}{s}_{k}$ implies that the right-hand side of the above equation (3.13) tends to zero. This means that for sufficiently large *i*, we get

$\underset{i\to \infty}{\mathrm{lim}}\frac{f\left({x}_{k}\right)-f\left({x}_{k}+{d}_{k}^{i}\right)}{{q}_{k}\left(0\right)-{q}_{k}\left({d}_{k}^{i}\right)}=1$, (3.14)

combining (2.4) and Lemma 3.3, we get

${r}_{k}^{i}=\frac{{D}_{k}-f\left({x}_{k}+{d}_{k}^{i}\right)}{{q}_{k}\left(0\right)-{q}_{k}\left({d}_{k}^{i}\right)}\ge \frac{f\left({x}_{k}\right)-f\left({x}_{k}+{d}_{k}^{i}\right)}{{q}_{k}\left(0\right)-{q}_{k}\left({d}_{k}^{i}\right)}$, (3.15)

this means for $i\to \infty $, ${r}_{k}^{i}\ge \nu \in \left(0,1\right)$ which is contradictory to (3.10), so the proof is completed. £

Theorem 3.1. If (H1) holds and the sequence $\left\{{x}_{k}\right\}$ is generated by Algorithm 2.1, then we have

$\underset{k\to \infty}{\mathrm{lim}}\mathrm{inf}\Vert {g}_{k}\Vert =0$. (3.16)

Proof. By contradiction, suppose that there exists a constant $\delta >0$ such that

$\Vert {g}_{k}\Vert \ge \delta $, $k\in N$. (3.17)

Using (2.4), Lemma 3.2 and ${r}_{k}\ge \nu $, we conclude that

$f\left({x}_{k}+{d}_{k}\right)\le {D}_{k}-\nu Pre{d}_{k}\le {D}_{k}-\frac{1}{2}\nu {h}^{{p}_{k}}\mathrm{min}\left\{\frac{1}{M}{\left(\frac{-{g}_{k}^{\text{T}}{q}_{k}}{\Vert {q}_{k}\Vert}\right)}^{2},\stackrel{\u02dc}{\Delta}\left(\frac{-{g}_{k}^{\text{T}}{q}_{k}}{\Vert {q}_{k}\Vert}\right)\right\}$. (3.18)

From the definitions of ${D}_{k}$ and (3.18), we can get

$\begin{array}{c}{D}_{k+1}={\eta}_{k+1}{D}_{k}+\left(1-{\eta}_{k+1}\right){f}_{k+1}\\ \le {\eta}_{k+1}{D}_{k}+\left(1-{\eta}_{k+1}\right)\left[{D}_{k}-\frac{1}{2}\nu {h}^{{p}_{k}}\mathrm{min}\left\{\frac{1}{M}{\left(\frac{-{g}_{k}^{\text{T}}{q}_{k}}{\Vert {q}_{k}\Vert}\right)}^{2},\stackrel{\u02dc}{\Delta}\left(\frac{-{g}_{k}^{\text{T}}{q}_{k}}{\Vert {q}_{k}\Vert}\right)\right\}\right]\\ ={D}_{k}-\frac{1-{\eta}_{k+1}}{2}\nu {h}^{{p}_{k}}\mathrm{min}\left\{\frac{1}{M}{\left(\frac{-{g}_{k}^{\text{T}}{q}_{k}}{\Vert {q}_{k}\Vert}\right)}^{2},\stackrel{\u02dc}{\Delta}\left(\frac{-{g}_{k}^{\text{T}}{q}_{k}}{\Vert {q}_{k}\Vert}\right)\right\}\end{array}$ (3.19)

Using (3.12) and (3.19), then we get

${D}_{k}-{D}_{k+1}\ge \frac{1-{\eta}_{\mathrm{max}}}{2}\nu {h}^{{p}_{k}}\mathrm{min}\left\{\frac{{\left(\theta \delta \right)}^{2}}{M},\stackrel{\u02dc}{\Delta}\left(\theta \delta \right)\right\}$. (3.20)

We can conclude from Lemma 3.3 that the sequence
${D}_{k}$ is monotonically nonincreasing and
${f}_{k+1}\le {D}_{k+1}$. According to assumption (H1) that *f* has a lower bound, then we deduce that
${D}_{k}$ is convergent. From (3.20) we have

$\underset{k=0}{\overset{\infty}{\sum}}\frac{1-{\eta}_{\mathrm{max}}}{2}\nu {h}^{{p}_{k}}\mathrm{min}\left\{{\frac{\left(\theta \delta \right)}{M}}^{2},\stackrel{\u02dc}{\Delta}\left(\theta \delta \right)\right\}}<\infty $, (3.21)

we define $\frac{1-{\eta}_{\mathrm{max}}}{2}\nu \mathrm{min}\left\{{\frac{\left(\theta \delta \right)}{M}}^{2},\stackrel{\u02dc}{\Delta}\left(\theta \delta \right)\right\}=\gamma $, then

$\underset{k=0}{\overset{\infty}{\sum}}\gamma {h}^{{p}_{k}}}<\infty $. (3.22)

From (3.17) and (3.22), we get there exists an index set H such that

$\underset{k\to \infty ,k\in {\rm H}}{\mathrm{lim}}\frac{-{g}_{k}^{\text{T}}{q}_{k}}{\Vert {q}_{k}\Vert}\ne 0$, (3.23)

Therefore,

$\underset{k\to \infty ,k\in {\rm H}}{\mathrm{lim}}{h}^{{p}_{k}}=0$. (3.24)

From (1.8),
${\Delta}_{k}\to 0$ as
$k\to \infty $ and
$k\in {\rm H}$. Now, assume that there are more than one inner cycles in the loop from Steps 3 to 4 at the *k*th iterate for all
$k\in N$. Therefore, the solution
${\stackrel{\u02dc}{d}}_{k}$ of the subproblem

$\begin{array}{l}\mathrm{min}\text{}\text{}\text{}\text{}\text{}\text{}\text{}\text{}\text{}\text{}\text{}\text{}\text{}\text{}\text{}{m}_{k}\left(d\right)={f}_{k}+{g}_{k}^{\text{T}}d+\frac{1}{2}{d}^{\text{T}}{B}_{k}d\\ \text{s}\text{.t}\text{.}\Vert d\Vert \le {\Delta}_{k}/h,\text{}k\in \text{H}\end{array}$ (3.25)

is not accepted at the *k*th iteration, then we denote

${r}_{k}=\frac{{D}_{k}-f\left({x}_{k}+{\stackrel{\u02dc}{d}}_{k}\right)}{{m}_{k}\left(0\right)-{m}_{k}\left({\stackrel{\u02dc}{d}}_{k}\right)}<\nu $, $k\in N$, (3.26)

but we have ${r}_{k}>\nu $ for $k\to \infty $ from Lemma 3.4, which is a contradiction with (3.26). This implies the result is valid. £

Theorem 3.2. If (H1)-(H3) holds, the sequence $\left\{{x}_{k}\right\}$ is generated by Algorithm2.1 converges to ${x}^{\ast}$, suppose that ${\nabla}^{2}f\left(x\right)$ is a Lipschitz continuous matrix in a neighborhood $N\left({x}^{*},\epsilon \right)$ of ${x}^{*}$, also suppose that $H\left(x\right)$ and ${B}_{k}$ are uniformly positive definite matrices such that

$\underset{k\to \infty}{\mathrm{lim}}\frac{\Vert \left[{B}_{k}-H\left({x}^{*}\right)\right]{d}_{k}\Vert}{\Vert {d}_{k}\Vert}=0$. (3.27)

Proof. See [44] for reference. £

4. Conclusion

In this paper, we introduce the algorithm of a new nonmonotone adaptive trust region method for solving unconstrained optimization problems based on (1.8) and (2.1). The nonmonotone strategy is introduced into a new adaptive trust region. Maratos effects are avoided and the amount of calculation is reduced. Furthermore, it is obvious that the current objective function value ${f}_{k}$ is fully employed. With the help of nonmonotone technique and adaptive trust region radius, our algorithm can reduce the number of ineffective iterations so that we enhance the effectiveness of the algorithm. Under some standard and suitable assumptions, the global convergence and superlinear convergence of the new algorithm are analyzed theoretically. However, our algorithm still has some continuation and expansion, we can consider the following aspect: although this paper gives the theoretical proof of the proposed method as detailed as possible, it does not fully demonstrate the superiority of the new algorithm through numerical experiments, which will be the focus of further work.

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

[1] |
Levenberg, K. (1944) A Method for Solution of Certain Problems in Least Squares. Quarterly of Applied Mathematics, 2, 164-168. https://doi.org/10.1090/qam/10666 |

[2] | Powell, M.J.D. (1970) A Hybrid Method for Nonlinear Equations. In: Rabinowitz, P., Ed., Numerical Methods for Nonlinear Algebraic Equations, Gordon and Breach, London, 87-114. |

[3] | Powell, M.J.D. (1970) A Fortran Subroutine for Solving Systems of Nonlinear Algebraic Equations. In: Rabinowitz, P., Ed., Numerical Methods for Nonlinear Algebraic Equations, Gordon and Breach, London, 115-161. |

[4] |
Powell, M.J.D. (1970) A New Algorithm for Unconstrained Optimization. Nonlinear Programming: Proceedings of a Symposium Conducted by the Mathematics Research Center, New York, 4-6 May 1970, 31-65. https://doi.org/10.1016/B978-0-12-597050-1.50006-3 |

[5] |
Powell, M.J.D. (1975) Convergence Properties of a Class of Minimization Algorithms. Nonlinear Programming 2: Proceedings of the Special Interest Group on Mathematical Programming Symposium Conducted by the Computer Sciences Department, New York, 15-17 April 1974, 1-27. https://doi.org/10.1016/B978-0-12-468650-2.50005-5 |

[6] | Fletcher, R. (1970) An Efficient, Global Convergent Algorithm for Unconstrained and Linearly Constrained Optimization Problems. Technical Report TP 431, AERE Harwell Laboratory, Oxfordshire. |

[7] | Hebden, M.D. (1973) An Algorithm for Minimization Using Exact Second Order Derivatives. Technical Report TP 515, AERE Harwell Laboratory, Oxfordshire. |

[8] |
Madsen, K. (1975) An Algorithm for the Minimax Solution of Overdetermined Systems of Non-Linear Equations. IMA Journal of Applied Mathematics, 16, 321-328. https://doi.org/10.1093/imamat/16.3.321 |

[9] |
Osborne, M.R. (1976) Nonlinear Least Squares—The Levenberg-Marquardt Algorithm Revisited. The ANZIAM Journal, 19, 343-357. https://doi.org/10.1017/S033427000000120X |

[10] |
Moré, J. (1978) The Levenberg-Marquardt Algorithm: Implementation and Theory. In: Watson, G.A., Ed., Numerical Analysis, Springer, Berlin, 105-116. https://doi.org/10.1007/BFb0067700 |

[11] |
Toint, P.L. (1978) Some Numerical Result Using a Sparse Matrix Updating Formula in Unconstrainded Optimization. Mathematics of Computation, 32, 839-851. https://doi.org/10.1090/S0025-5718-1978-0483452-7 |

[12] |
Toint, P.L. (1979) On the Superlinear Convergence of an Algorithm for Solving a Sparse Minimization Problem. SIAM Journal on Numerical Analysis, 16, 103-1045. https://doi.org/10.1137/0716076 |

[13] | Toint, P.L. (1980) Sparsity Exploiting Quasi-Newton Methods for Unconstrained Optimization. In: Dixon, L.C.W., Spedicato, E. and Szego, G.P., Eds., Nonlinear Optimization: Theory and Algorithms, Birkhauser, Belgium, 65-90. |

[14] | Toint, P.L. (1981) Convergence Properties of a Class of Minimization Algorithms that Use a Possibly Unbounded Sequences of Quadratic Approximation. Technical Report 81/1, Department of Mathematics, University of Namur, Belgium. |

[15] | Toint, P.L. (1978) Towards an Efficient Sparsity Exploiting Newton Method for Minimization. In: Duff, I.S., Ed., Sparse Matrices and Their Uses, Academic Press, London, 57-88. |

[16] |
Dennis, J.E. and Mei, H.H. (1979) Two New Unconstrained Optimization Algorithms Which Use Function and Gradient Values. Journal of Optimization Theory and Applications, 28, 453-482. https://doi.org/10.1007/BF00932218 |

[17] |
Sorensen, D.C. (1982) Newton’s Method with a Model Trust Region Modification. SIAM Journal on Numerical Analysis, 19, 409-426. https://doi.org/10.1137/0719026 |

[18] | Sorensen, D.C. (1982) Trust Region Methods for Unconstrained Optimization. In: Powell, M.J.D., Ed., Non-Linear Optimization 1981, Academic Press, London, 29-39. |

[19] |
Steihaug, T. (1983) The Conjugate Gradient Method and Trust Regions in Large Scale Optimization. SIAM Journal on Numerical Analysis, 20, 626-637. https://doi.org/10.1137/0720042 |

[20] |
Moré, J.J. (1983) Recent Developments in Algorithms and Software for Trust Region Methods. In: Bachem, A., Grötschel, M. and Korte, B., Eds., Mathematical Programming: The State of the Art, Springer, Berlin, 258-287. https://doi.org/10.1007/978-3-642-68874-4_11 |

[21] |
Conn, A.R., Gould, N.I.M. and Toint, P.L. (2000) Trust Region Methods Trust Region Methods (MOS-SIAM Series on Optimization). Society for Industrial and Applied Mathematics, Philadelphia. https://doi.org/10.1137/1.9780898719857 |

[22] |
Sartenaer, A. (1997) Automatic Determination of an Initial Trust Region in Nonlinear Programming. SIAM Journal on Scientific Computing, 18, 1788-1803. https://doi.org/10.1137/S1064827595286955 |

[23] |
Lin, C.J. and More, J.J. (1999) Newton’s Method for Large Bound-Constrained Optimization Problems. SIAM Journal on Optimization, 9, 1100-1127. https://doi.org/10.1137/S1052623498345075 |

[24] |
Zhang, X.S. (2000) NN Models for General Nonlinear Programming. Neural Networks in Optimization, 46, 273-288. https://doi.org/10.1007/978-1-4757-3167-5_11 |

[25] | Fan, J.Y. and Yuan, Y.X. (2001) A New Trust Region Algorithm with Trust Region Radius Converging to Zero. Proceedings of the 5th International Conference on Optimization: Techniques and Applications, Hong Kong, 15-17 December 2001, 786-794. |

[26] | Zhang, X.S., Zhang, J.L. and Liao, L.Z. (2002) An Adaptive Trust Region Method and Its Convergence. Science in China: Series A: Mathematics, Physics, Astronomy, 45, 620-631. |

[27] | Shi, Z.J. and Wang, H.Q. (2004) A New Self-Adaptive Trust Region Method for Unconstrained Optimization. Technical Report, College of Operations Research and Management, Qufu Normal University. |

[28] |
Gould, N.I.M., Orban, D., Sartenaer, A. and Toint, P.L. (2005) Sensitivity of Trust Region Algorithms to Their Parameters. 4OR-A Quarterly Journal of Operations Research, 3, 227-241. https://doi.org/10.1007/s10288-005-0065-y |

[29] |
Schnabel, R.B. and Eskow, E. (1990) A New Modified Cholesky Factorization. SIAM Journal on Scientific and Statistical Computing, 11, 1136-1158. https://doi.org/10.1137/0911064 |

[30] |
Cui, Z.C. and Wu, B.Y. (2011) A New Self-Adaptive Trust Region Method for Unconstrained Optimization. Journal of Vibration and Control, 18, 1303-1309. https://doi.org/10.1177/1077546311408473 |

[31] |
Shi, Z.J. and Guo, J.H. (2008) A New Trust Region Method for Unconstrained Optimization. Journal of Computational and Applied Mathematics, 213, 509-520. https://doi.org/10.1016/j.cam.2007.01.027 |

[32] |
Kamandi, A., Amini, K. and Ahookhosh, M. (2017) An Improved Adaptive Trust Region Algorithm. Optimization Letters, 11, 555-569. https://doi.org/10.1007/s11590-016-1018-4 |

[33] |
Grippo, L., Lampariello, F. and Lucidi, S. (1986) A Nonmonotone Line Search Technique for Newton’s Method. SIAM Journal on Numerical Analysis, 23, 707-716. https://doi.org/10.1137/0723046 |

[34] |
Ke, X.W. and Han, J.Y. (1998) A Class of Nonmonotone Trust Region Algorithms for Unconstrained Optimization. Science in China Series A: Mathematics, 41, 927-932. https://doi.org/10.1007/BF02880001 |

[35] |
Sun, W.Y. (2004) Nonmonotone Trust Region Method for Solving Optimization Problems. Applied Mathematics and Computation, 156, 159-174. https://doi.org/10.1016/j.amc.2003.07.008 |

[36] |
Fu, J. and Sun, W. (2005) Nonmonotone Adaptive Trust-Region Method for Unconstrained Optimization Problems. Applied Mathematics and Computation, 163, 489-504. https://doi.org/10.1016/j.amc.2004.02.011 |

[37] |
Zhang, H. and Hager, W.W. (2004) A Nonmonotone Line Search Technique and Its Application to Unconstrained Optimization. SIAM Journal on Optimization, 14, 1043-1056. https://doi.org/10.1137/S1052623403428208 |

[38] |
Dai, Y.H. (2002) On the Nonmonotone Line Search. Journal of Optimization Theory and Applications, 112, 315-330. https://doi.org/10.1023/A:1013653923062 |

[39] |
Mo, J., Liu, C. and Yan, S. (2006) A Nonmonotone Trust Region Method Based on Nonincreasing Technique of Weighted Average of the Successive Function Values. Journal of Computational and Applied Mathematics, 209, 97-108. https://doi.org/10.1016/j.cam.2006.10.070 |

[40] |
Xue, Y., Liu, H.W. and Liu, Z.H. (2019) An Improved Nonmonotone Adaptive Trust Region Method. Applications of Mathematics, 64, 335-350. https://doi.org/10.21136/AM.2019.0138-18 |

[41] |
Gu, N.Z. and Mo, J.T. (2008) Incorporating Nonmonotone Strategies into the Trust Region Method for Unconstrained Optimization. Computers & Mathematics with Application, 55, 2158-2172. https://doi.org/10.1016/j.camwa.2007.08.038 |

[42] | Yuan, Y.Y. and Sun, W.Y. (1997) Optimization Theory and Method. Science Press, Beijing. |

[43] | Nocedal, J. and Wright, S. (2006) Numerical Optimization. Springer Science & Business Media, Berlin. |

[44] |
Ahookhosh, M. and Amini, K. (2010) A Nonmonotone Trust Region Method with Adaptive Radius for Unconstrained Optimization Problems. Computers & Mathematics with Applications, 60, 411-422. https://doi.org/10.1016/j.camwa.2010.04.034 |

Journals Menu

Contact us

customer@scirp.org | |

+86 18163351462(WhatsApp) | |

1655362766 | |

Paper Publishing WeChat |

Copyright © 2023 by authors and Scientific Research Publishing Inc.

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.