A New Nonmonotone Adaptive Trust Region Method

Abstract

The trust region method plays an important role in solving optimization problems. In this paper, we propose a new nonmonotone adaptive trust region method for solving unconstrained optimization problems. Actually, we combine a popular nonmonotone technique with an adaptive trust region algorithm. The new ratio to adjusting the next trust region radius is different from the ratio in the traditional trust region methods. Under some appropriate conditions, we show that the new algorithm has good global convergence and superlinear convergence.

Share and Cite:

Zhang, Y. , Ji, Q. and Zhou, Q. (2021) A New Nonmonotone Adaptive Trust Region Method. Journal of Applied Mathematics and Physics, 9, 3102-3114. doi: 10.4236/jamp.2021.912202.

1. Introduction

In this paper, we consider the following unconstrained optimization problem:

$\underset{x\in {R}^{n}}{\mathrm{min}}\text{}f\left(x\right)$, (1.1)

where $f\left(x\right):{R}^{n}\to R$ is a real-valued twice continuously differentiable function.

The trust region method is an effective iterative method for solving problem (1.1). In 1944, Levenberg [1] firstly proposed a trust region method, where a modified Gauss-Newton method is given for solving nonlinear least square problems. Pioneer researches on trust region methods were given by Powell [2] [3] [4] [5], Fletcher [6], Hebden [7], Madsen [8], Osborne [9], Moré [10], Toint [11] [12] [13] [14] [15], Dennis and Mei [16], Sorensen [17] [18], and Steihaug [19]. Moré [20] gave good research of early works on trust region methods, which promoted trust region methods and standardized the term “trust region”. The modern version of trust region methods can be traced back to Powell [4]. Later, Conn [21] applied the modern trust region methods to solve ill-conditioned problems and proved that it has strong global convergence. In order to minimize $f\left(x\right)$, the trust region method constructs trust region subproblem to compute a trial step ${d}_{k}$ through the following quadratic approximation:

$\begin{array}{l}\underset{d\in {R}^{n}}{\mathrm{min}}\text{}{q}_{k}\left(d\right):={f}_{k}+{g}_{k}^{\text{T}}d+\frac{1}{2}{d}^{\text{T}}{B}_{k}d\\ \text{s}\text{.t}\text{.}\text{ }\text{}‖d‖\le {\Delta }_{k}\end{array}$ (1.2)

where ${f}_{k}=f\left({x}_{k}\right)$, ${g}_{k}=g\left({x}_{k}\right)=\nabla f\left({x}_{k}\right)$ is the gradient of the objective function at the current iteration point, the approximation of the Hessian matrix ${B}_{k}={\nabla }^{\text{2}}f\left({x}_{k}\right)\in {R}^{n×n}$ is an $n×n$ symmetric matrix, $‖\text{ }\cdot \text{ }‖$ denotes the Euclidean norm and ${\Delta }_{k}$ is a positive parameter which is called trust region radius.

At each iteration, the strategy of choosing a trust region radius ${\Delta }_{k}$ is very crucial. In the standard trust region method, the following ratio to make a comparison between the objective function and the model is defined:

${r}_{k}=\frac{Are{d}_{k}}{Pre{d}_{k}}=\frac{{f}_{k}-f\left({x}_{k}+{d}_{k}\right)}{{q}_{k}\left(0\right)-{q}_{k}\left({d}_{k}\right)}$. (1.3)

In the case, if the ratio ${r}_{k}$ is close to 1, it is concluded that there is a good agreement between the objective function and the model over this step, so it is safe to increase the trust region radius for the next iteration. Otherwise, if ${r}_{k}$ is close to 0 or even negative, we must shrink the trust region radius.

The strategies of determining and updating trust region radius affect the number of computational cost and convergence of the algorithm. There are many researchers who pay much attention to determining and updating the trust region radius [22] - [27]. In 1997, Sartenaer [22] proposed a new approach to determine a radius by monitoring an agreement between the model and the objective function along the direction $-{g}_{k}$ computed at the starting point. But the parameters of this procedure may be dependent on the problem that should be solved. In 2005, Gould et al. [28] examine the sensitivity of trust-region algorithms on the parameters related to the step acceptance and update of the trust region, although, they did not discuss an initial trust-region radius. Motivated by a problem in neural network, in 2002, Zhang et al. [26] proposed a strategy to determine the trust region radius. Specifically, they solved the subproblem (1.2) with

${\Delta }_{k}={c}^{p}‖{g}_{k}‖‖{\stackrel{^}{B}}_{k}^{-1}‖$,

where $c\in \left(0,1\right)$, p is a nonnegative integer and ${\stackrel{^}{B}}_{k}={B}_{k}+iE$ is a positive definite matrix based on a modified Cholesky factorization from Schnabel and Eskow [29]. It needs to estimate $‖{\stackrel{^}{B}}_{k}^{-1}‖$ at each iteration which can not use this radius for large-scale optimization problems. Inspired by Zhang’s method, Cui and Wu [30] proposed a new adaptive trust region method, which can automatically update the trust region radius during calculation. The adaptive radius is given by

${\Delta }_{k}={‖{d}_{k}‖}^{2}/{d}_{k}^{\text{T}}{B}_{k+1}{d}_{k}$,

avoiding the repeated solution of $‖{\stackrel{^}{B}}_{k}^{-1}‖$, and the trust region is no longer dependent on the current iteration information ${g}_{k}$. In 2004, Shi and Wang [27] proposed an adaptive radius given by

${\Delta }_{k}={c}^{p}{‖{g}_{k}‖}^{3}/{g}_{k}^{\text{T}}{\stackrel{^}{B}}_{k}{g}_{k}$,

where $c\in \left(0,1\right)$, p is a non-negative integer and ${\stackrel{^}{B}}_{k}={B}_{k}+iE$ is a positive definite matrix. More recently, Shi and Guo [31] proposed an adaptive trust region:

${\Delta }_{k}=-{c}^{p}\frac{{g}_{k}^{\text{T}}{q}_{k}}{{q}_{k}^{\text{T}}{\stackrel{^}{B}}_{k}{q}_{k}}‖{q}_{k}‖$, (1.4)

the vector ${q}_{k}$ satisfies the angle consdition:

$-\frac{{g}_{k}^{\text{T}}{q}_{k}}{‖{g}_{k}‖\cdot ‖{q}_{k}‖}\ge \theta$. (1.5)

where $\theta \in \left(0,1\right)$. Theoretical analysis shows that the proposed trust region method has global convergence for first-order critical points, and preliminary numerical results show that the proposed method is effective for solving medium-scale unconstrained optimization problems. Kamandi et al. [32] give an improved version of the trust-region radius (1.4). They proposed a modification of ${q}_{k}$ :

${q}_{k}=\left\{\begin{array}{l}-{g}_{k},\text{ }\text{ }\text{ }\text{}\text{ }\text{if}\text{ }\text{}k=0\text{ }\text{or}\text{ }\text{}\frac{-\left({g}_{k}^{\text{T}}{d}_{k-1}\right)}{‖{g}_{k}‖\text{ }‖{d}_{k-1}‖}\le \theta ,\\ {d}_{k-1},\text{ }\text{ }\text{ }\text{ }\text{otherwise,}\end{array}$ (1.6)

where ${d}_{k-1}$ is a solution of the subproblem (1.2) which can be accessed and $\theta \in \left(0,1\right)$. It is straightforward that ${q}_{k}$ satisfies the condition (1.5). To avoid a very small trust region radius, the formula is defined:

${s}_{k}=\left\{\begin{array}{l}-\frac{{g}_{k}^{\text{T}}{q}_{k}}{{q}_{k}^{\text{T}}{B}_{k}{q}_{k}}‖{q}_{k}‖\text{,}\text{ }\text{if}\text{ }k=0,\\ \mathrm{max}\left\{\text{ }-\frac{{g}_{k}^{\text{T}}{q}_{k}}{{q}_{k}^{\text{T}}{B}_{k}{q}_{k}}‖{q}_{k}‖,\lambda {\Delta }_{k-1}\right\},\text{ }\text{ }\text{ }\text{ }\text{if}\text{ }\text{ }k\ge 1.\end{array}$ (1.7)

where $\lambda >1$, and ${q}_{k}$ is determined by (1.5). Then, the trust region radius is updated by

${\Delta }_{k}={h}^{p}\mathrm{min}\left\{{s}_{k},\stackrel{˜}{\Delta }\right\}$, (1.8)

where $\stackrel{˜}{\Delta }>0$ is a real-valued constant, $h\in \left(0,1\right)$ and p is a nonnegative integer.

Due to the high efficiency of nonmonotone techniques, many researchers use the nonmonotone technique in the trust region algorithm framework. In 1986, Grippo et al. [33] put forward the nonmonotone line search technology for the first time. The stepsize ${\alpha }_{k}$ satisfies condition

$f\left({x}_{k}+{\alpha }_{k}{d}_{k}\right)\le {f}_{l\left(k\right)}+\beta {\alpha }_{k}{g}_{k}^{\text{T}}{d}_{k}$, (1.9)

where $\beta \in \left(0,\frac{1}{2}\right)$, the nonmonotone term ${f}_{l\left(k\right)}$ is defined by

${f}_{l\left(k\right)}\le \underset{0\le j\le m\left(k\right)}{\mathrm{max}}\left\{f\left(x-j\right)\right\}$, (1.10)

where $m\left(0\right)=0$, $0\le m\left(k\right)\le \mathrm{min}\left\{m\left(k-1\right)+1,\stackrel{˜}{M}\right\}$, for $k\ge 1$, and $\stackrel{˜}{M}$ is a given non-negative integer. This technique leads to a breakthrough in nonmonotonic algorithms for nonlinear optimization. Based on the proposed nonmonotone technique by Grippo et al, Ke and Han [34], Sun [35], Fu and Sun [36] presented various nonmonotone trust region methods. In 2004, Zhang and Hager [37] found that nonmonotone techniques (1.10) have some drawbacks. For example, the numerical performances are seriously dependent on the choice of parameter $\stackrel{˜}{M}$ ; A good function value generated at any iteration may not be useful; For any given bound $\stackrel{˜}{M}$ on the memory, even an iterative method is generating R-linearly convergence for a strongly convex function, the iterates may not satisfy the condition (1.3) for k sufficiently large [38]. In order to cope with these disadvantages, Zhang and Hager [37] propose another nonmonotonic technique where the stepsize ${\alpha }_{k}$ satisfies the following condition:

$f\left({x}_{k}+{\alpha }_{k}{d}_{k}\right)\le {C}_{k}+\beta {\alpha }_{k}{g}_{k}^{\text{T}}{d}_{k}$,

where

${C}_{k}=\left\{\begin{array}{l}f\left({x}_{k}\right),\text{}\text{ }\text{}k\ge 0\\ \frac{{\eta }_{k-1}{Q}_{k-1}{C}_{k-1}+f\left({x}_{k}\right)}{{Q}_{k}},\text{}k\ge 1\end{array}$ (2.1)

${Q}_{k}=\left\{\begin{array}{l}1,\text{}\text{ }\text{}k\ge 0\\ {\eta }_{k-1}{Q}_{k-1}+1,\text{}k\ge 1\end{array}$ (2.2)

where ${\eta }_{k-1}\in \left[{\eta }_{\mathrm{min}},{\eta }_{\mathrm{max}}\right]$ ; ${\eta }_{\mathrm{min}}\in \left[0,1\right)$ and ${\eta }_{\mathrm{max}}\in \left[{\eta }_{\mathrm{min}},1\right)$ are two prefixed constants. Inspired by this nonmonotone technique, in 2006, Mo et al. [39] introduced it into trust region method and developed a nonmonotone algorithm. The numerical results indicate that the algorithm is robust and encouraging. In 2019, Xue et al. [40] propose a new improved nonmonotone adaptive trust region method for solving unconstrained optimization problems. From the perspective of theoretical analysis, it is shown that algorithm possesses global convergence and superlinear convergence under classical assumptions.

Among the existing nonmonotone strategies, Gu and Mo [41] propose a simpler nonmonotone technique, and its computational complexity is greatly reduced. Therefore, based on the method in [40] and [41], we propose a new improved nonmonotone adaptive trust region method for solving unconstrained optimization problems. Under appropriate conditions, we analyze the global convergence and superlinear convergence of the algorithm.

2. The Structure of the New Algorithm

In this section, we talk about our algorithm for solving unconstrained optimization problems in detail. As we see, The nonmonotone technique proposed by Zhang and Hager [37] implies that each ${C}_{k}$ is a convex combination of the previous ${C}_{k-1}$ and ${f}_{k}$, including the complex ${\eta }_{k}$ and ${Q}_{k}$. In practice, it becomes an encumbrance to update ${\eta }_{k}$ and ${Q}_{k}$ at each iteration. Therefore, Gu and Mo [41] proposed another nonmonotone technique where the nonmonotone term is revised by:

${D}_{k}=\left\{\begin{array}{l}\text{}{f}_{k},\text{}k=0,\\ {\eta }_{k}{D}_{k-1}+\left(1-{\eta }_{k}\right){f}_{k},\text{}k\ge 1,\end{array}$ (2.1)

where ${\eta }_{k}\in \left[{\eta }_{\mathrm{min}},{\eta }_{\mathrm{max}}\right]$ ; ${\eta }_{\mathrm{min}}\in \left(0,1\right)$ and ${\eta }_{\mathrm{max}}\in \left[{\eta }_{\mathrm{min}},1\right)$ are two prefixed constants.

Then, the actual reduction of the objective function value is given by:

$Are{d}_{k}={D}_{k}-f\left({x}_{k}+{d}_{k}\right)$, (2.2)

the predicted reduction of the objective function value is given by:

$Pre{d}_{k}={q}_{k}\left(0\right)-{q}_{k}\left({d}_{k}\right)$. (2.3)

In order to determine whether the trial step is feasible and how to update the new trust region radius, we compute the modified ratio that is given by:

${r}_{k}=\frac{Are{d}_{k}}{Pre{d}_{k}}=\frac{{D}_{k}-f\left({x}_{k}+{d}_{k}\right)}{{q}_{k}\left(0\right)-{q}_{k}\left({d}_{k}\right)}$. (2.4)

We describe the new trust region region algorithm with adjustable radius as below:

In Algorithm 2.1, if ${r}_{k}\ge v$, it is called a successful iteration. The loop started from Step 3 to Step 4 is called the inner cycle.

The flowchart of our algorithm is provided here:

3. Convergence Analysis

In this paper, we consider the following assumptions that will be used to analyze the convergence properties and the superlinear convergence rate of the below new algorithm:

(H1) The level set ${W}_{0}=\left\{\text{}x\in {R}^{n}:f\left(x\right)\le f\left({x}_{0}\right)\text{}\right\}\text{}\subset \Omega$, where $\Omega \in {R}^{n}$ is a closed and bounded set and the objective function f is a twice continuously differentiable over ${W}_{0}$ ;

(H2) The approximation matrix ${B}_{k}$ is uniformly bounded, i.e. there exists a positive constant M such that $‖{B}_{k}‖\le M$ for all $k\in N$ ;

(H3) Matrix ${B}_{k}$ is invertible and the step ${d}_{k}=-{B}_{k}^{-1}{g}_{k}$ computed from Algorithm 2.1 for all $k\in N$.

The subsequent results are essential results to establish the global convergence of Algorithm 2.1.

Lemma 3.1. Suppose that the sequence $\left\{{x}_{k}\right\}$ is generated by Algorithm 2.1. Then we have

$|{f}_{k}-f\left({x}_{k}+{d}_{k}\right)-Pre{d}_{k}|\le O\left({‖{d}_{k}‖}^{2}\right)$. (3.1)

Proof. See [43] for reference. £

Lemma 3.2. If (H2) holds, the sequence $\left\{{x}_{k}\right\}$ is generated by Algorithm 2.1, and ${d}_{k}$ is a solution of (1.2) with ${\Delta }_{k}$ given by (1.7), then we get

${q}_{k}\left(0\right)-{q}_{k}\left({d}_{k}\right)\ge \frac{1}{2}{h}^{{p}_{k}}\mathrm{min}\left\{\frac{1}{M}{\left(\frac{-{g}_{k}^{\text{T}}{q}_{k}}{‖{q}_{k}‖}\right)}^{2},\stackrel{˜}{\Delta }\left(\frac{-{g}_{k}^{\text{T}}{q}_{k}}{‖{q}_{k}‖}\right)\right\}$, (3.2)

for all $k\in N$.

Proof. See [32] for reference. £

Lemma 3.3. Let $\left\{{x}_{k}\right\}$ be the sequence generated by Algorithm 2.1. Then we have

${f}_{k+1}\le {D}_{k+1}\le {D}_{k}$, $\forall k\in N$. (3.3)

Proof. From the definition of ${D}_{k}$, we have

${D}_{k+1}-{D}_{k}=\left(1-{\eta }_{k+1}\right)\left({f}_{k+1}-{D}_{k}\right)$ (3.4)

and

${D}_{k+1}-{f}_{k+1}={\eta }_{k+1}\left({D}_{k}-{f}_{k+1}\right)$, (3.5)

Now we Let $I=\left\{k|{r}_{k}\ge \nu \right\}$ and $J=\left\{k|{r}_{k}<\nu \right\}$. We consider two cases:

Case 1. $k\in I$. From (2.4) and (3.2), we have

${D}_{k}-{f}_{k+1}\ge \nu \left[{q}_{k}\left(0\right)-{q}_{k}\left({d}_{k}\right)\right]\ge \frac{1}{2}\nu {h}^{{p}_{k}}\mathrm{min}\left\{\frac{1}{M}{\left(\frac{-{g}_{k}^{\text{T}}{q}_{k}}{‖{q}_{k}‖}\right)}^{2},\stackrel{˜}{\Delta }\left(\frac{-{g}_{k}^{\text{T}}{q}_{k}}{‖{q}_{k}‖}\right)\right\}\ge 0$.(3.6)

From (3.4), (3.5) and (3.6), we have

${f}_{k+1}\le {D}_{k+1}\le {D}_{k}$, $\forall k\in I$. (3.7)

Case 2. $k\in J$. If $k-1\in I$, from Case 1, we have ${f}_{k}\le {D}_{k}$, and from Step 4 of Algorithm 2.1 we get ${x}_{k+1}={x}_{k}$, ${f}_{k+1}={f}_{k}$, $\forall k\in J$. Then we get

${D}_{k+1}={\eta }_{k+1}{D}_{k}+\left(1-{\eta }_{k+1}\right){f}_{k+1}\ge {\eta }_{k+1}{f}_{k+1}+\left(1-{\eta }_{k+1}\right){f}_{k+1}={f}_{k+1}$. (3.8)

From (3.4), (3.5) and (3.8), we have

${f}_{k+1}\le {D}_{k+1}\le {D}_{k}$, $\forall k-1\in I$. (3.9)

If $k-1\in J$, let $K=\left\{i|1. If $K=\varnothing$, then from Step 4 of Algorithm 2.1, we have ${f}_{0}={f}_{k-j}={f}_{k+1}$, $j=0,1,\cdots ,k-1$. Therefore, from the definition of ${D}_{k}$, ${D}_{k+1}={D}_{k}={f}_{k+1}$. On the other hand, if $K\ne \varnothing$, let $m=\mathrm{min}\left\{i|i\in K\right\}$, then we have ${f}_{k-j}={f}_{k}={f}_{k+1}$, $j=0,1,\cdots ,m-1$. Obviously, $k-m\in I$, then we get ${f}_{k-m+1}\le {D}_{k-m+1}\le {D}_{k-m}$ from Case 1. Then we have

$\begin{array}{c}{D}_{k-m+2}={\eta }_{k-m+2}{D}_{k-m+1}+\left(1-{\eta }_{k-m+2}\right){f}_{k-m+2}\\ \ge {\eta }_{k-m+2}{f}_{k-m+2}+\left(1-{\eta }_{k-m+2}\right){f}_{k-m+2}={f}_{k-m+2}\end{array}$

Then from (3.4) and (3.5), we get ${D}_{k-m+2}\le {D}_{k-m+1}$. Therefore, from the definition of ${D}_{k}$, we have ${D}_{k+1}\le {D}_{k}$. From (3.4) and (3.5), we get ${f}_{k+1}\le {D}_{k+1}\le {D}_{k}$.

Both Case 1 and Case 2 imply that ${f}_{k+1}\le {D}_{k+1}\le {D}_{k}$, $\forall k\in N$. So the proof is completed. £

Lemma 3.4. Step 3 and Step 4 of Algorithm 2.1 are well-defined in the sense that at each iteration they terminate finitely.

Proof. The proof this lemma is quite as the same as [40], for the completeness of this work, we prove it again here. By contradiction, assume that the inner loop from Step 3 to Step 4 of algorithm 2.1 is infinite. Now, let ${d}_{k}^{i}$ be the solution of subproblem (1.2) corresponding to $i\in N$ at ${x}_{k}$. Then we have

${r}_{k}^{i}<\nu$, $i=1,2,\cdots$ (3.10)

Since ${x}_{k}$ is not optimum, then we have

$‖{g}_{k}‖\ge \delta$, $k\in N$, (3.11)

using (3.11) and (1.5), we get

$-\frac{{g}_{k}{q}_{k}}{‖{q}_{k}‖}\ge \theta \delta$. (3.12)

It follows from Lemma 3.1-3.2 and (3.12) that

$\begin{array}{c}|\frac{f\left({x}_{k}\right)-f\left({x}_{k}+{d}_{k}^{i}\right)}{{q}_{k}\left(0\right)-{q}_{k}\left({d}_{k}^{i}\right)}-1|=|\frac{f\left({x}_{k}\right)-f\left({x}_{k}+{d}_{k}^{i}\right)-Pre{d}_{k}^{i}}{{q}_{k}\left(0\right)-{q}_{k}\left({d}_{k}^{i}\right)}|\\ \le \frac{O\left({‖{d}_{k}^{i}‖}^{2}\right)}{{q}_{k}\left(0\right)-{q}_{k}\left({d}_{k}^{i}\right)}\\ \le \frac{O\left({‖{d}_{k}^{i}‖}^{2}\right)}{\frac{1}{2}{h}^{{p}_{{k}_{i}}}\mathrm{min}\left\{\frac{1}{M}{\left(\frac{-{g}_{k}^{\text{T}}{q}_{k}}{‖{q}_{k}‖}\right)}^{2},\stackrel{˜}{\Delta }\left(\frac{-{g}_{k}^{\text{T}}{q}_{k}}{‖{q}_{k}‖}\right)\right\}}\\ \le \frac{O\left({‖{d}_{k}^{i}‖}^{2}\right)}{\frac{1}{2}{h}^{{p}_{{k}_{i}}}\mathrm{min}\left\{{\frac{\left(\theta \delta \right)}{M}}^{2},\stackrel{˜}{\Delta }\left(\theta \delta \right)\right\}}\end{array}$ (3.13)

By the assumption that the inner cycle cycles infinity and (1.8), we obtain that ${\Delta }_{k}^{i}\to 0$ with $i\to \infty$. $‖{d}_{k}^{i}‖\le {\Delta }_{k}^{i}\le {h}_{k}^{i}{}^{p}{s}_{k}$ implies that the right-hand side of the above equation (3.13) tends to zero. This means that for sufficiently large i, we get

$\underset{i\to \infty }{\mathrm{lim}}\frac{f\left({x}_{k}\right)-f\left({x}_{k}+{d}_{k}^{i}\right)}{{q}_{k}\left(0\right)-{q}_{k}\left({d}_{k}^{i}\right)}=1$, (3.14)

combining (2.4) and Lemma 3.3, we get

${r}_{k}^{i}=\frac{{D}_{k}-f\left({x}_{k}+{d}_{k}^{i}\right)}{{q}_{k}\left(0\right)-{q}_{k}\left({d}_{k}^{i}\right)}\ge \frac{f\left({x}_{k}\right)-f\left({x}_{k}+{d}_{k}^{i}\right)}{{q}_{k}\left(0\right)-{q}_{k}\left({d}_{k}^{i}\right)}$, (3.15)

this means for $i\to \infty$, ${r}_{k}^{i}\ge \nu \in \left(0,1\right)$ which is contradictory to (3.10), so the proof is completed. £

Theorem 3.1. If (H1) holds and the sequence $\left\{{x}_{k}\right\}$ is generated by Algorithm 2.1, then we have

$\underset{k\to \infty }{\mathrm{lim}}\mathrm{inf}‖{g}_{k}‖=0$. (3.16)

Proof. By contradiction, suppose that there exists a constant $\delta >0$ such that

$‖{g}_{k}‖\ge \delta$, $k\in N$. (3.17)

Using (2.4), Lemma 3.2 and ${r}_{k}\ge \nu$, we conclude that

$f\left({x}_{k}+{d}_{k}\right)\le {D}_{k}-\nu Pre{d}_{k}\le {D}_{k}-\frac{1}{2}\nu {h}^{{p}_{k}}\mathrm{min}\left\{\frac{1}{M}{\left(\frac{-{g}_{k}^{\text{T}}{q}_{k}}{‖{q}_{k}‖}\right)}^{2},\stackrel{˜}{\Delta }\left(\frac{-{g}_{k}^{\text{T}}{q}_{k}}{‖{q}_{k}‖}\right)\right\}$. (3.18)

From the definitions of ${D}_{k}$ and (3.18), we can get

$\begin{array}{c}{D}_{k+1}={\eta }_{k+1}{D}_{k}+\left(1-{\eta }_{k+1}\right){f}_{k+1}\\ \le {\eta }_{k+1}{D}_{k}+\left(1-{\eta }_{k+1}\right)\left[{D}_{k}-\frac{1}{2}\nu {h}^{{p}_{k}}\mathrm{min}\left\{\frac{1}{M}{\left(\frac{-{g}_{k}^{\text{T}}{q}_{k}}{‖{q}_{k}‖}\right)}^{2},\stackrel{˜}{\Delta }\left(\frac{-{g}_{k}^{\text{T}}{q}_{k}}{‖{q}_{k}‖}\right)\right\}\right]\\ ={D}_{k}-\frac{1-{\eta }_{k+1}}{2}\nu {h}^{{p}_{k}}\mathrm{min}\left\{\frac{1}{M}{\left(\frac{-{g}_{k}^{\text{T}}{q}_{k}}{‖{q}_{k}‖}\right)}^{2},\stackrel{˜}{\Delta }\left(\frac{-{g}_{k}^{\text{T}}{q}_{k}}{‖{q}_{k}‖}\right)\right\}\end{array}$ (3.19)

Using (3.12) and (3.19), then we get

${D}_{k}-{D}_{k+1}\ge \frac{1-{\eta }_{\mathrm{max}}}{2}\nu {h}^{{p}_{k}}\mathrm{min}\left\{\frac{{\left(\theta \delta \right)}^{2}}{M},\stackrel{˜}{\Delta }\left(\theta \delta \right)\right\}$. (3.20)

We can conclude from Lemma 3.3 that the sequence ${D}_{k}$ is monotonically nonincreasing and ${f}_{k+1}\le {D}_{k+1}$. According to assumption (H1) that f has a lower bound, then we deduce that ${D}_{k}$ is convergent. From (3.20) we have

$\underset{k=0}{\overset{\infty }{\sum }}\frac{1-{\eta }_{\mathrm{max}}}{2}\nu {h}^{{p}_{k}}\mathrm{min}\left\{{\frac{\left(\theta \delta \right)}{M}}^{2},\stackrel{˜}{\Delta }\left(\theta \delta \right)\right\}<\infty$, (3.21)

we define $\frac{1-{\eta }_{\mathrm{max}}}{2}\nu \mathrm{min}\left\{{\frac{\left(\theta \delta \right)}{M}}^{2},\stackrel{˜}{\Delta }\left(\theta \delta \right)\right\}=\gamma$, then

$\underset{k=0}{\overset{\infty }{\sum }}\gamma {h}^{{p}_{k}}<\infty$. (3.22)

From (3.17) and (3.22), we get there exists an index set H such that

$\underset{k\to \infty ,k\in Η}{\mathrm{lim}}\frac{-{g}_{k}^{\text{T}}{q}_{k}}{‖{q}_{k}‖}\ne 0$, (3.23)

Therefore,

$\underset{k\to \infty ,k\in Η}{\mathrm{lim}}{h}^{{p}_{k}}=0$. (3.24)

From (1.8), ${\Delta }_{k}\to 0$ as $k\to \infty$ and $k\in Η$. Now, assume that there are more than one inner cycles in the loop from Steps 3 to 4 at the kth iterate for all $k\in N$. Therefore, the solution ${\stackrel{˜}{d}}_{k}$ of the subproblem

$\begin{array}{l}\mathrm{min}\text{​}\text{​}\text{​}\text{​}\text{​}\text{​}\text{​}\text{​}\text{​}\text{​}\text{​}\text{​}\text{​}\text{​}\text{}{m}_{k}\left(d\right)={f}_{k}+{g}_{k}^{\text{T}}d+\frac{1}{2}{d}^{\text{T}}{B}_{k}d\\ \text{s}\text{.t}\text{.}‖d‖\le {\Delta }_{k}/h,\text{}k\in \text{H}\end{array}$ (3.25)

is not accepted at the kth iteration, then we denote

${r}_{k}=\frac{{D}_{k}-f\left({x}_{k}+{\stackrel{˜}{d}}_{k}\right)}{{m}_{k}\left(0\right)-{m}_{k}\left({\stackrel{˜}{d}}_{k}\right)}<\nu$, $k\in N$, (3.26)

but we have ${r}_{k}>\nu$ for $k\to \infty$ from Lemma 3.4, which is a contradiction with (3.26). This implies the result is valid. £

Theorem 3.2. If (H1)-(H3) holds, the sequence $\left\{{x}_{k}\right\}$ is generated by Algorithm2.1 converges to ${x}^{\ast }$, suppose that ${\nabla }^{2}f\left(x\right)$ is a Lipschitz continuous matrix in a neighborhood $N\left({x}^{*},\epsilon \right)$ of ${x}^{*}$, also suppose that $H\left(x\right)$ and ${B}_{k}$ are uniformly positive definite matrices such that

$\underset{k\to \infty }{\mathrm{lim}}\frac{‖\left[{B}_{k}-H\left({x}^{*}\right)\right]{d}_{k}‖}{‖{d}_{k}‖}=0$. (3.27)

Proof. See [44] for reference. £

4. Conclusion

In this paper, we introduce the algorithm of a new nonmonotone adaptive trust region method for solving unconstrained optimization problems based on (1.8) and (2.1). The nonmonotone strategy is introduced into a new adaptive trust region. Maratos effects are avoided and the amount of calculation is reduced. Furthermore, it is obvious that the current objective function value ${f}_{k}$ is fully employed. With the help of nonmonotone technique and adaptive trust region radius, our algorithm can reduce the number of ineffective iterations so that we enhance the effectiveness of the algorithm. Under some standard and suitable assumptions, the global convergence and superlinear convergence of the new algorithm are analyzed theoretically. However, our algorithm still has some continuation and expansion, we can consider the following aspect: although this paper gives the theoretical proof of the proposed method as detailed as possible, it does not fully demonstrate the superiority of the new algorithm through numerical experiments, which will be the focus of further work.

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.