A Scaled Conjugate Gradient Method Based on New BFGS Secant Equation with Modified Nonmonotone Line Search

Abstract

In this paper, we provide and analyze a new scaled conjugate gradient method and its performance, based on the modified secant equation of the Broyden-Fletcher-Goldfarb-Shanno (BFGS) method and on a new modified nonmonotone line search technique. The method incorporates the modified BFGS secant equation in an effort to include the second order information of the objective function. The new secant equation has both gradient and function value information, and its update formula inherits the positive definiteness of Hessian approximation for general convex function. In order to improve the likelihood of finding a global optimal solution, we introduce a new modified nonmonotone line search technique. It is shown that, for nonsmooth convex problems, the proposed algorithm is globally convergent. Numerical results show that this new scaled conjugate gradient algorithm is promising and efficient for solving not only convex but also some large scale nonsmooth nonconvex problems in the sense of the Dolan-Moré performance profiles.

Share and Cite:

Woldu, T. , Zhang, H. and Fissuh, Y. (2020) A Scaled Conjugate Gradient Method Based on New BFGS Secant Equation with Modified Nonmonotone Line Search. American Journal of Computational Mathematics, 10, 1-22. doi: 10.4236/ajcm.2020.101001.

1. Introduction

The conjugate gradient method (CG) and Quasi-Newton method are two major popular iterative methods for solving smooth unconstrained optimization problems, and Broyden-Fletcher-Goldfarb-Shanno (BFGS) method is one of the most efficient quasi-Newton methods for solving small and medium sized unconstrained optimization problems    . The iterative method is computed by

${x}_{k+1}={x}_{k}+{\alpha }_{k}{d}_{k},$ (1)

where ${\alpha }_{k}$ is a step size and ${d}_{k}$ is a search direction. For continuously differentiable function $h:{R}^{n}\to R$, the minimization problem:

$\underset{x\in {R}^{n}}{\mathrm{min}}h\left(x\right)$ (2)

has been well studied for several decades. Conjugate gradient method is among the preferable methods for solving problem (2) with search direction ${d}_{k}$ given by

${d}_{k}=\left(\begin{array}{l}-\nabla {h}_{k}+{\beta }_{k}{d}_{k-1}\text{ }\text{ }\text{ }\text{ }\text{\hspace{0.17em}}\text{if}\text{\hspace{0.17em}}k\text{\hspace{0.17em}}\ge 1,\\ -\nabla {h}_{k}\text{ }\text{ }\text{ }\text{ }\text{\hspace{0.17em}}\text{ }\text{ }\text{ }\text{ }\text{\hspace{0.17em}}\text{if}\text{\hspace{0.17em}}k\text{\hspace{0.17em}}=0,\end{array}$ (3)

where $\nabla {h}_{k}$ is the gradient of an objective function $h\left(x\right)$ at k iterate and ${\beta }_{k}$ is a scalar describing the attributes of the CG methods.

Some well-known formulas for the scalar ${\beta }_{k}$ are the Hestenes-Stiefel (HS) , Fletcher-Reeves (FR) , Polak-Ribière and Polak (PRP) , and Dai-Yuan (DY)  given by

${\beta }_{k}^{HS}=\frac{\nabla {h}_{k}^{\text{T}}{y}_{k}}{{d}_{k-1}^{\text{T}}{y}_{k}},{\beta }_{k}^{PRP}=\frac{\nabla {h}_{k}^{\text{T}}{y}_{k}}{{‖\nabla {h}_{k-1}‖}^{2}},$

${\beta }_{k}^{FR}=\frac{{‖\nabla {h}_{k}‖}^{2}}{{‖\nabla {h}_{k-1}‖}^{2}},{\beta }_{k}^{DY}=\frac{{‖\nabla {h}_{k}‖}^{2}}{{d}_{k-1}^{\text{T}}{y}_{k}},$

where ${y}_{k}=\nabla h\left({x}_{k}\right)-\nabla h\left({x}_{k-1}\right)$ and $‖\text{ }\cdot \text{ }‖$ denotes the Euclidean norm. Due to their simplicity and low memory requirement, CG methods are more effective and desirable for large scale unconstrained smooth problems  . The global convergence properties of nonlinear CG methods have been analyzed under the weak Wolfe line search

$\left(\begin{array}{l}h\left({x}_{k}+{\alpha }_{k}{d}_{k}\right)\le h\left({x}_{k}\right)+\varsigma {\alpha }_{k}\nabla {h}_{k}^{\text{T}}{d}_{k},\\ \nabla h{\left({x}_{k}+{\alpha }_{k}{d}_{k}\right)}^{\text{T}}{d}_{k}\ge \rho \nabla {h}_{k}^{\text{T}}{d}_{k},\end{array}$ (4)

and the strong Wolfe line search:

$\left(\begin{array}{l}h\left({x}_{k}+{\alpha }_{k}{d}_{k}\right)\le h\left({x}_{k}\right)+\varsigma {\alpha }_{k}\nabla {h}_{k}^{\text{T}}{d}_{k},\\ |\nabla h{\left({x}_{k}+{\alpha }_{k}{d}_{k}\right)}^{\text{T}}{d}_{k}|\le \rho |\nabla {h}_{k}^{\text{T}}{d}_{k}|,\end{array}$ (5)

where $0<\varsigma <\rho <1$. CG methods use relatively little memory for large scale problems and require no numerical linear algebra, so each step is quite fast. However, they do not have second order information of the objective function, and typically converge much more slowly than Newton or quasi-Newton methods.

The quasi-Newton method is an iterative method with second order information of the objective function, and BFGS is the effective quasi-Newton method with the search direction

${d}_{k}=-{B}_{k}\nabla {h}_{k},$ (6)

where ${B}_{k}$ is an approximation of the Hessian matrix of h at ${x}_{k}$. The update formula for ${B}_{k}$ is defined by

${B}_{k+1}={B}_{k}-\frac{{B}_{k}{s}_{k}{s}_{k}^{\text{T}}}{{s}_{k}^{\text{T}}{B}_{k}{s}_{k}}+\frac{{y}_{k}{y}_{k}^{\text{T}}}{{y}_{k}^{\text{T}}{s}_{k}},$ (7)

where ${s}_{k}$ is defined as ${s}_{k}={x}_{k+1}-{x}_{k}$, and the Hessian approximation ${B}_{k+1}$ of (7) satisfies the standard secant equation

${B}_{k+1}{s}_{k}={y}_{k},$ (8)

if ${y}_{k}^{\text{T}}{s}_{k}>0$, which is known as the curvature condition. The BFGS method has very interesting properties and remains one of the most respectable quasi-Newton methods for unconstrained optimization . The theory of BFGS method and its global convergence have been established by many researchers (see ). For convex objective function, using some special inexact line search, it has been proved that the BFGS method is globally convergent (see   ). However, when the objective function is nonconvex, the BFGS method under exact line search may fail to converge . Moreover, Dai  proved that the BFGS method may fail for nonconvex functions with Wolfe line search techniques given in (4) and (5) . Wolfe line search technique is the most common monotone line search technique, and it may leads to small steps without making significant progress to the minimum when the contours of the objective functions are a family of curves with large curvature (see    ). In order to overcome this drawback, the first nonmonotone line search technique was proposed by Grippo et al.  for Newton’s method. With this initiative, many nonmonotone line search techniques have been proposed in recent years . Yuan et al.  developed a modified limited memory BFGS method with the update formula that has a higher order approximation to exact Hessian, and its convergence property is analyzed under the nonmonotone line search type. However, the method converges for only uniformly convex functions. Li et al.  proposed a new BFGS algorithm with modified secant equation which achieves both global and superlinear convergence for generally convex functions under the nonmonotone line search of . Su and Rong  introduced and established a new spectral CG method and its implementation under a modified nonmonotone line search technique. They introduced a new spectral conjugate gradient direction

${d}_{k}=\left(\begin{array}{l}-{\theta }_{k}\nabla {h}_{k}+{\beta }_{k}{d}_{k-1}\text{ }\text{ }\text{if}\text{\hspace{0.17em}}k\ge 1,\\ -\nabla {h}_{0}\text{ }\text{ }\text{ }\text{ }\text{ }\text{ }\text{\hspace{0.17em}}\text{if}\text{\hspace{0.17em}}k=0,\end{array}$ (9)

where

${\theta }_{k}=1+\frac{{\beta }_{k}{d}_{k}^{\text{T}}\nabla {h}_{k}}{{‖\nabla {h}_{k}‖}^{2}},$ (10)

${\beta }_{k}=\frac{\nabla {h}_{k}^{\text{T}}{y}_{k-1}}{\left(1-\tau \right){‖\nabla {h}_{k-1}‖}^{2}+\tau {d}_{k-1}^{\text{T}}{y}_{k-1}},$ (11)

and $\tau \in \left[0,1\right]$. It is not difficult to notice that the denominator of (11) is the convex combination of the denominator of the conjugate parameters in HS and PRP conjugate gradient methods. The choice of spectral parameter given (10) ensures the sufficient descent property of the search direction without dependence of line search. The convergence property of their method analyzed under a new modified nonmonotone line search with some mild conditions. However, this spectral CG method has only first order information, and excludes second order information. When the number of dimension is large, the CG methods are more effective compared to the BFGS methods in term of the CPU-time but in term of the number of iterations and the number of function evaluations, the BFGS methods are better. In order to incorporate the remarkable properties of the CG and BFGS methods and to overcome their drawbacks, many hybrid of CG and BFGS methods are introduced for unconstrained smooth optimization   . However, the usage of these methods is mainly restricted to solve smooth optimization problems. Recently, Yuan et al.     introduced some CG approaches to solve nonsmooth convex large scale problems using the smoothing regularization, and under some assumptions, the global convergence properties of these approaches are analyzed. Yuan and Wei  proposed the Barzilai and Borwein (BB) gradient method with nonmonotone line search to solve nonsmooth convex optimization problems. Some implementable quasi-Newton methods are also introduced for solving the same problem (see    ). More recently, Ou and Zhou  introduced a modified scaled BFGS preconditioned CG algorithm, and under appropriate assumptions, the method is proven to possess global convergence for nonsmooth convex functions.

Motivated by the work of Ou and Zhou , in this paper, we propose a hybrid approach of the a scaled CG method and a modified BFGS method to combine the simplicity of CG method and the Hessian approximation of BFGS method. Our work is mainly focused in developing the scaled conjugate search direction that includes the second order information of the objective function by incorporating the modified secant equation of BFGS method. Opposing the work of Ou and Zhou , our method has both the function and gradient value information of the objective function. Moreover, our method leads to better descent direction than the CG methods proposed so far. To the best of our knowledge, this is the first work to incorporate the scaled CG algorithm with the BFGS secant equation which contains both the function and gradient value information of the objective function for solving large scale nonsmooth optimization. Under the new modified nonmonotone line search technique, the global convergence of the algorithm is analyzed for nonsmooth convex problems.

The paper is organized as follows. In the next section, we consider a nonsmooth convex problem and review their basic results. In Section 3, we propose a new scaled CG algorithm that incorporates the BFGS secant equation which has both function value and gradient information of the objective function via the smoothing regularization. Using the new modified nonmonotone line search technique, we prove the global convergence of our new algorithm for nonsmooth convex problems. Numerical results and related comparisons are reported in Section 4. Finally, Section 5 concludes our work.

2. Nonsmooth Convex Problems and Their Basic Results

In this section, we consider the unconstrained optimization problem

$\underset{x\in {R}^{n}}{\mathrm{min}}f\left(x\right),$ (12)

where $f:{R}^{n}\to R$ is a possibly nonsmooth convex function. This problem is equivalent to the following problem

$\underset{x\in {R}^{n}}{\mathrm{min}}F\left(x\right),$ (13)

where $F:{R}^{n}\to R$ is the Moreau-Yosida regularization of f , which is defined by

$F\left(x\right)=\underset{z\in {R}^{n}}{\mathrm{min}}\left\{f\left(z\right)+\frac{1}{2\lambda }{‖z-x‖}^{2}\right\},$ (14)

where $\lambda$ is a positive parameter. The function F is a finite-valued, continuously differentiable convex function even though the function f is nondifferentiable (see ). Let $p\left(x\right)$ be the unique solution of (14). In what follows, we can express $F\left(x\right)$ as

$F\left(x\right)=f\left(p\left(x\right)\right)+\frac{1}{2\lambda }{‖p\left(x\right)-x‖}^{2}.$ (15)

Moreover, the gradient of F is globally Lipschitz continuous, i.e.,

$‖g\left(x\right)-g\left(y\right)‖\le \frac{1}{\lambda }‖x-y‖,\forall x,y\in {R}^{n},$ (16)

where

$g\left(x\right)=\nabla F\left(x\right)=\frac{x-p\left(x\right)}{\lambda }.$ (17)

The point $x\in {R}^{n}$ is an optimal solution to (12) if and only if $g\left(x\right)=0$ (see ). Furthermore, under reasonable conditions the gradient of F is semismooth and some of its remarkable properties are given in  .

Several methods have been proposed to solve (13) by incorporating bundle methods and quasi-Newton methods ideas   , but it is burdensome to evaluate the exact value of $p\left(x\right)$ at any given point x . Luckily, for each $x\in {R}^{n}$ and any $\epsilon >0$, we can have ${p}^{\alpha }\left(x,\epsilon \right)\in {R}^{n}$ such that

$f\left({p}^{\alpha }\left(x,\epsilon \right)\right)+\frac{1}{2\lambda }{‖{p}^{\alpha }\left(x,\epsilon \right)-x‖}^{2}\le F\left(x\right)+\epsilon .$ (18)

Therefore, we can approximate $F\left(x\right)$ and $g\left(x\right)$ by

${F}^{\alpha }\left(x,\epsilon \right)=f\left({p}^{\alpha }\left(x,\epsilon \right)\right)+\frac{1}{2\lambda }{‖{p}^{\alpha }\left(x,\epsilon \right)-x‖}^{2},$ (19)

and

${g}^{\alpha }\left(x,\epsilon \right)=\frac{x-{p}^{\alpha }\left(x,\epsilon \right)}{\lambda },$ (20)

respectively. Implementable algorithms to define such a ${p}^{\alpha }\left(x,\epsilon \right)$ for nonsmooth convex model can be seen in . The noticeable attributes of ${F}^{\alpha }\left(x,\epsilon \right)$ and ${g}^{\alpha }\left(x,\epsilon \right)$ by the following proposition .

Proposition 1. Let ${p}^{\alpha }\left(x,\epsilon \right)$ be a vector that satisfies (18), and let ${F}^{\alpha }\left(x,\epsilon \right)$ and ${g}^{\alpha }\left(x,\epsilon \right)$ be defined by (19) and (20), respectively. Then we obtain

$F\left(x\right)\le {F}^{\alpha }\left(x,\epsilon \right)\le F\left(x\right)+\epsilon ,$ (21)

$‖{p}^{\alpha }\left(x,\epsilon \right)-p\left(x\right)‖\le \sqrt{2\lambda \epsilon },$ (22)

and

$‖{g}^{\alpha }\left(x,\epsilon \right)-g\left(x\right)‖\le \sqrt{2\epsilon /\lambda }.$ (23)

Proposition 1 shows that the approximations of ${F}^{\alpha }\left(x,\epsilon \right)$ and ${g}^{\alpha }\left(x,\epsilon \right)$ can be made arbitrarily close to the exact values of $F\left(x\right)$ and $g\left(x\right)$ respectively.

3. A Scaled CG Method Based on New BFGS Secant Equation

In this section, we introduce the new scaled CG search direction that incorporates the modified BFGS secant equation, and then describe the new algorithm for solving nonsmooth problems. We make use of a modified nonmonotone line search technique introduced by  to compute a step size. Based on the above approximations, we redefine the search direction of CG method (3) to solve problem (13) as follows:

${d}_{k+1}=\left(\begin{array}{l}-{g}^{\alpha }\left({x}_{k+1},{\epsilon }_{k+1}\right)+{\beta }_{k+1}{d}_{k}\text{ }\text{ }\text{if}\text{\hspace{0.17em}}k\ge 1,\\ -{g}^{\alpha }\left({x}_{k+1},{\epsilon }_{k+1}\right)\text{ }\text{ }\text{ }\text{ }\text{ }\text{\hspace{0.17em}}\text{ }\text{if}\text{\hspace{0.17em}}k=0,\end{array}$ (24)

where $\epsilon$ is an appropriately chosen positive number. Ou and Zhou  provided a search direction defined by

${d}_{k+1}=\left(\begin{array}{l}-{\stackrel{˜}{Q}}_{k+1}{g}^{\alpha }\left({x}_{k+1},{\epsilon }_{k+1}\right)\text{ }\text{ }\text{if}\text{\hspace{0.17em}}k\text{\hspace{0.17em}}\ge 1,\\ -{g}^{\alpha }\left({x}_{k+1},{\epsilon }_{k+1}\right)\text{ }\text{ }\text{ }\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{if}\text{\hspace{0.17em}}k\text{\hspace{0.17em}}=0,\end{array}$ (25)

where ${\stackrel{˜}{Q}}_{k+1}\in {R}^{n×n}$ is defined

${\stackrel{˜}{Q}}_{k+1}={\stackrel{˜}{\theta }}_{k+1}I-{\stackrel{˜}{\theta }}_{k+1}\frac{{w}_{k}{s}_{k}^{\text{T}}+{s}_{k}{w}_{k}^{\text{T}}}{{w}_{k}^{\text{T}}{s}_{k}}+\left[1+{\stackrel{˜}{\theta }}_{k+1}\frac{{w}_{k}^{\text{T}}{w}_{k}}{{w}_{k}^{\text{T}}{s}_{k}}\right]\frac{{s}_{k}{s}_{k}^{\text{T}}}{{w}_{k}^{\text{T}}{s}_{k}},$ (26)

with

${\stackrel{˜}{\theta }}_{k+1}=\frac{{s}_{k}^{\text{T}}{s}_{k}}{{w}_{k}^{\text{T}}{s}_{k}},$

where

${w}_{k}={y}_{k}^{\ast }+{t}_{k}{s}_{k}.$ (27)

The vector ${y}_{k}^{\ast }$ and ${t}_{k}$ in (27) are defined as

${y}_{k}^{\ast }={g}^{\alpha }\left({x}_{k+1},{\epsilon }_{k+1}\right)-{g}^{\alpha }\left({x}_{k},{\epsilon }_{k}\right)$ (28)

and

${t}_{k}=t+\mathrm{max}\left\{\frac{{s}_{k}^{\text{T}}{y}_{k}}{{‖{s}_{k}‖}^{2}},0\right\}\left(t>0\right).$ (29)

It is easy to observe that the (27) has only gradient value information. In order to have both gradient and function value information, we replace (27) and (29) by

${w}_{k}^{\ast }={y}_{k}^{\ast }+\mathrm{max}\left\{{t}_{k}^{\ast },0\right\}{s}_{k},$ (30)

and

${t}_{k}^{\ast }=\frac{6\left[F\left({x}_{k}\right)-F\left({x}_{k}+{\alpha }_{k}{d}_{k}\right)\right]+3{\left({g}^{\alpha }\left({x}_{k}+{\alpha }_{k}{d}_{k},{\epsilon }_{k+1}\right)+{g}^{\alpha }\left({x}_{k},{\epsilon }_{k}\right)\right)}^{\text{T}}{s}_{k}}{{‖{s}_{k}‖}^{2}},$ (31)

respectively. Thus, the BFGS method with the secant equation

${B}_{k+1}{s}_{k}={w}_{k}^{\ast },$ (32)

and the update formula

${B}_{k+1}={B}_{k}-\frac{{B}_{k}{s}_{k}{s}_{k}^{\text{T}}}{{s}_{k}^{\text{T}}{B}_{k}{s}_{k}}+\frac{{w}_{k}^{\ast }{w}_{k}^{\ast \text{T}}}{{w}_{k}^{\ast \text{T}}{s}_{k}},$ (33)

has both gradient and function value information, and the matrix ${B}_{k+1}$ inherits the positive definiteness of ${B}_{k}$ for generally convex functions. Using the secant Equation (32), we propose the new search direction is defined by

${d}_{k+1}=\left(\begin{array}{l}-{\stackrel{¨}{\theta }}_{k+1}{g}^{\alpha }\left({x}_{k+1},{\epsilon }_{k+1}\right)+{\beta }_{k+1}{d}_{k}-{\vartheta }_{k+1}{w}_{k}^{\ast },\text{ }\text{if}\text{\hspace{0.17em}}k\ge 1,\\ -{g}^{\alpha }\left({x}_{k+1},{\epsilon }_{k+1}\right),\text{ }\text{ }\text{ }\text{ }\text{ }\text{ }\text{ }\text{ }\text{ }\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{ }\text{if}\text{\hspace{0.17em}}k=0,\end{array}$ (34)

where

${\stackrel{¨}{\theta }}_{k+1}=2-\frac{{d}_{k}^{\text{T}}{g}^{\alpha }\left({x}_{k+1},{\epsilon }_{k+1}\right)}{{‖{g}^{\alpha }\left({x}_{k+1},{\epsilon }_{k+1}\right)‖}^{2}}\left(\frac{{g}^{\alpha }{\left({x}_{k+1},{\epsilon }_{k+1}\right)}^{\text{T}}{w}_{k}^{\ast }}{‖{d}_{k}‖‖{w}_{k}^{\ast }‖}\right),$ (35)

${\beta }_{k+1}=\frac{{g}^{\alpha }{\left({x}_{k+1},{\epsilon }_{k+1}\right)}^{\text{T}}{w}_{k}^{\ast }}{‖{d}_{k}‖‖{w}_{k}^{\ast }‖+|{d}_{k}^{\text{T}}{y}_{k}^{\ast }|},$ (36)

and

${\vartheta }_{k+1}=\frac{{d}_{k}^{\text{T}}{g}^{\alpha }\left({x}_{k+1},{\epsilon }_{k+1}\right)}{‖{d}_{k}‖‖{w}_{k}^{\ast }‖}.$ (37)

Now, based on the above search direction, we describe our new scaled CG algorithm with a modified nonmonotone line search for solving problem (13) as follows.

Algorithm 1

Step 0. Given $ϵ>0,\beta \in \left(0,1\right),\epsilon \in \left(0,1\right),\sigma \in \left(0,1\right)$, and a point ${x}_{0}\in {R}^{n}$. Set ${d}_{0}=-{g}^{\alpha }\left({x}_{0},{\epsilon }_{0}\right)$ and $k:=0$.

Step 1. If $‖{g}^{\alpha }\left({x}_{k},{\epsilon }_{k}\right)‖<ϵ$, then stop, else go to the next step.

Step 2. Compute the search direction ${d}_{k}$ by using (34)-(37).

Step 3. Set trial step size ${\alpha }_{k}=1$.

Step 4. Set ${x}_{k+1}={x}_{k}+{\alpha }_{k}{d}_{k}$ and choose a scalar ${\epsilon }_{k+1}$ such that $0<{\epsilon }_{k+1}<{\epsilon }_{k}$.

Step 5. Let $\mu \in \left(0,1\right]$, $M\ge 1$ is a positive integer, define $m\left(k\right)=\mathrm{min}\left\{k+1,M\right\}$, and choose

${\mu }_{ki}\ge \mu ,i=0,1,2,\cdots ,m\left(k\right)-1,\underset{i=0}{\overset{m\left(k\right)-1}{\sum }}{\mu }_{ki}=1.$

Let ${\alpha }_{k}\ge 0$ be bounded above and satisfy:

$\begin{array}{l}{F}^{\alpha }\left({x}_{k}+{\alpha }_{k}{d}_{k},{\epsilon }_{k+1}\right)\\ \le \mathrm{max}\left[{F}^{\alpha }\left({x}_{k},{\epsilon }_{k}\right),\underset{i=0}{\overset{m\left(k\right)-1}{\sum }}{\mu }_{ki}{F}^{\alpha }\left({x}_{k-i},{\epsilon }_{k-i}\right)\right]\sigma {\alpha }_{k}{g}^{\alpha }{\left({x}_{k},{\epsilon }_{k}\right)}^{\text{T}}{d}_{k}.\end{array}$ (38)

If (38) does not holds, define ${\alpha }_{k}=\beta {\alpha }_{k}$ and go to step 5.

Step 6. Set K := k + 1 and go to step 1.

It can be observed that the line search technique in step 5 of Algorithm 1 is a nonmonotone line search technique with some modifications.

Convergence Analysis

In this subsection, we establish the global convergence of our method for nonsmooth convex problem (12). To prove the global convergence of Algorithm 1, the following Lemmas are needed.

Lemma 1. Assume that the search direction ${d}_{k}$ is generated by Algorithm 1, then for all $k\ge 0$, we have

${g}^{\alpha }{\left({x}_{k+1},{\epsilon }_{k+1}\right)}^{\text{T}}{d}_{k+1}\le -{‖{g}^{\alpha }\left({x}_{k+1},{\epsilon }_{k+1}\right)‖}^{2},$ (39)

and

$‖{d}_{k+1}‖\le 5‖{g}^{\alpha }\left({x}_{k+1},{\epsilon }_{k+1}\right)‖.$ (40)

Proof. If $k=0$, then

${g}^{\alpha }{\left({x}_{0},{\epsilon }_{0}\right)}^{\text{T}}{d}_{0}=-{g}^{\alpha }{\left({x}_{0},{\epsilon }_{0}\right)}^{\text{T}}{g}^{\alpha }\left({x}_{0},{\epsilon }_{0}\right)=-{‖{g}^{\alpha }\left({x}_{0},{\epsilon }_{0}\right)‖}^{2},$

and

$‖{d}_{0}‖=‖-{g}^{\alpha }\left({x}_{0},{\epsilon }_{0}\right)‖\le ‖{g}^{\alpha }\left({x}_{0},{\epsilon }_{0}\right)‖\le 5‖{g}^{\alpha }\left({x}_{0},{\epsilon }_{0}\right)‖.$

Let $k\ge 1$, then from (34) we have

$\begin{array}{l}{g}^{\alpha }{\left({x}_{k+1},{\epsilon }_{k+1}\right)}^{\text{T}}{d}_{k+1}\\ =-{\stackrel{¨}{\theta }}_{k+1}{‖{g}^{\alpha }\left({x}_{k+1},{\epsilon }_{k+1}\right)‖}^{2}+{\beta }_{k+1}{g}^{\alpha }{\left({x}_{k+1},{\epsilon }_{k+1}\right)}^{\text{T}}{d}_{k}-{\vartheta }_{k+1}{g}^{\alpha }{\left({x}_{k+1},{\epsilon }_{k+1}\right)}^{\text{T}}{w}_{k}^{\ast }\\ \le -{\stackrel{¨}{\theta }}_{k+1}{‖{g}^{\alpha }\left({x}_{k+1},{\epsilon }_{k+1}\right)‖}^{2}+\frac{{g}^{\alpha }{\left({x}_{k+1},{\epsilon }_{k+1}\right)}^{\text{T}}{w}_{k}^{\ast }}{‖{d}_{k}‖‖{w}_{k}^{\ast }‖+|{d}_{k}^{\text{T}}{y}_{k}^{\ast }|}{g}^{\alpha }{\left({x}_{k+1},{\epsilon }_{k+1}\right)}^{\text{T}}{d}_{k}\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{ }-\frac{{d}_{k}^{\text{T}}{g}^{\alpha }\left({x}_{k+1},{\epsilon }_{k+1}\right)}{‖{d}_{k}‖‖{w}_{k}^{\ast }‖}{g}^{\alpha }{\left({x}_{k+1},{\epsilon }_{k+1}\right)}^{\text{T}}{w}_{k}^{\ast }\\ =-2{‖{g}^{\alpha }\left({x}_{k+1},{\epsilon }_{k+1}\right)‖}^{2}+\frac{{d}_{k}^{\text{T}}{g}^{\alpha }\left({x}_{k+1},{\epsilon }_{k+1}\right)}{‖{d}_{k}‖‖{w}_{k}^{\ast }‖}{g}^{\alpha }{\left({x}_{k+1},{\epsilon }_{k+1}\right)}^{\text{T}}{w}_{k}^{\ast }\end{array}$

$\begin{array}{l}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{ }+\frac{{g}^{\alpha }{\left({x}_{k+1},{\epsilon }_{k+1}\right)}^{\text{T}}{w}_{k}^{\ast }}{‖{d}_{k}‖‖{w}_{k}^{\ast }‖+|{d}_{k}^{\text{T}}{y}_{k}^{\ast }|}{g}^{\alpha }{\left({x}_{k+1},{\epsilon }_{k+1}\right)}^{\text{T}}{d}_{k}\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{ }-\frac{{d}_{k}^{\text{T}}{g}^{\alpha }\left({x}_{k+1},{\epsilon }_{k+1}\right)}{‖{d}_{k}‖‖{w}_{k}^{\ast }‖}{g}^{\alpha }{\left({x}_{k+1},{\epsilon }_{k+1}\right)}^{\text{T}}{w}_{k}^{\ast }\\ =-2{‖{g}^{\alpha }\left({x}_{k+1},{\epsilon }_{k+1}\right)‖}^{2}+\frac{{g}^{\alpha }{\left({x}_{k+1},{\epsilon }_{k+1}\right)}^{\text{T}}{w}_{k}^{\ast }}{‖{d}_{k}‖‖{w}_{k}^{\ast }‖+|{d}_{k}^{\text{T}}{y}_{k}^{\ast }|}{g}^{\alpha }{\left({x}_{k+1},{\epsilon }_{k+1}\right)}^{\text{T}}{d}_{k}\\ \le -2{‖{g}^{\alpha }\left({x}_{k+1},{\epsilon }_{k+1}\right)‖}^{2}+\left(\frac{{‖{g}^{\alpha }\left({x}_{k+1},{\epsilon }_{k+1}\right)‖}^{2}}{‖{d}_{k}‖‖{w}_{k}^{\ast }‖}\right)‖{d}_{k}‖‖{w}_{k}^{\ast }‖\\ =-{‖{g}^{\alpha }\left({x}_{k+1},{\epsilon }_{k+1}\right)‖}^{2}.\end{array}$

Once more, (34) yields that

$\begin{array}{c}‖{d}_{k+1}‖=‖-{\stackrel{¨}{\theta }}_{k+1}{g}^{\alpha }\left({x}_{k+1},{\epsilon }_{k+1}\right)+{\beta }_{k+1}{d}_{k}-{\vartheta }_{k+1}{w}_{k}^{\ast }‖\\ =‖-{\stackrel{¨}{\theta }}_{k+1}{g}^{\alpha }\left({x}_{k+1},{\epsilon }_{k+1}\right)+\frac{{g}^{\alpha }{\left({x}_{k+1},{\epsilon }_{k+1}\right)}^{\text{T}}{w}_{k}^{\ast }}{‖{d}_{k}‖‖{w}_{k}^{\ast }‖+|{d}_{k}^{\text{T}}{y}_{k}^{\ast }|}{d}_{k}-\frac{{d}_{k}^{\text{T}}{g}^{\alpha }\left({x}_{k+1},{\epsilon }_{k+1}\right)}{‖{d}_{k}‖‖{w}_{k}^{\ast }‖}{w}_{k}^{\ast }‖\\ \le ‖{\stackrel{¨}{\theta }}_{k+1}{g}^{\alpha }\left({x}_{k+1},{\epsilon }_{k+1}\right)‖+2‖{g}^{\alpha }\left({x}_{k+1},{\epsilon }_{k+1}\right)‖\\ \le 4‖{g}^{\alpha }\left({x}_{k+1},{\epsilon }_{k+1}\right)‖+\left(\frac{‖{d}_{k}‖‖{w}_{k}^{\ast }‖}{{‖{g}^{\alpha }\left({x}_{k+1},{\epsilon }_{k+1}\right)‖}^{2}}\right)\frac{{‖{g}^{\alpha }\left({x}_{k+1},{\epsilon }_{k+1}\right)‖}^{3}}{‖{d}_{k}‖‖{w}_{k}^{\ast }‖}\\ \le 5‖{g}^{\alpha }\left({x}_{k+1},{\epsilon }_{k+1}\right)‖.\end{array}$

Thus, the proof is completed.

Lemma 1 shows that the search direction ${d}_{k}$ developed in (34)-(37) leads to the most sufficiently descent direction and it belongs to a trust region.

Lemma 2. Let the step size ${\alpha }_{k}$ satisfy (38), then there exist $\beta >0$ satisfy a

${\alpha }_{k}\ge \mathrm{min}\left\{1,\frac{\left(1-\sigma \right)\beta }{L}\frac{|{g}^{\alpha }{\left({x}_{k},{\epsilon }_{k}\right)}^{\text{T}}{d}_{k}|}{{‖{d}_{k}‖}^{2}}\right\}.$ (41)

Proof. If ${\alpha }_{k}=1$ satisfies the formula (38), then the proof is completed. Otherwise, there exist $\beta$ such that

$\begin{array}{l}{F}^{\alpha }\left({x}_{k}+\frac{{\alpha }_{k}}{\beta }{d}_{k},{\epsilon }_{k+1}\right)\\ >\mathrm{max}\left\{{F}^{\alpha }\left({x}_{k},{\epsilon }_{k}\right),\underset{i=0}{\overset{m\left(k\right)-1}{\sum }}{\mu }_{ki}{F}^{\alpha }\left({x}_{k-i},{\epsilon }_{k-i}\right)\right\}+\sigma \frac{{\alpha }_{k}}{\beta }{g}^{\alpha }{\left({x}_{k},{\epsilon }_{k}\right)}^{\text{T}}{d}_{k}\\ >{F}^{\alpha }\left({x}_{k},{\epsilon }_{k}\right)+\sigma \frac{{\alpha }_{k}}{\beta }{g}^{\alpha }{\left({x}_{k},{\epsilon }_{k}\right)}^{\text{T}}{d}_{k}.\end{array}$

Thus,

${F}^{\alpha }\left({x}_{k}+\frac{{\alpha }_{k}}{\beta }{d}_{k},{\epsilon }_{k+1}\right)-{F}^{\alpha }\left({x}_{k},{\epsilon }_{k}\right)>\sigma \frac{{\alpha }_{k}}{\beta }{g}^{\alpha }{\left({x}_{k},{\epsilon }_{k}\right)}^{\text{T}}{d}_{k}.$ (42)

Using mean value theorem, we have

$\begin{array}{l}{F}^{\alpha }\left({x}_{k}+\alpha {d}_{k},{\epsilon }_{k+1}\right)-{F}^{\alpha }\left({x}_{k},{\epsilon }_{k}\right)\\ ={\int }_{0}^{\alpha }{\left({g}^{\alpha }\left({x}_{k}+t{d}_{k},{\epsilon }_{k+1}\right)-{g}^{\alpha }\left({x}_{k},{\epsilon }_{k}\right)\right)}^{\text{T}}{d}_{k}\text{d}t+\alpha {g}^{\alpha }{\left({x}_{k},{\epsilon }_{k}\right)}^{\text{T}}{d}_{k}\\ \le \frac{1}{2}L{\alpha }^{2}{‖{d}_{k}‖}^{2}+\alpha {g}^{\alpha }{\left({x}_{k},{\epsilon }_{k}\right)}^{\text{T}}{d}_{k}.\end{array}$

Combining the above inequality with (42), we have

${\alpha }_{k}\ge \mathrm{min}\left\{1,\frac{\left(1-\sigma \right)\beta }{L}\frac{|{g}^{\alpha }{\left({x}_{k},{\epsilon }_{k}\right)}^{\text{T}}{d}_{k}|}{{‖{d}_{k}‖}^{2}}\right\}.$

Thus, the proof is completed.

Lemma 3. Assume that the sequence $\left\{{x}_{k}\right\}$ is generated by Algorithm 1, then we have

$\begin{array}{l}{F}^{\alpha }\left({x}_{k},{\epsilon }_{k}\right)\\ \le {F}^{\alpha }\left({x}_{0},{\epsilon }_{0}\right)+\mu \sigma \underset{i=0}{\overset{k-2}{\sum }}\text{ }\text{ }{\alpha }_{i}{g}^{\alpha }{\left({x}_{i},{\epsilon }_{i}\right)}^{\text{T}}{d}_{i}+\sigma {\alpha }_{k-1}{g}^{\alpha }{\left({x}_{k-1},{\epsilon }_{k-1}\right)}^{\text{T}}{d}_{k-1}\\ \le {F}^{\alpha }\left({x}_{0},{\epsilon }_{0}\right)+\mu \sigma \underset{i=0}{\overset{k-1}{\sum }}\text{ }\text{ }{\alpha }_{r}{g}^{\alpha }{\left({x}_{i},{\epsilon }_{i}\right)}^{\text{T}}{d}_{i}.\end{array}$

Proof. We prove this lemma by induction. For $k=1$, by (38) and $\mu \le 1$, we have

$\begin{array}{c}{F}^{\alpha }\left({x}_{1},{\epsilon }_{1}\right)\le {F}^{\alpha }\left({x}_{0},{\epsilon }_{0}\right)+\sigma {\alpha }_{0}{g}^{\alpha }{\left({x}_{0},{\epsilon }_{0}\right)}^{\text{T}}{d}_{0}\\ \le {F}^{\alpha }\left({x}_{0},{\epsilon }_{0}\right)+\mu \sigma {\alpha }_{0}{g}^{\alpha }\left({x}_{0},{\epsilon }_{0}\right){d}_{0}\end{array}$

Assume the equation holds for $1,2,\cdots ,k$, and we need to show for $k+1$. To show the condition, we have considered two cases.

Case 1:

$\mathrm{max}\left[{F}^{\alpha }\left({x}_{k},{\epsilon }_{k}\right),\underset{i=0}{\overset{m\left(k\right)-1}{\sum }}{\mu }_{ki}{F}^{\alpha }\left({x}_{k-i},{\epsilon }_{k-i}\right)\right]={F}^{\alpha }\left({x}_{k},{\epsilon }_{k}\right).$

Then, from (38), we have

$\begin{array}{c}{F}^{\alpha }\left({x}_{k+1},{\epsilon }_{k+1}\right)={F}^{\alpha }\left({x}_{k}+{\alpha }_{k}{d}_{k},{\epsilon }_{k+1}\right)\\ \le {F}^{\alpha }\left({x}_{k},{\epsilon }_{k}\right)+\sigma {\alpha }_{k}{g}^{\alpha }{\left({x}_{k},{\epsilon }_{k}\right)}^{\text{T}}{d}_{k}\\ \le {F}^{\alpha }\left({x}_{0},{\epsilon }_{0}\right)+\mu \sigma \underset{i=0}{\overset{k-1}{\sum }}\text{ }\text{ }{\alpha }_{i}{g}^{\alpha }{\left({x}_{i},{\epsilon }_{i}\right)}^{\text{T}}{d}_{i}+\sigma {\alpha }_{k}{g}^{\alpha }{\left({x}_{k},{\epsilon }_{k}\right)}^{\text{T}}{d}_{k}\\ \le {F}^{\alpha }\left({x}_{0},{\epsilon }_{0}\right)+\mu \sigma \underset{i=0}{\overset{k}{\sum }}\text{ }\text{ }{\alpha }_{i}{g}^{\alpha }{\left({x}_{i},{\epsilon }_{i}\right)}^{\text{T}}{d}_{i}.\end{array}$

Case 2:

$\mathrm{max}\left[{F}^{\alpha }\left({x}_{k},{\epsilon }_{k}\right),\underset{i=0}{\overset{m\left(k\right)-1}{\sum }}{\mu }_{ki}{F}^{\alpha }\left({x}_{k-i},{\epsilon }_{k-i}\right)\right]=\underset{i=0}{\overset{m\left(k\right)-1}{\sum }}{\mu }_{ki}{F}^{\alpha }\left({x}_{k-i},{\epsilon }_{k-i}\right),$

let $n=\mathrm{min}\left[k,m-1\right]$. Then, again from (38),

$\begin{array}{c}{F}^{\alpha }\left({x}_{k+1},{\epsilon }_{k+1}\right)={F}^{\alpha }\left({x}_{k}+{\alpha }_{k}{d}_{k},{\epsilon }_{k+1}\right)\\ \le \underset{j=0}{\overset{n}{\sum }}\text{ }\text{ }{\mu }_{kj}{F}^{\alpha }\left({x}_{k-j},{\epsilon }_{k-j}\right)+\sigma {\alpha }_{k}{g}^{\alpha }{\left({x}_{k},{\epsilon }_{k}\right)}^{\text{T}}{d}_{k}\\ \le \underset{j=0}{\overset{n}{\sum }}\text{ }\text{ }{\mu }_{kj}\left[{F}^{\alpha }\left({x}_{0},{\epsilon }_{0}\right)+\mu \sigma \underset{i=0}{\overset{k-j-2}{\sum }}{\alpha }_{i}{g}^{\alpha }{\left({x}_{i},{\epsilon }_{i}\right)}^{\text{T}}{d}_{i}\\ \begin{array}{c}\text{ }\\ \text{ }\end{array}+\sigma {\alpha }_{k-j-1}{g}^{\alpha }{\left({x}_{k-j-1},{\epsilon }_{k-j-1}\right)}^{\text{T}}{d}_{k-j-1}\right]+\sigma {\alpha }_{k}{g}^{\alpha }{\left({x}_{k},{\epsilon }_{k}\right)}^{\text{T}}{d}_{k}.\end{array}$

Thus, by imposing

$\left(1,2,\cdots ,n\right)×\left(1,2,\cdots ,k-n-2\right)\subset \left\{\left(j,i\right):0\le j\le n,0\le i\le k-n-2\right\},$

and

$\underset{j=0}{\overset{n}{\sum }}\text{ }\text{ }{\mu }_{kj}=1,{\mu }_{kj}\ge \mu ,$

we have

$\begin{array}{l}{F}^{\alpha }\left({x}_{k+1},{\epsilon }_{k+1}\right)\\ \le {F}^{\alpha }\left({x}_{0},{\epsilon }_{0}\right)+\mu \underset{i=0}{\overset{k-n-2}{\sum }}\left(\underset{j=0}{\overset{n}{\sum }}\text{ }\text{ }{\mu }_{kj}\right){\alpha }_{i}{g}^{\alpha }{\left({x}_{i},{\epsilon }_{i}\right)}^{\text{T}}{d}_{i}\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{ }+\sigma \underset{j=0}{\overset{n}{\sum }}\text{ }\text{ }{\mu }_{kj}{\alpha }_{k-j-1}{g}^{\alpha }{\left({x}_{k-j-1},{\epsilon }_{k-j-1}\right)}^{\text{T}}{d}_{k-j-1}+\sigma {\alpha }_{k}{g}^{\alpha }{\left({x}_{k},{\epsilon }_{k}\right)}^{\text{T}}{d}_{k}\end{array}$

$\begin{array}{l}\le {F}^{\alpha }\left({x}_{0},{\epsilon }_{0}\right)+\mu \sigma \underset{i=0}{\overset{k-n-2}{\sum }}{\alpha }_{i}{g}^{\alpha }{\left({x}_{i},{\epsilon }_{i}\right)}^{\text{T}}{d}_{i}\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{ }+\mu \sigma \underset{i=k-j-1}{\overset{k-1}{\sum }}{\alpha }_{i}{g}^{\alpha }{\left({x}_{i},{\epsilon }_{i}\right)}^{\text{T}}{d}_{i}+\sigma {\alpha }_{k}{g}^{\alpha }{\left({x}_{k},{\epsilon }_{k}\right)}^{\text{T}}{d}_{k}\\ ={F}^{\alpha }\left({x}_{0},{\epsilon }_{0}\right)+\mu \sigma \underset{i=0}{\overset{k-1}{\sum }}\text{ }\text{ }{\alpha }_{i}{g}^{\alpha }{\left({x}_{i},{\epsilon }_{i}\right)}^{\text{T}}{d}_{i}+\sigma {\alpha }_{k}{g}^{\alpha }{\left({x}_{k},{\epsilon }_{k}\right)}^{\text{T}}{d}_{k}\\ \le {F}^{\alpha }\left({x}_{0},{\epsilon }_{0}\right)+\mu \sigma \underset{i=0}{\overset{k}{\sum }}\text{ }\text{ }{\alpha }_{i}{g}^{\alpha }{\left({x}_{i},{\epsilon }_{i}\right)}^{\text{T}}{d}_{i}.\end{array}$

Theorem 1. Assume that the sequences $\left\{{x}_{k}\right\}$ and $\left\{{d}_{k}\right\}$ are generated by Algorithm 1. Let F is bounded below on the level set ${\mathcal{L}}_{0}=\left\{x\in {R}^{n}|F\left(x\right)\le F\left({x}_{0}\right)\right\}$ and

$\underset{k\to \infty }{\mathrm{lim}}{\epsilon }_{k}=0.$

Then

$\underset{k\to \infty }{\mathrm{lim}}{g}^{\alpha }{\left({x}_{k},{\epsilon }_{k}\right)}^{\text{T}}{d}_{k}=0.$ (43)

Proof. Suppose that (43) is not true. Then there exist constants $\gamma >0$ and ${k}_{0}$ such that

${g}^{\alpha }{\left({x}_{k},{\epsilon }_{k}\right)}^{\text{T}}{d}_{k}\le -\gamma ,\forall k>{k}_{0}.$ (44)

From Lemma 3, we have

${F}^{\alpha }\left({x}_{0},{\epsilon }_{0}\right)-{F}^{\alpha }\left({x}_{k},{\epsilon }_{k}\right)\ge -\mu \sigma \underset{i=0}{\overset{k-1}{\sum }}\text{ }\text{ }{\alpha }_{r}{g}^{\alpha }{\left({x}_{i},{\epsilon }_{i}\right)}^{\text{T}}{d}_{i}.$ (45)

By (40), (41) and (44), we have

$\begin{array}{l}{F}^{\alpha }\left({x}_{0},{\epsilon }_{0}\right)-{F}^{\alpha }\left({x}_{k},{\epsilon }_{k}\right)\\ \ge -\mu \sigma \underset{i=0}{\overset{k-1}{\sum }}\text{ }\text{ }{\alpha }_{i}{g}^{\alpha }{\left({x}_{i},{\epsilon }_{i}\right)}^{\text{T}}{d}_{i}\ge \mu \sigma \gamma \underset{i=0}{\overset{k-1}{\sum }}\text{ }\text{ }{\alpha }_{i}\\ \ge \mu \sigma \gamma \underset{i=0}{\overset{k-1}{\sum }}\mathrm{min}\left\{1,\frac{\left(1-\sigma \right)\beta }{L}\frac{|{g}^{\alpha }{\left({x}_{k},{\epsilon }_{k}\right)}^{\text{T}}{d}_{k}|}{{‖{d}_{k}‖}^{2}}\right\}\\ \ge \mu \sigma \gamma \underset{i=0}{\overset{k-1}{\sum }}\mathrm{min}\left\{1,\frac{\left(1-\sigma \right)\beta }{25L}\right\}.\end{array}$

Letting $k\to \infty$, we have

$\mu \sigma \gamma \underset{k=0}{\overset{\infty }{\sum }}\mathrm{min}\left\{1,\frac{\left(1-\sigma \right)\beta }{25L}\right\}\le \underset{k=0}{\overset{\infty }{\sum }}\text{ }\text{ }{F}^{\alpha }\left({x}_{0},{\epsilon }_{0}\right)-{F}^{\alpha }\left({x}_{k},{\epsilon }_{k}\right),$

and this contradicts our assumption on F. Hence the theorem is proved.

Theorem 2. Let the conditions in Lemma 1 and Theorem 1 hold, then Algorithm 1 converges for nonsmooth problem (12).

Proof. From Lemma 1 and Theorem 1, we have

$0\ge \underset{k\to \infty }{\mathrm{lim}}\left(-{‖{g}^{\alpha }\left({x}_{k},{\epsilon }_{k}\right)‖}^{2}\right)\ge \underset{k\to \infty }{\mathrm{lim}}{g}^{\alpha }{\left({x}_{k},{\epsilon }_{k}\right)}^{\text{T}}{d}_{k}=0.$

Then,

$\underset{k\to \infty }{\mathrm{lim}}‖{g}^{\alpha }\left({x}_{k},{\epsilon }_{k}\right)‖=0.$ (46)

Thus, (23) and convergence of sequence $\left\{{\epsilon }_{k}\right\}$ yield

$0\le \underset{k\to \infty }{\mathrm{lim}}‖{g}^{\alpha }\left({x}_{k},{\epsilon }_{k}\right)-g\left({x}_{k}\right)‖\le \underset{k\to \infty }{\mathrm{lim}}\sqrt{2\epsilon /\lambda }=0.$

Hence,

$\underset{k\to \infty }{\mathrm{lim}}‖g\left({x}_{k}\right)‖=0.$ (47)

Let ${x}^{\ast }$ be an accumulation point of $\left\{{x}_{k}\right\}$. Then there exists a subsequence ${\left\{{x}_{k}\right\}}_{K}$ satisfying

$\underset{k\in K,k\to \infty }{\mathrm{lim}}{x}_{k}={x}^{\ast }.$ (48)

Thus, (17), (43) and (47) yield ${x}^{\ast }=p\left({x}^{\ast }\right)$. Therefore ${x}^{\ast }$ is an optimal solution of nonsmooth problem (12).

4. Numerical Experiments for Large Scale Nonsmooth Problems

In this section, we present some numerical experiments to examine the efficiency of Algorithm 1 for some large scale nonsmooth academic test problems which are introduced in . The details of these large scale nonsmooth academic test problems with their initial points ${x}_{i}^{\left(1\right)}$ and the minimum values $f\left({x}^{\ast }\right)$ are listed as follows:

Problem 1

$f\left(x\right)=\underset{1\le i\le n}{\mathrm{max}}{x}_{i}^{2}$

${x}_{i}^{\left(1\right)}=i$ for $i=1,\cdots ,n/2$ and

${x}_{i}^{\left(1\right)}=-i$ for $i=n/2+1,\cdots ,n$

$f\left({x}^{\ast }\right)=0.$

Problem 2

$f\left(x\right)=\underset{1\le i\le n}{\mathrm{max}}|\underset{i=1}{\overset{n}{\sum }}\frac{{x}_{j}}{i+j-1}|$

${x}_{i}^{\left(1\right)}=i$ for $i=1,\cdots ,n$.

$f\left({x}^{\ast }\right)=0.$

Problem 3

$f\left(x\right)=\underset{i=1}{\overset{n-1}{\sum }}\mathrm{max}\left\{-{x}_{i}-{x}_{i+1},-{x}_{i}-{x}_{i+1}+\left({x}_{i}^{2}+{x}_{i+1}^{2}-1\right)\right\}$

${x}_{i}^{\left(1\right)}=-0.5$ for $i=1,\cdots ,n$ ;

$f\left({x}^{\ast }\right)=-\sqrt{2\left(n-1\right)}.$

Problem 4

$f\left(x\right)=\underset{i=1}{\overset{n-1}{\sum }}\mathrm{max}\left\{{x}_{i}^{4}+{x}_{i+1}^{2},{\left(2-{x}_{i}\right)}^{2}+{\left(2-{x}_{i+1}\right)}^{2},2{\text{e}}^{-{x}_{i}+{x}_{i+1}}\right\}$

${x}_{i}^{\left(1\right)}=2$ for $i=1,\cdots ,n$ ;

$f\left({x}^{\ast }\right)=2\left(n-1\right).$

Problem 5

$f\left(x\right)=\mathrm{max}\left\{\underset{i=1}{\overset{n-1}{\sum }}\left({x}_{i}^{4}+{x}_{i+1}^{2}\right),\underset{i=1}{\overset{n-1}{\sum }}\left({\left(2-{x}_{i}\right)}^{2}+{\left(2-{x}_{i+1}\right)}^{2}\right),\underset{i=1}{\overset{n-1}{\sum }}\left(2{\text{e}}^{-{x}_{i}+{x}_{i+1}}\right)\right\}$

${x}_{i}^{\left(1\right)}=2$ for $i=1,\cdots ,n$ ;

$f\left({x}^{\ast }\right)=2\left(n-1\right).$

Problem 6

$f\left(x\right)=\underset{1\le i\le n}{\mathrm{max}}\left\{g\left(-\underset{i=1}{\overset{n}{\sum }}\text{ }\text{ }{x}_{i}\right),g\left({x}_{i}\right)\right\}$,

where $g\left(y\right)=\mathrm{ln}\left(|y|+1\right)$ ;

${x}_{i}^{\left(1\right)}=1$ for $i=1,\cdots ,n$ ;

$f\left({x}^{\ast }\right)=0.$

Problem 7

$f\left(x\right)=\underset{i=1}{\overset{n-1}{\sum }}\left({|{x}_{i}|}^{{x}_{i+1}^{2}+1}+{|{x}_{i+1}|}^{{x}_{i}^{2}+1}\right)$

${x}_{i}^{\left(1\right)}=-1$ when $\mathrm{mod}\left(i,2\right)=1,\left(i=1,\cdots ,n\right)$ and

${x}_{i}^{\left(1\right)}=1$ when $\mathrm{mod}\left(i,2\right)=0,\left(i=1,\cdots ,n\right)$ ;

$f\left({x}^{\ast }\right)=0.$

Problem 8

$f\left(x\right)=\underset{i=1}{\overset{n-1}{\sum }}\left(-{x}_{i}+2\left({x}_{i}^{2}+{x}_{i+1}^{2}-1\right)+1.75|{x}_{i}^{2}+{x}_{i+1}^{2}-1|\right)$

${x}_{i}^{\left(1\right)}=-1$ for $i=1,\cdots ,n$ ;

$f\left({x}^{\ast }\right)=varies.$

Problem 9

$f\left(x\right)=\mathrm{max}\left\{\underset{i=1}{\overset{n-1}{\sum }}\left({x}_{i}^{2}+{\left({x}_{i+1}-1\right)}^{2}+{x}_{i+1}-1\right),\underset{i=1}{\overset{n-1}{\sum }}\left(-{x}_{i}^{2}-{\left({x}_{i+1}-1\right)}^{2}+{x}_{i+1}+1\right)\right\}$

${x}_{i}^{\left(1\right)}=-1.5$ when $\mathrm{mod}\left(i,2\right)=1,\left(i=1,\cdots ,n\right)$ and

${x}_{i}^{\left(1\right)}=2.0$ when $\mathrm{mod}\left(i,2\right)=0,\left(i=1,\cdots ,n\right)$ ;

$f\left({x}^{\ast }\right)=0.$

Problem 10

$f\left(x\right)=\underset{i=1}{\overset{n-1}{\sum }}\mathrm{max}\left\{{x}_{i}^{2}+{\left({x}_{i+1}-1\right)}^{2}+{x}_{i+1}-1,-{x}_{i}^{2}-{\left({x}_{i+1}-1\right)}^{2}+{x}_{i+1}+1\right\}$

${x}_{i}^{\left(1\right)}=-1.5$ when $\mathrm{mod}\left(i,2\right)=1,\left(i=1,\cdots ,n\right)$ and

${x}_{i}^{\left(1\right)}=2.0$ when $\mathrm{mod}\left(i,2\right)=0,\left(i=1,\cdots ,n\right)$ ;

$f\left({x}^{\ast }\right)=0.$

The problems 1 - 5 are convex functions, and the others are nonconvex functions. We test the above problems with the dimension of $n=1000$, $n=3000$, $n=5000$, $n=6000$, $n=10000$, $n=12000$, $n=20000$, $n=50000$, $n=60000$ and $n=100000$. For convenience sake, we denote Algorithm 1 by scaled conjugate gradient method based on modified secant equation of BFGS method (SCG-MBFGS), and in order to demonstrate validity of our algorithm, we also list the results of other three algorithms MPRP in , MHS in  and MSBFGS-CG in . All algorithms were implemented in Fortran 90 and run on a PC with an intel(R) Core(TM)i3-3110M CPU at 2.40 GHz, 4.00 GB of RAM, and the Windows 7 operating system. We stopped the iteration when the condition $‖{g}^{\alpha }\left({x}_{k},{\epsilon }_{k}\right)‖\le {10}^{-10}$ was satisfied. The parameters for SCG-MBFGS were chosen as $M=10\beta =0.6,\sigma =0.85\lambda =\mu =1$. All parameters for other three methods are chosen as in   and  respectively. Table 1 shows the numerical results of SCG-MBFGS, MPRP, MHS and MSBFGS-CG on the given test problems. The columns in Table 1 have the following meanings:

Dim: the dimensions of problem.

NI: the total number of iterations.

NF: the number of function evaluations.

TIME: the CPU time in seconds.

$f\left(x\right)$ : the value of $f\left(x\right)$ at the final iteration.

From the numerical results in Table 1, it is not difficult to see that

Table 1. Numerical results for 10 problems with given initial points and dimensions.

SCG-MBFGS is superior or competitive to the other three methods in solving the given problems in terms of number of iteration, number of function evaluations and CPU time. Furthermore, to directly illustrate the performances of our method, we employed the tool provided by Dolan and Moré  to analyze and compare the efficiency of the method in terms of the number of iterations, number of function evaluations and CPU time. Figures 1-3 represent the computational performance profiles of the above algorithms regarding the number of iterations, number of function evaluations and CPU time respectively. From the 3 figures, we can observe that for the given test problems, SCG-MBFGS is competitive or superior to other three methods in terms of number of iteration, function evaluations and CPU time respectively.

From Figure 1 and Figure 2, we also notice that SCG-MBFGS performs better than the other methods do in terms of the numbers of iterations and function evaluations. Figure 3 indicates that MHS is comparable to SCG-MBFGS in terms of CPU time, and since the search direction of MHS is developed with only first order information while SCG-MBFGS, MPRP and MSBFGS-CG are with second order information, it is reasonable to need less CPU time for MHS.

Figure 1. Performance profiles of these three methods based on NI.

Figure 2. Performance profiles of these three methods based on NF.

Figure 3. Performance profiles of these three methods based on CPUTIME.

5. Conclusion

In this paper, we propose a new scaled conjugate gradient method which incorporates a modified secant equation of BFGS method. This modified secant equation contains both function value and gradient information of the objective function, and its Hessian approximation update generates positive definite matrix. Under a modified nonmonotone line search and some mild conditions, the strong global convergence of the proposed method is analyzed for nonsmooth convex problems. The search direction of our new method generates sufficiently descent condition and belongs to a trust region. Compared with existing nonsmooth CG methods, the search direction of our approach is more descent direction. Numerical results and related comparisons show that the proposed method is effective for solving large scale nonsmooth optimization problems.

Acknowledgements

The authors would like to thank the reviewers and editor for their valuable comments which greatly improve our paper. This work is supported by the National Natural Science Foundation of China [Grant No. 11771003].

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

  Broyden, C.G. (1970) The Convergence of a Class of Double-Rank Minimization Algorithms: I. General Considerations. Journal of the Institute of Mathematics and Its Applications, 6, 76-90. https://doi.org/10.1093/imamat/6.1.76  Fletcher, R. (1970) A New Approach to Variable Metric Algorithms. The Computer Journal, 13, 317-322. https://doi.org/10.1093/comjnl/13.3.317  Goldfarb, D. (1970) A Family of Variable Metric Methods Derived by Variation Mean. Mathematics of Computation, 23, 23-26. https://doi.org/10.1090/S0025-5718-1970-0258249-6  Shanno, D.F. (1970) Conditioning of Quasi-Newton Methods for Function Minimization. Mathematics of Computation, 24, 647-656. https://doi.org/10.1090/S0025-5718-1970-0274029-X  Hestenes, M.R. and Stiefel, E. (1952) Methods of Conjugate Gradients for Solving Linear Systems. Journal of Research of the National Bureau of Standards, 49, 409-436. https://doi.org/10.6028/jres.049.044  Fletcher, R. and Reeves, C.M. (1964) Function Minimization by Conjugate Gradients. The Computer Journal, 7, 149-154. https://doi.org/10.1093/comjnl/7.2.149  Polak, H. (1969) The Conjugate Gradient Method in Extreme Problems. USSR Computational Mathematics and Mathematical Physics, 9, 94-112. https://doi.org/10.1016/0041-5553(69)90035-4  Dai, Y.H. and Yuan, Y. (1999) A Nonlinear Conjugate Gradient Method with a Strong Global Convergence Property. SIAM Journal on Optimization, 10, 177-182. https://doi.org/10.1137/S1052623497318992  Hager, W.W. and Zhang, H. (2006) A Survey of Nonlinear Conjugate Gradient Methods. Pacific Journal of Optimization, 2, 35-58.  Sun, W.Y. and Yuan, Y.X. (2006) Optimization Theory and Methods: Nonlinear Programming. Springer, New York.  Nocedal, J. (1992) Theory of Algorithms for Unconstrained Optimization. Acta Numerica, 1, 199-242. https://doi.org/10.1017/S0962492900002270  Dennis, J.E. and Moré, J.J. (1974) A Characterization of Superlinear Convergence and Its Application to Quasi-Newton Methods. Mathematics of Computation, 28, 549-560. https://doi.org/10.1090/S0025-5718-1974-0343581-1  Byrd, R., Nocedal, J. and Yuan, Y. (1987) Global Convergence of a Class of Quasi-Newton Methods on Convex Problems. SIAM Journal on Numerical Analysis, 24, 1171-1189. https://doi.org/10.1137/0724077  Byrd, R. and Nocedal, J. (1989) A Tool for the Analysis of Quasi-Newton Methods with Application to Unconstrained Minimization. SIAM Journal on Numerical Analysis, 26, 727-739. https://doi.org/10.1137/0726042  Griewank, A. (1991) The Global Convergence of Partitioned BFGS on Problems with Convex Decompositions and Lipschitzian Gradients. Mathematical Programming, 50, 141-175. https://doi.org/10.1007/BF01594933  Mascarenhas, W.F. (2004) The BFGS Method with Exact Line Searches Fails for Non-Convex Objective Functions. Mathematical Programming, 99, 49-61. https://doi.org/10.1007/s10107-003-0421-7  Dai, Y.H. (2002) Convergence Properties of the BFGS Algorithm. SIAM Journal on Optimization, 13, 693-701. https://doi.org/10.1137/S1052623401383455  Wolfe, P. (1969) Convergence Conditions for Ascent Methods. SIAM Review, 11, 226-235. https://doi.org/10.1137/1011036  Grippo, L., Lampariello, F. and Lucidi, S. (1986) A Nonmonotone Line Search Technique for Newton’s Method. SIAM Journal on Scientific Computing, 23, 707-716. https://doi.org/10.1137/0723046  Toint, P.L. (1996) An Assessment of Nonmonotone Line Search Technique for Unconstrained Optimization. SIAM Journal on Scientific Computing, 17, 725-739. https://doi.org/10.1137/S106482759427021X  Toint, P.L. (1997) A Nonmonotone Trust Region Algorithm for Nonlinear Programming Subject to Convex Constraints. Mathematical Programming, 77, 69-94. https://doi.org/10.1007/BF02614518  Panier, E.R. and Tits, A.L. (1991) Avoiding Maratos Effect by Means of Nonmonotone Line Search Constrained Problems. SIAM Journal on Numerical Analysis, 28, 1183-1190. https://doi.org/10.1137/0728063  Yu, Z. and Pu, D. (2008) A New Nonmonotone Line Search Technique for Unconstrained Optimization. Journal of Computational and Applied Mathematics, 219, 134-144. https://doi.org/10.1016/j.cam.2007.07.008  Yuan, G., Wei, Z. and Wu, Y. (2010) Modified Limited Memory BFGS Method with Nonmonotone Line Search for Unconstrained Optimization Problems. Journal of the Korean Mathematical Society, 47, 767-788. https://doi.org/10.4134/JKMS.2010.47.4.767  Li, X., Wang, B. and Hu, W. (2017) A Modified Nonmonotone BFGS Algorithm for Unconstrained Optimization. Journal of Inequalities and Applications, 183, 1-18. https://doi.org/10.1186/s13660-017-1453-5  Su, K. and Rong, Z. (2015) A Spectral Conjugate Gradient Method under Modified Nonmonotone Line Search Technique. Mathematica Aeterna, 5, 537-549.  Andrei, A. (2007) Scaled Conjugate Gradient Algorithms for Unconstrained Optimization. Computational Optimization and Applications, 38, 401-416. https://doi.org/10.1007/s10589-007-9055-7  Babaie-Kafaki, S. (2013) A Modified Scaled Memoryless BFGS Preconditioned Conjugate Gradient Method for Unconstrained Optimization. 4OR, 11, 361-374. https://doi.org/10.1007/s10288-013-0233-4  Babaie-Kafaki, S. and Chanbari, R. (2017) A Class of Descent Four-Term Extension of the Dai-Liao Conjugate Gradient Method Based on the Scaled Memoryless BFGS Update. Journal of Industrial & Management Optimization, 13, 649-658. https://doi.org/10.3934/jimo.2016038  Yuan, G., Wei, Z. and Li, G. (2014) A Modified Polak-Ribière-Polyak Conjugate Gradient Algorithm for Nonsmooth Convex Programs. Journal of Computational and Applied Mathematics, 255, 86-96. https://doi.org/10.1016/j.cam.2013.04.032  Yuan, G., Wei, Z. and Li, Y. (2015) A Modified Hestenes and Stiefel Conjugate Gradient Algorithm for Large-Scale Nonsmooth Minimizations and Nonlinear Equations. Journal of Optimization Theory and Applications, 168, 129-152. https://doi.org/10.1007/s10957-015-0781-1  Yuan, G. and Wei, Z. (2015) A Modified PRP Conjugate Gradient Algorithm with Nonmonotone Line Search for Nonsmooth Convex Optimization Problems. Journal of Applied Mathematics and Computing, 51, 397-412. https://doi.org/10.1007/s12190-015-0912-8  Yuan, G., Sheng, Z. and Liu, W. (2016) The Modified HZ Conjugate Gradient Algorithm for Large-Scale Nonsmooth Optimization. PLoS ONE, 11, e0164289. https://doi.org/10.1371/journal.pone.0164289  Yuan, G. and Wei, Z. (2012) The Barzilai and Browein Gradient Method with Nonmonotone Line Search for Nonsmooth Convex Optimization Problems. Mathematical Modelling and Analysis, 17, 203-216. https://doi.org/10.3846/13926292.2012.661375  Cui, Z., Yuan, G., Sheng, Z., Liu, W., Wang, X. and Duan, X. (2015) A Modified BFGS Formula Using a Trust Region Model for Nonsmooth Convex Minimizations. PLoS ONE, 10, e0140606. https://doi.org/10.1371/journal.pone.0140606  Burke, J.V. and Qian, M. (2000) On the Superlinear Convergence of the Variable Metric Proximal Point Algorithm Using Broyden and BFGS Matrix Secant Updating. Mathematical Programming, 88, 157-181. https://doi.org/10.1007/PL00011373  Chen, X. and Fukushima, M. (1999) Proximal Quasi-Newton Methods for Nondifferentiale Convex Optimization. Mathematical Programming, 85, 313-334. https://doi.org/10.1007/s101070050059  Sagara, N. and Fukushima, M. (2005) A Trust Region Method for Nonsmooth Convex Optimization. Journal of Industrial & Management Optimization, 1, 171-180. https://doi.org/10.3934/jimo.2005.1.171  Ou, Y. and Zhou, X. (2018) A Modified Scaled Memoryless BFGS Preconditioned Conjugate Gradient Algorithm for Nonsmooth Convex Optimization. Journal of Industrial & Management Optimization, 14, 785-801. https://doi.org/10.3934/jimo.2017075  Hiriart-Urruty, J.B. and Lemaréchal, C. (1993) Convex Analysis and Minimization Algorithms. Springer, Berlin. https://doi.org/10.1007/978-3-662-02796-7  Fukushima, M. and Qi, L. (1996) A Global and Superlinearly Convergent Algorithm for Nonsmooth Convex Minimization. SIAM Journal on Optimization, 6, 1106-1120. https://doi.org/10.1137/S1052623494278839  Qi, L. and Sun, J. (1993) A Nonsmooth Version of Newton’s Method. Mathematical Programming, 58, 353-367. https://doi.org/10.1007/BF01581275  Miffin, R. (1996) A Quasi-Second-Order Proximal Bundle Algorithm. Mathematical Programming, 73, 51-72. https://doi.org/10.1007/BF02592098  Bonnans, J.F., Gilbert, J.C., Lemaréchal, C. and Sagastizábal, C. (1995) A Family of Variable-Metric Proximal Methods. Mathematical Programming, 68, 15-47. https://doi.org/10.1007/BF01585756  Lemaréchal, C. and Sagastizábal, C. (1997) Practical Aspects of the Moreu-Yosida Regularization: Theoretical Preliminaries. SIAM Journal on Optimization, 7, 367-385. https://doi.org/10.1137/S1052623494267127  Rauf, A.I. and Fukushima, M. (2000) A Globally Convergent BFGS Method for Nonsmooth Convex Optimization. Journal of Optimization Theory and Applications, 104, 539-558. https://doi.org/10.1023/A:1004633524446  Conn, A.R., Gould, N.I.M. and Toint, P.L. (2000) Trust Region Methods. SIAM, Philadelphia. https://doi.org/10.1137/1.9780898719857  Li, D.H. and Fukushima, M. (1999) On the Global Convergence of BFGS Method for Nonconvex Unconstrained Optimization Problems. SIAM Journal on Optimization, 11, 1054-1064. https://doi.org/10.1137/S1052623499354242  Haarala, M. and Mäkelä, M.M. (2004) New Limited Memory Bundle Method for Large-Scale Nonsmooth Optimization. Optimization Methods and Software, 19, 673-692. https://doi.org/10.1080/10556780410001689225  Dolan, E.D. and Moré, J.J. (2002) Benchmarking Optimization Software with Performance Profiles. Mathematical Programming, 91, 201-213. https://doi.org/10.1007/s101070100263     customer@scirp.org +86 18163351462(WhatsApp) 1655362766  Paper Publishing WeChat 