New Parameter of CG-Method with Exact Line Search for Unconstraint Optimization

Abstract

In this paper, a new CG method has been introduced to solve nonlinear equations systems. This method achieved the conditions of descent and global convergence, using the exact line search. The numerical results were good compared to other methods in terms of the number of iterations and the number of functions evaluation.

Keywords

Share and Cite:

Hady, M. and Younis, M. (2020) New Parameter of CG-Method with Exact Line Search for Unconstraint Optimization. Open Access Library Journal, 7, 1-8. doi: 10.4236/oalib.1106236. 1. Introduction

The conjugate gradient method is one of the important ways to find the minimum value of a function for unconstrained optimization.

The conjugate gradient method is widespread because its requirements are a small memory. Unconstrained optimization problem can be expressed as follows:

${\mathrm{min}}_{x\in {R}^{n}}f\left(x\right)$ (1)

where $f:{R}^{n}\to R$ is a continuous and derivative function. The CG method generates frequent updates in this format.

${x}_{k+1}={x}_{k}+{\alpha }_{k}{d}_{k},\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}k=1,2,3,4$ (2)

where xk is the current iteration point, ${\alpha }_{k}>0$ is the positive step size using the “exact line search” as shown by the following:

${\alpha }_{k}={\mathrm{min}}_{\alpha >0}f\left({x}_{k}+{\alpha }_{k}{d}_{k}\right)$ (3)

and dk is the search direction, which we get as follows:

${d}_{k}=\left\{\begin{array}{l}-{g}_{k}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{for}\text{\hspace{0.17em}}k=0\\ -{g}_{k}+{\beta }_{k}{d}_{k-1}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{ }\text{\hspace{0.17em}}\text{for}\text{\hspace{0.17em}}k\ge 1\end{array}$ (4)

where k is integer and that gk is the gradient of the function f(x) and that βk is the coefficient of the conjugate gradient associated with the function f(x) at the point xk.

Some of the known conjugation methods are:

${\beta }_{k}^{FR}=\frac{{g}_{k}^{\text{T}}{g}_{k}}{{‖{g}_{k-1}‖}^{2}},\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\beta }_{k}^{PR}=\frac{{g}_{k}^{\text{T}}\left({g}_{k}-{g}_{k-1}\right)}{{‖{g}_{k-1}‖}^{2}}$

${\beta }_{k}^{HS}=\frac{{g}_{k}^{\text{T}}\left({g}_{k}-{g}_{k-1}\right)}{{\left({g}_{k}-{g}_{k-1}\right)}^{\text{T}}{d}_{k-1}},\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\beta }_{k}^{DY}=\frac{{g}_{k}^{\text{T}}{g}_{k}}{{\left({g}_{k}-{g}_{k-1}\right)}^{\text{T}}{d}_{k-1}}$

${\beta }_{k}^{Ls}=\frac{{g}_{k}^{\text{T}}\left({g}_{k}-{g}_{k-1}\right)}{-{g}_{k-1}^{\text{T}}{d}_{k-1}},\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\beta }_{k}^{CD}=-\frac{{g}_{k}^{\text{T}}{g}_{k}}{{d}_{k-1}^{\text{T}}{g}_{k-1}}$

The coefficient gradient coefficient ${\beta }_{k}\in R$ is a numerical constant, which determines the difference in different CG methods when ${g}_{k-1},{g}_{k}$ denote the gradient of a function f(x) at points ${x}_{k-1},{x}_{k}$ , respectively.

The above methods are known as:

Fletcher and Reeves (FR)  , Polka and Ribiere (PR)  , Hestenes and Steifel (HS)  , Dai and Yuan (DY)  , Liu and Story (LS)  , Conjugate Descent (CD) by Fletcher  .

These aforementioned methods behave strictly convex quadratic functions in a behavior that is completely different from what they do in non-quadratic general functions. In any case, most of these methods examine the properties of universal approach in the field of conjugated gradient.

However, in recent years, there have been many attempts that have been directed towards building new formulas for CG methods with good numerical performance and achieving the characteristics of global convergence.

2. The New Conjugate Gradient and Its Algorithm

It is well known that the methods of numerical optimization are iterative methods and there is no specific method suitable for all types of problems. Each method has its advantages and new features as well as some of the characteristics that are not good and are efficient for some types of problems and not efficient for other types of problems.

The new coefficient of gradient is

${\beta }_{k}^{ME}=\frac{{g}_{k}^{\text{T}}{g}_{k}}{{\left({g}_{k}+{d}_{k-1}\right)}^{\text{T}}{d}_{k-1}}$ (5)

New method algorithm

Step (1): Set $ϵ>0,{d}_{0}=-{g}_{0},k=0$ and choose an initial value X0

Step (2): Calculate ${\beta }^{ME}$ from (5)

Step (3): Calculate ${d}_{k}=-{g}_{k}+{\beta }_{k}^{ME}{d}_{k-1}$

In the case if $‖{g}_{k}‖=0$ , stop

Step (4): Calculate ${\alpha }_{k}={\mathrm{min}}_{\alpha >0}f\left({x}_{k}+\alpha {d}_{k}\right)$

Step (5): Calculate the new point with the following iterative formula:

${x}_{k+1}={x}_{k}+{\alpha }_{k}{d}_{k}$ (6)

Step (6): Test if it is

$f\left({x}_{k+1}\right)

And also

$‖{g}_{k}‖\le ϵ$ Stop

Otherwise, go to step (1) with k = k + 1

The coefficient ${\beta }_{k}$ is chosen in such a way that ${d}_{k+1}$ is G-conjugate to ${d}_{0},{d}_{1},{d}_{2},\cdots ,{d}_{k}$ .

Lemma (1)

In the conjugate direction algorithm

${g}_{k+1}^{\text{T}}{d}_{i}=0\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{for}\text{\hspace{0.17em}}\text{all}\text{\hspace{0.17em}}\text{ }k,\text{\hspace{0.17em}}0\le k\le n-1\text{ }\text{\hspace{0.17em}}\text{and}\text{\hspace{0.17em}}\text{ }0\le i\le k.$

Proposition: In the conjugate gradient algorithm the direction ${d}_{0},{d}_{1},\cdots ,{d}_{n-1}$ are G-conjugate.

Proof: By using induction

We first show

${d}_{0}^{\text{T}}G{d}_{1}=0$

$\begin{array}{c}{d}_{0}^{\text{T}}G{d}_{1}={d}_{0}^{\text{T}}G\left(-{g}_{1}+{\beta }_{1}{d}_{0}\right)\\ =-{d}_{0}^{\text{T}}G{g}_{1}+{\beta }_{1}{d}_{0}^{\text{T}}G{d}_{0}\end{array}$

by ELS

$=\frac{{\beta }_{1}{d}_{0}^{\text{T}}\left({g}_{1}-{g}_{0}\right)}{{\alpha }_{0}}$ when ${\alpha }_{0}$ in (3)

$=\frac{-{\beta }_{1}{d}_{0}^{\text{T}}{g}_{0}}{{\alpha }_{0}}$

$=-\frac{{g}_{1}^{\text{T}}{g}_{1}}{{\left({g}_{1}+{d}_{0}\right)}^{\text{T}}{d}_{0}}\cdot \frac{{d}_{0}^{\text{T}}{g}_{0}}{{\alpha }_{0}}$ by Lemma (1) and ELS we get =zero

Now we assume that ${d}_{k-1}^{\text{T}}G{d}_{k}=0$ is correct. And we prove that ${d}_{k}^{\text{T}}G{d}_{k+1}=0$

$\begin{array}{c}{d}_{k}^{\text{T}}G{d}_{k+1}={d}_{k}^{\text{T}}G\left(-{g}_{k+1}+{\beta }_{k+1}{d}_{k}\right)\\ =-{d}_{k}^{\text{T}}G{g}_{k+1}+{\beta }_{k+1}{d}_{k}^{\text{T}}G{d}_{k}\\ =\frac{{\beta }_{k+1}{d}_{k}^{\text{T}}\left({g}_{k+1}-{g}_{k}\right)}{{\alpha }_{k}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{when}\text{\hspace{0.17em}}{\alpha }_{k}\text{\hspace{0.17em}}\text{in}\text{\hspace{0.17em}}\left(\text{3}\right)\\ =-{\beta }_{k+1}\frac{{d}_{k}^{\text{T}}{g}_{k}}{{\alpha }_{k}}\\ =\frac{-{g}_{k+1}^{\text{T}}{g}_{k+1}}{{\left({g}_{k+1}+{d}_{k}\right)}^{\text{T}}{d}_{k}}\cdot \frac{{d}_{k}^{\text{T}}{g}_{k}}{{\alpha }_{k}}\end{array}$

By Lemma (1) and ELS we get ${d}_{k}^{\text{T}}G{d}_{k+1}=0$ .

The fulfillment of the descent condition ${g}_{k}^{\text{T}}{d}_{k}<0$ .

The new method is shown as follows:

${g}_{k}^{\text{T}}{d}_{k}=-{g}_{k}^{\text{T}}{g}_{k}+{\beta }^{M}{g}_{k}^{\text{T}}{d}_{k-1}$

By ELS, we get

${g}_{k}^{\text{T}}{d}_{k}=-{g}_{k}^{\text{T}}{g}_{k}=-{‖{g}_{k}‖}^{2}<0$

So ${g}_{k}^{\text{T}}{d}_{k}<0$ .

Thus the descent condition is held.

3. Global Convergence

An analysis of the overall convergence using the Exact Line search (ELS) demonstrates according to the following hypotheses:

1) In the neighborhood N of L the function f(x) is continuous, derivative, bound and defined at the level set $L=\left\{x,f\left(x\right)\le f\left({x}_{0}\right)\right\}$ , when x0 is an initial point.

2) The gradient is Lipschitz condition when there is a constant number L > 0 so that

$‖g\left(x\right)-g\left(y\right)‖\le L‖x-y‖,\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{for}\text{\hspace{0.17em}}\text{all}\text{\hspace{0.17em}}x,y\in N$

According to these assumptions we have the following taken by Zoutendijk  .

Lemma 2: Assuming assumption 1) is correct, we consider the conjugate regression methods formulated in formula (3), where dk is the descent search direction, ${\alpha }_{k}$ fulfills the exact line search of the minimization rules, so the following condition defined by the Zoutendijk condition is held:

$\underset{k=0}{\overset{\infty }{\sum }}\frac{{\left({g}_{k}^{\text{T}}{d}_{k}\right)}^{2}}{{‖{d}_{k}‖}^{2}}<\infty$ (7)

From Lemma (2), we can obtain a convergence theorem of the conjugate gradient CG method using

${\beta }_{k}^{ME}=\frac{{g}_{k}^{\text{T}}{g}_{k}}{{\left({g}_{k}+{d}_{k-1}\right)}^{\text{T}}{d}_{k-1}}$ (8)

Theorem 1: Suppose that the assumption 1) is satisfied. Consider every CG method in the form (4), where ${\alpha }_{k}$ is obtained by the exact minimization rules. Then either

$\underset{k\to \infty }{\mathrm{lim}}‖{g}_{k}‖=0\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{or}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\underset{k=0}{\overset{\infty }{\sum }}\frac{{\left({g}_{k}^{\text{T}}{d}_{k}\right)}^{2}}{{‖{d}_{k}‖}^{2}}<\infty$ (9)

Proof. By contradiction, if theorem 1 is not true, there exists a constant $c>0$ such that

$‖{g}_{k}‖\ge c$ (10)

${d}_{k}=-{g}_{k}+{\beta }_{k}{d}_{k-1}$

${d}_{k}+{g}_{k}={\beta }_{k}{d}_{k-1}$

Squaring both sides

${‖{d}_{k}‖}^{2}+2{g}_{k}^{\text{T}}{d}_{k}+{‖{g}_{k}‖}^{2}={|{\beta }_{k}|}^{2}{‖{d}_{k-1}‖}^{2}$

${‖{d}_{k}‖}^{2}={|{\beta }_{k}|}^{2}{‖{d}_{k-1}‖}^{2}-2{g}_{k}^{\text{T}}{d}_{k}-{‖{g}_{k}‖}^{2}$ (11)

But ${g}_{k}^{\text{T}}{d}_{k}=-c{‖{g}_{k}‖}^{2}$ .

Dividing both sides of (11) by ${\left({g}_{k}^{\text{T}}{d}_{k}\right)}^{2}$ given

$\begin{array}{c}\frac{{‖{d}_{k}‖}^{2}}{{\left({g}_{k}^{\text{T}}{d}_{k}\right)}^{2}}=\frac{{|{\beta }_{k}|}^{2}{‖{d}_{k-1}‖}^{2}}{{\left({g}_{k}^{\text{T}}{d}_{k}\right)}^{2}}-\frac{2{g}_{k}^{T}{d}_{k}}{{\left({g}_{k}^{\text{T}}{d}_{k}\right)}^{2}}-\frac{{‖{g}_{k}‖}^{2}}{{\left({g}_{k}^{\text{T}}{d}_{k}\right)}^{2}}\le \frac{{|{\beta }_{k}|}^{2}{‖{g}_{k-1}‖}^{2}}{{\left({g}_{k}^{\text{T}}{d}_{k}\right)}^{2}}+\frac{1}{{‖{g}_{k}‖}^{2}}\\ \le {\left\{\frac{{g}_{k}^{\text{T}}{g}_{k}}{{\left({g}_{k}+{d}_{k-1}\right)}^{\text{T}}{d}_{k-1}}\right\}}^{2}\frac{{‖{d}_{k-1}‖}^{2}}{{\left({g}_{k}^{\text{T}}{d}_{k}\right)}^{2}}+\frac{1}{{‖{g}_{k}‖}^{2}}\\ \le {\left\{\frac{{‖{g}_{k}‖}^{2}}{{g}_{k}^{\text{T}}{d}_{k-1}+{‖{d}_{k-1}‖}^{2}}\right\}}^{2}\frac{{‖{d}_{k-1}‖}^{2}}{{\left({‖{g}_{k}‖}^{2}\right)}^{2}}+\frac{1}{{‖{g}_{k}‖}^{2}}\\ \le \frac{{‖{g}_{k}‖}^{4}}{{‖{d}_{k-1}‖}^{4}}\frac{{‖{d}_{k-1}‖}^{2}}{{‖{g}_{k}‖}^{4}}+\frac{1}{{‖{g}_{k}‖}^{2}}\le \frac{1}{{‖{d}_{k-1}‖}^{2}}+\frac{1}{{‖{g}_{k}‖}^{2}}\end{array}$ (12)

But note that $\frac{1}{{‖{d}_{0}‖}^{2}}=\frac{1}{{‖{g}_{0}‖}^{2}}$ , then from (12) we get

$\frac{{‖{g}_{k}‖}^{2}}{{\left({g}_{k}^{\text{T}}{d}_{k}\right)}^{2}}\le \frac{1}{{‖{g}_{k-1}‖}^{2}}+\frac{1}{{‖{g}_{k}‖}^{2}}\le \underset{i=0}{\overset{k}{\sum }}\frac{1}{{‖{g}_{i}‖}^{2}}$

$\therefore \frac{{\left({g}_{k}^{\text{T}}{d}_{k}\right)}^{2}}{{‖{d}_{k}‖}^{2}}\ge \frac{{c}^{2}}{k}$ (13)

From (10) and (13) we get

$\underset{k=0}{\overset{\infty }{\sum }}\frac{{\left({g}_{k}^{\text{T}}{d}_{k}\right)}^{2}}{{‖{d}_{k}‖}^{2}}=\infty$

This contradicts the Zoutendijk condition in lemma (2) which completes the proof. □

4. Numerical Results

In this section we consider the numerical solution for this research. The conjugate gradient method of ME, Dai and Yuan, and Fletcher and Reeves were tested. Some test problems considered in Andrei  . We are selected based on the number of iteration and number of function evaluation (Table 1 and Table 2).

Table 1. Comparison of the algorithms for n = 100.

Table 2. Comparison of the algorithms for n = 1000.

5. Conclusion

A new kind of parameter in the conjugate gradient method for large scale unconstrained optimization problems is proposed. Numerical results are detected that the new method is superior in practice with competitive DY and FR methods.

A List of Test Function

F1 Extended Trigonometric Function.

F2 Diagonal 2 function.

F3 Extended Tridiagonal −1 function.

F4 Extended Three Exponential Terms.

F5 Generalized PSC1 function.

F6 Extended PSC1 Function.

F7 Extended Block Diagonal BD1 function.

F8 Extended Quadratic Penalty QP1 function.

F9 Extended Tridiagonal −2 function.

F10 Nondquar (CUTE).

F11 DIXMAANC (CUTE).

F12 DIXMAANE (CUTE).

F13 EDENSCH function (CUTE).

F14 STAIRCASE S1/F52 VARDIM function (CUTE).

F15 ENGVAL1 (CUTE).

F16 DENSCHNA (CUTE).

F17 DENSCHNB (CUTE).

F18 DIGGSB1 (CUTE).

F19 Diagonal 7.

F20 SINCOS.

F21 HIMMELBG (CUTE).

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

  Fletcher, R. and Reeves, C.M. (1964) Function Minimization by Conjugate Gradients. The Computer Journal, 7, 149-154. https://doi.org/10.1093/comjnl/7.2.149  Polak, E. and Ribiere, G. (1969) Note sur la convergence de méthodes de directions conjuguées. ESAIM: Mathematical Modelling and Numerical Analysis-Modélisation Mathématique et Analyse Numérique, 3, 35-43. https://doi.org/10.1051/m2an/196903R100351  Hestenes, M.R. and Stiefel, E. (1952) Methods of Conjugate Gradients for Solving Linear Systems. Journal of Research National Bureau Standards, 49, 409-436. https://doi.org/10.6028/jres.049.044  Dai, Y.-H. and Yuan, Y. (1999) A Nonlinear Conjugate Gradient Method with a Strong Global Convergence Property. SIAM Journal on Optimization, 10, 177-182. https://doi.org/10.1137/S1052623497318992  Liu, Y. and Storey, C. (1991) Efficient Generalized Conjugate Gradient Algorithms, Part 1: Theory. Journal of Optimization Theory and Applications, 69, 129-137. https://doi.org/10.1007/BF00940464  Fletcher, R. (1987) Practical Methods of Optimization, Vol. 1, Unconstrained Optimization. Wiley, New York.  Sun, J. and Zhang, J. (2001) Global Convergence of Conjugate Gradient Methods without Line Search. Annals of Operations Research, 103, 161-173. https://doi.org/10.1023/A:1012903105391  Andrei, N. (2008) An Unconstrained Optimization Test Functions Collection. Advanced Modeling and Optimization, 10, 147-161.     customer@scirp.org +86 18163351462(WhatsApp) 1655362766  Paper Publishing WeChat 