New Parameter of CG-Method with Exact Line Search for Unconstraint Optimization

Mohammed M. Abdel Hady; Maha S. Younis

doi:10.4236/oalib.1106236

Open Access Library Journal > Vol.7 No.4, April 2020

New Parameter of CG-Method with Exact Line Search for Unconstraint Optimization

Mohammed M. Abdel Hady, Maha S. Younis
Department of Mathematics, College of Education for Pure Science, University of Mosul, Mosul, Iraq.
DOI: 10.4236/oalib.1106236 PDF HTML XML 220 Downloads 838 Views Citations

Abstract

In this paper, a new CG method has been introduced to solve nonlinear equations systems. This method achieved the conditions of descent and global convergence, using the exact line search. The numerical results were good compared to other methods in terms of the number of iterations and the number of functions evaluation.

Keywords

Parameter, CG-Method, Optimization

Share and Cite:

Hady, M. and Younis, M. (2020) New Parameter of CG-Method with Exact Line Search for Unconstraint Optimization. Open Access Library Journal, 7, 1-8. doi: 10.4236/oalib.1106236.

1. Introduction

The conjugate gradient method is one of the important ways to find the minimum value of a function for unconstrained optimization.

The conjugate gradient method is widespread because its requirements are a small memory. Unconstrained optimization problem can be expressed as follows:

$\min_{x \in R^{n}} f (x)$ (1)

where $f : R^{n} \to R$ is a continuous and derivative function. The CG method generates frequent updates in this format.

$x_{k + 1} = x_{k} + α_{k} d_{k}, k = 1, 2, 3, 4$ (2)

where x_k is the current iteration point, $α_{k} > 0$ is the positive step size using the “exact line search” as shown by the following:

$α_{k} = \min_{α > 0} f (x_{k} + α_{k} d_{k})$ (3)

and d_k is the search direction, which we get as follows:

$d_{k} = {\begin{cases} - g_{k} for k = 0 \\ - g_{k} + β_{k} d_{k - 1} for k \geq 1 \end{cases}$ (4)

where k is integer and that g_k is the gradient of the function f(x) and that β_k is the coefficient of the conjugate gradient associated with the function f(x) at the point x_k.

Some of the known conjugation methods are:

$β_{k}^{F R} = \frac{g_{k}^{T} g_{k}}{{‖ g_{k - 1} ‖}^{2}}, β_{k}^{P R} = \frac{g_{k}^{T} (g_{k} - g_{k - 1})}{{‖ g_{k - 1} ‖}^{2}}$

$β_{k}^{H S} = \frac{g_{k}^{T} (g_{k} - g_{k - 1})}{{(g_{k} - g_{k - 1})}^{T} d_{k - 1}}, β_{k}^{D Y} = \frac{g_{k}^{T} g_{k}}{{(g_{k} - g_{k - 1})}^{T} d_{k - 1}}$

$β_{k}^{L s} = \frac{g_{k}^{T} (g_{k} - g_{k - 1})}{- g_{k - 1}^{T} d_{k - 1}}, β_{k}^{C D} = - \frac{g_{k}^{T} g_{k}}{d_{k - 1}^{T} g_{k - 1}}$

The coefficient gradient coefficient $β_{k} \in R$ is a numerical constant, which determines the difference in different CG methods when $g_{k - 1}, g_{k}$ denote the gradient of a function f(x) at points $x_{k - 1}, x_{k}$ , respectively.

The above methods are known as:

Fletcher and Reeves (FR) [1] , Polka and Ribiere (PR) [2] , Hestenes and Steifel (HS) [3] , Dai and Yuan (DY) [4] , Liu and Story (LS) [5] , Conjugate Descent (CD) by Fletcher [6] .

These aforementioned methods behave strictly convex quadratic functions in a behavior that is completely different from what they do in non-quadratic general functions. In any case, most of these methods examine the properties of universal approach in the field of conjugated gradient.

However, in recent years, there have been many attempts that have been directed towards building new formulas for CG methods with good numerical performance and achieving the characteristics of global convergence.

2. The New Conjugate Gradient and Its Algorithm

It is well known that the methods of numerical optimization are iterative methods and there is no specific method suitable for all types of problems. Each method has its advantages and new features as well as some of the characteristics that are not good and are efficient for some types of problems and not efficient for other types of problems.

The new coefficient of gradient is

$β_{k}^{M E} = \frac{g_{k}^{T} g_{k}}{{(g_{k} + d_{k - 1})}^{T} d_{k - 1}}$ (5)

New method algorithm

Step (1): Set $ϵ > 0, d_{0} = - g_{0}, k = 0$ and choose an initial value X₀

Step (2): Calculate $β^{M E}$ from (5)

Step (3): Calculate $d_{k} = - g_{k} + β_{k}^{M E} d_{k - 1}$

In the case if $‖ g_{k} ‖ = 0$ , stop

Step (4): Calculate $α_{k} = \min_{α > 0} f (x_{k} + α d_{k})$

Step (5): Calculate the new point with the following iterative formula:

$x_{k + 1} = x_{k} + α_{k} d_{k}$ (6)

Step (6): Test if it is

$f (x_{k + 1}) < f (xk)$

And also

$‖ g_{k} ‖ \leq ϵ$ Stop

Otherwise, go to step (1) with k = k + 1

The coefficient $β_{k}$ is chosen in such a way that $d_{k + 1}$ is G-conjugate to $d_{0}, d_{1}, d_{2}, \dots, d_{k}$ .

Lemma (1)

In the conjugate direction algorithm

$g_{k + 1}^{T} d_{i} = 0 for all k, 0 \leq k \leq n - 1 and 0 \leq i \leq k .$

Proposition: In the conjugate gradient algorithm the direction $d_{0}, d_{1}, \dots, d_{n - 1}$ are G-conjugate.

Proof: By using induction

We first show

$d_{0}^{T} G d_{1} = 0$

$\begin{matrix} d_{0}^{T} G d_{1} = d_{0}^{T} G (- g_{1} + β_{1} d_{0}) \\ = - d_{0}^{T} G g_{1} + β_{1} d_{0}^{T} G d_{0} \end{matrix}$

by ELS

$= \frac{β_{1} d_{0}^{T} (g_{1} - g_{0})}{α_{0}}$ when $α_{0}$ in (3)

$= \frac{- β_{1} d_{0}^{T} g_{0}}{α_{0}}$

$= - \frac{g_{1}^{T} g_{1}}{{(g_{1} + d_{0})}^{T} d_{0}} \cdot \frac{d_{0}^{T} g_{0}}{α_{0}}$ by Lemma (1) and ELS we get =zero

Now we assume that $d_{k - 1}^{T} G d_{k} = 0$ is correct. And we prove that $d_{k}^{T} G d_{k + 1} = 0$

$\begin{matrix} d_{k}^{T} G d_{k + 1} = d_{k}^{T} G (- g_{k + 1} + β_{k + 1} d_{k}) \\ = - d_{k}^{T} G g_{k + 1} + β_{k + 1} d_{k}^{T} G d_{k} \\ = \frac{β_{k + 1} d_{k}^{T} (g_{k + 1} - g_{k})}{α_{k}} when α_{k} in (3) \\ = - β_{k + 1} \frac{d_{k}^{T} g_{k}}{α_{k}} \\ = \frac{- g_{k + 1}^{T} g_{k + 1}}{{(g_{k + 1} + d_{k})}^{T} d_{k}} \cdot \frac{d_{k}^{T} g_{k}}{α_{k}} \end{matrix}$

By Lemma (1) and ELS we get $d_{k}^{T} G d_{k + 1} = 0$ .

The fulfillment of the descent condition $g_{k}^{T} d_{k} < 0$ .

The new method is shown as follows:

$g_{k}^{T} d_{k} = - g_{k}^{T} g_{k} + β^{M} g_{k}^{T} d_{k - 1}$

By ELS, we get

$g_{k}^{T} d_{k} = - g_{k}^{T} g_{k} = - {‖ g_{k} ‖}^{2} < 0$

So $g_{k}^{T} d_{k} < 0$ .

Thus the descent condition is held.

3. Global Convergence

An analysis of the overall convergence using the Exact Line search (ELS) demonstrates according to the following hypotheses:

1) In the neighborhood N of L the function f(x) is continuous, derivative, bound and defined at the level set $L = {x, f (x) \leq f (x_{0})}$ , when x₀ is an initial point.

2) The gradient is Lipschitz condition when there is a constant number L > 0 so that

$‖ g (x) - g (y) ‖ \leq L ‖ x - y ‖, for all x, y \in N$

According to these assumptions we have the following taken by Zoutendijk [7] .

Lemma 2: Assuming assumption 1) is correct, we consider the conjugate regression methods formulated in formula (3), where d_k is the descent search direction, $α_{k}$ fulfills the exact line search of the minimization rules, so the following condition defined by the Zoutendijk condition is held:

$\sum_{k = 0}^{\infty} \frac{{(g_{k}^{T} d_{k})}^{2}}{{‖ d_{k} ‖}^{2}} < \infty$ (7)

From Lemma (2), we can obtain a convergence theorem of the conjugate gradient CG method using

$β_{k}^{M E} = \frac{g_{k}^{T} g_{k}}{{(g_{k} + d_{k - 1})}^{T} d_{k - 1}}$ (8)

Theorem 1: Suppose that the assumption 1) is satisfied. Consider every CG method in the form (4), where $α_{k}$ is obtained by the exact minimization rules. Then either

$\lim_{k \to \infty} ‖ g_{k} ‖ = 0 or \sum_{k = 0}^{\infty} \frac{{(g_{k}^{T} d_{k})}^{2}}{{‖ d_{k} ‖}^{2}} < \infty$ (9)

Proof. By contradiction, if theorem 1 is not true, there exists a constant $c > 0$ such that

$‖ g_{k} ‖ \geq c$ (10)

$d_{k} = - g_{k} + β_{k} d_{k - 1}$

$d_{k} + g_{k} = β_{k} d_{k - 1}$

Squaring both sides

${‖ d_{k} ‖}^{2} + 2 g_{k}^{T} d_{k} + {‖ g_{k} ‖}^{2} = {| β_{k} |}^{2} {‖ d_{k - 1} ‖}^{2}$

${‖ d_{k} ‖}^{2} = {| β_{k} |}^{2} {‖ d_{k - 1} ‖}^{2} - 2 g_{k}^{T} d_{k} - {‖ g_{k} ‖}^{2}$ (11)

But $g_{k}^{T} d_{k} = - c {‖ g_{k} ‖}^{2}$ .

Dividing both sides of (11) by ${(g_{k}^{T} d_{k})}^{2}$ given

$\begin{matrix} \frac{{‖ d_{k} ‖}^{2}}{{(g_{k}^{T} d_{k})}^{2}} = \frac{{| β_{k} |}^{2} {‖ d_{k - 1} ‖}^{2}}{{(g_{k}^{T} d_{k})}^{2}} - \frac{2 g_{k}^{T} d_{k}}{{(g_{k}^{T} d_{k})}^{2}} - \frac{{‖ g_{k} ‖}^{2}}{{(g_{k}^{T} d_{k})}^{2}} \leq \frac{{| β_{k} |}^{2} {‖ g_{k - 1} ‖}^{2}}{{(g_{k}^{T} d_{k})}^{2}} + \frac{1}{{‖ g_{k} ‖}^{2}} \\ \leq {\frac{g_{k}^{T} g_{k}}{{(g_{k} + d_{k - 1})}^{T} d_{k - 1}}}^{2} \frac{{‖ d_{k - 1} ‖}^{2}}{{(g_{k}^{T} d_{k})}^{2}} + \frac{1}{{‖ g_{k} ‖}^{2}} \\ \leq {\frac{{‖ g_{k} ‖}^{2}}{g_{k}^{T} d_{k - 1} + {‖ d_{k - 1} ‖}^{2}}}^{2} \frac{{‖ d_{k - 1} ‖}^{2}}{{({‖ g_{k} ‖}^{2})}^{2}} + \frac{1}{{‖ g_{k} ‖}^{2}} \\ \leq \frac{{‖ g_{k} ‖}^{4}}{{‖ d_{k - 1} ‖}^{4}} \frac{{‖ d_{k - 1} ‖}^{2}}{{‖ g_{k} ‖}^{4}} + \frac{1}{{‖ g_{k} ‖}^{2}} \leq \frac{1}{{‖ d_{k - 1} ‖}^{2}} + \frac{1}{{‖ g_{k} ‖}^{2}} \end{matrix}$ (12)

But note that $\frac{1}{{‖ d_{0} ‖}^{2}} = \frac{1}{{‖ g_{0} ‖}^{2}}$ , then from (12) we get

$\frac{{‖ g_{k} ‖}^{2}}{{(g_{k}^{T} d_{k})}^{2}} \leq \frac{1}{{‖ g_{k - 1} ‖}^{2}} + \frac{1}{{‖ g_{k} ‖}^{2}} \leq \sum_{i = 0}^{k} \frac{1}{{‖ g_{i} ‖}^{2}}$

$∴ \frac{{(g_{k}^{T} d_{k})}^{2}}{{‖ d_{k} ‖}^{2}} \geq \frac{c^{2}}{k}$ (13)

From (10) and (13) we get

$\sum_{k = 0}^{\infty} \frac{{(g_{k}^{T} d_{k})}^{2}}{{‖ d_{k} ‖}^{2}} = \infty$

This contradicts the Zoutendijk condition in lemma (2) which completes the proof. □

4. Numerical Results

In this section we consider the numerical solution for this research. The conjugate gradient method of ME, Dai and Yuan, and Fletcher and Reeves were tested. Some test problems considered in Andrei [8] . We are selected based on the number of iteration and number of function evaluation (Table 1 and Table 2).

Table 1. Comparison of the algorithms for n = 100.

Table 2. Comparison of the algorithms for n = 1000.

5. Conclusion

A new kind of parameter in the conjugate gradient method for large scale unconstrained optimization problems is proposed. Numerical results are detected that the new method is superior in practice with competitive DY and FR methods.

A List of Test Function

F₁ Extended Trigonometric Function.

F₂ Diagonal 2 function.

F₃ Extended Tridiagonal −1 function.

F₄ Extended Three Exponential Terms.

F₅ Generalized PSC1 function.

F₆ Extended PSC1 Function.

F₇ Extended Block Diagonal BD1 function.

F₈ Extended Quadratic Penalty QP1 function.

F₉ Extended Tridiagonal −2 function.

F₁₀ Nondquar (CUTE).

F₁₁ DIXMAANC (CUTE).

F₁₂ DIXMAANE (CUTE).

F₁₃ EDENSCH function (CUTE).

F₁₄ STAIRCASE S1/F₅₂ VARDIM function (CUTE).

F₁₅ ENGVAL1 (CUTE).

F₁₆ DENSCHNA (CUTE).

F₁₇ DENSCHNB (CUTE).

F₁₈ DIGGSB1 (CUTE).

F₁₉ Diagonal 7.

F₂₀ SINCOS.

F₂₁ HIMMELBG (CUTE).

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

References

[1]	Fletcher, R. and Reeves, C.M. (1964) Function Minimization by Conjugate Gradients. The Computer Journal, 7, 149-154. https://doi.org/10.1093/comjnl/7.2.149
[2]	Polak, E. and Ribiere, G. (1969) Note sur la convergence de méthodes de directions conjuguées. ESAIM: Mathematical Modelling and Numerical Analysis-Modélisation Mathématique et Analyse Numérique, 3, 35-43. https://doi.org/10.1051/m2an/196903R100351
[3]	Hestenes, M.R. and Stiefel, E. (1952) Methods of Conjugate Gradients for Solving Linear Systems. Journal of Research National Bureau Standards, 49, 409-436. https://doi.org/10.6028/jres.049.044
[4]	Dai, Y.-H. and Yuan, Y. (1999) A Nonlinear Conjugate Gradient Method with a Strong Global Convergence Property. SIAM Journal on Optimization, 10, 177-182. https://doi.org/10.1137/S1052623497318992
[5]	Liu, Y. and Storey, C. (1991) Efficient Generalized Conjugate Gradient Algorithms, Part 1: Theory. Journal of Optimization Theory and Applications, 69, 129-137. https://doi.org/10.1007/BF00940464
[6]	Fletcher, R. (1987) Practical Methods of Optimization, Vol. 1, Unconstrained Optimization. Wiley, New York.
[7]	Sun, J. and Zhang, J. (2001) Global Convergence of Conjugate Gradient Methods without Line Search. Annals of Operations Research, 103, 161-173. https://doi.org/10.1023/A:1012903105391
[8]	Andrei, N. (2008) An Unconstrained Optimization Test Functions Collection. Advanced Modeling and Optimization, 10, 147-161.

Journals Menu

Follow SCIRP

	+1 323-425-8868
	customer@scirp.org
	+86 18163351462(WhatsApp)
	1655362766

	Paper Publishing WeChat

Journals Menu

Home

About SCIRP

Service

Policies