An Alternating Direction Nonmonotone Approximate Newton Algorithm for Inverse Problems ()
1. Introduction
We consider inverse problems that can be expressed in the form
, (1)
where is convex, and. The emphasis of our work is on problems where A and B have a specific structure. It can be applied to many applications, especially in machine learning [1] [2] , image reconstruction [3] [4] or model reduction [5] . We assume that the functions in (1) are strictly convex, so both problems has an unique solution.
In Hong-Chao Zhang’s paper [6] , he uses the Alternating Direction Approximate Newton method (ADAN) based on Alternating Direction Method (ADMM) which originaly in [7] to solve (1). He employs the BB approximation to increase the iterations. In many applications, the optimization problems in ADMM are either easily resolvable, since ADMM iterations can be performed at a low computational cost. Besides, combine different Newton-based methods with ADMM have become a trend, see [6] [8] [9] , since those methods may achieve the high convergent speed.
In alternating direction nonmonotone approximate Newton (ADNAN) algorithm developed in this paper, we adopt the nonmonotone line search to replace the traditional Armijo line search in ADAN, because the nonmonotone schemes can improve the likelihood of finding a global optimum and improve convergence speed in cases where a monotone scheme is forced to creep along the bottom of a narrow curved valley in [10] .
In the latter context, the first subproblem is to solve the unconstrained minimization problems with Alternating Direction Nonmonotone Approximate Newton algorithm. The purpose is to accelerate the speed of convergence, and then to project or the scale the unconstrained minimizer into the box, The second subproblem is a bound-constrained optimization problem.
The rest of the paper is organized as follows. In Section 2, we give a review of the alternating direction approximate Newton method. In Section 3, we introduce the new algorithm ADNAN. In Section 4, we introduce the gradient-based algorithm of the second subproblem. A preliminary convergence analysis for ADNAN and gradient- based algorithm (GRAD) is given in Section 5. Numerical results presented in Section 6 explain the effectiveness of ADNAN and GRAD.
2. Review of Alternating Direction Approximate Newton Algorithm
We introduce a new variable w to obtain the split formulation of (1):
(2)
The augmented Lagrangian function associated with (2) is
(3)
where is the penalty parameter, is a Lagrangian multiplier associated with the constraint. In ADMM, each iteration minimizes over x holding w fixed, minimizes over w holding x fixed, and updates an estimate for the multiplier b. More specifically, if is the current approximation to the multiplier, then ADMM [10] [11] applied to the split formulation (3) is given by the iteration:
(4)
(5)
(6)
And (4) can be written as follows:
(7)
For any Hermitian matrix, we define, if Q is a positive definite matrix, then is a norm. The proximal version of (4) is
(8)
In ADAN, the choice will cancel the term in this iteration while is retained. We replace by, where is a Barzilai-Borwein (BB) [8] [12] approximation to. We can observe the fast convergence of BB approximation in the experiments of Raydan and Svaiter [13] . Moreover, is strictly smaller than the largest eigenvalue of, and is indefinite, so the new convergence analysis is needed. As a result, the updated version for x given in (4) can be expressed as follows:
(9)
(10)
Here, is the generalized inverse, is the Hessian of the objective, and is the gradient of at,. The formula for in (2) is exactly the same formula that we would have gotten if we performed a single iteration of Newton’s method on the equation with starting guess. We employ the BB approximation, [14] where
and is a positive lower bound for. Hence, the Hessian is approximated by. Since a Fourier transform can be inverted in flops. The inversion of can be accomplished relatively quickly. After replacing by, the iteration becomes where
Note that by substituting in (2) and solving for the minimizer, we would get exactly the same formula for the minimizer as that given in (5). When the search direction is determined suitable step size along this direction should be found to determine the next iterative point such as.
The inner product between and the objective gradient at is
.
It follows that is a descent direction. Since is a quadratic, the Taylor expansion of around is as follows:
where
In this section, we adopt a nonmonotone line search method [9] . The step size is chosen in an ordinary Armijo line search which could not admit the more faster speed in unconstrained problems [12] . In contrast, nonmonotone schemes can not only improve the likelihood of finding a global optimum but also improve convergence speed.
Initialization: Choose starting guess, and parameters, , and. Set, , and.
Convergence test: If sufficiently small, then stop.
line search update: set,where satisfies either the (non- monotone) Wolfe conditions:
(11)
(12)
or the (nonmonotone) Armijo conditions:, where is the trial step, and is the largest integer such that (11) holds and.
Cost update: Choose, and update
(13)
Observe that is a convex combination of Ck and. Since it follows that Ck is a convex combination of the function values. The choice of controls the degree of nonmonotonicity. If for each k, then the line search is the usual monotone Wolfe or Armijo line search. If for each k,
then where, is the average function value. The
scheme with was proposed by Yu-Hong Dai [15] . As approaches 0, the line search closely approximates the usual monotone line search, and as approaches 1, the scheme becomes more nonmonotone, treating all the previous function values with equal weight when we compute the average cost value.
Lemma 2.1 If for each k, then for the iterates generated by the nonmonotone line search algorithm, we have for each k. Moreover, if and is bounded from below, then there exists satisfying either the Wolfe or Armijo conditions of the line search update.
3. Alternating Direction Nonmonotone Approximate Newton Algorithm
In Algorithm 1, we could get the x at each iteration which can be combined with Algorithm 2. Then, we use the Algorithm 2 to solve the first subproblem in this paper which is an unconstrained minimization problem with ADNAN, then to project or the scale the unconstrained minimizer into the box
(14)
the iteration is as follows:
(15)
(16)
(17)
Later we give the existence and uniqueness result for (1).
The solution to (5) has the closed-form means
with P being the projection map onto the box
Algorithm 2. Alternating Direction Nonmonotone Approximate Newton algorithm.
Parameter:, Initialize
Step 1: If sufficiently small, then set, and branch to Step 4.
,
Step 2: If then
Step 3: If accomplish the Wolfe conditions, then
Step 4: Update x which generated from Algorithm 1.
Step 5:
Step 6:
Step 7: If a stopping criterion is satisfied, terminate the algorithm, Otherwise k = k + 1 and go to Step 1.
Lemma 3.1: we show some criteria that are only satisfied a finite number of times, so converge to positive limits. An upper bound for is the following:
Uniformly in k, we have where is the value of at the start of iteration k and is the starting in ADAN.
Lemma 3.2: If, then minimizes
4. Algorithm 3: Gradient-Based Algorithm (GRAD)
Next, we consider the second subproblem which is about bound-constrained optimization problem as
(18)
And the iteration is similar with (4) (5) (6) as follows:
(19)
(20)
(21)
Compute the solution from (19),
Compute the solution from (20),
Set
5. Convergence Analysis
In this section, we show the convergence of proposed algorithms. Obviously, the proofs of the two algorithms are almost the same, and we only prove the convergence of algorithm 2.
Lemma 3.1: Let L be the function in (3). The vector solves (2) if and only if there exists such that solves
Lemma 3.2: Let L be the function in (3), be a saddle-point of L, Then
the sequence satisfies.
Theorem 3.1: Let be the sequence of iterates produced by the algorithm 2.
then, and is the optimal point for problem (14)
Proof From Lemma 3.1, 3.2, we obtain that
(22)
Since we have a unique minimizer in, so we have, Then, (22) gives which completes the proof.
6. Numerical Experiments
6.1. Parameter Settings
In Algorithm 2, the parameters, the penalty in the augmented Lagrangian (3), are common to these two algorithms, ADAN and ADNAN. Besides has a vital impact on convergence speed. We choose based on the results from W. Hager [6] . The choice is a compromise between speed and stability, is large enough to ensure invertibility.
The search directions were generated by the L-BFGS method developed by No-cedal in [16] and Liu and Nocedal in [1] . We choose the step size to satisfy the Wolfe conditions with m = 0.09 and σ = 0.9. In addition we employ a fixed value which could get the reasonable results. To obtain a good estimate for the optimal objective in (1), we ran them for 100,000 iterations. The optimal objective values for the three data sets were
In addition, we timed how long it took ADNAN to reduce the objective error to within 1% of the optimal objective value. The algorithms are coded in MATLAB, version 2011b, and run on a Dell version 4510U with a 2.8 GHz Intel i7 processor.
In Algorithm 3, a 256-by-256 gray-scale image was considered, which is compared to the experiment by J. Zhang [8] . The dimensions of the inverse problems are m = n = 65536 and the constraints are and. The experiments on image deblurring problems show that GRAD algorithm is also effective in terms of quality of the image resolution.
6.2. Experiments Results
This section compares the performance of the ADNAN to ADAN. The main difference between the ADNAN algorithm and the ADAN algorithm is the computation of. In ADAN where is the step size. In ADNAN, x generated from Algorithm 1, if the convergence condition in ADAN is satisfied, then the update could be performed. Here is the same choice for them. Hence, there seems to be a significant benefit from using a value for smaller than the largest eigenvalue of.
The initial guess for, and was zero for two algorithms. Figures 1-3 show the objective values and objective error as a function of CPU time. Moreover, we give the comparison of objective values and objective error versus CPU time/s for different conditions. It is observed that ADNAN is slightly stable than ADAN although ADNAN and ADAN are competitive. The ADNAN not only could get more smaller objective error but also get more fast convergence speed (see Figure 3). In addition, ADNAN objective value could get more smaller after a few iterations than ADAN. As a whole, the effect of ADNAN is superior to ADAN.
On the other hand, the experiment results about Algorithm 3 are as follows:
Original image blurry image deblurred image
Original image blurry image deblurred image
Original image blurry image deblurred image
Original image blurry image deblurred image
7. Conclusions
According to the Figures 1-3, we can conclude that the nonmonotone line search could accelerate the convergence speed, furthermore ADNAN could get the objective values more stable and fast during the iterations when compared to ADAN.
On the other hand, the validness of GRAD is verified. Experiments results on image deblurring problems in Figures 4-7 show that difference constraints on x can also get effective deblurred images.
Acknowledgements
This work is supported by Innovation Program of Shanghai Municipal Education Commission (No. 14YZ094).