A Geometric Gaussian Kaczmarz Method for Large Scaled Consistent Linear Equations

Li Wen; Feng Yin; Yimou Liao; Guangxin Huang

doi:10.4236/jamp.2021.911189

Journal of Applied Mathematics and Physics > Vol.9 No.11, November 2021

A Geometric Gaussian Kaczmarz Method for Large Scaled Consistent Linear Equations

Li Wen¹, Feng Yin^1,2*, Yimou Liao¹, Guangxin Huang^2,3
¹College of Mathematics and Statistics, Sichuan University of Science and Engineering, Zigong, China.
²Sichuan Province University Key Laboratory of Bridge Non-Destruction Detecting and Engineering Computing, Zigong, China.
³College of Mathematics and Physics, Chengdu University of Technology, Chengdu, China.
DOI: 10.4236/jamp.2021.911189 PDF HTML XML 356 Downloads 1,347 Views Citations

Abstract

This paper presents a geometric Gaussian Kaczmarz (GGK) method for solving the large-scaled consistent linear systems of equation. The GGK method improves the geometric probability randomized Kaczmarz method in [1] by introducing a new block set strategy and the iteration process. The GGK is proved to be of linear convergence. Several numerical examples show the efficiency and effectiveness of the GGK method.

Keywords

Gaussian Kaczmarz, Convergence Rate, Linear Systems, Consistent

Share and Cite:

Wen, L. , Yin, F. , Liao, Y. and Huang, G. (2021) A Geometric Gaussian Kaczmarz Method for Large Scaled Consistent Linear Equations. Journal of Applied Mathematics and Physics, 9, 2954-2965. doi: 10.4236/jamp.2021.911189.

1. Introduction

We are concerned with the approximation solution of large-scaled linear systems of equation of the form

$A x = b,$ (1)

where $A \in ℝ^{m \times n}$ is a real matrix, $b \in ℝ^{m}$ is a real vector and $x \in ℝ^{n}$ is an unknown vector to be determined.

The Kaczmarz [2] method is a good candidate for solving large problems (1) due to its simplicity and good performance, which is applied in numerous fields, such as image reconstruction [3] [4] and digital signal processing [5]. The Kaczmarz method can be formulated as

$x_{k + 1} = x_{k} + \frac{b^{(i)} - A^{(i)} x_{k}}{{‖ A^{(i)} ‖}_{2}^{2}} {(A^{(i)})}^{T}$ (2)

where $i = (k \mod m) + 1$ , i.e. all m equations in the linear systems (1) are swept through after m iterations.

There are many extended Kaczmarz methods are derived in recent years. Strohmer and Vershynin in [6] proposed a randomized Kaczmarz (RK) method with the expected exponential rate of convergence, which is also known as “linear convergence”. In 2018, Bai and Wu [7] proposed a greedy randomized Kaczmarz (GRK) method. GRK introduces an effective probability criterion to grasp larger entries of the residual vector at each iteration and was proved to be of the linear convergence rate. The index set is determined by

$u_{k} = {i | {| b^{(i)} - A^{(i)} x_{k} |}^{2} \geq ε_{k} {‖ b - A x_{k} ‖}_{2}^{2} {‖ A^{(i)} ‖}_{2}^{2}},$ (3)

where

$ε_{k} = \frac{1}{2} (\frac{1}{{‖ b - A x_{k} ‖}_{2}^{2}} \max_{1 \leq i \leq m} {\frac{{| b^{(i)} - A^{(i)} x_{k} |}^{2}}{{‖ A^{(i)} ‖}_{2}^{2}}} + \frac{1}{{‖ A ‖}_{F}^{2}}) .$ (4)

In order to improve the influence on the convergence rate by diagonal scalings of the coefficient matrix in (1). Yang in [1] presented a geometric probability randomized Kaczmarz algorithm (GPRK) to determine a subset $J_{k}$ of $[m] = {1,2, \dots, m}$ by a given rule such that the magnitude of the value of ${(d_{i}^{k})}^{2}$ should be larger than a prescribed threshold if the row index $i \in J_{k}$ , where

$d_{i}^{k} = ‖ \frac{b^{(i)} - A^{(i)} x_{k}}{{‖ A^{(i)} ‖}_{2}^{2}} {(A^{(i)})}^{T} ‖ .$ (5)

Generally, the block Kaczmarz methods [8] [9] select some rows from the coefficient matrix A to construct the block matrix and compute the Moore-Penrose pseudoinverse of the block at each iteration. However, the cost of computing the Moore-Penrose pseudoinverse of a matrix is so expensive, which impacts gravely the CPU time of the algorithm.

Gower and Richtrik proposed a Gaussian Kaczmarz (GK) method [10] whose iteration process is defined by

$x_{k + 1} = x_{k} + \frac{ζ_{k}^{T} (b - A x_{k})}{{‖ A^{T} ζ_{k} ‖}_{2}^{2}} A^{T} ζ_{k},$ (6)

where $ζ_{k}$ is a Gaussian vector with mean $0 \in ℝ^{m}$ and the covariance matrix $I \in ℝ^{m \times m}$ , i.e., $ζ_{k} ~ N (0, I)$ . Here I denotes the identity matrix. The expected linear convergence rate was analyzed in the case that A is of full column rank. The idea of the GK method is also used in [11] to avoid computing the pseudoinverses of submatrices. We will adapt this iteration process in our method.

In this paper, we improve the GPRK method in [1] and present a geometric Gaussian Kaczmarz (GGK) method. The GGK introduces a new block strategy and uses the iteration process of the GK method in (6). The block set strategy of GGK is more efficient than that in [1] because it can grasp as many as possible rows of the system matrix to participate in projection. This is illustrated in Section 3.

The rest of this paper is arranged as follows. A geometric Gaussian Kaczmarz algorithm is presented in Section 2. Its convergence is also proved. Section 3 shows several numerical examples for the proposed method and Section 4 draws some conclusions.

2. A Geometric Gaussian Kaczmarz Algorithm

This section describes a geometric Gaussian Kaczmarz (GGK) algorithm to compute the solution of (1). Algorithm 1 summarizes the GGK algorithm. Steps 2, 3 and 4 determine the block control sequence ${τ_{k}}_{k \geq 0}$ , which is simpler than and different from that in [1], which determines the block control sequence by probability

$P r (r o w = i k) = \frac{\max_{1 \leq i \leq m} {(d_{i}^{k})}^{2}}{\sum_{i = 1}^{m} {(d_{i}^{k})}^{2}} .$

Steps 5 and 6 give the iteration process of the GGK method.

The following results show the convergence of Algorithm 1.

Theorem 1. Assume the linear system (1) is consistent, and then the iterative sequence ${x_{k}}_{k \geq 0}$ generated by Algorithm 1 converges to the least-norm solution $x^{*} = A^{†} b$ of linear systems (1). Moreover, the solution error of linear systems (1) satisfies

${‖ x_{k + 1} - x^{*} ‖}_{2}^{2} \leq (1 - η \frac{{‖ A_{τ_{k}} ‖}_{F}^{2}}{λ_{\max} (A_{τ_{k}} A_{τ_{k}}^{T})} \frac{λ_{\min} (A^{T} A)}{{‖ A ‖}_{F}^{2}}) {‖ x_{k} - x^{*} ‖}_{2}^{2} .$ (7)

Proof. According to Algorithm 1, we have

$\begin{matrix} x_{k + 1} - x^{*} = x_{k} - x^{*} + \frac{ζ_{k}^{T} (b - A x_{k})}{{‖ A^{T} ζ_{k} ‖}_{2}^{2}} A^{T} ζ_{k} \\ = x_{k} - x^{*} - \frac{{(A^{T} ζ_{k})}^{T} (x_{k} - x^{*})}{{‖ A^{T} ζ_{k} ‖}_{2}^{2}} A^{T} ζ_{k} \\ = (I - \frac{{(A^{T} ζ_{k})}^{T} A^{T} ζ_{k}}{{‖ A^{T} ζ_{k} ‖}_{2}^{2}}) (x_{k} - x^{*}) . \end{matrix}$

Algorithm 1. A geometric Gaussian Kaczmarz algorithm (GGK).

Denote the projector $P_{k} = \frac{{(A^{T} ζ_{k})}^{T} A^{T} ζ_{k}}{{‖ A^{T} ζ_{k} ‖}_{2}^{2}}$ , then $P_{k}$ is orthogonal because $P_{k}^{T} = P_{k}$ and $P_{k}^{2} = P_{k}$ . Thus we have

$\begin{matrix} {‖ x_{k + 1} - x^{*} ‖}_{2}^{2} = {‖ (I - \frac{{(A^{T} ζ_{k})}^{T} A^{T} ζ_{k}}{{‖ A^{T} ζ_{k} ‖}_{2}^{2}}) (x_{k} - x^{*}) ‖}_{2}^{2} \\ = {‖ x_{k} - x^{*} ‖}_{2}^{2} - {‖ \frac{ζ_{k}^{T} (b - A x_{k})}{{‖ A^{T} ζ_{k} ‖}_{2}^{2}} A^{T} ζ_{k} ‖}_{2}^{2} \\ = {‖ x_{k} - x^{*} ‖}_{2}^{2} - \frac{{| ζ_{k}^{T} (b - A x_{k}) |}^{2}}{{‖ A^{T} ζ_{k} ‖}_{2}^{2}} \\ = {‖ x_{k} - x^{*} ‖}_{2}^{2} - \frac{{‖ ζ_{k} ‖}_{2}^{4}}{{‖ A^{T} ζ_{k} ‖}_{2}^{2}} . \end{matrix}$

The last equality holds because

$ζ_{k}^{T} (b - A x_{k}) = \sum_{i \in τ_{k}} (b^{(i)} - A^{(i)} x_{k}) e_{i}^{T} (b - A x_{k}) = \sum_{i \in τ_{k}} {| b^{(i)} - A^{(i)} x_{k} |}^{2} = {‖ ζ_{k} ‖}_{2}^{2} .$

Let $E_{k} \in ℝ^{m \times | τ_{k} |}$ denote the matrix whose columns orderly are constituted of all the vector $e_{i} \in ℝ^{m}$ with $i \in τ_{k}$ , then $A_{τ_{k}} = E_{k}^{T} A$ . Denoted by ${\hat{ζ}}_{k} = E_{k}^{T} ζ_{k}$ , we have

${‖ A^{T} ζ_{k} ‖}_{2}^{2} = ζ_{k}^{T} A A^{T} ζ_{k} = {\hat{ζ}}_{k}^{T} E_{k}^{T} A A^{T} E_{k} {\hat{ζ}}_{k} = {\hat{ζ}}_{k}^{T} A_{τ_{k}} A_{τ_{k}}^{T} {\hat{ζ}}_{k} = {‖ A_{τ_{k}}^{T} {\hat{ζ}}_{k} ‖}_{2}^{2},$

moreover,

${‖ A^{T} ζ_{k} ‖}_{2}^{2} \leq λ_{\max} (A_{τ_{k}} A_{τ_{k}}^{T}) {‖ {\hat{ζ}}_{k} ‖}_{2}^{2} = λ_{\max} (A_{τ_{k}} A_{τ_{k}}^{T}) {‖ ζ_{k} ‖}_{2}^{2} .$

It then holds that

$\begin{matrix} {‖ x_{k + 1} - x^{*} ‖}_{2}^{2} \leq {‖ x_{k} - x^{*} ‖}_{2}^{2} - \frac{{‖ ζ_{k} ‖}_{2}^{2}}{λ_{\max} (A_{τ_{k}} A_{τ_{k}}^{T})} \\ = {‖ x_{k} - x^{*} ‖}_{2}^{2} - \frac{\sum_{i \in τ_{k}} {| b^{(i)} - A^{(i)} x_{k} |}^{2}}{λ_{\max} (A_{τ_{k}} A_{τ_{k}}^{T})} \\ = {‖ x_{k} - x^{*} ‖}_{2}^{2} - \frac{\sum_{i \in τ_{k}} {(d_{i}^{k})}^{2} {‖ A^{(i)} ‖}_{2}^{2}}{λ_{\max} (A_{τ_{k}} A_{τ_{k}}^{T})} \\ \leq {‖ x_{k} - x^{*} ‖}_{2}^{2} - \frac{η \max_{i \in τ_{k}} {(d_{i}^{k})}^{2} \sum_{i \in τ_{k}} {‖ A^{(i)} ‖}_{2}^{2}}{λ_{\max} (A_{τ_{k}} A_{τ_{k}}^{T})} \\ = {‖ x_{k} - x^{*} ‖}_{2}^{2} - \frac{η \max_{i \in τ_{k}} {(d_{i}^{k})}^{2} {‖ A_{τ_{k}} ‖}_{F}^{2}}{λ_{\max} (A_{τ_{k}} A_{τ_{k}}^{T})} . \end{matrix}$

For each $k \geq 0$ , since

$\max_{i \in τ_{k}} {(d_{i}^{k})}^{2} = \max_{i \in τ_{k}} \frac{{| b^{(i)} - A^{(i)} x_{k} |}^{2}}{{‖ A^{(i)} ‖}_{2}^{2}} \geq \sum_{i = 1}^{m} \frac{{‖ A^{(i)} ‖}_{2}^{2}}{{‖ A ‖}_{F}^{2}} \frac{{| b^{(i)} - A^{(i)} x_{k} |}^{2}}{{‖ A^{(i)} ‖}_{2}^{2}} = \frac{{‖ b - A x_{k} ‖}_{2}^{2}}{{‖ A ‖}_{F}^{2}},$

we have

$\begin{matrix} {‖ x_{k + 1} - x^{*} ‖}_{2}^{2} \leq {‖ x_{k} - x^{*} ‖}_{2}^{2} - \frac{η \frac{{‖ b - A x_{k} ‖}_{2}^{2}}{{‖ A ‖}_{F}^{2}} {‖ A_{τ_{k}} ‖}_{F}^{2}}{λ_{\max} (A_{τ_{k}} A_{τ_{k}}^{T})} \\ \leq {‖ x_{k} - x^{*} ‖}_{2}^{2} - η \frac{λ_{\min} (A^{T} A)}{λ_{\max} (A_{τ_{k}} A_{τ_{k}}^{T})} \frac{{‖ A_{τ_{k}} ‖}_{F}^{2}}{{‖ A ‖}_{F}^{2}} {‖ x_{k} - x^{*} ‖}_{2}^{2} \\ = (1 - η \frac{λ_{\min} (A^{T} A)}{λ_{\max} (A_{τ_{k}} A_{τ_{k}}^{T})} \frac{{‖ A_{τ_{k}} ‖}_{F}^{2}}{{‖ A ‖}_{F}^{2}}) {‖ x_{k} - x^{*} ‖}_{2}^{2} . \end{matrix}$

This completes the proof. □

We remark that

$0 \leq 1 - η \frac{λ_{\min} (A^{T} A)}{λ_{\max} (A_{τ_{k}} A_{τ_{k}}^{T})} \frac{{‖ A_{τ_{k}} ‖}_{F}^{2}}{{‖ A ‖}_{F}^{2}} \leq 1,$

which means that Algorithm 1 is convergent. In fact,

$\frac{{‖ A_{τ_{k}} ‖}_{F}^{2}}{λ_{\max} (A_{τ_{k}} A_{τ_{k}}^{T})} \geq 1,$

and

$0 < \frac{λ_{\min} (A^{T} A)}{{‖ A ‖}_{F}^{2}} \leq 1,$

it follows that

$1 - η \frac{{‖ A_{τ_{k}} ‖}_{F}^{2}}{λ_{\max} (A_{τ_{k}} A_{τ_{k}}^{T})} \frac{λ_{\min} (A^{T} A)}{{‖ A ‖}_{F}^{2}} \leq 1 - η \frac{λ_{\min} (A^{T} A)}{{‖ A ‖}_{F}^{2}} < 1.$

3. Numerical Experiments

In this section, we use Algorithm 1 for solving different types of consistent linear systems (1) and compare it with GPRK in [1]. All experiments are carried out using MATLAB (version R2019b) on a personal computer with 2.0 GHz central processing unit (Inter(R) Core(TM) i7-8565U CPU) and 8 GB memory, and 64 bit Windows operation system.

The coefficient matrix $A \in ℝ^{m \times n}$ is either generated by the MATLAB function $r a n d n (m, n)$ or taken from the University of Florida sparse matrix collection [12]. To ensure each linear system (1) with a selected matrix A is consistent, we let $b \in ℝ^{m}$ in (1) be generated by $A x^{*}$ , where the exact solution $x^{*} \in ℝ^{n}$ is produced randomly by the MATLAB function randn and is to be determined. The efficiency of each method is evaluated by the number of iterations (IT) and the CPU time (CPU). It can be measured by the speed-up of GGK against GPRK (SU) is defined by

$S U = \frac{C P U o f G P R K}{C P U o f G G K} .$

The effectiveness of both methods is measured by the relative residual (RR) defined by

$R R = \frac{{‖ b - A x_{k} ‖}_{2}}{{‖ b ‖}_{2}} .$

We set the initial solution $x^{0}$ be 0 in all experiments, and the iteration does not terminate until $R R < 10^{- 6}$ . The numerical results of each method shown in this section are arithmetical average quantities with respect to 50 repeated trials. We set $η = 0.3$ in GGK for each example.

3.1. Experiments with Sparse Matrix

This subsection considers the linear systems (1) with sparse matrices. These matrices include some flat ones in Table 1 and the tall ones (their transposes) in Table 2 from the University of Florida sparse matrix collection [12].

Table 1. Comparison of IT, CPU and SU of the GGK method with that of the GPRK method for different flat spare systems.

Table 2. Comparison of IT, CPU and SU of the GGK method with that of the GPRK method for different thin spare systems.

Table 1 and Table 2 list the average number of iterations (IT), computing time (CPU), and the related speed-up (SU) of GGK against GPRK. Figure 1 and Figure 2 plot RR versus IT (left) and CPU (right) of different methods for solving (1) with the matrix bibd_16_8 and crew1^T, respectively. From Table 1 and Table 2 we see that the GGK method needs smaller IT and CPU than the GPRK method does in all cases. We can see whether system 1 is overdetermined or underdetermined. When the matrix is flat, the value of SU locates in the interval (2.70, 628.92), while for the case of the thin matrix, the value of SU ranges from 2.49 to 641.08. It is observed from Figure 1 and Figure 2 that the GGK method converges faster than the GPRK method.

3.2. Experiments with Dense Matrix

In this subsection, the test matrices are dense normally distributed random matrices including thin and flat matrices. The sizes of rows and columns of the selected matrices vary from 2000 to 30,000.

Table 3 lists the test results of IT, CPU and SU for flat dense matrix with different size. Table 4 displays IT, CPU and SU for a different thin dense matrix with different size. Figure 3 and Figure 4 plot RR versus IT (left) and CPU (right) of GGK and GPRK methods for solving (1) with the matrix $A = r a n d n (2200,30000)$ and $A = r a n d n (30000,2000)$ , respectively. From Table 1 and Table 2, we can

Figure 1. RR versus IT (left) and CPU (right) of the GGK method compared with the GPRK method for solving (1) with the matrix bibd_16_8.

Figure 2. RR versus IT (left) and CPU (right) of the GGK method compared with the GPRK method for solving (1) with the matrix crew1^T.

Table 3. Comparison of IT, CPU and SU of the GGK method with that of the GPRK method for different flat dense systems.

Table 4. Comparison of IT, CPU and SU of the GGK method with that of the GPRK method for different thin dense systems.

see in all cases, GGK needs less IT and CPU time than GPRK does, and the speed-up of GGK against GPRK ranges from tens of times to hundreds of times, i.e., the speed-up of GGK against GPRK varies from 73.30 to 227.15 in the case of flat and from 11.39 to 425.19 in the case of thin. Similar results to Figure 1

Figure 3. RR versus IT (left) and CPU (right) of the GGK method compared with the GPRK method for solving (1) with the matrix $A = r a n d n (2200,30000)$ .

Figure 4. RR versus IT (left) and CPU (right) of the GGK method compared with the GPRK method for solving (1) with the matrix $A = r a n d n (30000,2000)$ .

and Figure 2 are drawn from Figure 3 and Figure 4 that the GGK method converges much faster than the GPRK method.

4. Conclusion

We develop a geometric Gauss-Kaczmarz (GGK) algorithm for solving large-scale consistent linear systems and the convergence is proved for this algorithm. Numerical experiments show that the GGK algorithm has better efficiency and effectiveness than the GPRK algorithm. In our future work, we will focus on block Kaczmarz methods to solve ill-posed problems.

Acknowledgements

The authors thank the reviewers for providing some helpful comments. Research by G.H. was supported in part by Application Fundamentals Foundation of STD of Sichuan (grant 2020YJ0366) and the Opening Project of Sichuan Province University Key Laboratory of Bridge Non-destructionDetecting and Engineering Computing (grant 2020QZJ03), and research by F.Y. was partially supported by NNSF (grant 11501392) and SUSE (grant 2019RC09, 2020RC25), and research by Y.M. Liao was partially supported by the Innovation Fund of Postgraduate of SUSE (grant y2021101).

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

References

[1]	Yang, X. (2020) A Geometric Probability Randomized Kaczmarz Method for Large Scale Linear Systems. Applied Numerical Mathematics, 164, 139-160. https://doi.org/10.1016/j.apnum.2020.10.016
[2]	Kaczmarz, S. (1937) Angenäherte Auflösung von Systemen Linearer Gleichungen. Bulletin International de l’ Académie Polonaise des Sciences et des Lettres, 35, 355-357.
[3]	Ihrig, A. and Schmitz, G. (2018) Accelerating Nonlinear Speed of Sound Reconstructions Using a Randomized Block Kaczmarz Algorithm. 2018 IEEE International Ultrasonics Symposium (IUS), Kobe, Japan, 22-25 October 2018, 1-9. https://doi.org/10.1109/ULTSYM.2018.8580199
[4]	Popa, C. and Zdunek, R. (2004) Kaczmarz Extended Algorithm for Tomographic Image Reconstruction from Limited-Data. Mathematics and Computers in Simulation, 65, 579-598. https://doi.org/10.1016/j.matcom.2004.01.021
[5]	Lorenz, D.A., Wenger, S., Schöpfer, F. and Magnor, M. (2014) A Sparse Kaczmarz Solver and A Linearized Bergman Method for Onlinear Compressed Sensing. 2014 IEEE International Conference on Image Processing (ICIP), Parise, France, 27-30 October 2014, 1347-1351. https://doi.org/10.1109/ICIP.2014.7025269
[6]	Strohmer, T. and Vershynin, R. (2009) A Randomized Kaczmarz Algorithm with Exponential Convergence. Journal of Fourier Analysis and Applications, 15, Article No. 262. https://doi.org/10.1007/s00041-008-9030-4
[7]	Bai, Z.Z. and Wu, W.T. (2018) On Greedy Randomized Kaczmarz Method for Solving Large Sparse Linear Systems. SIAM Journal on Scientific Computing, 40, A592-A606. https://doi.org/10.1137/17M1137747
[8]	Needell, D., Tropp, J. A. (2014) Paved with Good Intentions: Analysis of A Randomized Block Kaczmarz Method. Linear Algebra and Its Applications, 441, 199-221. https://doi.org/10.1016/j.laa.2012.12.022
[9]	Niu, Y.Q. and Zheng, B. (2020) A Greedy Block Kaczmarz Algorithm for Solving Large-Scale Linear Systems. Applied Mathematics Letters, 104, Article No. 106294. https://doi.org/10.1016/j.aml.2020.106294
[10]	Gower, R.M. and Richtárik, R. (2015) A Randomized Kaczmarz Algorithm with Exponential Convergence. SIAM Journal on Matrix Analysis and Applications, 36, 1660-1690. https://doi.org/10.1137/15M1025487
[11]	Chen, J.Q. and Huang, Z.D. (2021) On a Fast Deterministic Block Kaczmarz Method for Solving Large-scale Linear Systems. Numerical Algorithms. https://doi.org/10.1007/s11075-021-01143-4
[12]	Davis, T.A. and Hu, Y. (2011) The University of Florida Sparse Matrix Collection. ACM Transactions on Mathematical Software, 38, 1-25. https://doi.org/10.1145/2049662.2049663

Journals Menu

Follow SCIRP

	customer@scirp.org
	+86 18163351462(WhatsApp)
	1655362766

	Paper Publishing WeChat

Journals Menu

Home

About SCIRP

Service

Policies