A Geometric View on Inner Transformation between the Variables of a Linear Regression Model ()
1. Introduction
A matrix can be factorized into the product of several matrices with special properties. Particularly, by singular value decomposition (SVD) which is widely used in regression analysis [1] [2] [3], a matrix can be factorized into the product of three matrices with orthogonal or diagonal properties respectively. SVD can be formularized like
[4] [5] [6] [7].
Moore-Penrose generalized inverse is a special case of generalized matrix inverse [8] [9] [10] [11] [12] and can be applied in regression analysis [13] [14] and least square analysis [15]. For example, in the study of Tian and Zhang [13], the linear unbiased estimator of partial coefficients is derived through Moore-Penrose generalized inverse algorithm. Herein, multiplied by Moore-Penrose generalized inverse of the independent variable matrix, the dependent variable vector is transformed into the new coordinate systems, or left space and then right space of the independent variable matrix [16]. In fact, this process can also be regarded as a vector rotating algorithm. In addition to Moore-Penrose’s generalized inverse algorithm, singular value decomposition (SVD) facilitates such transformation by dividing the transformation process into three-step to present a visible geometrical view.
2. y Can Be Transformed into
In order to avoid the irrelevant calculation minutia when adopting an independent matrix with a large rank and only to highlight inner transformation, herein,
a simple multiple linear regression model
is adopted with the independent matrix
and the dependent vector
,
symbolizes the regression coefficient estimator vector.
In this multiple linear regression model, the matrix X’s singular value decomposition (SVD) can be demonstrated as below:
(E.1)
U and V are respectively the left and right singular vector matrix of X, and
is eigen value matrix of X [6] [7] [17].
Because X is a matrix with full rank column,
can be demonstrated as
[13] [16],
signifies the Moore-Penrose generalized inverse of matrix X. And
can also be demonstrated as below when X is substituted by its SVD form presented in equation (E.1),
(E.2)
The Equation (E.2) can be regarded as a process that the vector y is transformed 3 times from right to left multiplied by
.
Transformation 1
(E.3)
U’s columns are demonstrated as
,
and
.
Let the coordinate axes of a 3-dimension original coordinate system be symbolized by
,
and
. The quintessence of the equation (E.3) is that multiplied by the matrix
, vector
in the coordinate system constructed by
and
, can be transformed into
which is located in a coordinate system constructed by
and
, or left space of X. In fact, y and
are at the same spatial location. However, y is in the coordinate system constructed by
and
, meanwhile
is presented by the coordinate system constructed by
and
, in another word,
is in the left space of X. These locations can be demonstrated as below in Figure 1.
Transformation 2
(E.4)
During this transformation, multiplied by
’s Moore–Penrose inverse
, the vector
in
and
coordinate system is transformed into the vector
which is in the same coordinate system as
. Such a transformation can be called “Vector Stretching”. During this transformation, the coordinate value of
in
axis diminishes, meanwhile, the coordinate value of
in
axis gets enlarged. The coordinate value of
in
axis vanishes. The “Vector Stretching” of
into
can be demonstrated as below in Figure 2.
The vanishing coordinate value of
in
axis can signify the degree of freedom of X from a geometric view.
Transformation 3
(E.5)
X’s right singular vectors can be denoted as
and
.
and
vectors are located in a coordinate system constructed by
and
vectors [16].
Figure 1. The dependent variable vector y transformed into y1 by landing on to the left space of matrix X.
Figure 2. Vector y1 transformed intoy2 through “Vector Stretching”.
The quintessence of Equation (E.5) is that the vector
in the coordinate system is constructed by
and
can be embodied by the vector
in the coordinate system
and
. This transformation is demonstrated as below in Figure 3.
Though the vector
has the same spatial position as
in Figure 3, these two vectors are situated in two different coordinate systems constructed respectively by
(the right space of X) and
(the left space of X). In another word, the distinction between the vector
and
is that
is demonstrated by the coordinate system of
and
, meanwhile,
is demonstrated by
and
. As demonstrated by the aforementioned 3 steps of transformation, the dependent variable y vector is inner-transformed into parameter
vector.
Figure 3. Vector y2 transformed into
by landing on to the right space of matrix X.
3. y Can Be Transformed into
The independent variable X’s left singular vector matrix U [6] [7] [17] ) can be blocked into two sub-matrics
and
as below.
Let
symbolize the estimator of the dependent variable y. For that
is the left space of X,
can be demonstrated as
[16], based on which the following result can be derived.
Because
is an orthonormal matrix,
’s Moore-Penrose inverse
is equal with
[12]. So,
can be demonstrated as below:
(E.6)
Based on E.6, y can be transformed into
by two steps as below.
Transformation 4
(E.7)
In Transformation 4, y transforms into
by being projected into
, the left space of X. Another word,
is the projection of y in X.
Transformation 5
(E.8)
In Transformation 5, multiplied by
,
transforms into
and is presented in the coordinate system constructed by
and
. Their spatial locations are demonstrated in Figure 4.
In Figure 4, it can be found that the error estimator
is located in the same direction with
which is the left null space of matrix X, and can be expressed as below:
.
is the projection of y into the left space of X, but presented in the original coordinate system constructed by
and
.
is located in the left null space of matrix X and perpendicular with
. This perpendicular result can be geometrically demonstrated in Figure 4 as well as in the following multiplication.
.
4. Finding and Conclusion
By applying SVD and Moore-Penrose generalized inverse of the independent variable X in a multiple linear regression model, the dependent variable y can be transformed into the regression coefficient estimator vector
and its own estimator
. This process presents a new geometric perspective to study the relationship between X, y,
and
, through the inner-transformation algorithm.
As demonstrated from Figures 1-3, y transforms into
by transforming from the original coordinate system of
and
into the right space of X constructed by
and
. Through this process, y transforms into
by transferring into the coordinate system of
and
, that is presented by Transformation 1 (E.3) and demonstrated in Figure 1.
Multiplied by
, the Moore-Penrose inverse of
,
stretches into
, that is presented by Rotation 2 (E.4) and demonstrated in Figure 2. In this transformation, the coordinate value of
in
axis vanished, which also signifies the degree of freedom of X from a geometric view.
Multiplied by
, the inverse of the right singular value matrix of X,
transforms into
, that is presented by Rotation 3 (E.5) and demonstrated in Figure 3.
Figure 4. The dependent variable estimator
landing on to the left space of X, meanwhile, the error estimator
landing on to the left null space of X.
As demonstrated in Figure 4, y transforms into
by projecting into
, the left space of X. And then, multiplied by
,
transforms into
and returns back into the original coordinate system constructed by
and
.
is the projection of y into the left space of X, and perpendicular with error estimator
which is located in the left null space of X.
With the aid of the algorithms like matrix decomposition and Moore-Penrose generalized matrix inverse, the dependent variable y of a multiple linear regression model can be inner-transformed into the regression coefficient estimator
and its own estimator
. This process is a new approach to illustrate the inner-transformation between variables from a geometric view as well as presenting the spatial locations of the variables. To date, there is no study to explore the relationship between the variables of the multiple linear regression model from the view of geometric transformation. This study fills such a gap and provides a new perspective for studying multiple linear regression.
The limitation of this work is that a simple example of the multiple linear model is adopted to present such intricate inner transformation. But, it can demonstrate more inference art of inner transformation if more complex examples of multiple linear regression model are adopted in future studies.