Inherent Numerical Instability in Computing Invariant Measures of Markov Chains ()
1. Introduction
This paper is dedicated to discrete-time Markov chains
with a countably infinite set of states (labelled as
) and a stationary one-step transition matrix
of the form
(1)
with
and
for at least one
. In addition we assume that X is irreducible and recurrent. Notice that the irreducibility of X implies
(2)
For a recurrent irreducible Markov chain, the system
(3)
has a unique solution which is strictly positive, see Karlin and Taylor ( [1] , chapter 11). Every constant positive multiple of
is called an invariant measure of X.
Substituting (1) into (3) and rearranging yields the (n + 1)th-order homogeneous linear recurrence relation
(4)
coupled to the side conditions
(5)
From the theory of linear difference equations (see Miller [2] ) it is known that there exist
linearly independent functions
defined on
which take on prescribed values at
and satisfy the homogeneous recurrence relation (4) for all
.
Since
for all k the desired solution
is uniquely determined by its initial values
and (5). An application of the recurrence relation (4) then should directly lead to the desired solution x. But unfortunately forward computation of x by means of the homogeneous equation (4) is not a meaningful procedure. In the sequel it is shown that the solution space S of the recurrence relation (4) is the direct sum
of two subspaces
and
with the property, that every solution
dominates over every solution
, i.e.
In addition, we have
and
.
As pointed out by Cash [3] , Gautschi [4] [5] , Lozier [6] and Mattheij [7] , forward computation of a dominated solution becomes numerically unstable. To avoid numerical instability, it is therefore necessary to construct a decoupled recursion in which the dominant solution
does not appear. In the sequel, it is shown that with each transition matrix P of the form (1) one can associate a set of so-called generalized continued fractions (GCFs) being convergent if the underlying Markov chain is irreducible and recurrent. It turns out that the limits of these fractions coincide with the coefficients of the desired decoupled recursion.
The phenomenon of inherent instability has been first recognized in connection with the computation of higher transcendental functions such as Bessel and associated Legendre functions (see Gautschi [4] and Wimp [8] ). Our experience seems to indicate that also most of the recurrence relations arising from stochastic modelling are subject to numerical instability and require special attention. Transition matrices and Q-matrices with lower bandwidth n and upper bandwidth 1 have been discussed in [9] .
Remark. Most of our considerations concerning the computation of invariant measures of Markov chains may be extended to more general infinite systems of equations. In Section 3, we will point out the properties of invariant measures which we explicitly used.
2. Generalized Continued Fractions and Linear Difference Equations
This chapter deals with the computation of dominated solutions of homogeneous linear difference equations by means of generalized continued fractions (GCFs). The theory of GCFs was initiated by Jacobi [10] and Perron [11] [12] and played a leading role in different areas of mathematics, especially in number theory, ergodic theory, linear difference equations and Padé approximations.
Defintion 1. A generalized continued fraction (GCF) of dimension
is an
-tuple
of real-valued sequences and a convergence structure as follows. Let
be the principal solutions of the corresponding homogeneous linear difference equation
(6)
satisfying
(7)
The GCF is said to converge iff all the limits
(8)
do exist.
Convergence of a GCF is indicated by the notation
Terminating a GCF after the N-th column we get the so-called N-th approximants
The term “GCF” becomes obvious if forward computation of the N-th approximant of a GCF is replaced by the equivalent backward algorithm. It is known that the recurrence relation (6) can be converted into a first order vector recursion of the form
(9)
by putting
Comparing (6) and (9) it is seen, that the numerators
and the denominators
of the GCF appear in the last row of the matrix
Consider now the backward recurrence scheme
(10)
with initial vector
. Then
Writing (10) in expanded form and inserting the structure of
, we get
or, equivalently
(11)
with initial values
for
. Hence
and
(12)
Alternative procedures for computing GCFs are described in [13] .
The relations between GCFs and linear difference equations have been first recognized by Perron [14] . Perron’s results were generalized by Van der Cruyssen [13] , Hanschke [15] [16] and Levrie and Bultheel [17] . The following theorem is due to Van der Cruyssen [13] .
Theorem 1. A GCF
converges iff there are
linearly independent solutions
of the recurrence relation (6) satisfying
(13)
and
(14)
As pointed out by Gautschi [4] [5] and Van der Cruyssen [13] , forward computation of a solution
is numerically unstable. Van der Cruyssen [13] establishes that if the limits
(15)
exist for all
, then
iff
(16)
Combining (11), (12) and (16), one obtains an efficient algorithm (which we will refer to as Van der Cruyssen’s algorithm) for approximating the first
components
of an element
with prescribed values
:
Step 1: Select
and define
for
by
.
Step 2: Set
and
for
.
Step 3: Increase N until the accuracy of the iterates is within prescribed limits.
For any
, the vector
is an approximant of a GCF. Hence, convergence of the algorithm is related to convergence of GCFs. For the latter, we cite two helpful results.
Theorem 2. (Levrie [18] , Perron [12] ). The GCF
converges if it satisfies the so-called dominance condition, i.e.
Theorem 3. (De Bruin [19] , Perron [12] ). Consider a GCF of the form (8) with nth approximant numerators
and denominators
and the GCF given by
(17)
with nth numerators
and denominators
where the real numbers
are all different from zero. Then
In other words, if one of these two GCFs converges so does the other.
Notice that the GCF (17) may be interpreted as an equivalence transformation of the GCF (12).
3. Main Results
To make (4) congruent with (6) we put
(18)
(19)
Theorem 4. Let
be defined as in (18) and (19). The limits
(20)
exist for all
if the corresponding Markov chain is irreducible and recurrent. In case of convergence the invariant measure is a dominated solution of (6) and satisfies the decoupled recursion
(21)
Proof. Denote by
the
northwest corner truncation of
. Since
is assumed to be irreducible and recurrent we conclude from Seneta [20] [21] [22] that the finite system
(22)
where
is the unit matrix and
is the vector with unity in the first position and zeros elsewhere has a unique solution
satisfying
The vector
satisfies the homogeneous linear difference Equation (6) for
and is subject to the boundary conditions
(23)
and
(24)
Notice that the numerators
and the denumerators
of the GCF (20) satisfy
(25)
and build up a fundamental system of solutions to Equation (6) for
. Hence any solution of the recurrence relation (6) can be expressed in terms of these functions. In view of their initial values
(26)
we get
(27)
Choosing
and utilizing (24) equation (27) reduces to
(28)
Dividing both sides of (28) by
and
we formally arrive at
(29)
Passing to the limit
we get the decoupled recursion (21) provided that the limits
do exist for
.
To prove our assertion we utilize that the invariant measure
of the irreducible and recurrent Markov chain (1) is strictly positive and satisfies the recurrence relation (4) which can be rewritten as
(30)
Define
for
Then (30) becomes equivalent to
implying that the GCFs
converge for all
by means of Theorem 2. From Theorem 3, we then conclude that the same is true for the GCFs (8). ,
Theorem 2 says that the numerical calculation of the invariant measures of X can be reduced to Van der Cruyssen’s algorithm with (23) as initial conditions.
Remark. It is important that the statement of Theorem 4 consists of two parts:
・ The invariant measure is dominated by other solutions of the difference Equation (4).
・ By means of GCFs, we are able to construct a decoupled recursion which is not affected to numerical instability.
Our main goal was the derivation of the first statement. Although the existence of dominating solutions has been pointed out in specific examples (e.g. ( [8] , p. 167) where the exact solutions of the difference equations are known), to our knowledge, no statement in this generality has been published. Since the desired solution being dominated directly implies numerical instability, the system
should not be solved by forward computation. Note, that up to some slight modifications concerning the boundary conditions, the truncated system (22) yields the same difference Equation (4) for
. Thus, forward computation could be applied to (22), too, but at least for large N, the result of numerical instability and hence, the recommendation not to use forward computation, holds for (22) as well.
The observation of inherent numerical instability of (4) is essential since the transition structure under consideration arises in many practical applications, e.g. state-dependent queueing systems with bulk arrivals, and it is also valid in a more general context, e.g. as an approximation to state-dependent variants of the traditional G/M/1 queueing model.
Basically, we have exploited that the solutions of the truncated system (22) converge to invariant measures as
. For Markov chains with a general transition structure, there is no way of solving
directly (in particular, forward computation cannot be applied if
for some
). In this situation, Seneta [20] [21] [22] as well as Golub and Seneta [23] already recommended to use the convergence of the solutions of the truncated system (22) to the invariant measure for numerical issues. Furthermore, Seneta [22] discussed numerical aspects (in terms of the condition number) of the finite system (22) and stated numerical stability of Gaussian elimination or LU decomposition. The above comment concerning forward computation as a solution method for (22) emphasizes on the fact that it is important to solve this finite system in an appropriate way. The GCF-based algorithm can be interpreted as a combination of building up the truncated system (22), and then applying a modification of Gaussian elimination procedure. Therefore, it follows both steps recommended in the literature cited above and represents the complete procedure in a combined mathematical form, which is interesting from a structural point of view.
4. Continuous-Time Markov Chains
The results of the preceding chapter can easily be extended to continuous-time Markov chains
generated by a conservative, irreducible and regular Q-matrix of the form
(31)
where
and
for at least one
. Notice that the irreducibility of Y implies
If Y is positive recurrent, the limits
exist independently of the initial state and build up a probability distribution which is strictly positive. For the construction of a continuous-time Markov process from its infinitesimal properties the reader is referred to Reuter [24] and Kendall and Reuter [25] . Now let
be the
northwest corner truncation of Q. If Y is regular and positive recurrent, then the finite system
(32)
has a unique solution
satisfying
see Tweedie [26] . Now let
(33)
(34)
By substituting (31) into (32) it is seen that
satisfies the
order homogeneous linear difference equation
(35)
for
and the boundary conditions
(36)
(37)
By replacing (33) with (18) and (34) with (19), it is readily seen that the scheme (35)-(37) formally coincides with that of the discrete time case. Hence (26)-(29) also hold for
. To prove convergence of the associated GCFs
(38)
we use the same arguments as in the proof of Theorem 4. Notice that
satisfies (35). Rearranging yields
(39)
Define
for
. Since
for
and
for
, (39) is equivalent to
By Theorem 3, the GCF
then converges for all
. Hence the same is true for the GCFs (38). Summarizing we arrive at the following theorem.
Theorem 5. Let
be defined as in (33) and (34), respectively. The limits (38) exist for all
if the corresponding Markov process is irreducible, regular and positive recurrent. In case of convergence the limiting distribution
is a dominated solution of (35) and satisfies the decoupled recursion.
(40)
For executing (40), we make use of Van der Cruyssen’s algorithm with (36) as initial conditions. The resulting sequence
can be normalized such that
.
Remark. As for discrete-time Markov chains, the key statement of Theorem 5 is that the invariant measure is a dominated solution of the difference equation (35). Again, the GCFs additionally combine the two-step method consisting of considering the truncated system (32) and applying Gaussian elimination or LU decomposition.
5. Numerical Examples
As an example we consider the continuous-time Markov chain
generated by the Q-matrix
, where
(41)
with parameters
and
. Such a Markov chain may be interpreted as a model for a single-server queueing system with state-dependent arrival and service rates, and with customers arriving in groups of size 1 or 2.
It is easy to check that Q is irreducible. To prove that Y is regular and positive recurrent we make use of Tweedie’s [27] criterion. The condition is to find some
and a non-negative sequence
satisfying
as
and
(42)
for almost all i. By substituting (41) into (42) we get
for almost all i. The substitution
for
implies the condition
for almost all i which is true because of
.
Under these conditions there exists a unique invariant measure of Y (up to multiplication by a constant), which satisfies the following set of equations:
(43)
(44)
and
or, equivalently,
(45)
for
. (45) leads to the recursion
(46)
which is used for forward computation.
Since (45) is a homogenous linear difference equation with constant coefficients a fundamental set of solutions can easily be derived from the roots of its characteristic equation. A simple manipulation yields the following set of linearly independent solutions:
The desired invariant measure
of Y with initial value
is a linear combination of the first two solutions, that is:
Therefore, choosing
yields that there is a geometrically increasing solution of (46), which explains why forward computation cannot cope with this problem, whereas the GCF-based algorithm becomes a stable procedure. This effect is reflected in Table 1 and Table 2, where data type double in C++ was used.
![]()
Table 1. Numerical values for
![]()
Table 2. Numerical values for
Consider next the M/M/1 queueing system, in which the customers arrive according to a Poisson stream with rate
and in which the successive service times are stochastically independent and exponentially distributed with parameter
. For this system it is known that the queueing process
which counts the number of customers in line, forms a homogeneous continuous time Markov chain with state space
being regular and positive recurrent for
, see [28] . The model is convenient to get clear about the mechanism of our approach.
The invariant measure
of L meets the second order linear difference equation
(47)
subject to the side conditions
In principle,
could be computed by forward computation, that is
where
.
A simple induction argument shows that
for
. Another solution of (47) is given by
for
, implying that
becomes a dominated solution of (47), as claimed. Therefore
can also be characterized by the decoupled recursion (40), i.e.
(48)
Substituting of (48) into (47) leads to
(49)
By truncating this infinite continued fraction scheme at a certain level
, we get approximants for the desired coefficients
. A numerical example is given in Table 3. In this example the effect of numerical instability is comparatively small.
6. Conclusion and Further Research
For Markov chains with a specific transition structure, we have proven that the
![]()
Table 3. Numerical values for the M/M/1 queueing system for
.
invariant measure is a dominated solution of the corresponding steady-state recurrence relations, and therefore, it cannot be calculated by forward computation. As a by-product of our considerations, a GCF-based method for computing the invariant measure in a numerically stable way has been established. The effects of numerical instability of the original recurrence relations as well as the stability of the decoupled version have been illustrated by numerical examples. In some way, the GCF-based approach is similar to previously recommended methods (truncate the infinite system and solve the truncated one by Gaussian elimination or LU decomposition), but represents the steps in a combined mathematical form. Note that in previous literature, the generality of the transition structure did not allow forward computation, and hence, the question of instability of this method did not arise.
At no point of our consideration, we used stochasticity of P (or the corresponding property of conservativity of Q) explicitly. Therefore, it seems reasonable to extend our results to more general infinite systems
of linear equations, where
In the context of computing invariant measures of Markov chains, we have
or
, respectively. In the general case, the main steps of the proofs of Theorem 4 or Theorem 5 can be restated in an abstract context as follows: If
and
for
, and
for
and
, and if there is a strictly positive solution
of
, all GCFs involved in the algorithmic procedure converge. If additionally, we have
for all
for the solutions
of
,
then the GCF-based algorithm converges to
as
tends to infinity, see [29] . Usually, these assumptions are met in the context of computing Perron-Frobenius eigenvectors for infinite non-negative matrices with some communication structure (e.g. irreducibility, see [30] for more details). In the context of discrete-time Markov chains, invariant measures are right eigenvectors for the Perron-Frobenius eigenvalue 1, whereas left eigenvectors are related to hitting probabilities (or absorption probabilities). Therefore, an extension of our considerations could refer to the task of computing these probabilities under appropriate assumptions on the transition structure. Further generalizations could deal with the following generalizations:
・ In the tridiagonal case (that is,
), a generalization to block-matrices (that is, the corresponding Markov chain is a quasi-birth-death process) by means of matrix-valued continued fractions has been studied in [31] . The resulting algorithm is similar to the matrix-analytic methods studied in [32] [33] . An intuitive combination is the study of transition probability matrices of the form (1) where all
are matrices. For practical issues, it would be desirable to find a memory-efficient algorithm for computing stationary expectations. For quasi-birth-death processes (that is, block-tridiagonal transition matrices), such a method has been suggested in [34] .
・ Instead of banded matrices, we could consider upper Hessenberg matrices P (or lower Hessenberg matrices S). Then the difference equation (4) of order n is replaced by a difference equation of infinite order (sometimes referred to as sum equation). Furthermore, arbitrary lower bandwidthes for P could be considered (as in [9] , where the upper bandwidth was restricted to 1).
Acknowledgements
We acknowledge support by Open Access Publishing Fund of Clausthal University of Technology.