Stability Estimation for Markov Control Processes with Discounted Cost

DOI: 10.4236/am.2020.116036   PDF   HTML   XML   113 Downloads   413 Views  

Abstract

This article explores controllable Borel spaces, stationary, homogeneous Markov processes, discrete time with infinite horizon, with bounded cost functions and using the expected total discounted cost criterion. The problem of the estimation of stability for this type of process is set. The central objective is to obtain a bounded stability index expressed in terms of the Lévy-Prokhorov metric; likewise, sufficient conditions are provided for the existence of such inequalities.

Share and Cite:

Martínez-Sánchez, J. (2020) Stability Estimation for Markov Control Processes with Discounted Cost. Applied Mathematics, 11, 491-509. doi: 10.4236/am.2020.116036.

1. Introduction

Let on a Borel space ( X , B X ) two following Markov control processes be given:

x t = F ( x t 1 , a t , ξ t ) , t = 1 , 2 , (1.1)

x ˜ t = F ( x ˜ t 1 , a ˜ t , ξ ˜ t ) , t = 1 , 2 , (1.2)

where a t A ( x t 1 ) A , a ˜ t A ( x ˜ t 1 ) A are the controls forming the control policies π = ( a 1 , a 2 , ) , π ˜ = ( a ˜ 1 , a ˜ 2 , ) (see [1] [2] for definitions); { ξ t } , { ξ ˜ t } are sequences of independent and identically distributed (i.i.d.) random vectors in a separable metric space ( S , r ) . In what follows the distributions of ξ 1 , ξ ˜ 1 are denoted by D ξ and D ξ ˜ respectively. Let c be a given bounded measurable one-step cost function; for any initial state x X and control policy π Π ( Π is the set of all control policies see [1]), the expected total α-discounted cost criterion areas follows:

V α ( x , π ) : = E x π t = 1 α t 1 c ( x t 1 , a t ) , (1.3)

V ˜ α ( x , π ) : = E x π t = 1 α t 1 c ( x ˜ t 1 , a ˜ t ) , (1.4)

Under assumptions 3.1 and 3.2 given in section 3, there exist stationary optimal policies f * and f ˜ * such that

V α * = V α ( f * ) = inf π Π V α ( x , π ) V ˜ α * = V ˜ α ( f ˜ * ) = inf π Π V ˜ α ( x , π ) (1.5)

To set the stability estimation problem, first, suppose that process given in Equation (1.2) is interpreted as an “available approximation” to process given in Equation (1.1), i.e., D ξ is an approximation to D ξ ˜ .

Second, the policy f ˜ * (optimal with respect to Equation (1.4)) is applied to control the “original process” given in Equation (1.1) (instead of “unavailable” optimal policy f * ).

Following the definition given in [3] [4] [5] [6] [7], we introduce the stability index:

Δ V α : = V α ( f ˜ * ) V α ( f * ) 0 ,

where V α ( ) is the value function defined in Equation (1.5). This definition means that Δ V α represents anextra cost paid for using f ˜ * instead of the optimal policy f * .

Under certain Lipschitz conditions it was proved (for the processes with bounded costs c) that

Δ V α = K ¯ l π ( D ξ , D ξ ˜ ) (1.6)

where K ¯ is an explicitly calculated constant, and l π is the Lévy-Prokhorov metric (see Section 2 for definition). The convergence in l π is equivalent to the weak convergence plus the convergence of first absolute moments (see [8]).

Inequalities as given in Equation (1.6) have been developed with other types of metrics (Kantorovich, total variation, etc.) and optimization criteria (the averange cost) see e.g. in [5] [7] [9] [10]. Other types of criteria used to obtain the stability of the process can be consulted in [11] [12] [13].

The aim of the present paper is making advantage of boundedness of c and using the well-known contractive properties of the operators related to the expected total discount cost optimality equations to prove the “stability inequality” as in Equation (1.6) with the Lévy-Prokhorov distance on its right-hand side.

This paper is organized as follows: Section 2 defines the control Markov model and the problem of its stability. Section 3 presents the Lipschitz conditions and the assumptions to guarantee the existence of a optimal control to the Markov control process as well as the mail result of this work, the Theorem 3.1, which establishes the conditions to achieve the stability. Section 4 is presented a couple of application examples, for which the assumptions are validated and then the result obtained in the Theorem 3.1 is applied. Finally, section 5 has presented the proof of the Theorem 3.1 as well as a couple of lemmas that are required for this proof.

2. Setting of the Problem

In standard way (see for instance [1] [14]), it will denote a Markov Control Process (MCP) indiscrete time with infinite horizon, stationary and homogeneous as the following fivefold:

M = ( X , A , { A ( x ) : x X } , p , c ) (2.1)

where will be assumed that the controllable process components M have the following characteristics:

• The space state X is a metric space with a metric ρ and B X denotes the sigma-algebra;

• The actions space A is a metric space with a metric l;

• The set of admissible actions A ( x ) is compact for every x X ;

• The pairs set of admissible state-actions K = { ( x , a ) X × A : a A , x X } is anon-empty (and measurable) Borel subset of the set X × A and it is equipped with the metric ν = max { ρ , l } ;

• p is a stochastic kernel in X given K . This stochastic kernel specifies the transition probability:

p = ( B | x , a ) (2.2)

where B B X and ( x , a ) K .

• Finally, c : K is a bounded and measurable function called a step cost function.

On the other hand, in many applications the evolution of the MCP given in Equation (2.1) is specified by the following model:

x t = F ( x t 1 , a t , ξ t ) , t = 1 , 2 , (2.3)

where x 0 represents the initial state and { ξ t } it is a sequence of i.i.d. random vectors that take values in any Borel space S with a common distribution D ξ . In fact, it is considered that S is a metric space equipped with a metric r and F : K × S X is a measurable function. The expression given in Equation (2.3) will be denoted as the original process.

Let x X be the initial state and π Π the applied control policy, ( Π is the set of all control policies, see [1] [14] for definitions), then the performance criterion called expected total α-discounted cost is defined as usual, by the following functional:

V α ( x , π ) : = E x π t = 1 α t 1 c ( x t 1 , a t ) (2.4)

where α ( 0 , 1 ) is a fixed discount coefficient; E x π denotes the expected value corresponds to the distribution of the process { x t } with the initial state x X and the control policy π Π applied.

Now, the function V α * ( x ) : = inf π Π V α ( x , π ) with x X is called the value function and a control policy π * (provided it exists) is called optimum (with respect to the criterion V α ) if it meets the following:

V α * ( x ) = V α ( x , π * ) = inf π Π V α ( x , π ) , x X (2.5)

Later, conditions will be imposed that will guarantee the existence of an optimal stationary policy π * = f * = ( f , f , , f , ) for Equation (2.5), (see [14]).

The stability index and its estimation problem. Estimation of the stability problem arises when there is uncertainty about the likelihood of transition p defined in Equation (2.2). The original task of controller consists of the search (or approach) of the optimal policy π * that satisfies Equation (2.5) for the original process. In many applications, this task cannot be accomplished directly because, among other reasons, any of the following:

1) Frequently p or some of its parameters are unknown to the controller and this transition probability is estimated using some statistical procedures (from observations). With the results of these estimates another transition probability is generated p ˜ , that is interpreted as an approximation accessible to the unknown p.

2) There are situations in which p is known but too complicated to have a hope to solve the problem of optimization of control policy. In such cases, sometimes p is replaced by “a theoretical approach” p ˜ , resulting in a controllable process with a more simple structure.

In both cases, in the optimization policies the controller is to work with the controllable Markov process M ˜ = ( X , A , { A ( x ) : x X } , p ˜ , c ) defined by the accessible transition probability p ˜ . This means that instead of the original process x t , given in Equation (2.3), the controller uses an approximate process given by the following equation:

x ˜ t = F ( x ˜ t 1 , a ˜ t , ξ ˜ t ) , t = 1 , 2 , , with x 0 X given (2.6)

where x ˜ t , x ˜ t 1 X are states of the processes; a ˜ t A ( x ˜ t 1 ) is an action of the corresponding state; and { ξ ˜ t } is a sequence of random vector i.i.d. with values in S. The only difference between the given processes in Equations (2.3) and (2.6) is possible, the different distributions D ξ and D ξ ˜ from the random vector { ξ t } and { ξ ˜ t } respectively.

Changing x t for x ˜ t in Equations (2.4) and (2.5), it is defined as the corresponding optimization criterion V ˜ α for approximate processing M ˜

V ˜ α ( x , π ) : = E x π t = 1 α t 1 c ( x ˜ t 1 , a ˜ t ) , with x 0 X .

Suppose now, that it is possible (at least theoretically) to find an optimal policy π ˜ * for the process M ˜ , i.e., the value for the approximate process function is defined as

V ˜ α * ( x , π ˜ * ) = V ˜ α * ( x ) : = inf π Π V ˜ α ( x , π ) , x X (2.7)

The control policy π ˜ * in Equation (2.7) is used as approximation to the nonaccessible optimal policy π * (assuming it exists). In other words, the policy π ˜ * is used to control the original process M instead of the unknown policy π * .

The increase in the cost for such an approach is estimated by means of the following stability index, (see [3] [4]):

Δ V α ( x ) : = V α ( x , π ˜ * ) V α * ( x ) V α ( x , π ˜ * ) V α ( x , π * ) 0 , x X (2.8)

As proposed in [5] [6], the estimation of the stability problem consists of the search of some inequalities of the following type (stability inequalities):

Δ V α ( x ) K ¯ ( x ) ψ [ μ ( p , p ˜ ) ] , x X (2.9)

where:

μ ( p , p ˜ ) is a “distance” between the probabilities of transition p and p ˜ (expressed in terms of a probabilistic metric).

ψ [ μ ( p , p ˜ ) ] is a continuous function such that ψ ( s ) 0 when s 0 ; and K ¯ ( x ) , x X is a function with values calculated explicity.

The results presented in [4] [5] provide inequalities such as the one given in the inequality (2.9) using ψ ( s ) = s γ for 0 < γ 1 , and the so-called “strong metrics”: total variation metric and the weighted total variation metric.

The aim of this article is obtaining inequalities of stability such as given in the inequality (2.9) with ψ ( s ) = s and the use of “metric weak” probabilistic, specifically, the Lévy-Prokhorov metric ( l π ).

For instance, the Theorem 3.1 presented in the next section, see inequalities (3.1) and (3.2), ensure that under appropriate conditions it complies

Δ V α ( x ) K ¯ ( x ) l π ( D ξ , D ξ ˜ ) , x X (2.10)

where:

l π ( D ξ , D ξ ˜ ) : = inf { ε > 0 : D ξ ( A ) D ξ ˜ ( A ε ) + ε , D ξ ˜ ( A ) D ξ ( A ε ) + ε for every A B S }

is the Lévy-Prokhorov metric; A ε : = { s S : r ( s , A ) < ε } and B S denotes the sma-algebra of Borel of metric space ( S , r ) .

It is well-known (see [15]) that l π metrizes weak convergence in any separable metric space. A succession of random vectors that converge under the metric l π , converges weakly.

3. Assumptions and Results

The Hausdorff distance (h) between compact subsets B , C of the metric space ( A , l ) is given by

h ( B , C ) : = max { sup x B l ( x , C ) , sup y C l ( B , y ) } ,

where l ( x , C ) = inf Z C l ( x , z ) .

Likewise the so-called “strong metric”, the total variation metric ( V ( , ) ) is given by

V ( D ξ , D ξ ˜ ) : = sup { | E [ ϕ ( ξ ) ϕ ( ξ ˜ ) ] | : ϕ < 1 } ,

where D ξ , D ξ ˜ are in the space of probability distributions over ( S , B S ) and is the supremum norm. Of course under S , then

V ( D ξ , D ξ ˜ ) = | D ξ ( t ) D ξ ˜ ( t ) | d t .

On the other hand, one of the metrics called “weak” is the Kantorovich metric ( κ )

κ ( D ξ , D ξ ˜ ) : = sup ϕ L { E | ϕ ( ξ ) ϕ ( ξ ˜ ) | }

where the function ϕ it is of Lipchitz, namely, the set L is defined as

L : = { ϕ : | ϕ ( s ) ϕ ( s ) r ( s , s ) | , s , s S } .

It is well known (see [9]) that in the case of S m , it is true that κ ( D ξ , D ξ ˜ ) 0 if and only if ξ n ξ (weak convergence) and that E | ξ n | E | ξ | .

In the remainder of the article, it will be denoted by B to the Banach space of all measurable functions u : X for which the norm u = sup x X { | u ( x ) | } is finite.

The first set of technical assumptions is required to ensure the existence of minimizers in the value functions of the original and the approximate model, see [16].

Assumption 3.1.

1) The set A is compact for each x X ; also the mapping of values set as x A ( x ) is upper semicontinuous with respect to the Hausdorff metric.

2) The one-step cost function c : K is bounded, namely | c ( k ) | b for each k K , b ; and for each x X , the one-step cost function c ( k ) = c ( x , a ) is lower semicontinuous in A.

3) For each continuous function bounded u : X , the functions

u 1 ( x , a ) : = E u [ F ( x , a , s ) ] ;

u 2 ( x , a ) : = E u [ F ( x , a , s ˜ ) ] ,

with ( x , a ) K , s , s ˜ S are continuous in K .

The second set of assumptions imposes the “Lipschitz conditions” on the one-step cost function as well as on the transition probabilities of the original and approximate processes.

Assumption 3.2.

There are finite constants b , L 0 , L 1 , L 2 , L 3 such that the following is true:

1) | c ( k ) | b < for each k K ;

2) | c ( k ) c ( k ) | L 0 ν ( k , k ) for all k , k K ;

3) h ( A ( x ) , A ( x ) ) L 1 ρ ( x , x ) for all x , x X where h ( , ) is the Hausdorff metric;

4) V ( F ( k , s ) , F ( k , s ) ) L 2 ν ( k , k ) for all x , x X , s S where V ( , ) is the total variation metric;

5) ρ ( F ( k , s ) , F ( k , s ) ) L 3 r ( s , s ) for all k K , s , s S ;

6) For each x X , s S and the bounded function u : X , then the function a E u [ F ( x , a , s ) ] is lower semicontinuous in A ( x ) .

For a proof of the following proposition, see [16].

Proposition 1 (Well-known result). Under the assumptions 3.1 and 3.2, for the control processes given in Equations (2.3) and (2.6) there are optimal stationary control policies denoted by f * = { f * , f * , } and f ˜ * = { f ˜ * , f ˜ * , } respectively, such that V α ( x , f * ) an V ˜ α ( x , f ˜ * ) do not depend on the initial state x X and

V α * = V α ( x , f * ) = inf π Π V α ( x , π ) , x X ;

V ˜ α * = V ˜ α ( x , f * ) = inf π Π V ˜ α ( x , π ) , x X .

In addition, the corresponding value functions V α * , V ˜ α * B . In particular, for each fixed ( x , a ) K , expected values E V α * [ F ( x , a , s ) ] and E V ˜ α * [ F ( x , a , s ) ] are well defined.

Now, we are in position to formulate the main result of the paper.

Theorem 3.1. Under the assumptions 3.1 and 3.2, the stability index given in Equation (2.8) meets the following inequality:

sup x X Δ V α ( x ) K ¯ α l π ( D ξ , D ξ ˜ ) , (3.1)

where the stability constant is

K α = 4 α ( 1 α ) 3 [ b + L 3 ( 1 + L 1 ) ( L 0 ( 1 α ) + α b L 2 ) ] (3.2)

Note that if α 1 , then the constant K ¯ α in the inequality (3.2) it is of order O ( 1 α ) 3 .

4. Some Examples

4.1. The Process of Regularization of the Water Level in a Dam

An important application of control problems (deterministic and stochastic) are those related to water reserve operations. An excellent introduction to many of these problems, including the connection between these and inventory systems, is given in [17].

In the simplest case of regularization of the water level in a dam, the following modeling can be used for the original process:

x t = min { x t 1 + ξ t a t ; U } , t = 1 , 2 , (4.1)

and the respective approximate model remains as

x ˜ t = min { x ˜ t 1 + ξ ˜ t a ˜ t ; U } , t = 1 , 2 , (4.2)

In this model, the state variable x t represents the level of the stock (volume) of water that the dam has at the beginning of the period t; the control a t is the amount of water that is released from the dam for family consumption, irrigation, electric power, etc. during the period t; and the “disturbance” ξ t is the amount of water that the dam receives, randomly, viarain for instance.

In this example, we get X = [ 0 , U ] , S = [ 0 , ) , A ( x ) = [ 0 , x ] , with x X , where U is the maximum capacity of the dam.

Let 0 c ( x , a ) b < be the cost paid for the released water service, for example, can be made use of a cost function given by c ( x , a ) = c 0 a , proportional to water consumption and where c 0 would represent the cost of a unit of water.

To ensure compliance with assumption 3.2 for this example, it is admitted that the following conditions are met:

➢ C1. The cost for one step c ( x , a ) satisfies the assumption 3.2 clauses (1) and (2).

➢ C2. The random variable ξ has a density g ξ , which is:

1) Bounds by a constant M g < ;

2) Satisfies the condition of Lipschitz with a constant L g .

For A ( x ) = [ 0 , x ] , the clause (3) in the assumption 3.2 is verified directly (using the Hausdorff metric definition) with the constant L 1 = 1 . Now, denoting by y : = x a , it is easy to see that for each y fixed, the function F ( x , a , s ) = min { y + s , U } is Lipschitz in S with the constant 1. Then the clause (5) of this assumption is complied with L 3 = 1 . Next, the clauses will be verified (4) of assumption 3.2.

Denoting by y : = x a and y : = x a with x , x [ 0 , U ] , a [ 0 , x ] , a [ 0 , x ] , consider the following random variables:

ζ ( y ) : = min { y + ξ , U } ,

ζ ( y ) : = min { y + ξ , U } .

Since

| y y | | x x | + | a a | 2 v ( k , k ) ,

it is enough to prove that for a constant L ˜ the following inequality is met

V ( ζ ( y ) , ζ ( y ) ) L ˜ | y y | , (4.3)

At the time you will see that, according to the definition of the total variation metric, to prove the inequality (4.3) it must be proved that for each measurable function ϕ : S , with ϕ 1 it is true that

| E ϕ [ ζ ( y ) ] E ϕ [ ζ ( y ) ] | L ˜ | y y | .

Now then

E ϕ [ ζ ( y ) ] = E { ϕ [ ζ ( y ) ] ; y + ξ < U } + E { ϕ [ ζ ( y ) ] ; y + ξ U } ,

where for a random variable η , we get that E { η ; A } = E η I A .

Using the same representation for E ϕ [ ζ ( y ) ] , we get that

| E ϕ [ ζ ( y ) ] E ϕ [ ζ ( y ) ] | | E { ϕ [ ζ ( y ) ] ; y + ξ < U } E { ϕ [ ζ ( y ) ] ; y + ξ < U } | + | E { ϕ [ ζ ( y ) ] ; y + ξ > U } E { ϕ [ ζ ( y ) ] ; y + ξ > U } | ,

then

| E ϕ [ ζ ( y ) ] E ϕ [ ζ ( y ) ] | I 1 ( y , y ) + I 2 ( y , y ) (4.4)

For the second term on the right side of the last inequality, we get that

(4.5)

Let for instance be y > y . Then from Equation (4.5) and the condition C2, we get that

I 2 ( y , y ) = U y U y g ξ ( s ) d s M g | y y | . (4.6)

Let z ( 0 , U ) be an arbitrary but fixed number and d z that denotes an infinitesimal interval with center in z. Since

{ y + ξ d z } { y + ξ < U } ,

then

( ξ d z y , y + ξ < U ) = ( ξ d z y ) = g ξ ( z y ) d z .

Similary

( ξ d z y , y + ξ < U ) = g ξ ( z y ) d z .

Then in the inequality (4.4) (taking into account that g ξ ( x ) = 0 for x < 0 ):

I 1 ( y , y ) = | y U ϕ ( z ) g ξ ( z y ) d z y U ϕ ( z ) g ξ ( z y ) d z | ,

or then, assuming for example that y > y , we get that

I 1 ( y , y ) = | y U ϕ ( z ) g ξ ( z y ) d z y y ϕ ( z ) g ξ ( z y ) d z y U ϕ ( z ) g ξ ( z y ) d z | , | y y ϕ ( z ) g ξ ( z y ) d z | + y U | ϕ ( z ) | | g ξ ( z y ) g ξ ( z y ) | d z M g | y y | + U L g | y y | (4.7)

(Applying the conditions C1 and C2).

Joining inequalities (4.4), (4.6) and (4.7) is obtained the inequality (4.3) with L ˜ = 2 M g + U L g .

Finally it has been established that for this example the clause (4) of assumption 3.2 is met with L 2 = 2 ( 2 M g + U L g ) . Following similar arguments can be shown that the clause (6) of assumption 3.2 is also true. Therefore, in this example inequality (3.1) can be applied to the Theorem 3.1, obtaining the following:

sup x [ 0 , U ] Δ V α ( x ) K ¯ α l π ( D ξ , D ξ ˜ ) (4.8)

where

K ¯ α = 4 α ( 1 α ) 3 [ b + 2 L 0 ( 1 α ) + 4 α b ( 2 M g + U L g ) ] (4.9)

On the other hand, the distance l π ( D ξ , D ξ ˜ ) given in the inequality (4.8) is very difficult to calculate. Therefore, the result given in the inequality (4.8) can be expressed in terms of other probabilistic metrics as it’s shown in the following:

v Total variation metric. Using the well-known relationship l π < V , see [18], between the metrics of Lévy-Prokhorov and of total variation and since in this example S R , we can narrow the part on the right side of inequality (4.8) for the next stability inequality:

sup x [ 0 , U ] Δ V α ( x ) K ¯ α V ( D ξ , D ξ ˜ ) = K ¯ α 0 | g ξ ( s ) g ξ ˜ ( s ) | d s , (4.10)

where constant K ¯ α is given in the inequality (4.9).

v Kantorovich metric ( κ ). Let be ( s 0 ),

G ξ ( s ) = 0 s g ξ ( z ) d z ; G ξ ˜ ( s ) = 0 s g ξ ˜ ( z ) d z ,

the distribution functions of random variables ξ and ξ ˜ , respectively, in Equations (4.1) and (4.2). Then, using the fact that ( l π ) 2 κ , see [18], relates the Lévy-Prokhorov metric and the Kantorovich metric (which was defined in Section 2), the part on the right side of the inequality (4.8) is bounded as

sup x [ 0 , U ] Δ V α ( x ) K ¯ α [ κ ( D ξ , D ξ ˜ ) ] 1 / 2 = K ¯ α [ 0 | G ξ ( s ) G ξ ˜ ( s ) | d s ] 1 / 2 , (4.11)

where constant K ¯ α is given in the inequality (4.9).

The integral in the last inequality represents the Kantorovich metric between ξ and ξ ˜ . The inequality (4.11) is more informative compared to inequality (4.10) since it supports that approximation of G ξ for the corresponding empirical distribution functions.

4.2. Example 4.2

Let be X = S = , A ( x ) = A , x X , with A being a compact set in . Now, define the following processes:

x t = H ( x t 1 , a t ) + G ( x t 1 ) ξ t , t 1 ;

x ˜ t = H ( x ˜ t 1 , a ˜ t ) + G ( x ˜ t 1 ) ξ ˜ t , t 1 ;

where ξ t ~ N ( θ , 1 ) , ξ ˜ t ~ N ( θ ˜ , 1 ) , H : × A and G : are bounded and Lipschitz functions with constants L H and L G respectively.

In [19], it is shown that assumption 3.1 is satisfied for this model.

Properly selecting a cost function c ( x , a ) that is bounded and Lipschitz, it is assured that the clauses (1) and (2) from assumption 3.2 are fulfilled; for instance, if the following cost function is selected by c ( x , a ) = x a , then given that k = ( x , a ) , k = ( x , a ) K , we get that

| c ( k ) c ( k ) | = | ( x a ) ( x a ) | 2 ν ( k , k ) ,

so, by selecting a constant of L o = 2 , this clause (2) is satisfied.

On the other hand, it is clear that the clause (3) is satisfied for any positive constant L 1 . To validate the clause (4) of assumption 3.2 first, define the following random variables:

y ( k ) = F ( k , ξ 1 ) = H ( x t 1 , a t ) + G ( x t 1 ) ξ 1 , k = ( x t , a t ) K

y ( k ) = F ( k , ξ 1 ) = H ( x t 1 , a t ) + G ( x t 1 ) ξ 1 , k = ( x t , a t ) K

so, it is clear that the probability density of each of the previous random variables is, respectively

y ( k ) ~ N ( H ( x t 1 , a t ) + θ G ( x t 1 ) , G 2 ( x t 1 ) ) = f y ( k ) ,

y ( k ) ~ N ( H ( x t 1 , a ) t + θ G ( x t 1 ) , G 2 ( x t 1 ) ) = f y ( k ) ,

then, since this example S = , after some direct calculations we get to the next result

V ( F ( k , ξ 1 ) , F ( k , ξ 1 ) ) = V ( y ( k ) , y ( k ) ) = | f y ( k ) ( D ) f y ( k ) ( D ) | d t 2 π | H ( x t 1 , a t ) + θ G ( x t 1 ) H ( x t 1 , a t ) θ G ( x t 1 ) | ,

and as it was assumed that the functions H and G are Lipschitz for the constants L H , L G respectively, then from the last inequality we get that

V ( F ( k , ξ 1 ) , F ( k , ξ 1 ) ) 2 π ( L H + θ L G ) v ( k , k ) .

So, by selecting the constant L 2 = 2 π ( L H + θ L G ) the clause (4) of assumption 3.2 is satisfied. To validate the clause (5) of this assumption, let be k = ( x t , a t ) K , s , s S and note that

ρ ( F ( k , s ) , F ( k , s ) ) | H ( x t 1 , a t ) + G ( x t 1 ) ξ t H ( x t 1 , a t ) G ( x t 1 ) ξ ˜ t | ,

and since the functions H and G are bounded, let M G be the finite constant, such that | G ( x ) | M G for all x X . Therefore, from the last inequality we get that

ρ ( F ( k , s ) , F ( k , s ) ) M G r ( s , s ) .

So for a constant of L 3 = M G , the clause (5) is satisfied. Finally, since the function F ( k , s ) is continuous in all its arguments, then the clause (6) is also true.

In conclusion, the example 4.2 satisfies the assumption 3.2, so the result of the Theorem 3.1 can be applied, see inequalities (3.1), (3.2), and narrow the stability index using the Lévy-Prokhorov metric

Δ V α K ^ l π ( D ξ , D ξ ˜ ) ,

where

K ^ = 4 α { b + M G ( 1 + L 1 ) [ L 0 ( 1 α ) + α b ( 2 π ( L H + θ L G ) ) ] } ( 1 α ) 3 .

5. Proofs

5.1. Some Preliminary Lemmas

For the proof of the theorem 3.1, the following lemmas will be used:

Lemma 5.1. Under assumption 3.2, the value function V α * defined in Equation (2.5) satisfies the condition of Lipschitz in the state space X, with the

constant L = ( L 1 + 1 ) [ L ` 0 + α b L 2 1 α ] .

Proof. For the assumption 3.2 clause (1), for each π Π we get that V α ( x , π ) = E x f t = 1 α t 1 c ( x t 1 , a t ) is bounded by b 1 α , then V α b 1 α .

On the other hand, in [16] it is proved that the following operators:

T u ( x ) : = inf a A ( x ) { c ( x , a ) + α E u [ F ( x , a , ξ ) ] } T ˜ u ( x ) : = inf a A ( x ) { c ( x , a ) + α E u [ F ( x , a , ξ ) ] } (5.1)

are contractive in the space of Banach B with module α .

Now, of these operators will be selected the terms that are inside the “brackets” to define the following function:

g ( x , a ) g ( k ) : = c ( k ) + α E V α [ F ( k , ξ ) ] , k K .

It is claimed that the function g ( k ) is Lipschitz for the constant L _ = L 0 + α b L 2 1 α .

To prove it, let be k = ( x , a ) , k = ( x , a ) K , then

| g ( k ) g ( k ) | = | c ( k ) + α E V α [ F ( k , ξ ) ] c ( k ) α E V α [ F ( k , ξ ) ] | | c ( k ) c ( k ) | + α | E V α [ F ( k , ξ ) ] E V α [ F ( k , ξ ) ] | .

Applying the assumption 3.2 clause (2) and the fact that V α b 1 α , we get that

| g ( k ) g ( k ) | L 0 ν ( k , k ) + α b 1 α V ( F ( k , ξ ) , F ( k , ξ ) ) .

Then, applying the assumption 3.2 clauses (4), the previous inequality can be expressed as

| g ( k ) g ( k ) | L 0 ν ( k , k ) + α b 1 α L 2 ν ( k , k ) = [ L 0 + α b L 2 1 α ] ν ( k , k ) = L ¯ ν ( k , k ) ,

therefore, g ( k ) L i p ( L ¯ ) in K , where L ¯ = [ L 0 + α b L 2 1 α ] ν ( k , k ) .

By virtue of which the operators given in Equation (5.1) are contractive, we get that V α * = T V α * = inf a A ( x ) { g ( x , a ) } .

Then, to prove that the function V α * is Lipschitz in X with the constant L = ( L 1 + 1 ) [ L 0 + α b L 2 1 α ] = ( L 1 + 1 ) L _ _ , is enough to try the following:

For the function g : K , so that | g ( k ) g ( k ) | L ¯ ν ( k , k ) , with k , k K , it follows that for all x , y X , a A :

| inf a A ( x ) g ( x , a ) inf a A ( y ) g ( y , a ) | L ρ ( x , y ) = L ¯ ( L 1 + 1 ) ρ ( x , y ) (5.2)

Remark 5.1. Observe that

| inf a A ( x ) g ( x , a ) inf a A ( y ) g ( y , a ) | = | inf a A ( x ) { c ( x , a ) + α E V α * [ F ( x , a , ξ ) ] } inf a A ( y ) { c ( y , a ) + α E V α * [ F ( y , a , ξ ) ] } | ,

so, we have that

| inf a A ( x ) g ( x , a ) inf a A ( y ) g ( y , a ) | = | V α * ( x ) V α * ( y ) | .

So if the inequality (5.2) is met, then it is true that

| V α * ( x ) V α * ( y ) | L ¯ ( L 1 + 1 ) ρ ( x , y ) ,

which would conclude that V α * is Lipschitz in X.

Next the proof of the inequality (5.2) is presented.

Let be q ( x , y ) : = | inf a A ( x ) g ( x , a ) inf a A ( y ) g ( y , a ) | .

Then, by the inequality of the triangle we get the following:

q ( x , y ) | inf a A ( x ) g ( x , a ) inf a A ( y ) g ( y , a ) | + | inf a A ( x ) g ( y , a ) inf a A ( y ) g ( y , a ) | ,

q ( x , y ) L ¯ ρ ( x , y ) + I ,

where

I = | inf a A ( x ) g ( y , a ) inf a A ( y ) g ( y , a ) | (5.3)

It will be proven that

I L ¯ L 1 ρ ( x , y ) (5.4)

The proof will be done by contradiction. Assuming inequality is not met given in (5.4), then there is a ε > 0 such that the following is satisfied:

I > L ¯ L 1 ρ ( x , y ) + ε (5.5)

Due to the compactness of the sets A ( x ) , A ( y ) and to the continuity of g, there are elements a x A ( x ) , a y A ( y ) for which the infimum are reached in I, see Equation (5.3).

If it is admitted for example that

I = inf a A ( x ) g ( y , a ) inf a A ( y ) g ( y , a ) > L ¯ L 1 ρ ( x , y ) + ε ,

Then

g ( y , a x ) g ( y , a y ) > L ¯ L 1 ρ ( x , y ) + ε , g ( y , a x ) > g ( y , a y ) + L ¯ L 1 ρ ( x , y ) + ε , (5.6)

Now, for the assumption 3.2 clause (3), as h ( A ( x ) , A ( y ) ) L 1 ρ ( x , y ) , exists a ¯ x A ( x ) such that l ( a y , a ¯ x ) L 1 ρ ( x , y ) and consequently we get that

| g ( y , a ¯ x ) g ( y , a y ) | L ¯ ν ( k , k ) L ¯ l ( a ¯ x , a y ) L ¯ L 1 ρ ( x , y ) .

The above implies that

g ( y , a ¯ x ) g ( y , a y ) L ¯ L 1 ρ ( x , y )

g ( y , a y ) g ( y , a ¯ x ) L ¯ L 1 ρ ( x , y ) ,

if this last inequality is substituted in the inequality (5.6), we obtain

g ( y , a x ) > g ( y , a ¯ x ) + ε ,

which contradicts the fact that a ¯ x is the element for which the minimum of g ( y , a ) over A ( x ) is reached. Therefore, the assumption made in the inequality (5.5) is false. Then, we get that I L ¯ L 1 ρ ( x , y ) , which implies that q ( x , y ) L ¯ ρ ( x , y ) + L ¯ L 1 ρ ( x , y ) and consequently

q ( x , y ) L i p ( ( 1 + L 1 ) [ L 0 + α b L 2 1 α ] ) .

Finally, because of the comments made in remark 5.1, we get that

V α * L i p ( ( 1 + L 1 ) [ L 0 + α b L 2 1 α ] ) ,

which proves lemma 5.1.

Lemma 5.2. Under assumption 3.2, the value function V α * defined in Equation (2.5) satisfies the condition of Lipschitz in space S with the constant

L * = ( 1 + L 1 ) [ L 0 + α b L 2 1 α ] L 3 .

Proof. For proof of lemma 5.2, the following function will be used: Let be k K ; define the function φ k as.

Let be k K , s , s S . By the definition of functional φ k we get that

| φ k ( s ) φ k ( s ) | = | V α * [ F ( k , s ) ] V α * [ F ( k , s ) ] | ,

and because of lemma 5.1, we came to the next inequality:

| φ k ( s ) φ k ( s ) | ( 1 + L 1 ) [ L 0 + α b L 2 1 α ] ρ ( F ( k , s ) , F ( k , s ) ) .

Now, applying assumption 3.2 clause (4) to the previous inequality we get the following:

| φ k ( s ) φ k ( s ) | ( 1 + L 1 ) [ L 0 + α b L 2 1 α ] L 3 r ( s , s ) ,

namely

V α * L i p ( ( 1 + L 1 ) [ L ` 0 + α b L 2 1 α ] L 3 ) in S .

5.2. The Proof of the Theorem 3.1

To prove inequality (3.1) we take advantage of method proposed in [7], nevertheless we need to modify this technique and the combination of certain Lyapunov like conditions in the results of the paper allows to use the contractive properties of the operators related to the discount cost optimality equations, so the following are the incorporations and developments required for the proof of the Theorem 3.1 obtained in this article, with a bounded cost function.

Let π = { f , f , } , π ˜ = { f ˜ , f ˜ , } be the optimal stationary policies for processes given in Equations (2.3) and (2.6) respectively, and V α * , V ˜ α * the corresponding value functions.

Then (see chapter 8 of [16]) V α * , V ˜ α * and f , f ˜ satisfy the following optimality equations (even more, V α * , V ˜ α * are the only solutions to these equations):

V α * ( x ) : = inf a A ( x ) { c ( x , a ) + α E V α * [ F ( x , a , ξ ) ] } = c ( x , f ( x ) ) + α E V α * [ F ( x , f ( x ) , ξ ) ] , (5.7)

V ˜ α * ( x ) : = inf a A ( x ) { c ( x , a ) + α E V ˜ α * [ F ( x , a , ξ ˜ ) ] } = c ( x , f ˜ ( x ) ) + α E V ˜ α * [ F ( x , f ˜ ( x ) , ξ ) ] . (5.8)

For all ( x , a ) K are defined

H ( x , a ) : = c ( x , a ) + α E V α * [ F ( x , a , ξ ) ] ,

H ˜ ( x , a ) : = c ( x , a ) + α E V ˜ α * [ F ( x , a , ξ ˜ ) ] .

As it has been proved in [7], the stability index given in Equation (2.8) can be represented as

Δ V α ( x ) : = i = 1 α t 1 E x π ˜ Λ t (5.9)

where

Λ t : = H ( x t 1 , a t ) inf a A ( x t 1 ) H ( x t 1 , a ) (5.10)

and { x t 1 , a t , t 1 } is the trajectory of the process given in Equation (2.3) applying the control policy π ˜ = { f ˜ , f ˜ , } .

By the definition given in Equations (5.10) and (5.8) along with the fact that π ˜ is optimal for the process given in Equation (2.6), we have that

Λ t = H ( x t 1 , a t ) H ˜ ( x t 1 , a t ) + inf a A ( x t 1 ) H ˜ ( x t 1 , a ) inf a A ( x t 1 ) H ( x t 1 , a ) ,

which implies

| Λ t | 2 α sup a A ( x t 1 ) | H ( x t 1 , a ) H ˜ ( x t 1 , a ) | ,

and by the definition of the functions H and H ˜

| Λ t | 2 α sup a A ( x t 1 ) | E V α * [ F ( x t 1 , a , ξ ) ] E V ˜ α * [ F ( x t 1 , a , ξ ˜ ) ] | .

Then, applying the inequality of the triangle

| Λ t | 2 α sup a A ( x t 1 ) | E V α * [ F ( x t 1 , a , ξ ) ] E V α * [ F ( x t 1 , a , ξ ˜ ) ] | + 2 α sup a A ( x t 1 ) | E V α * [ F ( x t 1 , a , ξ ˜ ) ] E V ˜ α * [ F ( x t 1 , a , ξ ˜ ) ] | . (5.11)

Define now, the next pseudo-metric:

μ ( D ξ , D ξ ˜ ) : = sup { | E V α * [ F ( x , a , ξ ) ] E V α * [ F ( x , a , ξ ˜ ) ] | : ( x , a ) K } (5.12)

Then, from Equation (5.12) it is observed that the first summand on the right side of Equation (5.11) is bounded by 2 α μ ( D ξ , D ξ ˜ ) .

On the other hand, applying the supremum x t 1 in Equation (5.11), the second term on the right side is

2 α sup a A ( x t 1 ) | E V α * [ F ( x t 1 , a , ξ ˜ ) ] E V ˜ α * [ F ( x t 1 , a , ξ ˜ ) ] | 2 α V α * V ˜ α * . (5.13)

As already mentioned, the operators given in Equation (5.1) are contractive in B with module a. So, Equations (5.7) and (5.8) can be expressed as V α * = T V α * and V ˜ α * = T ˜ V ˜ α * . Now, given that V α * and V ˜ α * are fixed points of these operators, we get that

V α * V ˜ α * = T V α * T ˜ V ˜ α * ,

now, applying the inequality of the triangle

V α * V ˜ α * = T V α * T ˜ V α * + T ˜ V α * T ˜ V ˜ α * T V α * T ˜ V α * + α V α * V ˜ α * 1 1 α T V α * T ˜ V α * α 1 α sup ( x , a ) K | E V α * [ F ( x , a , ξ ) ] E V α * [ F ( x , a , ξ ˜ ) ] | .

Using the definition given in Equation (5.12), we obtain that the previous inequality can be expressed as

V α * V ˜ α * α 1 α μ ( D ξ , D ξ ˜ ) .

Substituting this last expression in the inequality (5.13), we get that the second term on the right side of inequality (5.11) is bounded by 2 α 1 α μ ( D ξ , D ξ ˜ ) and so inequality (5.11) is bounded by

sup X | Λ t | 2 α μ ( D ξ , D ξ ˜ ) + 2 α 2 1 α μ ( D ξ , D ξ ˜ ) = 2 α 1 α μ ( D ξ , D ξ ˜ ) . (5.14)

Finally, substituting inequality (5.14) in Equation (5.9) we obtain

sup X Δ V α ( x ) i = 1 α t 1 { 2 α 1 α μ ( D ξ , D ξ ˜ ) } = 2 α ( 1 α ) 2 μ ( D ξ , D ξ ˜ ) . (5.15)

To find a dimension for μ ( D ξ , D ξ ˜ ) , it will be used the definition of the Dudley metric ( δ ) in the space of distributions in ( S , r ) :

δ ( D ξ , D ξ ˜ ) = δ ( P X , P Y ) : = sup φ D | E φ ( X ) E φ ( Y ) | ,

where

D : = { φ : S : φ + φ L 1 } ,

and

φ = sup x S | φ ( x ) | ; φ L = sup x y | φ ( x ) φ ( y ) | r ( x , y ) .

(See [15] for definition and properties of δ ).

By the lemma 5.2, we get that V α * L i p ( ( 1 + L 1 ) [ L 0 + α b L 2 1 α ] L 3 ) and since V α * b 1 α , then the stability index can be narrowed in terms of Dudley’s metric by the following expression:

Δ V α ( x ) = 2 α ( 1 α ) 2 [ b 1 α + ( 1 + L 1 ) [ L 0 + α b L 2 1 α ] L 3 ] δ ( D ξ , D ξ ˜ ) .

Now, using the well-known relationship δ 2 l π between the Dudley metric and Lévy-Prokhorov metric (see [18]) and after some direct calculations, the desired inequality (3.1) is obtained with the constant given in Equation (3.2).

6. Conclusions

Despite the vast literature that exists on the subject of Markov controllable processes, a few studies have been carried out on the subject of estimating stability. The study of stability for Markov control processes represents a challenge both from a theoretical and a practical point of view. Proposing appropriate probabilistic metrics to achieve so-called stability inequalities is an additional effort.

In this article, conditions were found to obtain the stability of a Markov control process under the optimization criterion of expected total α-discounted cost with a bounded cost function using the Lévy-Prokhorov metric.

The importance of being able to use the Lévy-Prokhorov metric lies in the fact that for application problems it allows estimations of the stability index under the use of empirical distributions for the random elements, since they converge weakly under this metric to the distributions that are it tries to estimate (unlike the so-called “strong metrics”).

On the other hand, since in applications, there is no company that can bear unlimited (unbounded) costs, the results found in this work using simple techniques such as contractive operators provide an estimate of the increase in cost (the stability index) to control the “original process” using the optimal policy of the “approximate process”. Of course, the stability constant ( K ¯ α ) affects this stability index, specifically in this work it was found that this constant is of order O ( 1 α ) 3 if α 1 . There are arguments to support the hypothesis that in the left part of the inequality (3.1) (for each initial state fixed x): Δ V α when α 1 and the distribution of ξ and ξ ˜ are fixed. It is not clear what the rate of such growth is. Therefore, it is proposed that in future research, based on particular (and simple) control processes, to verify the growth rate of Δ V α using computational experiments and process simulation.

Acknowledgements

The author is particularly grateful to Professor Edgar Vladyvosky M.S. for his instructive discussions on a generalization of the Markov processes and properties of the Lévy-Prokhorov metric.

Conflicts of Interest

The author declares no conflicts of interest regarding the publication of this paper.

References

[1] Dynkin, E.B. and Yushkevich, A.A. (1979) Controlled Markov Processes. Springer-Verlag, New York.
https://doi.org/10.1007/978-1-4615-6746-2
[2] Hernandez-Lerma, O. (1989) Adaptive Markov Control Process. Springer-Verlang, New York.
https://doi.org/10.1007/978-1-4419-8714-3
[3] Gordienko, E.I. (1988) Stability Estimates for Controlled Markov Chains with a Minorant. Journal of Soviet Mathematics, 40, 481-486.
https://doi.org/10.1007/BF01083641
[4] Gordienko, E.I. (1992) An Estimate of the Stability of Optimal Control of Certain in Stochastic and Deterministic Systems. Journal of Soviet Mathematics, 59, 891-899.
https://doi.org/10.1007/BF01099115
[5] Gordienko, E.I. and Salem, F.S. (1998) Robustness Inequality for Markov Control Processes with Undounded Costs. Systems & Control Letters, 33, 125-130.
https://doi.org/10.1016/S0167-6911(97)00077-7
[6] Gordienko, E.I. and Yushkevich, A. (2003) Stability Estimates in the Problem of Optimal Switching of a Markov Chain. Mathematical Methods of Operations Research, 57, 345-365.
https://doi.org/10.1007/s001860200258
[7] Gordienko, E.I., Lemus-Rodriguez, E. and Montes-de-Oca, R. (2008) Discounted Cost Optimality Problem: Stability with Respect to Weak Metrics. Mathematical Methods of Operations Research, 68, 77-96.
https://doi.org/10.1007/s00186-007-0171-z
[8] Rachev, S.T. and Ruschendorfl, L. (1998) Mass Transportation Problem, Vol. II: Applications. Springer, New York.
[9] Gordienko, E.I., Lemus-Rodriguez, E. and Montes-de-Oca, R. (2009) Average Cost Markov Control Processes: Stability with Respect to the Kantorovich Metric. Mathematical Methods of Operations Research, 70, 13-33.
https://doi.org/10.1007/s00186-008-0229-6
[10] Montes-de-Oca, R. and Salem-Silva, F. (2005) Estimates for Perturbations of Average Markov Decision Processes with a Minimal State and Upper Bounded by Stochastically Ordered Markov Chains. Kybernetika, 41, 757-772.
[11] Aziz, M.M. and Merie, D.M. (2020) Stability and Adaptive Control with Sychronition of 3-D Dynamical System. Open Access Library Journal, 7, e6075.
https://doi.org/10.4236/oalib.1106075
[12] Yilmaz, S., Büyükköroglu, T. and Dzhafarov, V. (2015) On Asymptotic Stability of Linear Control Systems. Applied Mathematics, 6, 71-77.
https://doi.org/10.4236/am.2015.61008
[13] Engblom, S. (2014) On the Stability of Stochastic Jump Kinetics. Applied Mathematics, 5, 3217-3239.
https://doi.org/10.4236/am.2014.519300
[14] Hernandez-Lerma, O. and Lasserre, J.B. (1996) Discrete-Time Markov Control Processes. Basic Optimality Criteria. Springer, New York.
https://doi.org/10.1007/978-1-4612-0729-0
[15] Rachev S. T. (1991) Probability Metrics and the Stability of Stochastic Models. Wiley, Chichester.
[16] Hernandez-Lerma O. and Lasserre J. B. (1999) Further Topics on Discrete-Time Markov Control Processes. Springer, New York.
https://doi.org/10.1007/978-1-4612-0561-6
[17] Yakowitz, S. (1982) Dynamic Programming Applications in Water Resources. Water Resources Research, 18, 673-696.
https://doi.org/10.1029/WR018i004p00673
[18] Dudley, R.M. (1989) Real Analysis and Probability. Wadswort & Brooks Cole, Pacific Grove.
[19] Arapostathis, A., Borkar, V.S., Fernandez-Gaucherande, G.M.K. and Marcus, S.I. (1993) Discrete-Time Controlled Markov Processes with Average Cost Criterion: A Survey. SIAM Journal on Control and Optimization, 31, 282-344.
https://doi.org/10.1137/0331018

  
comments powered by Disqus

Copyright © 2020 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.