Estimation of Distribution Function Based on Presmoothed Relative-Risk Function

Abstract

In this article, the lifetime data subjecting to right random censoring is considered. Nonparametric estimation of the distribution function based on the conception of presmoothed estimation of relative-risk function and the properties of the estimator by using methods of numerical modeling are discussed. In the model under consideration, the estimates were compared using numerical methods to determine which of the estimates is actually better.

Share and Cite:

Abdushukurov, A. , Bozorov, S. and Mansurov, D. (2022) Estimation of Distribution Function Based on Presmoothed Relative-Risk Function. Applied Mathematics, 13, 191-204. doi: 10.4236/am.2022.132015.

1. Introduction

Censored data occur in survival analysis, bio-medical trials, industrial experiments. There are several schemas of censoring (from the right, left, both sides, mixed with competing risks and others). However, in statistical literature right random censoring is wide spread, in so far as it was easily described from the methodological point of view. Here we consider also this kind of censorship in order to compare our results with others.

Let X 1 , X 2 , and Y 1 , Y 2 , be two independent sequences of independent and identically distributed (i.i.d.) random variables (r.v.-s) with common unknown continuous distribution functions (d.f.-s) F and G, respectively. Let the X j be censored on the right by Y j , so that the observations available for us at the n-th stage consist of the sample C ( n ) = { ( Z j , δ j ) , 1 j n } , where Z j = min ( X j , Y j ) and δ j = I ( X j Y j ) with I ( A ) meaning the indicator of the event A. The main problem consists of a non-parametrical estimating of d.f. F with nuisance d.f. G based on the censored sample C ( n ) , where the number of observed X j s , ν n = δ 1 + + δ n is random.

Kaplan and Meier [1] were the first to suggest the product-limit (PL) estimator F n P L of F defined as

F n P L ( t ) = { 1 { j : Z ( j ) t } ( 1 δ ( j ) n j + 1 ) , t Z ( n ) , 1 , t > Z ( n ) , δ ( n ) = 1 , undefined , t > Z ( n ) , δ ( n ) = 0 , (1)

where Z ( 1 ) Z ( n ) are the order statistics of Z-sample { Z j ,1 j n } and { δ ( j ) ,1 j n } the sequence of indicators adjunct to the ordered Z-sample. There are different versions of PL-estimators. However, those do not coincide, if the largest Z j is a censoring time. There is an enormous set of works on investigating several properties of PL-estimators and their application on statistical problems, specially in the case of right random censorship. However, F n P L is not a unique estimator of d.f. F. Abdushukurov [2] [3] proposed another estimator of F, of relative-risk power type:

F n R R ( t ) = 1 ( 1 H n ( t ) ) R n ( t ) = { 0 , t < Z ( 1 ) , 1 ( n j n ) R n ( t ) , Z ( j ) t < Z ( j + 1 ) , 1 j n 1 , 1 , t Z ( n ) , (2)

where H n ( t ) = 1 n j = 1 n I ( Z j t ) , t 1 ( ; + ) is an empirical estimator of d.f. P ( Z j t ) = 1 ( 1 F ( t ) ) ( 1 G ( t ) ) H ( t ) and R n ( t ) = ( Λ n ( t ) ) 1 Λ 1 n ( t ) is an estimator of relative-risk function R ( t ) = ( Λ ( t ) ) 1 Λ 1 ( t ) , t 1 . Here cumulative hazard functions (c.h.f.-s) Λ , Λ 0 and Λ 1 corresponding to d.f.-s H, G and F defined as

Λ ( t ) = t d H ( u ) 1 H ( u ) = Λ 0 ( t ) + Λ 1 ( t ) , Λ 0 ( t ) = t d G ( u ) 1 G ( u ) = t d H 0 ( u ) 1 H ( u ) , Λ 1 ( t ) = t d F ( u ) 1 F ( u ) = t d H 1 ( u ) 1 H ( u ) , (3)

with subdistribution functions H 0 ( t ) + H 1 ( t ) = H ( t ) , t 1 ,

H 0 ( t ) = P ( Z j t , δ j = 0 ) = t ( 1 F ( u ) ) d G ( u ) , H 1 ( t ) = P ( Z j t , δ j = 1 ) = t ( 1 G ( u ) ) d F ( u ) .

The corresponding estimators of c.h.f.-s (3) are

Λ n ( t ) = t d H n ( u ) 1 H n ( u ) = Λ 0 n ( t ) + Λ 1 n ( t ) ,

where

Λ k n ( t ) = t d H k n ( u ) 1 H n ( u ) , k = 0 , 1 ; H k n ( t ) = 1 n j = 1 n I ( Z j t , δ j = k )

are empirical counterparts of H k ( t ) , k = 0 , 1 with H 0 n ( t ) + H 1 n ( t ) = H n ( t ) , t 1 .

In [2] [3] [4] it was shown that both estimators (1) and (2) have similar asymptotic properties tending to the same limiting Gaussian process. However, the relative-risk power estimator (2) has some small-sample advantages with respect to PL-estimator (1). For example, it is not sensitive to censoring in last observed point Z ( n ) , since F n R R ( Z ( n ) ) = 1 and it is identifiable with the model: ( 1 F n R R ( t ) ) ( 1 G n R R ( t ) ) = 1 H n ( t ) , n 1 , t 1 , where G n R R ( t ) = 1 ( 1 H n ( t ) ) 1 R n ( t ) is a corresponding estimator of d.f. G(t). In [4] it was proposed several extended versions of estimator (2) in generalized models of incomplete observations mixed with competing risks. These estimators were also extensively studied in some statistical problems. It is not difficult to observe that estimator (2) is a natural extension of well-know ACL-(Abdushukurov-Cheng-Lin) estimator of F in simple Proportional Hazards Model (PHM):

F n A C L ( t ) = 1 ( 1 H n ( t ) ) p n , t 1 ,

where p n = ν n n is an estimator of probability p = P ( δ j = 1 ) , which is value of the constant relative-risk function R ( t ) p (so far as in PHM,

Λ 1 ( t ) = p Λ ( t ) , t 1 ). Note that F n A C L was independently proposed and studied by Abdushukurov [5] and Chen, Lin [6] (for more information, see also Csörgő [7] ). This estimator was studied, extended and used by many other authors up a present. The main property of PHM is its characterization by independence of subsamples { Z 1 , , Z n } and { δ 1 , , δ n } . This property is equivalent

to relation 1 G ( t ) = ( 1 F ( t ) ) β , t 1 for some positive β . In PHM, p = 1 1 + β and therefore β is a censoring parameter. Estimator F n A C L in

PHM is asymptotically efficient with respect to F n P L . This advantage of the estimator is well preserved for plug-in estimators of many functionals (see [2] [5] [7] ). That is why, in this framework, the conditional probability that datum is not censored given its observed value

p ( t ) = P ( δ j = 1 / Z j = t ) = E [ δ j / Z j = t ] , t 1 , (4)

is a very important function, which in PHM is constant p ( t ) = 1 1 + β , t 1 . Moreover, the key role of probability (4) takes part in expressing c.h.f. Λ 1 via Λ as

Λ 1 ( t ) = t p ( u ) d Λ ( u ) , t 1 ,

and, therefore, a relative-risk function given as

R ( t ) = ( Λ ( t ) ) 1 t p ( u ) d Λ ( u ) , t 1 .

Probability (4) is a regression of δ j on Z j . Hence, it can be estimated by some regression statistics. We have used the following nonparametric regression estimator of Nadaraya [8] and Watson [9]:

p n ( t ) = [ 1 n h ( n ) j = 1 n k ( t Z j h ( n ) ) ] 1 [ 1 n h ( n ) j = 1 n δ j k ( t Z j h ( n ) ) ] , (5)

where the kernel k ( ) is a given probability density function and { h = h ( n ) , n 1 } is a bandwith sequence such that: h 0, n . In case of dependence of probability (4) on unknown parameters, it may be estimated parametrically (see, Dikta [10] in this context). Cao et al. [11] proposed following presmoothed PL-estimator of d.f. F by replacing the censoring indicators δ ( j ) in the expression of PL-estimator (1) by the estimator (5) at the observed data points:

F n p ( t ) = 1 { j : Z ( j ) t } ( 1 p n ( Z ( j ) ) n j + 1 ) , t 1 . (6)

Some asymptotic properties of estimator (6) were investigated in [11] [12]. Taking into account some advantages of estimator (2) with respect to (1), we propose a new presmoothed relative-risk power (PRRP) estimator:

F n P R ( t ) = 1 ( 1 H n ( t ) ) R n p ( t ) = { 0 , t < Z ( 1 ) , 1 ( n j n ) R n p ( t ) , Z ( j ) t < Z ( j + 1 ) , 1 j n 1 , 1 , t Z ( n ) , (7)

were

R n p ( t ) = ( Λ n ( t ) ) 1 Λ 1 n p ( t ) = ( Λ n ( t ) ) 1 t p n ( u ) d Λ n ( u ) , t 1 ,

is a partially presmoothed analogue of estimator R n ( t ) . For probability mass function (4) smooth estimator (5) is used in formula for c.h.f. Λ 1 ( t ) . But the estimator (7) is not smooth. We can see that estimator (7) is also well defined in whole line without any conditions on censorship.

2. Asymptotic Properties of PRRP Estimator

Let’s denote r ( n ) = h 2 ( n ) + ( n h ( n ) ) 1 / 2 ( log n ) 1 / 2 . In order to investigate the properties of estimator (7) we need the following conditions:

(C1) ( F , G ) K = { ( F , G ) : N F N G , P ( X j Y j ) ( 0,1 ) } , where N F = { t : 0 < F ( t ) < 1 } and N G = { t : 0 < G ( t ) < 1 } ;

(C2) Numbers α , β and γ are such that min { H ( α ) ,1 H ( β ) } γ ( 0,1 ) , α > τ H = sup { t : H ( t ) = 0 } and β < T H = inf { t : H ( t ) = 1 } , [ α , β ] ;

(C3) For all n 1 there takes place P ( 0 < ν n < n ) = 1 ;

(C4) k is a symmetric, twice continuously differentiable and bounded variation density function with compact support;

(C5) Density q ( t ) = H ( t ) exists, is four times continuously differentiable at t [ α , β ] and sup α t β q ( t ) > 0 ;

(C6) p ( t ) is four times continuously differentiable at t [ α , β ] ;

(C7) n 1 ε h ( n ) for some ε > 0 , n = 1 h λ ( n ) < for some λ > 0 and h 2 ( n ) = o ( ( n h ( n ) ) 1 / 2 ( log ( 1 h ( n ) ) ) 1 / 2 ) .

Consider random functions

φ 1 ( t ; z ) = p ( t ) 1 H ( t ) ( I ( Z t ) H ( t ) ) , φ 2 ( t ; z ) = t I ( Z u ) H ( u ) 1 H ( u ) p ( u ) d u , φ 3 ( t ; z , δ ) = t k ( u Z h ) ( δ p ( u ) ) 1 H ( u ) d u .

In the next theorem, we will show that PRRP estimator can be approximated by summ of i.i.d. random functions on t with the rate for the remainder term tending to zero at n almost surely.

Theorem 1. If the conditions (C1)-(C7) are fulfilled, then there holds

F n P R ( t ) F ( t ) = ( 1 F ( t ) ) Ω n ( t ) + Q n ( t ) (8)

with sup α t β | Q n ( t ) | = a . s . Ο ( max { ( r ( n ) log n ) 2 , log n n } ) , where Ω n ( t ) = 1 n i = 1 n [ φ 1 ( t ; Z i ) φ 2 ( t ; Z i ) + φ 3 ( t ; Z i , δ i ) ] .

The following Lemmas allow us to prove Theorem 1.

Lemma 1. (Corollary from lemma 3.2 in [12] ) Assume that the conditions (C1)-(C7) are fulfilled. Then following estimate holds

sup α t β | Λ 1 n p ( t ) Λ 1 ( t ) | = a . s . Ο ( r ( n ) log n ) . (9)

Lemma 2. (Theorem 3.4 in [12] ). Assume that the conditions (C1)-(C7) are fulfilled. Then for t [ α , β ] it is true that

Λ 1 n p ( t ) Λ 1 ( t ) = a . s . Ω n ( t ) + Ο ( r 2 ( n ) ) . (10)

Lemma 3. (Dworetzky-Kiefer-Wolfowitz inequality with tight constant d = 2 from [13] ). For all n 1 , some γ > 0 and ε = ( ( 1 + γ ) ( 2 n ) 1 log n ) 1 / 2 the following estimate holds

P ( sup < t < | H n ( t ) H ( t ) | > ε ) 2 n ( 1 + γ ) (11)

Lemma 4. (Lemma on page 53 [14] ). For γ > 0 there is true following estimate

P ( sup α t β | t ( H n ( u ) H ( u ) ) d ( H n ( u ) H ( u ) ) ( 1 H ( u ) ) 2 | > Α ( 1 H ( β ) ) 2 n 1 log n ) Β n ( 1 + γ ) (12)

where Α = Α ( γ ) and Β are some positive constants.

Lemma 5. 1) If conditions (C1)-(C3) are fulfilled, then there hold estimates

a) sup α t β | Λ ( t ) Λ n ( t ) 1 | = a . s . Ο ( ( log n n ) 1 / 2 ) ;

b) sup α t β | Λ 1 ( t ) Λ n ( t ) R ( t ) | = a . s . Ο ( ( log n n ) 1 / 2 ) ;

2) If the conditions (C4)-(C7) are additionally required fulfilled, then the following estimate is also valid

c) sup α t β | R n P ( t ) R ( t ) | = a . s . Ο ( max { r ( n ) log n , ( log n n ) 1 / 2 } ) .

Proof of Lemma 5. Observe that

sup α t β | Λ ( t ) Λ n ( t ) 1 | sup α t β ( H n ( α ) ) 1 | Λ n ( t ) Λ ( t ) | 3 [ H n ( α ) ( 1 H n ( β ) ( 1 H ( β ) ) ) ] 1 sup < t < | H n ( t ) H ( t ) | = a . s . Ο ( ( log n n ) 1 / 2 ) , (13)

where the last equality follows from (11) and Borel-Cantelli’s lemma. Further, it is clear that b) is consequence of (13) and

sup α t β | Λ 1 ( t ) Λ n ( t ) R ( t ) | = sup α t β R ( t ) | Λ ( t ) Λ n ( t ) 1 | sup α t β | Λ ( t ) Λ n ( t ) 1 | = a . s . Ο ( ( log n n ) 1 / 2 ) .

For c) we have

sup α t β | R n p ( t ) R ( t ) | = sup α t β | R n p ( t ) Λ 1 ( t ) Λ n ( t ) | + sup α t β | Λ 1 ( t ) Λ n ( t ) R ( t ) | sup α t β ( Λ n ( t ) ) 1 [ sup α t β | Λ 1 n p ( t ) Λ 1 ( t ) | + sup α t β | Λ n ( t ) Λ ( t ) | ] = a . s . ( H n ( α ) ) 1 [ Ο ( r ( n ) ) + Ο ( ( log n n ) 1 / 2 ) ] = a . s . Ο ( max { r ( n ) log n , ( log n n ) 1 / 2 } ) ,

where we have used Lemma 1 and the estimate (13). Lemma 5 is proved.

Proof of Theorem 1. By two-term Taylor expansion for difference F n P R ( t ) F ( t ) we obtain

F n P R ( t ) F ( t ) = ( 1 F ( t ) ) M n ( t ) + L n ( t ) , (14)

where

M n ( t ) = [ R n p ( t ) log ( 1 H n ( t ) ) R ( t ) log ( 1 H ( t ) ) ] , L n ( t ) = 1 2 M n 2 ( t ) exp ( χ n ( t ) ) , χ n ( t ) lies between log ( 1 F n P R ( t ) ) and Λ 1 ( t ) = log ( 1 F ( t ) ) . For M n ( t ) we have representation

M n ( t ) = ( Λ 1 n p ( t ) Λ 1 ( t ) ) Λ ( t ) Λ n ( t ) ( Λ n ( t ) Λ ( t ) ) Λ 1 ( t ) Λ n ( t ) + [ log ( 1 H n ( t ) ) + log ( 1 H ( t ) ) ] R n p ( t ) . (15)

Now we will show that in (15) the first summand is main term and the sum of other two terms tends (at n ) to zero. Consider first term, which can be decomposed as

( Λ 1 n p ( t ) Λ 1 ( t ) ) Λ ( t ) Λ n ( t ) = ( Λ 1 n p ( t ) Λ 1 ( t ) ) + S n ( t ) , (16)

where

S n ( t ) = ( Λ 1 n p ( t ) Λ 1 ( t ) ) ( Λ ( t ) Λ n ( t ) 1 ) .

Hence by (9) and Lemma 5 (condition (a)), we have

sup α t β | S n ( t ) | = a . s . Ο ( r ( n ) ( log n ) 3 / 2 n 1 / 2 ) . (17)

Then from (10), (16) and (17) we obtain for all t [ α , β ]

( Λ 1 n p ( t ) Λ 1 ( t ) ) Λ ( t ) Λ n ( t ) = a . s . Ω n ( t ) + Ο ( r 1 ( n ) ) .

For other two terms of (15) we have

( Λ n ( t ) Λ ( t ) ) Λ 1 ( t ) Λ n ( t ) + [ log ( 1 H n ( t ) ) + log ( 1 H ( t ) ) ] = R n p ( t ) { ( Λ n ( t ) Λ 1 ( t ) ) + [ log ( 1 H n ( t ) ) + log ( 1 H ( t ) ) ] ( Λ n ( t ) Λ ( t ) ) ( Λ 1 ( t ) Λ 1 n p ( t ) 1 ) } = R n p ( t ) [ A n ( t ) + B n ( t ) + C n ( t ) ] . (18)

Hence by (9) and (13), we obtain

sup α t β R n p ( t ) | C n ( t ) | sup α t β | C n ( t ) | sup α t β ( Λ 1 n p ( t ) ) 1 sup α t β | Λ n ( t ) Λ ( t ) | sup α t β | Λ 1 n p ( t ) Λ 1 ( t ) | = a . s . Ο ( r ( n ) ( log n ) 3 / 2 n 1 / 2 ) . (19)

Now by simple algebra and integrating by parts for Α n ( t ) and taking into account Taylor exponsion for Β n ( t ) we get chain of equalities

Α n ( t ) + Β n ( t ) = ( Λ n ( t ) Λ ( t ) ) + [ log ( 1 H n ( t ) ) + log ( 1 H ( t ) ) ] = [ t d H n ( u ) 1 H n ( u ) t d H ( u ) 1 H ( u ) ] + [ H n ( t ) H ( t ) 1 H ( t ) + 1 2 ( H n ( t ) H ( t ) ) 2 θ n 2 ( t ) ] = { [ t d H n ( u ) 1 H n ( u ) t d H n ( u ) 1 H n ( u ) ] [ t d H n ( u ) 1 H n ( u ) t d H ( u ) 1 H ( u ) ] + [ H n ( t ) H ( t ) 1 H ( t ) + 1 2 ( H n ( t ) H ( t ) ) 2 θ n 2 ( t ) ] }

= { [ t ( H n ( u ) H ( u ) ) d H ( u ) ( 1 H n ( u ) ) ( 1 H ( u ) ) + H n ( t ) H ( t ) 1 H ( t ) t ( H n ( u ) H ( u ) ) d H ( u ) ( 1 H ( u ) ) 2 + t ( H n ( u ) H ( u ) ) d ( H n ( u ) H ( u ) ) ( 1 H n ( u ) ) ( 1 H ( u ) ) D 1 n ( t ) ] + [ H n ( u ) H ( u ) 1 H ( u ) + D 2 n ( t ) ] } = t ( H n ( u ) H ( u ) ) d H ( u ) ( 1 H n ( u ) ) ( 1 H ( u ) ) + t ( H n ( u ) H ( u ) ) d H ( u ) ( 1 H ( u ) ) 2 + t ( H n ( u ) H ( u ) ) d ( H n ( u ) H ( u ) ) ( 1 H n ( u ) ) ( 1 H ( u ) ) + D 1 n ( t ) + D 2 n ( t ) , (20)

where

D 1 n ( t ) = t ( H n ( u ) H n ( u ) ) d H n ( u ) ( 1 H n ( u ) ) ( 1 H n ( u ) ) , D 2 n ( t ) = 1 2 ( H n ( t ) H ( t ) ) 2 θ n 2 ( t ) ,

θ n ( t ) [ min { H n ( t ) , H ( t ) } , max { H n ( t ) , H ( t ) } ] .

Hence, using (11) we have

sup α t β | D 1 n ( t ) | = a . s . Ο ( 1 n ) , sup α t β | D 2 n ( t ) | = a . s . Ο ( log n n ) . (21)

Consider equality

1 1 H n ( u ) = 1 1 H ( u ) + H n ( u ) H ( u ) ( 1 H n ( u ) ) ( 1 H ( u ) ) (22)

and its integral form

t ( H n ( u ) H ( u ) ) d H ( u ) ( 1 H n ( u ) ) ( 1 H ( u ) ) = t ( H n ( u ) H ( u ) ) d H ( u ) ( 1 H ( u ) ) 2 + t ( H n ( u ) H ( u ) ) 2 d H ( u ) ( 1 H ( u ) ) 2 ( 1 H n ( u ) ) . (23)

Using (22) and (23) in the third and first integrals in (20) and taking into account also (21) we obtain

Α n ( t ) + Β n ( t ) = a . s . t ( H n ( u ) H ( u ) ) 2 d H ( u ) ( 1 H ( u ) ) 2 ( 1 H n ( u ) ) + t ( H n ( u ) H ( u ) ) 2 d ( H n ( u ) H ( u ) ) ( 1 H n ( u ) ) ( 1 H ( u ) ) + t ( H n ( u ) H ( u ) ) d ( H n ( u ) H ( u ) ) ( 1 H ( u ) ) 2 + Ο ( log n n ) . (24)

Application of estimator (11) to the first and second integrals and (12) to the third integral in (24) gives that

sup α t β | t ( H n ( u ) H ( u ) ) 2 d H ( u ) ( 1 H ( u ) ) 2 ( 1 H n ( u ) ) | [ ( 1 H ( β ) ) 2 ( 1 H n ( β ) ) ] 1 [ sup < t < | H n ( u ) H ( u ) | ] 2 = a . s . Ο ( log n n ) , (25)

sup α t β | t ( H n ( u ) H ( u ) ) 2 d ( H n ( u ) H ( u ) ) ( 1 H n ( u ) ) ( 1 H ( u ) ) | 2 [ ( 1 H ( β ) ) ( 1 H n ( β ) ) ] 1 [ sup < t < | H n ( u ) H ( u ) | ] 2 = a . s . Ο ( log n n ) , (26)

sup α t β | t ( H n ( u ) H ( u ) ) d ( H n ( u ) H ( u ) ) ( 1 H ( u ) ) 2 | = a . s . Ο ( log n n ) . (27)

Thus, adding (18), (19) and (24)-(27), we derive

sup α t β R n p ( t ) | A n ( t ) + B n ( t ) + C n ( t ) | = a . s . Ο ( max { r ( n ) ( log n ) 3 / 2 n 1 / 2 log n n } )

Then, by virtue of (9) and (16), from (15) we have

sup α t β | M n ( t ) | = a . s . Ο ( max { r ( n ) log n , log n n } )

and, consequently,

sup α t β | L n ( t ) | [ sup α t β | M n ( t ) | ] 2 = a . s . Ο ( [ max { r ( n ) log n , log n n } ] 2 ) . (28)

Finally, the desired result (8) follows, from (15)-(17) and (28). The proof is completed.

Now as a consequence the strong uniform consistency of PRRP estimator can be obtained.

Theorem 2. Let the assumtions of Theorem 1 are fulfilled. Then at n there holds

sup α t β | F n P R ( t ) F ( t ) | = a . s . Ο ( max { r ( n ) log n , ( log n n ) 1 / 2 } ) . (29)

Proof of Theorem 2. Using inequality | u v | | log u log v | , 0 < u , v 1 , we have a chain of following relations:

sup α t β | F n P R ( t ) F ( t ) | sup α t β | ( 1 F n P R ( t ) ) + log ( 1 F ( t ) ) | = sup α t β | R n P ( t ) log ( 1 H n ( t ) ) + R ( t ) log ( 1 H ( t ) ) | ( 1 H ( β ) ) 1 [ sup α t β | R n P ( t ) R ( t ) | + ( 1 H ( β ) ) 1 sup α t β | H n ( t ) H ( t ) | ] = a . s . Ο ( max { r ( n ) log n , ( log n n ) 1 2 } ) ,

where the last equality is obtained by using of Lemmas 3 and 5 (candition (c)) and this completes the proof of Theorem 2.

The approximating sequence of normalized sum of random functions Ω n ( t ) in Theorem 1is the same that for presmoothed PL-estimator (6). Therefore, from theorem 3.7 in [12] follows the asymptotic normality of PRRP estimator, under taking into account the representation (8).

Theorem 3. Let the assumptions of Theorem 1 be fulfilled and

(C8) n h 2 ( n ) ( log n ) 6 , n h 8 ( n ) ( log n ) 4 0 and h 3 ( n ) ( log n ) 5 0 as n for any t [ α , β ] .

Then there hold

1) If n h 4 ( n ) 0 , then n 1 / 2 ( F n P R ( t ) F ( t ) ) d N ( 0 , σ 2 ( t ) ) ,

2) If n h 4 ( n ) C 4 , then n 1 / 2 ( F n P R ( t ) F ( t ) ) d N ( b ( t ) , σ 2 ( t ) ) ,

where

b ( t ) = C 2 ( 1 F ( t ) ) α ( t ) d ( k ) ,

d ( k ) = u 2 k 2 ( u ) d u ,

α ( t ) = t ( 1 2 p ( u ) q ( u ) + p ( u ) q ( u ) ) d u 1 H ( u ) ,

σ 2 ( t ) = ( 1 F ( t ) ) 2 γ ( t ) ,

γ ( t ) = t μ ( u ) d u , μ ( t ) = p ( t ) q ( t ) ( 1 H ( t ) ) 2 .

3. Numerical Study of Estimators

In this section, we investigate the above estimates using numerical methods. By python programming language we are preparing a high-quality sample. We select F ( t , c ) = 1 e t c , c = 1.79 ( t 0 ) and get a sample of volume n = 500 . This sample is censored from the right with r.v.-s having a d.f. G ( t ) = 1 e t ( t 0 ) . The resulting sample has a degree of censorship 47%. We will study the above estimates on the resulting sample.

The red line in the figure shows the theoretical d.f. F ( t , c ) and the green line shows the Kaplan-Meier estimate (Figure 1). One disadvantage of this estimate is that it may not matter at this endpoint.

Now we draw the evaluation graph (Figure 2) of estimator proposed by Abdushukurov (2). In the figure, the red line shows the theoretical d.f., the blue line shows Abdushukurov’s estimate. It can be seen from the graphs drawn that both estimates are very good. But in practice, it is difficult for us to see on the graph which score is better. Therefore, we study the sum i = 1 n 1 ( F n ( Z i ) F ( Z i ) ) 2 . Let’s make the appropriate tables for it.

From the table (Table 1) above, it can be concluded that the estimate (2) proposed by Abdushukurov is closer to the d.f. F ( t , c ) .

Now we draw the estimates (6) (Figure 3) and (7) (Figure 4).

As can be seen from the graph, despite the high level of censorship, both estimates are very close to the theoretical d.f. The table below shows that the price actually depends on the selected bandwith sequence.

From the table (Table 2) above, we can conclude that the F n P R -estimator is better than F n p -estimator.

Figure 1. F n P L -Estimator (Kaplan-Meier).

Table 1. Comparison of F n P L ( t ) -estimate with F n R R ( t ) -estimate.

Figure 2. F n R R -Estimator (Abdushukurov).

Figure 3. F n p -Estimator.

Figure 4. F n P R -Estimator.

Table 2. Comparison of F n p ( t ) -estimate with F n P R ( t ) -estimate.

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

References

[1] Kaplan, E.L. and Meier, P. (1958) Nonparametric Estimation from Incomplete Observation. Journal of the American Statistical Association, 53, 457-481.
https://doi.org/10.1080/01621459.1958.10501452
[2] Abdushukurov, A.A. (1998) Nonparametric Estimation of the Distribution Function Based on Relative-Risk Function. Communications in Statistics: Theory and Methods, 27, 1991-2012.
https://doi.org/10.1080/03610929808832205
[3] Abdushukurov, A.A. (1999) On Nonparametric Estimation of Reliability Indices by Censored Samples. Theory of Probability & Its Applications, 43, 3-11.
https://doi.org/10.1137/S0040585X97976702
[4] Abdushukurov, A.A. (2011) Estimation of Unknown Distributions from Incomplete Observations and their Properties. LAP Lambert Academic, Saarbrtücken. (in Russian)
[5] Abdushukurov, A.A. (1987) Nonparametric Estimation in Proportional Hazards Model of Random Censorship. VINITI 3448 (B87).
[6] Cheng, P.E. and Lin, G.D. (1987) Maximum Likelihood Estimation of Survival Function under the Koziol-Green Proportional Hazards Model. Statistics & Probability Letters, 5, 75-80.
https://doi.org/10.1016/0167-7152(87)90030-7
[7] Csörgő, S. (1988) Estimation in the Proportional Hazards Model of Random Censorship. Statistics, 19, 437-463.
https://doi.org/10.1080/02331888808802115
[8] Nadaraya, E.A. (1964) On Estimating Regression. Probability Theory and Related Fields, 61, 405-415.
[9] Watson, G.S. (1964) Smooth Regression Analysis. Sankhya: The Indian Journal of Statistics, Series A, 26, 359-372.
[10] Dikta, J. (1998) On Semiparametric Random Censorship Models. Journal of Statistical Planning and Inference, 66, 253-279.
https://doi.org/10.1016/S0378-3758(97)00091-8
[11] Cao, R., Lopez-de-Ullibarri, I., Janssen, P. and Veraverbeke, N. (2005) Presmoothed Kaplan-Meier and Nelson-Aalen Estimators. Journal of Nonparametric Statistics, 17, 31-56.
https://doi.org/10.1080/10485250410001713981
[12] Jacome, M.A. and Cao, R. (2007) Almost Sure Asymptotic Representation for the Presmoothed Distribution and Density Estimators for Censored Data. Statistics, 41, 517-534.
https://doi.org/10.1080/02331880701529522
[13] Massart, P. (1990) The Tight Constant in the Dworetzky-Kiefer-Wolfowitz Inequality. Annals of Probability, 18, 1269-1283.
https://doi.org/10.1214/aop/1176990746
[14] Burke, M.D., Csörgő, S. and Horvath, L. (1988) A Correction to and Improvement of “Strong Approximations of Some Biometric Estimates under Random Censorship”. Probability Theory and Related Fields, 79, 51-57.
https://doi.org/10.1007/BF00319103

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.