Functional Kernel Estimation of the Conditional Extreme Quantile under Random Right Censoring

Justin Ushize Rutikanga; Aliou Diop

doi:10.4236/ojs.2021.111009

Open Journal of Statistics > Vol.11 No.1, February 2021

Functional Kernel Estimation of the Conditional Extreme Quantile under Random Right Censoring

Justin Ushize Rutikanga¹, Aliou Diop^2*
¹Institut de Mathématiques et de Sciences Physiques (IMSP-UAC), Porto-Novo, Bénin.
²LERSTAD, Université Gaston Berger, Saint Louis, Sénégal.
DOI: 10.4236/ojs.2021.111009 PDF HTML XML 387 Downloads 1,131 Views Citations

Abstract

The study of estimation of conditional extreme quantile in incomplete data frameworks is of growing interest. Specially, the estimation of the extreme value index in a censorship framework has been the purpose of many investigations when finite dimension covariate information has been considered. In this paper, the estimation of the conditional extreme quantile of a heavy-tailed distribution is discussed when some functional random covariate (i.e. valued in some infinite-dimensional space) information is available and the scalar response variable is right-censored. A Weissman-type estimator of conditional extreme quantiles is proposed and its asymptotic normality is established under mild assumptions. A simulation study is conducted to assess the finite-sample behavior of the proposed estimator and a comparison with two simple estimations strategies is provided.

Keywords

Kernel Estimator, Functional Data, Censored Data, Conditional Extreme Quantile, Heavy-Tailed Distributions

Share and Cite:

Rutikanga, J. and Diop, A. (2021) Functional Kernel Estimation of the Conditional Extreme Quantile under Random Right Censoring. Open Journal of Statistics, 11, 162-177. doi: 10.4236/ojs.2021.111009.

1. Introduction

Estimation of extreme quantile is one of the most important keys in many studies of rare events that happen occasionally but have a big impact on the behaviors of distribution of these rare events. The useful material for modeling those types of extreme events are provided by Extreme Value Theory (EVT), such as estimation of tailed index and associated extreme quantile. The study of extreme events is taking attention in numerous fields of applied statistics for example in hydrology where people are interested for example to estimate a maximum level reached by seawater along a coast over a given period or estimation of conditional quantile of rainfall for given region, see [1]. While in medicine the conditional quantile has been used to determine the probability of a patient with AIDS to survive within different age group, for more details see [2].

The main purpose of studying the problem of rare events is not in the estimation of “central” parameters of the random variable such as mean, mode and median fortunately researchers are interested on the understanding the properties or behaviors on its right tails. One of the known famous results in extreme value theory is the Fisher-Tippett-Gnedenko Theorem [3] [4]. Let $Y_{1}, \dots, Y_{n}$ be independent and identically distributed random variables with distribution function F. Suppose there exists a sequence of constants ${(a_{n})}_{n \geq 1} > 0$ , and ${(b_{n})}_{n \geq 1}$ real and a non-degenerate distribution $H$ such that for all $x \in ℝ$ ,

$\lim_{n \to \infty} ℙ (\frac{\max (Y_{1}, \dots, Y_{n}) - b_{n}}{a_{n}} \leq x) = \lim_{n \to \infty} F^{n} (a_{n} x + b_{n}) = H (x),$

then $H$ belongs to the type¹ of one of the following three distribution functions:

$Φ_{α} (x) = {\begin{array}{l} 0 & if x \leq 0 \\ \exp (- x^{- α}) & if x > 0, α > 0 \end{array}$ (Fréchet)

$Ψ_{α} (x) = {\begin{array}{l} 1 & if x \geq 0 \\ \exp (- {(- x)}^{α}) & if x < 0, α > 0 \end{array}$ (Weibull)

$Λ (x) = \exp (- \exp (- x))$ for all $x \in ℝ$ (Gumbel)

The three above distribution functions $Λ, Ψ_{α}$ and $Φ_{α}$ are the only possible limit laws of the normalized maximum of a sample of independent and identically distributed random variables. They are referred to as the Extreme Value Distribution (EVD). A parametrization of these three distributions into a single formula called Generalized Extreme Value Distribution (GEV) is given by:

$H_{γ} (x) = {\begin{array}{l} \exp (- {(1 + γ x)}^{- \frac{1}{γ}}) & for all x such that 1 + γ x > 0, if γ \neq 0 \\ \exp (- \exp (- x)) & for all x \in ℝ, if γ = 0. \end{array}$

The parameter $γ$ so-called the extreme-value index or the tail-index completely characterizes the behaviour of the tail of the distribution F. Its sign also determines the notion of domain of attraction.

The estimation of $γ$ is a cornerstone when we deal with various problem in extreme value analysis such as estimation of conditional extreme quantile of random variable in presence of covariate. When some covariate information X is available and the distribution of Y depends on X, the problem is to estimate the conditional extreme-value index and conditional extreme quantiles.

Then, in this paper, we consider the situation where some covariate information X is available to the investigator, and the distribution of Y depends on X. We focus on the problem of estimating a conditional extreme-quantile of a heavy-tailed distribution when some functional covariate information $X \in E$ is available, where $E$ is an infinite dimensional space associated with a semi-metric $d (\cdot, \cdot)$ .

In the literature, many studies have conducted a research on estimating the conditional extreme quantiles of a random variables Y. Daouia et al. [5] introduced the kernel-type estimation of the extreme conditional quantile $q (α_{n} | x)$ from heavy-tailed distribution which belongs to Fréchet maximum domain of attraction, while [6] proposed new estimation procedure for estimating the conditional survival function $\bar{F} (y | x) : = 1 - F (y | x)$ by considering different double kernel estimator. They proposed a Weissman-type estimator to estimate the conditional extreme quantile.

In normal case, it may happen that we observe the incomplete information for the variable of interest. In classical applications such as the analysis of lifetime data (survival analysis, reliability theory, insurance), a typical feature which appears is censorship. For example, in medical follow up, the response variable Y represents the time elapsed from the entry of a patient in, say, a follow-up study until death. If, at the time that the data collection is performed, the patient is still alive or has withdrawn from the study for some reason, the variable of interest Y will not be available. Many authors have addressed this issue among them [7] [8] for more details.

Recently, many authors have been interested in the estimation of the extreme value index and extreme quantile we can enumerate few of them such as [9] [10] [11] [12] have considered the cases of the estimation of extreme value index and extreme quantile from censored data when the covariate information are not available. In [9] the authors proposed to estimate extreme value index by using the modification of Hill’s estimator version. In [13] [14] [15] authors proposed the Bayesian extreme value index and extreme quantile for the case of uncensored data. [16] [17] [18] investigated the estimation of extreme value index and extreme quantile where there is not covariate information and censored data are taking into consideration. [8] investigate estimation of the conditional extreme value-index and conditional extreme quantile under randomly right censored with presence of covariate for finite dimension.

However, based on our knowledge, estimation of the conditional extreme quantile of a heavy-tailed distribution under random right censoring data and functional covariate has not yet been addressed, which motivated us to tackle this issue by taking into consideration the heavy tail distribution and the functional covariate under random right censored data. In our methodology, we consider the Kernel conditional Kaplan-Meier estimator of the conditional survival function and the functional covariate (infinite dimension) is present. Then we construct Weissman-type estimators of the conditional extreme quantile $F^{\leftarrow} (1 - α | x)$ under censoring and we establish their asymptotic normality. Finally, the finite-sample performance of these estimators are assessed via simulations and compared with several alternative estimators.

The remainder of this paper is organized as follows. Section 2 consists of introduction of notations and describes the framework of the study. The construction of our estimator of functional conditional extreme quantile is summarized and the asymptotic normality of the proposed estimator is established in Section 3 and some proofs are given in Section 4. In Section 5, we assess via simulations the finite sample behavior of our estimator. The conclusion and some perspectives are presented in Section 6.

2. Framework

In this section, we are interested to describe the behaviors of the nonparametric estimator of the conditional quantile using the Kaplan-Meier estimator with covariate as functional random variable (infinite dimension) when the censored data are available, then for more details we can see [19].

Let $(X_{i}, Y_{i}), i = 1, \dots, n$ be the independent copies of the random pairs $(X, Y)$ , where Y is positive real random variable and X be a functional random variable, $X \in E$ is an infinite dimensional space associated to semi-metric $d (\cdot, \cdot)$ . We assume that the random variable Y can be a randomly right censored by a positive random variable C. Therefore, we now observe triple of the independent $(X_{i}, δ_{i}, Z_{i})$ , where $Z_{i} = \min (Y_{i}, C_{i})$ and $δ_{i} = 1_{{Y_{i} \leq C_{i}}}$ for $i = 1 \dots, n$ where $1_{{A}}$ is the indicator function of the event A. Regarding that the random variable C is defined on the some probability space $(Ω, ℂ, ℙ)$ as Y. We assume that Y andC are independent given $X = x$ , where $C_{1}, \dots, C_{n}$ are independent each other.

Let $F (\cdot | x)$ and $G (\cdot | x)$ be the conditional cumulative distribution function of random variable Y and C given $X = x$ respectively.

Let $\bar{F} (\cdot | x) = 1 - F (\cdot | x)$ and $\bar{G} (\cdot | x) = 1 - G (\cdot | x)$ be the conditional survival function of random variable Y and C given $X = x$ respectively. Since, here we are dealing with the case of heavy tails therefore, we assume that the following condition to be satisfied

(A1)

$\bar{F} (t | x) = r_{1} (x) \exp {- \int_{1}^{t} (\frac{1}{γ_{1} (x)} - ε_{1} (μ | x)) \frac{d μ}{μ}}$ (1)

and

$\bar{G} (t | x) = r_{2} (x) \exp {- \int_{1}^{t} (\frac{1}{γ_{2} (x)} - ε_{2} (μ | x)) \frac{d μ}{μ}}$ (2)

where $γ_{1} (x), γ_{2} (x)$ are positive unknown functions of the covariate x, $r_{1}, r_{2}$ are positive functions and $| ε_{1} (μ | x) |, | ε_{2} (μ | x) |$ are continuous and ultimately decreasing to zero. From Equations (1) and (2), we can state that the conditional distribution functions of Y and C given $X = x$ are in Fréchet maximal domain of attraction. Thus, $γ_{1} (x)$ and $γ_{2} (x)$ are taken as the conditional extreme tail index functions. Therefore, for all $t > 0$ , $\bar{F} (\cdot | x)$ and $\bar{G} (\cdot | x)$ are regularly

varying functions at infinity with index $- \frac{1}{γ_{1} (x)}$ and $- \frac{1}{γ_{2} (x)}$ respectively. Thus,

$\bar{F} (u | x) = u^{- \frac{1}{γ_{1} (x)}} L_{1} (u | x) and \bar{G} (u | x) = u^{- \frac{1}{γ_{2} (x)}} L_{2} (u | x)$

where for x fixed, $L_{1} (.| x)$ and $L_{2} (.| x)$ are slowly varying functions at infinity, that is, for all $λ > 0$ ,

$\lim_{y \to \infty} \frac{L_{i} (λ u | x)}{L_{i} (u | x)} = 1, i = 1, 2.$

By condition of independence between Y and C, the conditional survival function $\bar{H} (\cdot | x)$ of Z given $X = x$ is also a regularly varying function at infinity with index $- \frac{1}{γ (x)}$ as expressed as follow:

$\begin{matrix} \bar{H} (z | x) = 1 - H (z | x) = \bar{F} (z | x) \bar{G} (z | x) \\ = r (x) \exp {- \int_{1}^{z} (\frac{1}{γ (x)} - (ε (μ | x))) \frac{d μ}{μ}} \end{matrix}$

with $γ (x) = γ_{1} (x) p (x)$ where $p (x) = \frac{γ_{2} (x)}{γ_{1} (x) + γ_{2} (x)}$ is the ultimate proportion

of uncensored observations among $Z_{i}, i = 1, \dots, n$ ; the proof of this statement is out of scope of presented paper (see [2] [10] for more details) and $r (x) = r_{1} (x) r_{2} (x)$ , $ε (μ | x) = ε_{1} (μ | x) + ε_{2} (μ | x)$ . In the sequel, we further assume that $L_{i} (u | x), i = 1, 2$ belong to the Hall class of slowly-varying functions.

Normally, let $(X, Y)$ be $E \times ℝ$ valued random element where, $E$ is a semi-metric space. Let $d (\cdot, \cdot)$ be semi-metric correspondent with the space $E$ and suppose that now we observe a sequence of $(X_{i}, Y_{i}), i \geq 1$ a copies of $(X, Y)$ .

In this paper, we are interested on the problem of the estimation of conditional extreme quantile $q (α_{n} | x)$ of order $(1 - α_{n})$ of the conditional survival distribution function $\bar{F} (\cdot | x)$ of Y given $X = x$ .

$\bar{F} (q (α_{n} | x) | x) = α_{n}, α_{n} \to 0, n \to \infty .$

By considering the random right censored model $(Z, δ)$ , Z is random variable and $δ$ is indicator of censoring, then $δ$ equal to one if $Y \leq C$ and zero otherwise, therefore we say that Y is right censored by C. Hence, conditional cumulative Hazard ratio is given by:

$Λ (y_{n} | x_{i}) = - \log (1 - F (y_{n} | x_{i})) = \int_{0}^{y_{n}} \frac{d F (s | x_{i})}{1 - F (s | x_{i})} = \int_{0}^{y_{n}} \frac{d H_{1} (s | x_{i})}{1 - H (s | x_{i})} .$

Therefore, the estimator ${\hat{Λ}}_{n} (y_{n} | x_{i})$ of $Λ (y_{n} | x_{i})$ is given by

(3)

where $H_{n} (x) = B_{n i} (x, h) 1_{{Z_{i} > y_{n}}}$ and $H_{n 1} (x) = B_{n i} (x, h) 1_{{Z_{i} > y_{n}, δ_{i} = 1}}$ with $δ_{i}$ is an indicator function of $Y_{i}$ associated to $Z_{i}$ and $B_{n i} (x, h)$ is the Nadaraya-Watson weighted [20] [21] expressed as follows:

$B_{n i} (x) = K (d (x, X_{i}) / h) / \sum_{j =1}^{n} K (d (x, X_{j}) / h),$

where K is a kernel density and h is a bandwidth parameter such that $h \to 0$ as $n \to \infty$ .

From the Equation (3), estimator of survival function may be expressed as

$\begin{matrix} {\hat{\bar{F}}}_{n} (y_{n} | x_{i}) = 1 - {\hat{F}}_{n} (y_{n} | x_{i}) = \exp (- {\hat{Λ}}_{n} (y_{n} | x_{i})) \\ = \exp (- \sum_{i = 1}^{n} \frac{B_{n i} (x, h) 1_{{Z_{i} > y_{n}, δ_{i} = 1}}}{1 - \sum_{j = 1}^{n} B_{n j} (x, h) 1_{{Z_{j} \leq Z_{i}}}}) \\ = \prod_{i = 1}^{n} \exp (- \frac{B_{n i} (x, h) 1_{{Z_{i} > y_{n}, δ_{i} = 1}}}{1 - \sum_{j = 1}^{n} B_{n j} (x, h) 1_{{Z_{j} \leq Z_{i}}}}) . \end{matrix}$

By applying Taylor expansion of $\exp (- y)$ around $y = 0$ where $\exp (- y) \approx 1 - y$ , we obtained

${\hat{\bar{F}}}_{n} (y_{n} | x_{i}) = \prod_{i = 1}^{n} (1 - \frac{B_{n i} (x, h) 1_{{Z_{i} > y_{n}, δ_{i} = 1}}}{1 - \sum_{j = 1}^{n} B_{n j} (x, h) 1_{{Z_{j} \leq Z_{i}}}}) .$

We denote the conditional moderated quantile, of the order $α_{n} \to 0$ as $n \to \infty$ of random variable Y given $X = x$ , by

$q (α_{n} | x) = {\bar{F}}^{\leftarrow} (α_{n} | x) = \inf {y_{n} : \bar{F} (y_{n} | x) \leq α_{n}}$

Therefore, a natural estimator of $q (α_{n} | x)$ is given by

${\hat{q}}_{n} (α_{n} | x) = {\hat{\bar{F}}}_{n}^{\leftarrow} (α_{n} | x) = \inf {y_{n} : {\hat{\bar{F}}}_{n} (y_{n} | x) \leq α_{n}} .$

Let us denote that $B (t, r)$ be a ball centered at point t and the radius r for $t \in E$ and defined by

$B (t, r) = {x \in E, d (x, t) \leq r}$

and let h be a positive sequence tending to zero as $n \to \infty$ . The proposed method for moving windows adopted in [22] shows that the response of variable Y_i’s correspond to the covariates x_i’s belongs in ball $B (t, h)$ , therefore such proposition is given by

For $x \in E$ , we denote the conditional probability distribution function of Y given $X = x$ by

$\forall y \in ℝ, F (y | x) = P (Y \leq y | X = x) .$

By assuming that $B (x, h)$ be the ball of center x and the radius h, at the end $φ_{x} (h)$ can be rewritten as $φ_{x} (h) = P (X \in B (x, h))$ a small ball probability of X.

3. Estimation of Conditional Extreme Quantile

We now investigate the estimation of large conditional quantile $q (α_{n} | x)$ of order $1 - α_{n}$ of $F (\cdot | x)$ for a variable Y given $X = x$ defined by $1 - F (q (α_{n} | x) | x) = α_{n}$ with $α_{n} \to 0$ as $n \to \infty$ . To define our estimator, we have in the first step to define ${\hat{q}}_{n}^{c} (α_{n} | x)$ the functional estimator of a large conditional quantile $q (α_{n} | x)$ within the sample.

Let us consider the Kernel conditional Kaplan-Meier estimator of the conditional survival function $1 - F (\cdot | x)$ , for all $x \in E$ and $y_{n} \in (0, \infty)$ defined as follows:

${\hat{\bar{F}}}_{n} (y_{n} | x) = \prod_{i = 1}^{n} (1 - \frac{B_{n i} (x, h) 1_{{Z_{i} > y_{n}, δ_{i} = 1}}}{1 - \sum_{j = 1}^{n} B_{n j} (x, h) 1_{{Z_{j} \leq Z_{i}}}}) .$

This function may be rewritten as

(4)

and zero otherwise where $Z_{(1)} \leq \dots \leq Z_{(n)}$ denoted the order statistics of $Z_{1}, \dots, Z_{n}$ .

By taking into account the estimator in Equation (4), we propose to estimate conditional quantile $q (α_{n} | x)$ within the sample of observation (i.e. for fixed $α_{n} \in (0,1)$ ) as a generalized inverse of $\hat{\bar{F}} (\cdot | x)$ as

${\hat{q}}_{n}^{c} (α_{n} | x) = {\hat{\bar{F}}}_{n}^{\leftarrow} (α_{n} | x) = \inf {u : {\hat{\bar{F}}}_{n} (u | x) \leq α_{n}},$

where $α_{n} \to 0$ as $n \to \infty$ , we propose to estimate the conditional extreme quantile $q (α_{n} | x)$ by Weissman-type estimator

${\hat{q}}_{n}^{c, W} (α_{n} | x) = {\hat{q}}_{n}^{c} ({\hat{\bar{F}}}_{n} (Z_{(n - k)} | x)) {(\frac{{\hat{\bar{F}}}_{n} (Z_{(n - k)} | x)}{α_{n}})}^{{\hat{γ}}_{n}^{c, H} (x)} .$ (5)

The term ${(\frac{{\hat{\bar{F}}}_{n} (Z_{(n - k)} | x)}{α_{n}})}^{{\hat{γ}}_{n}^{c, H} (x)}$ is an extrapolation factor allowing to estimate

arbitrary large quantiles and ${\hat{γ}}_{n}^{c, H} (x)$ is the estimator of the censored functional conditional extreme value index $γ_{1} (x)$ .

Some regularity conditions are needed for proving our results (these conditions are adapted from [5] and [23] ). We also require some Lipschitz conditions to be fulfilled.

(A2) K is a function with support $[0,1]$ and there exist $0 < c_{1} < c_{2} < \infty$ such that $c_{1} \leq K (t) \leq c_{2}$ for all $t \in [0,1]$ .

(A3) Let consider $α \in (0,1)$ and let a fixed $x \in E$ , the conditional quantile function $α \in (0,1) \to q (α | x) \in (0, + \infty)$ is differentiable and the function defined by $α \in (0,1) \to Δ (α | x) = γ_{1} (x) + α \frac{\partial (\log q (α | x))}{\partial α}$ is continuous and such that $\lim_{α \to 0} Δ (α | x) = 0$ .

The behavior of the log quantile function with respect to the first derivative is controlled under the hypothesis (A3) which is a necessary and sufficient condition to obtain the heavy-tail property.

The largest oscillation of the log-quantile function with respect to its second variable is defined for all $a \in (0, 1 / 2)$ as

$ω_{n} (a) = \sup {| \frac{q (α_{n} | x)}{q (α_{n} | x^{'})} |, α_{n} \in (a,1 - a), (x, x^{'}) \in B {(t, h)}^{2}} .$

Theorem 1. Assume that (A1)-(A3) hold, let $x \in E$ and consider $β_{n} = {\hat{\bar{F}}}_{n} (Z_{(n - k)} | x)$ and $α_{n}$ be sequence such that $β_{n} / α_{n} \to 0$ and $σ_{n} (x) \to 0$ , $σ_{n}^{- 1} (x) ε (q (β_{n} | x) | x) \to 0$ as $n \to \infty$ . Consider ${\hat{γ}}_{n}^{c, H} (x)$ such that $σ_{n}^{- 1} (x) ({\hat{γ}}_{n}^{c, H} (x) - γ_{1} (x)) \to N (0, A V (x))$ with $A V (x) \geq 0$ and

$σ_{n} (x) = {(n \bar{H} (y_{n} | x) \frac{{(μ_{x}^{(1)} (h))}^{2}}{μ_{x}^{(2)} (h)})}^{- 1 / 2}$ .

Let $ζ_{n} (x) = {(n φ_{x} (h) β_{n})}^{1 / 2} \log (β_{n} / α_{n})$ and $σ_{n}^{- 1} (x) ζ_{n}^{- 1} (x) \to 0$ as $n \to \infty$ , then

$\frac{σ_{n}^{- 1} (x)}{\log (β_{n} / α_{n})} \log (\frac{{\hat{q}}_{n}^{c, W} (α_{n} | x)}{q (α_{n} | x)}) \to N (0, \frac{γ_{1}^{3} (x)}{γ (x)}) .$

4. Proofs

4.1. Preliminary Results

Lemma 2. Let $T_{1 n}$ and $T_{2 n}$ be two sequence of random variables. Suppose there exists an event $B_{n}$ such that $(T_{1 n} | B_{n}) =^{d} (T_{2 n} | B_{n})$ with $ℙ (B_{n}) \to 1$ , then $T_{1 n} \overset{d}{\to} T_{1}$ implies $T_{2 n} \overset{d}{\to} T_{1}$ .

Proof of Lemma 2: See [1].

Lemma 3. Suppose (A1) and (A2) holds. Let consider $0 \leq β_{n} \leq α_{n}$ , such that $α_{n} \to 0$ as $n \to \infty$ , then

$| \log \frac{q (α_{n} | x)}{q (β_{n} | x)} + γ_{1} (x) \log (\frac{α_{n}}{β_{n}}) | = O (\log (\frac{α_{n}}{β_{n}}) ε (q (α_{n} | x) | x))$

Proof of Lemma 3: See [5].

Proposition 4. Let $m_{x} = n φ_{x} (h)$ the nonrandom number of observations in the slice $(0, \infty) \times B (x, h)$ . Let $α_{n}$ be a sequence satisfying $α_{n} \to 0$ and $m_{x} α_{n} \to \infty$ . if ${(m_{x} α_{n})}^{2} ω (m_{x}^{- (1 + ε)}) \to 0$ as $n \to 0$ , for some $ε > 0$ . Then,

${(m_{x} α_{n})}^{1 / 2} (\frac{{\hat{q}}^{c} (α_{n} | x)}{q (α_{n} | x)} - 1) \overset{d}{\to} N (0, γ_{1}^{2} (x)) .$

Proof of Proposition 4

The proof is similar to proof of Theorem 1 in [24] and is therefore omitted.

4.2. Main Result

Proof of Theorem 1

Let $α_{n} > β_{n}$ , then the conditional estimation of extreme quantile defined as

$\log ({\hat{q}}_{n}^{c, W} (α_{n} | x)) = \log {\hat{q}}_{n}^{c} (β_{n} | x) + {\hat{γ}}_{n}^{c, H} (x) \log (β_{n} / α_{n})$

then,

$\begin{array}{l} \log (\frac{{\hat{q}}_{n}^{c, W} (α_{n} | x)}{q (α_{n} | x)}) = \log {\hat{q}}_{n}^{c} (β_{n} | x) + {\hat{γ}}_{n}^{c, H} (x) \log (β_{n} / α_{n}) - \log q (α_{n} | x) \\ = \log {\hat{q}}_{n}^{c} (β_{n} | x) - \log q (β_{n} | x) + ({\hat{γ}}_{n}^{c, H} (x) - γ_{1} (x)) \log (β_{n} / α_{n}) \\ - \log q (α_{n} | x) + \log q (β_{n} | x) + γ_{1} (x) \log (β_{n} / α_{n}) \\ = \log (\frac{{\hat{q}}_{n}^{c} (β_{n} | x)}{q (β_{n} | x)}) + ({\hat{γ}}_{n}^{c, H} (x) - γ_{1} (x)) \log (β_{n} / α_{n}) \\ + \log (\frac{q (β_{n} | x)}{q (α_{n} | x)}) + γ_{1} (x) \log (β_{n} / α_{n}) . \end{array}$

Therefore

$\begin{array}{l} \frac{σ_{n}^{- 1} (x)}{\log (β_{n} / α_{n})} \log (\frac{{\hat{q}}_{n}^{c, W} (α_{n} | x)}{q (α_{n} | x)}) \\ = \frac{σ_{n}^{- 1} (x)}{\log (β_{n} / α_{n})} \log (\frac{{\hat{q}}_{n}^{c} (β_{n} | x)}{q (β_{n} | x)}) + σ_{n}^{- 1} (x) ({\hat{γ}}_{n}^{c, H} (x) - γ_{1} (x)) \\ + \frac{σ_{n}^{- 1} (x)}{\log (β_{n} / α_{n})} [\log (\frac{q (β_{n} | x)}{q (α_{n} | x)}) + γ_{1} (x) \log (β_{n} / α_{n})] \\ = A_{1, n} + A_{2, n} + A_{3, n} \end{array}$

with,

$A_{1, n} = \frac{σ_{n}^{- 1} (x)}{\log (β_{n} / α_{n})} \log (\frac{{\hat{q}}_{n}^{c} (β_{n} | x)}{q (β_{n} | x)})$

$A_{2, n} = σ_{n}^{- 1} (x) ({\hat{γ}}_{n}^{c, H} (x) - γ_{1} ( x ))$

$A_{3, n} = \frac{σ_{n}^{- 1} (x)}{\log (β_{n} / α_{n})} [\log (\frac{q (β_{n} | x)}{q (α_{n} | x)}) + γ_{1} (x) \log (β_{n} / α_{n})]$

Under the assumption in Theorem 1 and applying the result of Proposition 4

${(n φ_{x} (h) β_{n})}^{1 / 2} [\log (\frac{{\hat{q}}_{n}^{c} (β_{n} | x)}{q (β_{n} | x)})] \to O_{p} (1) .$ (6)

By using some notation, we see that

$A_{1, n} = σ_{n}^{- 1} (x) ζ_{n}^{- 1} {(n φ_{x} (h) β_{n})}^{1 / 2} [\log (\frac{{\hat{q}}_{n}^{c} (β_{n} | x)}{q (β_{n} | x)})] .$

Using the expression in Equation (6) and the hypothesis of Theorem 1 leads to $A_{1, n} \to 0$ in probability as n goes to infinity. According to the assumption of the Theorem 1, $A_{2, n}$ converges in distribution to a centered Gaussian distribution with a covariate matrix AV (see [25] ). Finally, under Lemma 2(i) in [6], Lemma 3 and the assumption of Theorem 1 we have $A_{3, n} = σ_{n}^{- 1} (x) O (ε (q (β_{n} | x) | x))$ which goes to zero. This concludes the proof.

5. Simulation Studies

5.1. Simulation Design

In this part of simulation, the main purpose is to assess the performance of the proposed estimators. We will make a comparison of the results with two simple estimation approaches based on tail index of heavy tailed distribution under right random censored. By assuming that the theoretical distribution of Y given X and C given X are known. We consider the simulation of $N = 500$ replications of a sample size $n = 200, n = 500$ of random triple observation of ${(X_{i}, Z_{i}, δ_{i}), i = 1, \dots, n}$ from $(X, Z, δ)$ to construct the estimates.

Where the curve $X_{i}$ is given by the following expression of a functional covariate $X \in E$ which is defined by

$X (t) = Ω (2 - \cos (π W t)) + (1 - Ω) \cos (π W t)$

for all $t \in [0,1]$ with W is normally distributed on $[0,1]$ and $Ω$ in a random variable which follows a Bernoulli distribution with a probability equal to half as adapted in [26]. Figure 1 shows a sample of 300 curves representing a realisation of the functional random variable X.

The conditional distribution of Y given $X = x$ is a Burr distribution with parameter $τ (x) = 2, λ (x) = 2 / (8 {‖ X ‖}_{2}^{2} - 3)$ , which implies that $γ_{1} (x) = \frac{1}{τ (x) λ (x)}$ , with

Figure 1. A sample of curves $X_{i} (t), t \in [0, 1], i = 1, \dots, 300$ .

${‖ X ‖}_{2}^{2} = \int_{0}^{1} X^{2} (t) d t = 4 Ω^{2} - 4 Ω (2 Ω - 1) \frac{\sin (π W)}{π W} + {(2 Ω - 1)}^{2} [\frac{1}{2} + \frac{\sin (2 π W)}{4 π W}] .$

The conditional distribution of C given $X = x$ is also Burr distribution where the parameter $γ_{2} (x) = \frac{1}{τ (x) λ_{2} (x)}$ is chosen to yield various values for the overall censoring percentage c $(c = 10 %, 20 %, 30 %, 40 %)$ . Since

$γ (x) = γ_{1} (x) p (x)$ with $p (x) = \frac{γ_{2} (x)}{γ_{1} (x) + γ_{2} (x)} = \frac{λ_{1} (x)}{λ_{1} (x) + λ_{2} (x)}$ is the ultimate

proportion of uncensored observations among $Z_{i}$ for $i = 1, \dots, n$ then $γ_{1} (x)$ is selected, we choose $γ_{2} (x)$ such that $1 - p (x)$ is approximately to $(10%,20%,30%,40%)$ as censoring percentage.

In practice, there are some parameters to be fixed as kernel density K be an asymmetric linear kernel defined as $K (u) = (1.9 - 1.8 u) 1_{[0,1]}$ , the estimator ${\hat{q}}_{n}^{c, W} (x)$ dependents on parameters $h_{k} = h$ . The bandwidth parameter h is chosen using the cross-validation method which was implemented in [6].

$h^{o p t} = \arg \min \sum_{i = 1}^{n} \sum_{j = 1}^{n} {(1_{{Z_{i} > Z_{j}}} - {\hat{\bar{F}}}_{n - i} (Z_{j} | x_{i}))}^{2},$

with ${\hat{\bar{F}}}_{n - i}$ is the kernel conditional Kaplan-Meier estimator presented in Equation (4) adopted in [8], which depends on parameter h The aforementioned estimator is calculated on the sample $(X_{j}, δ_{j}, Z_{j}), j = 1, \dots, n$ and $i \neq j$ . Here we are considering that h belongs to a regular grid $H = {h_{1} \leq h_{2} \leq \dots \leq h_{S}}$ where $h_{1} = 1 / 100$ and $h_{S} = 0.2$ with $S = 20$ .

In case the bandwidth is already been selected then, next step is to determine the number of threshold excesses k. Different methods have been mentioned in literature and in this paper we adopted the method used by [8] as described as follow:

We started by creating the successive block of elements of the estimate in Equation (5) with $y_{k}$ , for $k = 1, \dots, n - 1$ , such that for each block has size $⌊ \sqrt{k_{\max}} ⌋$ . Finally, we compute the standard deviation for the estimates in each block, the median of the estimates for a minimum standard deviation is the one will be taken as an optimal k.

Other thing to discuss is the selection of semi-metric distance because a semi-metric appears to be an important key for behavior of nonparametric statistics for functional data for more details can see [27]. Since the curves of $X (t)$ are smoothing curves according to [6], the semi-metric distance based on the derivative will be used to determine the distance between two curves which is defined as follows:

$d_{s m} (X_{1}, X_{2}) = \sqrt{\int_{T} {(X_{1}^{q} (t) - X_{2}^{q} (t))}^{2} d t},$ (7)

where q is the degree of derivative.

5.2. Estimation of Conditional Quantile

In order to check the finite sample performance of the extreme conditional quantile estimator in Equation (5), we have performed some simulation experiments, which are thoroughly described in the Section 5.1. Furthermore, to evaluate the impact of the order of derivative for the choice of semi-metric as [27] advice for the practical cases, we have calculated the extreme conditional quantile at different levels of derivative as presented in Table 1. Secondly, we examine the behavior of our estimator according to both of censored rate and the sample size. Finally, the accuracy of our estimators is measured by the Mean Square Error (MSE) and Mean Absolute Error (MAE) for each scenario with 500 replications as illustrated in Table 1.

To assess the performance of our estimator, we make a comparison with two simple estimation strategies. The first one is a complete-case procedure (“CC” for short): we remove all censored observations from the simulated samples. Then, we compute the tail index estimator proposed in [23] where the estimator does not take into consideration the censorship. While, the second strategy is the ignored case, where, we consider that $δ_{i}$ for $i = 1, \dots, n$ equally to one for all observations. We consider the observations $Z_{i}, i = 1, \dots, n$ as if they were uncensored. That kind of strategy is called Ignored case (“CI” for Censoring-Ignored).

Now to illustrate the asymptotic normality result for our estimators, we use the Kolmogorov-Smirnov test to examine the asymptotic normality of the estimator as presented in Table 2.

The P-values of the Kolmogorov-Smirnov test are greater than 0.05 as illustrated in Table 2. Undoubtedly, the simulation results reveal that the behavior of the asymptotic normality is closely linked to the censored rate, degree of derivative and sample size. Thus, the present estimators are normally asymptotically distributed as the results in Table 2 confirmed our theoretical results.

5.3. Discussion

The performance of our estimator ${\hat{q}}_{n}^{c, W} (α_{n} | x)$ defined in (5) is evaluated using Mean Squared Error (MSE) and Mean Absolute Error (MAE). We also provide the averaged value (over the N samples) of the number of threshold excesses $k^{*}$ . The accuracy of our estimator depends on the censoring percentage and on the degree of derivative of the semi-metric $d_{s m} (\cdot, \cdot)$ defined in Equation (7).

To demonstrate the accurate of the proposed estimator, we provide the comparison for complete case and ignored case as described in Section 5.2.

The proposed estimator of conditional extreme quantiles shows to be quite well performance at low rate of censored as the sample size becomes large enough as is illustrated in Table 1, where different value of empirical Mean Square Error (MSE) and empirical Mean Absolute Error (MAE) of estimators respectively are presented at different sample size with respect to various censored rate and degree of derivative respectively. Our estimator performs well at high level of derivative of semi-metric distance.

According to choice of the semi-metric distance, our simulation results shows

Table 1. Table of MSE and MAE of the estimators value with sample size $n = 200$ and $n = 500$ for $N = 500$ replications, $α_{n} = 1 / 1000$ and $Ω = 1$ .

$k^{⋆}$ is the average of threshold excesses.

Table 2. Table of Kolmogorov-Smirnov P-value for distribution of Weissman quantile estimator at 10%, 20%, 30% and 40% censored rate for sample size 200 and 500 at 500 replications.

that the degree of derivative play a key role, since the functional curves are smooth, where the semi-metric distance with high degree of derivative is well perform compared to low derivative degree as is illustrated in Table 1 and interested reader can see [27].

Considering the results in Table 1, CI and CC estimators of $q (α_{n} | x)$ are quietly biased, even though when censoring is moderate. As result, our estimator in Equation (5) proved a significant result regarding the issues of estimating the functional conditional extreme quantile under censorship.

The Kolmogorov-Smirnov test has been performed to check the asymptotic normality of our proposed estimator, according to the results in Table 2. There is no doubt that the proposed estimators are asymptotically normal since all P-value are greater than 0.05 as the level of significance which confirmed the theoretical results.

6. Conclusions and Perspectives

We considered the estimation of the functional Weissman kernel type estimator when some functional random covariate (i.e. valued in some infinite-dimensional space) information is available and the scalar response variable is right-censored. Its asymptotic properties were established and its finite sample performance was illustrated in a simulation study. Also a comparison with two simple estimation strategies has been provided.

In future, work will be focused on the estimation of the conditional extreme value of Weibull distribution under random right censored in case the covariate is functional random variable, and established its asymptotic behavior.

Acknowledgements

The authors acknowledge an anonymous Associate Editor and an anonymous reviewer for their helpful comments that led to an improved version of this paper.

NOTES

¹Two non-degenerated distribution functions I and J are of same type if and only if there exist $a > 0$ and $b \in ℝ$ such that $I (a x + b) = J (x)$ for all $x \in ℝ$ .

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1]	Gardes, L. and Girard, S. (2010) Conditional Extremes from Heavy-Tailed Distributions: An Application to the Estimation of Extreme Rainfall Return Levels. Extremes, 13, 177-204. https://doi.org/10.1007/s10687-010-0100-z
[2]	Ndao, P. (2015) Modélisation de valeurs extrêmes conditionnelles en présence de censure. PhD Thesis, Université Gaston Berger de Saint-Louis.
[3]	Fisher, R.A. and Tippett, L.C. (1928) Limiting Forms of the Frequency Distribution of the Largest or Smallest Member of a Sample. Mathematical Proceedings of the Cambridge Philosophical Society, 24, 180-190. https://doi.org/10.1017/S0305004100015681
[4]	Gnedenko, B. (1943) Sur la distribution limite du terme maximum d’une serie aleatoire. Annals of Mathematics, 44, 423-453. https://doi.org/10.2307/1968974
[5]	Daouia, A., Gardes, L., Girard, S. and Lekine, A. (2011) Kernel Estimators of Extreme Level Curves. Test, 20, 311-333. https://doi.org/10.1007/s11749-010-0196-0
[6]	Gardes, L. and Girard, S. (2012) Functional Kernel Estimators of Large Conditional Quantiles. Electronic Journal of Statistics, 6, 1715-1744. https://doi.org/10.1214/12-EJS727
[7]	Stupfler, G. (2016) Estimating the Conditional Extreme-Value Index under Random Right-Censoring. Journal of Multivariate Analysis, 144, 1-24. https://doi.org/10.1016/j.jmva.2015.10.015
[8]	Ndao, P., Diop, A. and Dupuy, J.F. (2016) Nonparametric Estimation of the Conditional Extreme-Value Index with Random Covariates and Censoring. Journal of Statistical Planning and Inference, 168, 20-37. https://doi.org/10.1016/j.jspi.2015.06.004
[9]	Beirlant, J., Guillou, A., Dierckx, G. and Fils-Villetard, A. (2007) Estimation of the Extreme Value Index and Extreme Quantiles under Random Censoring. Extremes, 10, 151-174. https://doi.org/10.1007/s10687-007-0039-x
[10]	Einmahl, J.H., Fils-Villetard, A. and Guillou, A. (2008) Statistics of Extremes under Random Censoring. Bernoulli, 14, 207-227. https://doi.org/10.3150/07-BEJ104
[11]	Brahimi, B., Meraghni, D. and Necir, A. (2013) On the Asymptotic Normality of Hill’s Estimator of the Tail Index under Random Censoring.
[12]	Gomes, M.I. and Neves, M.M. (2010) A Note on Statistics of Extremes for Censoring Schemes on a Heavy Right Tail. 32nd IEEE International Conference on Information Technology Interfaces, Cavtat, 21-24 June 2010, 539-544.
[13]	Cabras, S. and Castellanos, M.E. (2011) A Bayesian Approach for Estimating Extreme Quantiles under a Semiparametric Mixture Model. ASTIN Bulletin, 41, 87-106.
[14]	Coles, S.G. and Powell, E.A. (1996) Bayesian Methods in Extreme Value Modelling: A Review and New Developments. International Statistical Review, 64, 119-136. https://doi.org/10.2307/1403426
[15]	Stephenson, A. and Tawn, J. (2004) Bayesian Inference for Extremes: Accounting for the Three Extremal Types. Extremes, 7, 291-307. https://doi.org/10.1007/s10687-004-3479-6
[16]	Worms, J. and Worms, R. (2014) New Estimators of the Extreme Value Index under Random Right Censoring, for Heavy-Tailed Distributions. Extremes, 17, 337-358. https://doi.org/10.1007/s10687-014-0189-6
[17]	Gomes, M.I. and Neves, M.M. (2011) Estimation of the Extreme Value Index for Randomly Censored Data. Biometrical Letters, 48, 1-22.
[18]	Matthys, G., Delafosse, E., Guillou, A. and Beirlant, J. (2004) Estimating Catastrophic Quantile Levels for Heavy-Tailed Distributions. Insurance: Mathematics and Economics, 34, 517-537. https://doi.org/10.1016/j.insmatheco.2004.03.004
[19]	Gonzalez-Manteiga, W. and Cadarso-Suarez, C. (1994) Asymptotic Properties of a Generalized Kaplan-Meier Estimator with Some Applications. Communications in Statistics—Theory and Methods, 4, 65-78. https://doi.org/10.1080/10485259408832601
[20]	Nadaraya, E.A. (1964) On Estimating Regression. Theory of Probability & Its Applications, 9, 141-142. https://doi.org/10.1137/1109020
[21]	Watson, G.S. (1964) Smooth Regression Analysis. Sankhyā: The Indian Journal of Statistics, Series A, 26, 359-372.
[22]	Gardes, L. and Girard, S. (2008) A Moving Window Approach for Nonparametric Estimation of the Conditional Tail Index. Journal of Multivariate Analysis, 99, 2368-2388. https://doi.org/10.1016/j.jmva.2008.02.023
[23]	Goegebeur, Y., Guillou, A. and Schorgen, A. (2014) Nonparametric Regression Estimation of Conditional Tails: The Random Covariate Case. Statistics, 48, 732-755. https://doi.org/10.1080/02331888.2013.800064
[24]	Gardes, L., Girard, S. and Lekina, A. (2010) Functional Nonparametric Estimation of Conditional Extreme Quantiles. Journal of Multivariate Analysis, 101, 419-433. https://doi.org/10.1016/j.jmva.2009.06.007
[25]	Rutikanga, J. and Diop, A. (2020) Functional Kernel Estimation of the Conditional Extreme Value Index under Random Right Censoring. https://hal.archives-ouvertes.fr/hal-02955521
[26]	Chaouch, M. and Khardani, S. (2015) Randomly Censored Quantile Regression Estimation Using Functional Stationary Ergodic Data. Journal of Nonparametric Statistics, 27, 65-87. https://doi.org/10.1080/10485252.2014.982651
[27]	Ferraty, F. and Vieu, P. (2006) Nonparametric Functional Data Analysis: Theory and Practice (Springer Series in Statistics). Springer-Verlag, Berlin.

Journals Menu

Follow SCIRP

	+1 323-425-8868
	customer@scirp.org
	+86 18163351462(WhatsApp)
	1655362766

	Paper Publishing WeChat

Journals Menu

Home

About SCIRP

Service

Policies