Kumaraswamy-Odd Rayleigh-G Family of Distributions with Applications ()
1. Introduction
Researchers use different approaches to induct additional parameters to a continuous class of distributions, ostensibly because in many applications, these classical probability distributions do not fit real life data. In other words, all of these approaches extend the classical baseline probability distributions by introducing additional parameter(s) to the baselines, thereby making the extended baselines much more flexible to fit wide range of data from practical situations. With this approach, several generalized families of distributions have been proposed and applied to real life data in areas such as engineering, life sciences, environmental sciences, finance and medical sciences.
Recently, there are a lot of attempts in the statistics literature to generalize distributions. This generalization is mainly on a methodology proposed by many researchers, as in [1]. The most frequently used is the T-X approach by [2]. Some of the generalized families of distributions based on this approach in the literature include Weibull G family by [3], Lomax Generator of distributions by [4], Odd Generalized Exponential family by [5], Odd Lindley-G family by [6], Gompertz-G family by [7], Zubair-G family by [8], Odd Frechet G family by [9], Power Lindley G family by [10], Topp Leone Exponentiated-G Family by [11], Odd Chen-G family by [12], Burr X Exponential G family by [13], Inverse Lomax-G family by [14].
The objective of this paper is to propose a new family of distribution called the Kumaraswamy Odd Rayleigh-G family of distributions which has the capacity of providing more robust compound probability distribution when used in modelling real life data set. This new family adds three additional parameters to the baseline distribution.
The rest of this article is structured as follows: In Section 2, we defined the Kumaraswamy Odd-Rayleigh-G Family. In Section 3, we derive some models based on the KORG family. In Section 4 we present the estimation method used in estimating the parameters of non-linear models. We conduct a Monte Carlo simulation study using a Kumaraswamy Odd-Rayleigh Log-Logistic (KORLLD) model in Section 5. In Section 6, we apply the new model of KORG family to five real life datasets and compare their performance with some existing distributions. Lastly, Section7 concludes the paper.
2. Kumaraswamy Odd Rayleigh G (KOR-G) Family
Attempts have been made to define new families of probability distributions which enhance the flexibility in practical data modeling of well known baseline distributions. In the spirit of the T-X approach by [3], this paper defines the cumulative distribution as
(1)
where
is the function of the baseline cdf H(x) of any continous random variable X. The function
must satisfy the following conditions
(a)
;
(b)
is non-decreasing and monotonically differentiable;
(c)
tends to a as x tends to
;
(d)
tends to b as x tends to
.
Let T be a random variable which is continuous with probability density function (pdf) z(t) defined on the close interval [a,b].
In 2011, [15] introduced the Kumaraswamy-G family of distribution. The probability density function (pdf) and the cummulative distribution function (cdf) are given by:
(2)
(3)
where
and
are the cdf and pdf of the baseline distribution with parameter vector
.
The Odd Rayleigh-G family has pdf and cdf given by
(4)
(5)
where
.
Lemma I
The cdf of the proposed KORG family of distributions is given by
(6)
where
,
the vector
is the parameter of the baseline distribution
and
.
Proof
From Equation (1),
(7)
let
if
, the
and if
,
. So,
(8)
and
(9)
and this can be written as
(10)
whence the proof. if
,
and if
,
From Equation (7), the pdf of the KORG family can be written as
(11)
And substituting Equations (4) and (5) in to 11 yields
(12)
Similarly, differentiating Equation (6) with respect to x will also yield Equation (12). Figure 1(a) illustrates the density function
with different parameter values. It is obvious from this graph that
values of x. And to evaluate this integral
Let
Then
and if
, then
Figure 1. Density and hazard rate plots of KORLL distribution with varying parameter values.
if
, then
therefore
which showed that
is a pdf for the continous random variable X. The Hazard function
and survival function
of the KORG family can be given as
(13)
and
(14)
Quantile Function of KORG
The quantile function of KROG model can be given as
(15)
where
is the quantile function of the baseline distribution.
3. Sub-Models of KORG Family
In this section, we considered two submodels of KORG family: Kumaraswamy Odd Rayleigh Log-Logistic (KORLL) and Kumaraswamy Odd Rayleigh Inverse Rayleigh (KORIR) distributions.
3.1. The KORLL Model
The cdf and pdf of log-logistic (LL) distribution are given as
and
The quantile function of the LL distribution is given by
where u is uniformly distributed in the interval (0, 1). Then, the KORLL distributon has the cdf given by:
(16)
The corresponding pdf of Equation (16) is given below:
(17)
The hazard function (hf), and survival function (sf) are presented below:
(18)
(19)
3.1.1. Quantile Function of KORLL
Lemma II
Let the random variable u be uniformly distributed on
. Define the random variable y as
(20)
then the random variable x defined as
(21)
has a kumaraswamy odd Rayleigh-Log-Logistic distribution i.e.
. And when
, x is distributed as
.
Figure 1 illustrates the various shapes of the density and hazard functions of the KORLL distribution at various parameter values. The density can be symmetric, skewed, and unimodal depending on the parameter values chosen. The hazard function can take many shapes depending on parameter values. This includes J-shaped and non-decreasing.
Table 1 presents the skewness and kurtosis of both the baseline log-logistic distribution and the KORLL distribution, computed from the quantile function in Equation (21) using Equation (22) and (23) respectively. For the choosen parameter values the skewness of the log-logistic ranged from −1.4352 to −0.0686, whereas that of the KORLL ranged from −0.0696 to 0.3479. Interms of skewness, it’s clear that KORLL model is much more flexible than the log-logistic distribution. Similarly the kurtosis for the baseline and extended baseline distribution ranged from −2.9641 to −0.1024 and −0.1646 to 31.0576 for the choosen parameter values, respectively. This further suggest the flexibility of the KORLL over log-logistic distribution.
Table 1. Skewness and Kurtosis using different parameter values.
Figure 2. cdf and sf plots of KORLL distribution with varying parameter values.
3.1.2. Skewness and Kurtosis
The skewness and kurtosis of the KORLL distribution can easily be computed from the quantile function using the relation: the Bowley’s skewness (by [16]) is based on the quantile defined as
(22)
And the Moor’s Kurtosis by [17] is based on octiles given by
(23)
3.2. The KORIR Model
The cdf and pdf of the baseline Inverse Rayleigh distribution are given as
and
is scale parameter. The qf is given by
when u is uniformly distributed. The cdf and pdf of KORIR distribution is given as
(24)
(25)
Quantile Function of KORIR
Lemma III
Let the random variable u be uniformly distributed on
. Define the random variable y as in Equation (20), then the random variable
defined as
(26)
has a kumaraswamy odd Rayleigh-Inverse Rayleigh distribution i.e.
. And when
, x is distributed as
.
4. Estimation
The parameters of the KORG family are estimated in this section using the method of maximum likelihood. Given a random sample of
of size n with parameters
and
from KORG family of distribution, the pdf of KORG can written as
(27)
where
.
Let
be the (p × 1) parameter vector, then the log-likelihood function based on Equation (25) is given by
(28)
Partially differentiating the likelihood function yields the components of the score function
as follows
(29)
(30)
(31)
(32)
where
,
,
, and
.
The estimators of the parameters can be obtained by setting Equations (29)-(32) to zero and solving numerically using Newton Rapson or any other iterative methods.
5. Monte Carlo Simulation
A Monte Carlo Simulation is conducted and the results of the bias and root mean squared error of the various estimated parameter values are presented in Table 2. The efficacy for the simulation study is to observe the performance of the maximum likelihood estimates and to see whether the simulated values of the model parameters approach the true parameter values or not. The Monte carlo simulation is described as follows:
(a) For known parameter values i.e.
, samples of different sizes from the KORLL distribution were generated (
,
,
, and
) using the quantile function defined in Equation (21).
(b) Using the maximum likelihood method, we compute the MLE of
,
,
, and
for the ith replicate.
(c) Steps (a) and (b) are replicated N = 1000 times.
Table 2. A simulation results for the KORLL distribution.
(d) The bias and RMSE for each sample size n are computed as
(33)
where
are the mle for each iteration
. The simulation results in Table 2 have shown that based on the parameter values chosen, the estimated Biases decrease as the sample size n inreases. In addition, the estimated root mean squared errors decay towards zero as the sample size increases. These two observations illustrate the consistency of the maximum likelihood estimates.
6. Application
Here, we illustrate the applicability of the KORLL distribution to five data sets. Data set I represent survival times of 121 patients with breast cancer as reported by [18]. Data set II represents the Marine water as reported by [19]. Data set III represents 101 data points that reflect the stress-rupture life of kevlar 49/epoxy strands which were subjected to continuous persistent pressure at the 90 percent stress point until everything had collapsed as in [20]. Data set IV represents the death times (in weeks) of patients with cancer of tongue with aneuploidy DNA profile as reported by [21]. Data set V is due to [21] which is a life times data relating to times (in months from 1st January, 2013 to 31st July, 2018) of 105 patients who were diagnosed with hypertension and received at least one treatment related to hypertension in the hospital where death is the event of interest.
We used a maxLik package by [22] in R by [23]. The analytical measures in comparing the model fit are the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC). Smaller values of the AIC statistic indicate better model fittings. The competing models are as follows:
(i) The Marshall Olkin Extended Log-Logistics (MOELL) as in [24] wth cdf
(ii) The Kumuraswamy Log-Logistic (KUMLL) as in [25] with cdf
(iii) The Zografos-Balakrishnan Log-Logistic (ZBLL) as in [26] with cdf
Based on the considered analytical measures, we have noted that the proposed KORLL model provides the best fit to the five analyzed real life data sets presented in Tables 3-7. This proposed model outperforms the other four competing
Table 3. MLEs of the Parameters with SEs (paranthesis), BIC, −ll, and AIC values for data set I.
Table 4. MLEs of the Parameters with SEs (paranthesis), BIC, −ll, and AIC values for Data set II.
Table 5. MLEs of the Parameters with SEs (paranthesis), BIC, −ll, and AIC values for Data set III.
Table 6. MLEs of the Parameters with SEs (paranthesis), BIC, −ll, and AIC values for Data set IV.
Table 7. MLEs of the Parameters with SEs (paranthesis), BIC, −ll, and AIC values for Data set V.
extensions of the log-logistic distributions presented.
7. Conclusion
In this paper, a new family of distributions called the Kumaraswamy Odd Rayleigh G family which introduced three additional parameters to the baseline distribution is proposed and studied. This new family gives more flexibility and proved best fit, to a wide range of data from practical situations. The Monte Carlo simulation results indicated that the simulated values of the parameters of the sub-model of this family approached the true values as the sample size increases. Also, the root mean squared error estimates decay towards zero as the sample size becomes large. These facts suggest the consistency of the estimates. Based on the considered analytical measures, we concluded that the proposed family represented in this study by the Kumaraswamy Odd Rayleigh Log-Logistic distribution provided the best fit to the 5 analysed real life data sets, some of which are the survival times of 121 patients with Breast cancer and death times (in weeks) of patients with cancer of tongue with aneuploidy DNA profile.