1. Introduction
The Fréchet distribution is a type 2 extreme value distribution of the generalized extreme value distribution (GEV), which includes the Gumbel distribution (type 1) and the Weibull distribution (type 3) [1] . Extreme value distributions are useful in the analysis of data from extreme phenomena and have applications in a variety of fields.
They are frequently used to model extreme phenomena such as extreme floods, wind speeds, earthquakes, horse racing, large insurance losses, sea currents, and other sources [2] . Over the years, several authors have introduced new distributions using various methods in search of more flexible distributions in terms of hazard rate shapes than those that already exist.
All commonly used methods for adding a parameter have the property of stability, which means that if the method is applied twice, nothing new is obtained the second time [3] . Many researchers have studied the flexibility of various distributions over time by introducing an additional parameter using the method introduced in [3] . Among these distributions are the generalized exponential Marshall-Olkin distribution presented in [4] , Marshall-Olkin Generalized Pareto distribution proposed in [5] , the extension of the alpha power transform class called Marshall-Olkin alpha power family of distributions [6] , Inverse Weibull Marshall-Olkin distribution [7] , Extended Exponentiated Weibull distribution [8] , the Marshall-Olkin exponential Weibull distribution [9] , Marshall-Olkin extended Burr type XII distribution [10] , the Marshall-Olkin Fréchet distribution [11] , Marshall-Olkin extended uniform distribution [12] and Marshall-Olkin extended Pareto distribution [13] . These authors investigated some of the properties of these distributions, such as moments, moment-generating functions, and order statistics. In all cases, maximum likelihood estimation was used to estimate the parameters, and in a few cases, maximum likelihood and Bayesian estimation methods were both used as methods of estimation. Simulations were done for some of the above distributions to assess their performances, and their applications to various data sets.
The Exponentiated Fréchet distribution, a generalization of the classical Fréchet distribution, was studied in [14] in the same way that the Exponentiated exponential distribution, a generalization of the traditional exponential distribution, was introduced in [15] .
The cdf and pdf of the Exponentiated Fréchet distribution are given in [14] , respectively:
(1.1)
And
(1.2)
where
and
are the shape parameters and
is the scale parameter.
As seen from the literature above, one of the main concerns for researchers is the search for new distributions that are more flexible than others, allowing these distributions to be applied in modeling different data sets from various fields. One approach for making distributions more flexible is to add an extra parameter. This typically adds flexibility to a class of distribution functions, which can be very useful for data analysis. Certain distributions exhibit solely decreasing or inverted-bathtub shapes, while others exclusively display decreasing and inverted-bathtub shapes. Consequently, this limitation poses challenges for datasets from diverse fields that demonstrate an increasing hazard rate. Thus, this paper proposes a new distribution based on Marshall-Olkin method, that is more flexible than the Exponentiated Fréchet distribution, which can capture monotonic (increasing and decreasing) shapes that the traditional exponentiated Fréchet distribution cannot handle. As a result, the MOEFr incorporates both unimodal, increasing and decreasing shapes. Therefore, the newly introduced distribution, derived from the Fréchet distribution, possesses the capability to effectively accommodate data exhibiting increasing and decreasing failure rates, characteristics that were previously unattainable with the Exponentiated Fréchet distribution. Some fundamental mathematical and statistical properties of the new distribution like the hazard rate function, reversed hazard rate function, survival function, cumulative hazard function, odds function, quantile function, moments and order statistics were derived.
The structure of the article is as follows: In Section 2, the Marshall-Olkin Generator Method and Exponentiated Fréchet distribution are presented. The proposed distribution and some of the mathematical and statistical properties are derived in Section 3. The estimation of the parameters of the Marshall-Olkin Exponentiated Fréchet (MOEFr) distribution is presented in Section 4. In Section 5, the performance of the proposed distribution is assessed using Monte Carlo simulation. The goodness-of-fit of the model is illustrated via three applications to real data sets in Section 6. Finally, Section 7 gives the concluding remarks of the article.
2. Marshall-Olkin Generator Method and Baseline Distribution
2.1. Marshall-Olkin Generator Method Theory
A method for establishing more flexible new families of distributions that represent different types of behavior than the baseline distributions was proposed in [3] . Considering
as a survival function, they introduced this new distribution family by introducing an additional parameter
with the following survival function:
(2.1)
where
,
and
.
Then
(2.2)
It is obvious that, if
, we have
.
The cumulative distribution function is given by:
(2.3)
And the probability density function is given by:
(2.4)
The hazard rate function is given by:
(2.5)
with
,
and
.
2.2. Exponentiated Fréchet Distribution
The Exponentiated Fréchet (EFr) distribution as a generalization of the classical Fréchet distribution was introduced in [14] . The pdf and the cdf of the distribution are respectively given by:
(2.6)
And
(2.7)
The Exponentiated Fréchet distribution has some appealing physical interpretations and shares many applications in extreme value theory such as supermarket queues, wind speeds, accelerated life testing, sea currents, earthquakes, horse racing, floods, track race records, and rainfall, among others [16] .
Other functions associated with the given distribution are:
Survival function:
Hazard rate function [14] :
Cumulative hazard function:
(2.8)
Reversed hazard rate function:
Odds function:
3. Marshall-Olkin Exponentiated Fréchet Distribution and Its Properties
A typical statistical distribution theory practice is adding a new parameter to an existing family of distribution functions. Adding an extra shape parameter often adds flexibility to a class of distribution functions, which can be highly beneficial for data analysis. Let us consider the Exponentiated Fréchet with the following
survival function:
, where
,
,
and
.
Therefore, substituting the survival function in Equations (2.3) and (2.4), the cdf and pdf are given respectively by:
(3.1)
and
(3.2)
where
and
.
From Equation (2.5), the hazard rate function of the MOEFr distribution is:
(3.3)
Figure 1 and Figure 2 show some possible shapes of the MOEFr pdf and hrf for various parameter values. The MOEFr pdf can be asymmetrical, a J shape, or reversed-J shape, and the MOEFr hazard rate function can have an increasing, inverted-bathtub, or decreasing shape.
3.1. Some Useful Functions
Other functions associated with the MOEFr distribution such as the survival function that gives the probability that an item will survive after a certain period of time, the cumulative hazard function, the reversed hazard rate defined as the ratio of pdf to cdf and the odds function defined as the ratio of the cdf to the survival function are given:
Survival function:
(3.4)
The cumulative hazard function:
(3.5)
The reversed hazard rate function:
(3.6)
Figure 1. Pdf of the MOEFr for various values of
, and
.
Figure 2. Plot of possible shapes of hrf for the MOEFr for various values of
and
.
Odds function:
(3.7)
3.2. Quantile Function
The role of quantile function is essential for simulating random samples from a given distribution. It can also be used to describe some distribution characteristics like the median, skewness and, kurtosis.
Theorem 1 Let Z be the random variable where
, then the quantile function is given by:
(3.8)
where
represents the inverse distribution function and
.
Proof: Using the cdf of the MOEFr defined in Equation (3.1), the quantile function is obtained by solving for x the equation:
(3.9)
The lower quartile, the median and the upper quartile are obtained by replacing q by the values 1/4, 1/2, and 3/4 in the quantile function, respectively.
Hence, the lower quartile is:
(3.10)
The median is:
(3.11)
And the upper quartile is:
(3.12)
3.3. Skewness and Kurtosis
The mathematical forms of the Galton skewness (also known as Bowley’s skewness) and Moors Kurtosis of the MOEFr distribution are given by:
(3.13)
where Q(.) gives the value of the quartile and octile.
The quantiles for various values of the parameters are given in Table 1.
3.4. Moments
Moments are extremely important since they can be used to calculate many important characteristics and properties of a probability distribution, such as mean, variance, kurtosis, and skewness.
Theorem 2 Let Z be the random variable where
, so the rth moment is given by:
(3.14)
Table 1. Quantiles of the MOEFr distribution for various parameter values.
Proof: We have the rth moment of the MOEFr given by:
(3.15)
Using the binomial expansion, we get:
(3.16)
Replacing
in Equation (3.16), we have:
From Table 2, we see that the MOEFr distribution is versatile in terms of mean and variance. The values of the coefficient of skewness (CS) show that it can be skewed to the right, to the left and, nearly symmetrical. Based on the
Table 2. First five moments, Skewness and Kurtosis of the MOEFr distribution for various parameter values.
coefficient of kurtosis(CK) values, the MOEFr distribution can be mesokurtic, leptokurtic, or platykurtic. These characteristics demonstrate the MOEFr distribution’s flexibility, which is appealing for modeling purposes.
3.5. Order Statistics
Assume we have a finite random sample
from a probability density function. Based on this sample, we define
; where
is the sample ith order statistic.
The expression of the pdf of the ith order statistic with
is given by:
(3.17)
Hence, the pdf of the ith order statistic of the MOEFr is given by:
(3.18)
The minimum order statistic for the MOEFr is obtained when i = 1 and is given by:
(3.19)
and the maximum order statistic for the MOEFr is obtained when i = n and is given by:
(3.20)
4. Maximum Likelihood Estimation (MLE)
The estimation of the MOEFr parameters is presented in this section. The parameters were estimated using the maximum likelihood method. Consider the random sample
of size n drawn from the MOEFr distribution. The likelihood function of the sample is given by:
(4.1)
The corresponding log-likelihood function is expressed as:
(4.2)
The nonlinear equations obtained by differentiating the log-likelihood function with respect to
and
are:
(4.3)
(4.4)
(4.5)
and
(4.6)
These nonlinear equations cannot be solved directly, so an iterative quasi-Newton approach, BFGS algorithm, is used to solve them.
5. Simulation Study
In this section, the Monte Carlo simulation was carried out in order to investigate the average biases(ABs) and root mean squared errors (RMSEs) for maximum likelihood estimators (MLEs) of the model parameters (
, and
). The inverse of the cdf given in Equation (3.8) was used to generate random samples from the MOEFr.
We conducted the simulation process using various samples and different parameter values. The samples utilized in the simulation were generated from the inverse cumulative distribution function (cdf) of the MOEFr distribution. To ensure accurate sample generation and obtain precise estimates, we performed 1000 iterations with sample sizes of
. Two sets of parameter scenarios were employed: Set I with
,
,
, and
, and Set II with
,
,
, and
.
The following are the steps for executing the developed model:
1) Set the initial values for the distribution parameters
2) Utilize the inverse cdf to generate samples
3) Utilize the various estimates to evaluate the estimation values
4) Analyze the inferential properties of the estimates, taking into account Average biases and Root mean squared errors
The following formulas were used to compute AB and RMSE:
(5.1)
where
is the parameter in question.
Due to the inability to solve these nonlinear equations directly, the BFGS algorithm, which is an iterative method, was employed for their solution.
Numerical computations were implemented via R-function nlminb () with the argument method = “BFGS”.
The table below shows the MLE, AB, and RMSE values of the parameters
, and
for various sample sizes.
Based on the results in Table 3, we see that MLEs tend to the true parameter values as sample size n increases as can also be seen in decreasing ABs. It is also observed that RMSEs decrease as the sample size increases. Figures 3-6 show the graphs of the Monte Carlo simulation results. In general, the values of ABs and RMSEs decrease as the sample size n increases. Hence, the MLEs are consistent and asymptotically unbiased for estimating the parameters.
6. Applications
In this section, the MOEFr distribution was fitted to three real data sets. Its goodness-of-fit was compared with existing distributions such as Fréchet (Fr), Exponentiated Fréchet (EFr), Marshall-Olkin Fréchet (MOFr), Beta Fréchet (BFr) and Beta Exponential Fréchet (BEFr) distributions using some measures such as the Kolmogorov-Smirnov (K-S) test, the Cramér-von Mises test, and the Anderson-Darling (AD) test. Information criteria such as Akaike Information Criterion (AIC), Consistent AIC (CAIC), Bayesian (Schwarz) Information Criterion (BIC), and Hannan-Quinn Information Criterion (HQIC) were also used to select the best fitting model.
The AIC is given by:
(6.1)
The BIC is given by:
(6.2)
The CAIC is given by:
(6.3)
The HQIC is given by:
(6.4)
where k is the number of estimated parameters in the model,
is the maximum value of the likelihood function for the model and n is the number of observations. The Kolmogorov-Smirnov (K-S) test statistic is computed by:
Table 3. Simulation results for the MOEFr distribution: MLEs, ABs and RMSEs.
Figure 3. Plot of Average Biases of the MOEFr model parameter estimates for set I.
Figure 4. Plot of RMSEs of the MOEFr model parameter estimates for set I.
Figure 5. Plot of Average Biases of the MOEFr model parameter estimates for set II.
Figure 6. Plot of RMSEs of the MOEFr model parameter estimates for set II.
(6.5)
where
is the cdf of the hypothesized distribution,
the empirical distribution function of the observed data. The Cramér-von Mises(
) test statistic is computed using:
(6.6)
where
is the ith observation in the sample, n is the number of observations ,F is the specified cumulative distribution function.
The Anderson-Darling (
) test statistic is given by:
(6.7)
where n is the sample size,
is the CDF for the specified distribution, and i is the ith sample, calculated when the data is sorted in ascending order.
The model with the lowest values of
,
, and K-S tests statistics, along with AIC, CAIC, BIC, and HQIC, is the best. Furthermore, the best model has the highest value of the log-likelihood function and the highest p-value for the K-S statistic.
The distributions to which the MOEFr was compared in this section are: the Fréchet, Exponentiated Fréchet [14] , Marshall-Olkin Fréchet [11] , Beta Fréchet (BFr) [17] and Beta Exponential Fréchet [18] with the respective pdfs:
Fr:
(6.8)
EFr:
(6.9)
MOFr:
(6.10)
BFr:
(6.11)
BEFr:
(6.12)
6.1. Data set I: Bladder Cancer Data Set
The first data set considered in this study is the remission times (in months) of a sample of 128 bladder cancer patients given in Table 4. This data can be found in [19] .
The Total Time on Test, also known as the TTT-transform, was used to generate a graph that reveals the shape of the hazard rate function. The Total Time
on Test (TTT) plot depicts the shape of the failure rate curve. When applying a model to any data, knowing the shape of the failure rate function is critical. The expression of the TTT plot is given by:
(6.13)
where
represent the order statistics.
Table 5 provides the descriptive statistics for data set I. The data are right skewed (coefficient of skewness = 3.325333) and heavily tailed (coefficient of kurtosis = 16.15128).
The TTT, histogram, violin, and box plots for the data set I are shown in Figure 7 and Figure 8. The TTT plot for the data set shows that the hazard rate function has an inverted bathtub shape.
The histogram shows that the data are right skewed, and the box plot indicates that there are some outliers. The violin demonstrates that most values are highly concentrated around the median.
Figure 9 illustrates the estimated PDF and CDF of the MOEFr distribution for the bladder cancer data set, and Figure 10 depicts the Kaplan-Meier and PP plots. The Kaplan-Meier curve demonstrates that the model fits the data because it is not different from the survival function of the model; additionally, the PP plot shows that the 2 distributions are very close. Furthermore, Figure 11 shows plot of fitted pdfs of the distributions considered in this study with histogram of the observed data.
Table 4. The remission times of a sample of 128 bladder cancer patients.
Table 5. Descriptive statistics of the bladder cancer data set.
(a) TTT plot of data set I (b) Histogram of data set I
Figure 7. TTT and histogram plots of bladder cancer data set.
The AIC, CAIC, BIC, and HQIC values, as well as the
,
, and K-S tests statistics, are shown in Table 6 and Table 7. As a result, the MOEFr is the best model because it has the lowest AIC, CAIC, BIC, HQIC,
,
and K-S values as well as the highest log-likelihood function value and p-value for the K-S statistic.
6.2. Data Set II: Carbone Data
The data set “carbone” which is available in the R package Adequacy Model, was used to demonstrate the applicability of the MOEFr. It is uncensored data on breaking stress of carbon fibres (in Gba) studied in [20] . Figure 12 and Figure 13
(a) Violin plot of data set I (b) Box plot of data set I
Figure 8. Violin and box plots of bladder cancer data set.
(a) Plot of estimated PDF (b) Plot of estimated CDF
Figure 9. Estimated PDF and CDF of the MOEFr distribution for bladder cancer data set.
(a) Plot of Kaplan-Meier (b) PP plot
Figure 10. Kaplan-Meier and PP plots of the MOEFr distribution for bladder cancer data set.
give the plots of TTT, histogram, violin and box plot for the data set. The TTT plot of the data set indicates that the hazard rate function is increasing shape.
The histogram clearly shows that the data are nearly symmetrical, and the violin illustrates that values are concentrated around the median. Outliers are revealed by the box plot.
Figure 11. Estimated fitted densities of bladder cancer data set for various distributions.
Table 6. Maximum likelihood estimates of the model parameters, the log-likelihood, and goodness-of-fit statistics for the bladder cancer data set.
Table 7. Values of information criteria for various distributions for the bladder cancer data set.
(a) TTT plot of data set II (b) Histogram of data set II
Figure 12. TTT and histogram plots of carbone data set.
(a) Violin plot of data set II (b) Box plot of data set II
Figure 13. Violin and box plots of carbone data set.
The plots of the estimated PDF and CDF of the MOEFr distribution for the carbone data set is given in Figure 14 and the Kaplan-Meier and PP plots are shown in Figure 15. The Kaplan-Meier curve closely approximates the survival function of the model, and the two distributions also exhibit close proximity in the PP plot.
Moreover, Figure 16 shows plot of fitted pdfs with histogram of the observed data.
The descriptive statistics of the carbone data are given in Table 8. The data are nearly symmetrical with a skewness coefficient of 0.3737844 and platykurtic with a kurtosis coefficient of 0.1728682.
The values of the AIC, CAIC, BIC, and HQIC, as well as
,
, and K-S tests statistics are given respectively in Table 9 and Table 10. Thereby, the MOEFr is the best model since it has the lowest AIC, CAIC, BIC, HQIC,
,
and K-S values together with the highest value of the log-likelihood function and highest p-value for K-S statistic.
6.3. Data set III: Wheaton River Data
The data displayed in Table 11 show the exceedances of flood peaks (in m3/s) of the Wheaton River near Carcross in Yukon Territory, Canada, for the years 1958 to 1984. This data set is available in [21] .
Figure 17 and Figure 18 give the TTT, histogram, violin and box plots for the Wheaton River data set. The TTT plot of the data set indicates that the hazard rate function is reversed bathtub shape.
The histogram shows that the data is skewed to the right, while the violin and box plot show that values are concentrated around the median and the presence of outliers, respectively.
Figure 19 depicts the estimated PDF and CDF of the MOEFr distribution for the Wheaton River data set, and Figure 20 depicts the Kaplan-Meier and PP plots for the MOEFr distribution. Figure 21 depicts the fitted pdfs of the observed data.
The descriptive statistics of the exceedances of flood peaks (in m3/s) of the Wheaton River data are given in Table 12. The data are highly skewed with the Coefficient of Skewness equal to 1.49711 and nearly mesokurtic with the Coefficient of Kurtosis equal to 3.121607.
(a) Plot of estimated PDF (b) Plot of estimated CDF
Figure 14. Estimated PDF and CDF of the MOEFr distribution for carbone data set.
(a) Plot of Kaplan-Meier (b) PP plot
Figure 15. Kaplan-Meier and PP plots of the MOEFr distribution for carbone data set.
Figure 16. Estimated fitted densities of carbone data set for various distributions.
Table 8. Descriptive statistics of the carbone data set.
Table 9. Maximum likelihood estimates of the model parameters, the log-likelihood, and goodness-of-fit statistics for the carbone data set.
Table 10. Values of information criteria for various distributions for the carbone data set.
Table 11. The exceedances of flood peaks (in m3/s) of the Wheaton River.
(a) TTT plot of data set III (b) Histogram of data set III
Figure 17. TTT and histogram plots of Wheaton River data set.
(a) Violin plot of data set III (b) Box plot of data set III
Figure 18. Violin and box plots of Wheaton River data set.
The values of the AIC, CAIC, BIC, and HQIC are given in Table 13. The MOEFr distribution has the lowest AIC, CAIC, BIC, and HQIC; hence, it is chosen to be better than the other distributions considered. In addition, Table 14 shows that the MOEFr is better than the models studied here since it has the lowest
,
, and K-S tests statistics plus the highest value of the log-likelihood function and highest p-value for K-S statistic.
7. Conclusions
This study aimed to introduce the MOEFr distribution to increase the flexibility
(a) Plot of estimated PDF (b) Plot of estimated CDF
Figure 19. Estimated PDF and CDF of the MOEFr distribution for Wheaton River data set.
(a) Plot of Kaplan-Meier (b) PP plot
Figure 20. Kaplan-Meier and PP plots of the MOEFr distribution for Wheaton River data set.
of the EFr using Marshall-Olkin method. The mathematical and statistical properties of the MOEFr, including the hazard rate function, survival function, reversed hazard rate function, cumulative hazard function, odds function, quantile function and its associated results, moments, and order statistics were derived.
The MLE was used to estimate the model’s parameters, and Monte Carlo simulation was used to evaluate the estimators’ behavior. It was seen that the MLEs perform well in estimating the model parameters, since MLEs tend to the
Figure 21. Estimated fitted densities of Wheaton River data set for various distributions.
Table 12. Descriptive statistics of the Wheaton River data.
Table 13. Values of information criteria for various distributions for the Wheaton River data set.
Table 14. Maximum likelihood estimates of the model parameters, the log-likelihood, and goodness-of-fit statistics for the Wheaton River data set.
true values of parameters as sample size increases, as evidenced by decreasing ABs. RMSEs are also observed to decrease as the sample size increases. The new distribution was also applied to three real data sets, Bladder cancer, Carbone and Wheaton River data sets. Based on goodness-of-fit statistics, log-likelihood function, and information criteria values, it was demonstrated that it provides a better fit than the other distributions, that is EFr, MOFr, BEFr, BFr and Fr considered in this study. Given that the MOEFr distribution demonstrated better fitting performance when compared to other distributions, it indicates that the MOEFr distribution represents better the datasets than any of the other distributions. Therefore, it is recommended to utilize the MOEFr distribution for analyzing the datasets.
Based on its performance, this new distribution is recommended for use in practice, particularly in survival analysis. It can be used for data from various fields including engineering sciences, medical sciences, extreme events among others.
It is suggested that future research investigate the performance of the MOEFr distribution using the Bayesian estimation method and applying the model to censored data.
Author Contribution
Conceptualization: AN and LOO, methodology: AN, LOO, EN and AH, software: AN, AH and AHM, validation: AN, LOO, EN, data curation: AN, AHM and AH, formula analysis: AN, LOO, EN and AH, resources: AN, LOO, EN, AHM and AH, writing original draft preparation: AN, LOO and EN, writing review and editing: AN and AH, supervision: LOO and EN, project administration: AN.
Data Availability
The real data used to illustrate the proposed distribution are within the manuscript.
Acknowledgements
The corresponding author would like to express his appreciation to the Pan African University, Institute for Basic Sciences, Technology and Innovation (PAUSTI) for supporting this work.