1. Introduction
Probability distributions are essential tools in a variety of disciplines as they provide statistical interpretations that help make sense of data. However, the existing distributions such as the exponential distribution have limitations making them insufficient in modeling a variety of data.
The exponential distribution is a continuous probability distribution commonly used in statistics and probability theory. It is frequently used to describe the time between events in a process where events occur independently at a fixed average rate [1] .
To overcome these limitations, many authors have proposed modifications of existing distributions. Gupta and Kundu [2] modified the exponential distribution by adding a shape parameter. However, their model is not flexible enough to control both skewness and kurtosis and also accommodate non-monotonic hazard rate shapes.
In the theory of probability distributions, choosing a particular probability distribution to represent real-life phenomena may be motivated by two factors such as tractability and flexibility [3] . Although a probability distribution’s tractability may be advantageous in theory since it is simple to use, particularly when simulating random samples, practitioners and other stakeholders may be more interested in a distribution’s flexibility. In reality, it is preferable to employ probability distributions that best fit the given data set rather than transforming the data set because doing so may compromise the data set’s original features. Due to this, numerous attempts have been made in recent years to ensure that the current standard theoretical distributions are improved and extended [4] [5] [6] . Marshall and Olkin [7] proposed and studied a new method for adding a parameter to a family of distributions. Margaretha, et al. [8] , in a conversation about employing Bayesian methodology to estimate parameters for the Exponentiated Exponential distribution in the context of left-censored data , in their work, Mahdavi and Kundu [9] introduced a new method for generating distributions and applied it to the exponential distribution. This generator was reported to control the skewness in the distribution which is not normal distributed. Many Researchers have made significant advancements in developing and extending various distributions.
Singh, et al. [10] and Niyoyunguruza, et al. [11] used the Marshall-Olkin generator method to extend distributions. They subsequently conducted parameter estimation using Maximum Likelihood Estimation (MLE) technique. By applying these extended distributions to a range of datasets, they demonstrated that these new distributions provided a better fit to the data when compared to the baseline distributions. Yahaya and Ieren [12] proposed the odd generalized exponential Gumbel distribution (OGEGD) for modeling lifetime data, Salem and selim [13] proposed Generalized Weibull-Exponential Distribution (GWED), the power Exponentiated family was proposed by Modi [14] , Uwadi, et al. [15] introduced and studied the Exponentiated Gumbel Exponential (EGuE) distribution, and studied its mathematical and statistical properties. Modi, et al. [16] , in their work proposed and studied a new family of distributions called Modi Exponential distribution, and applied it to two Real Datasets. In this study, we introduce the Modi Exponentiated Exponential distribution, which comprises four parameters. This distribution can be effectively used for fitting and analyzing data in various fields.
This paper is structured as follows: Section 2 provides a definition of the Modi generator, while Section 3 introduces the Exponentiated Exponential distribution. In Section 4, we present the Modi Exponentiated Exponential distribution, and develop its cumulative distribution function (CDF), probability density function (PDF), hazard rate function, and survival function. We also derive some statistical properties and mathematical properties of the proposed distribution. The estimation of the model parameters is given in Section 5. Simulation of study, Application to real data set, and Conclusion were given respectively in Sections 6, 7, and 8.
2. Modi Family
Modi, et al. [16] proposed and studied Modi family of distributions which is flexible and can be used to model a wide range of phenomena in various fields, including engineering, economics, and finance. The CDF, F(x) and PDF, f(x) of the Modi family are respectively, given by:
(2.1)
(2.2)
for all
, where
,
and S(x) is the CDF of the existing distribution, and s(x) is the PDF of the existing distribution.
3. Exponentiated Exponential Distribution
The practice of adding a new parameter to an existing family of distribution functions is a common one in statistical distribution theory.
Gupta and Kundu [2] proposed Exponentiated Exponential (EE) distribution. The CDF and PDF of EE are given respectively by:
(3.1)
where
;
(3.2)
where
is scale parameter, and
is a shape parameter.
4. Modi Exponentiated Exponential (MEE) Distribution
In the field of statistical distribution theory, it is common to enhance the flexibility of a class of distribution functions by introducing an additional parameter. This practice can be very useful for data analysis, as it allows for greater versatility in modeling various types of data.
4.1. Cumulative Distribution Function and Survival Function
From the Equation 2.1 and Equation 3.1 the cumulative distribution
of the MEE distribution is given as:
(4.1)
for all
and
,
and the survival function is derived from Equations (2.2) as follows:
(4.2)
for all
and
.
4.2. Probability Density Function and Hazard Rate Function
The PDF plays a crucial role in the modeling and analysis of continuous random variables.
The PDF of MEE is obtained from Equation 4.1 as follows:
(4.3)
for all
and
.
Figure 1 depicts several potential forms of the PDF for the MEE distribution across different parameter values. The MEE PDF has the potential to exhibit symmetry or unimodal as well as a reversed-J shape.
Parameter values were derived through the technique of Sensitivity Analysis, a valuable technique that enhances the understanding of input-variable-to-model-output connections. It offers insights into uncertainties and empowers decision-makers to make well-informed and resilient choices.
The hazard rate function is obtained by using Equations (4.2) and (4.3).
(4.4)
where
is a scale parameter and
are shapes parameters, for all
, and
.
The shapes of hazard rate function for the MEE are depicted in Figure 2. It demonstrates that the hazard rate function can take various shapes such as increasing, decreasing, constant, inverted bathtub, etc.
Figure 1. Plot of PDF of the MEE for various values of
and
.
Figure 2. Plot of hazard rate function of the MEE for various values of
and
.
4.3. Statisical and Mathematical Properties of the MEE Distribution
4.3.1. Quantile Function
The quantile function is crucial in generating random samples from a specified distribution.
Suppose that X is the random variable where
, so to obtain the quantile function, one can solve an equation for x by using the Cumulative Distribution Function (CDF) of the Modi Exponentiated Exponential (MEE) as defined in Equation (4.1):
Let
Then,
(4.5)
where
is the inverse of the distribution function of MEE and
. To obtain the values of the lower quartile, median, and upper quartile, one can use the quantile function by replacing u with 1/4, 1/2, and 3/4, respectively, giving:
The lower quartile as:
(4.6)
The median as:
(4.7)
The upper quartile as:
(4.8)
4.3.2. Skewness and Kurtosis
Mathematically, the Moors Kurtosis and Galton skewness of the MEE distribution are stated as:
(4.9)
where the quartiles and octiles value is indicated by Q(.)
Table 1 provides information about how quantiles of the MEE distribution vary for different combinations of parameter values. It can be used to look up specific quantile values based on the desired parameters, which can be useful for statistical analysis and modeling.
4.3.3. The rth Moments of MEE Distribution
Calculation of the moments of a distribution is crucial for statistical analysis, especially in practical applications. Moments are used to determine various statistical measures such as measures of central tendency, dispersion, and shape.
The mathematical expression for the rth moment is given by
(4.10)
where
is the PDF of the distribution.
By substituting (4.3) in (4.10)we get
By using binomial expansion, we get
By using gamma function the integral becomes:
since
Comparing with this, we can identify t as x, p as (r + 1), and q as
. Finally,
(4.11)
Table 2 illustrates the adaptability of the MEE distribution concerning its mean and variance. The Coefficient of Skewness (CS) values indicate its potential to exhibit right, or near symmetrical skewness. Similarly, the Coefficient of Kurtosis (CK) values suggest that the MEE distribution can display mesokurtic, leptokurtic, or platykurtic traits. These attributes highlight the MEE distribution’s versatility, making it an attractive choice for modeling needs.
Table 1. Quantiles of the MEE distribution for some parameter values.
Table 2. The first five moments, skewness, and kurtosis of the MEE distribution for different parameter values.
4.3.4. Order Statistics
Let’s consider a finite random sample
drawn from a probability density function. From this sample, we define the jth order statistic,
, where
represents the smallest value in the sample,
represents the second smallest value, and so on, up to
representing the largest value. The Probability Density Function (PDF) of the jth order statistic, with
, can be expressed as follows:
(4.12)
by substituting Equations (4.3) and (4.2) in (5.8), the PDF of the jth order statistic for the MEEE can be expressed as follows:
(4.13)
The PDF of smallest order statistic for the MEE occurs when the value of j is 1, and is given by
(4.14)
The PDF of largest order statistic for the MEE can be found when the index j is equal to n, and its PDF is given by
(4.15)
5. Maximum Likelihood Estimation (MLE)
Maximum Likelihood Estimation (MLE) is a method of finding the values of the parameters that maximize the likelihood function. If we have n values
that are randomly selected from the MEE distribution, the log-likelihood function denoted by
is given by:
(5.1)
where
Replacing Equation 4.3 in 5.1 to calculate the log-likelihood function that is associated with these values, we get
(5.2)
(5.3)
Differentiating 5.3 partially with respect to each parameter and equating to zero gives
(5.4)
(5.5)
(5.6)
(5.7)
Based on our observations, it is evident that Equations 5.4 - 5.7 can not be solved analytically, meaning there is no direct mathematical solution for them. Consequently, we need to resort to a numerical optimization technique to find their solutions. In this study, the Broyden-Fletcher-Goldfarb-Shannon (BFGS) algorithm was employed to estimate the parameters of the MEE distribution.
6. Simulation Study
This section describes a Monte Carlo simulation study conducted to examine the behavior of Maximum Likelihood Estimators (MLEs) for the parameters of the MEE model. A simulation study was conducted to examine the accuracy of the Maximum Likelihood Estimators (MLEs) for four model parameters (
,
,
, and
) in terms of their average biases (ABs) and root mean squared errors (RMSEs). In order to obtain random samples from the MEE, the inverse of the CDF presented in Equation 4.5 was utilized. To accomplish this, we generated 1000 replications for each of the sample sizes
, and
using Equation 4.5 for various combinations of parameter values (
,
,
,
).
The parameter values were provided in two different sets.
Set I:
,
,
,
Set II:
,
,
,
The average biases were computed by:
(6.1)
and the root mean square errors by:
(6.2)
where
is the parameter in question, while
is its estimated value at the ith replication at each sample size, and R is the total number of replication.
The following table presents the MLEs, ABs, and RMSEs values for different sample sizes, corresponding to the parameters,
,
,
,
.
Table 3 presents the simulation results for Maximum Likelihood Estimators (MLEs), Average Biases (ABs), and Root Mean Square Errors (RMSEs) for the parameter values in Set I and set II. Analyzing Table 3 in set I, it can be observed that as the sample size increases, the MLEs approach the true parameter values for the MEE distribution. Generally, the ABs, and RMSEs of the parameter estimators decrease as the sample size increases.
Table 3, on the other hand, displays the simulation results for the parameter values in Set II.
Indeed, the simulation results for the MEE distribution in Table 3 for set II demonstrate a similar pattern. As the sample size increases, the MLEs become closer to the true values of the parameters. Additionally, the Average Biases (ABs), and Root Mean Square Errors (RMSEs) for the parameter estimators tend to decrease, indicating improved accuracy, and precision as the sample size increases.
Sensitivity Analysis methodology was used to determine parameter values, and parameter estimations were carried out through Monte Carlo simulation using R software.
7. Application to Real Data Set
In this section, we fitted the Modi Exponentiated Exponential distribution to two real data sets and observed its flexibility compared to other well-known distributions. The analysis was conducted using R software. We calculated values of various information criteria such as Akaike Information Criterion (AIC), Hannan Quin Information Criterion (HQIC), Bayesian Information Criterion (BIC), and Consistent Akaike Information Criterion (CAIC). Additionally, we performed Kolmogorov-Smirnov (K-S), Cramér-von Mises test (W*), and Anderson-Darling test (A*) to assess the goodness of fit for the considered distributions. The distribution with the highest log-likelihood and the highest p-value for the K-S test and the lowest AIC, BIC, HQIC, CAIC, W*, A*, K-S was considered the best.
The PDFs of the existing distributions compared with the Modi Exponentiated Exponential distribution are presented in Table 4.
Data set I: This data set presented in [17] contains information on the remission times, measured in months, for a group of 128 individuals diagnosed with bladder cancer. 3.88, 5.32, 7.39, 10.34, 14.83, 34.26, 0.90, 2.69, 4.18, 5.34, 7.59, 10.66, 15.96, 36.66, 1.05, 2.69, 4.23, 5.41, 7.62, 10.75, 16.62, 43.01, 1.19, 2.75, 4.26, 5.41, 7.63, 17.12, 46.12, 1.26, 2.83, 4.33, 5.49, 7.66, 11.25, 17.14, 79.05, 1.35, 2.87, 5.62, 7.87, 11.64, 17.36, 1.40, 3.02, 4.34, 5.71, 7.93, 0.08, 2.09, 3.48, 4.87, 6.94, 8.66, 13.11, 23.63, 0.20, 2.23, 3.5, 4.98, 6.97, 9.02, 13.29, 0.40, 2.26, 3.57, 5.06, 7.09, 9.22, 13.80, 25.74, 0.50, 2.46, 3.64, 5.09, 7.26, 9.47, 14.24, 25.82, 0.51, 2.54, 3.70, 5.17, 7.28, 9.74, 14.76, 26.31, 0.81, 2.62, 3.82, 5.32, 7.32, 10.06, 14.77, 32.15, 2.64, 11.79, 18.10, 1.46, 4.40, 5.85, 8.26, 11.98, 19.13, 1.76, 3.25, 4.50, 6.25, 8.37, 12.02, 2.02, 3.31, 4.51, 6.54, 8.53, 12.03, 20.28, 2.02, 3.36, 6.76, 12.07, 21.73, 2.00, 3.36, 6.93, 8.65, 12.63, and 22.69.
Table 3. The outcomes of a Monte Carlo simulation, investigation for the parameters in set I and set II.
Table 4. The existing distributions compared with Modi Exponentiated Exponential distribution.
Table 5 presents the summary characteristics of data set I. The data is skewed to the right, as indicated by a skewness coefficient of 3.325, and it exhibits significant tailing in its distribution, with a kurtosis coefficient of 16.15.
The parameter known as the Kurtosis coefficient was obtained from the data set using R software through descriptive statistics.
In Figure 3, the histogram of the data displays a right-skewed and the presence of outliers is effectively revealed by the box plot.
In Figure 4, the TTT plot of the data set, it can be observed that the hazard rate function is an inverted bathtub shape while the violin plot highlights that the majority of values are concentrated around the median.
Table 6 and Table 7 provide the AIC, HQIC, BIC, and CAIC values, along with the K-S, W*, and A* tests. Based on these results, the MEE model emerges as the most favorable choice because it has the lowest values for AIC, HQIC, BIC, CAIC, W*, K-S and A*, indicating better goodness-of-fit. Additionally, it exhibits the highest p-value for the K-S statistic and log-likelihood function value, further supporting its superiority.
Figure 5 illustrates a plot of fitted densities, comparing the MEE distribution to its sub-models using the bladder cancer data set. The plot reveals that the MEE distribution demonstrates a favorable and encouraging fit when compared to the existing distributions.
Data set II: This data set consists of the waiting times (in minutes) of one hundred bank customers before they receive service. This data set has been previously analyzed by Ghitany, et al. [18] . They have fitted both the Lindley distribution and the exponential distribution to this data. The data set is provided below:
0.8, 0.8, 1.3, 1.5, 1.8, 1.9, 1.9, 2.1, 2.6, 2.7, 2.9, 3.1, 3.2, 3.3, 3.5, 3.6, 4.0, 4.1, 4.2, 4.2, 4.3, 4.3, 4.4, 4.4, 4.6, 4.7, 4.7, 4.8, 4.9, 4.9, 5, 5.3, 5.5, 5.7, 5.7, 6.1, 6.2, 6.2, 6.2, 6.3, 6.7, 6.9, 7.1, 7.1, 7.1, 7.1, 7.4, 7.6, 7.7, 8, 8.2, 8.6, 8.6, 8.6, 8.8, 8.8, 8.9, 8.9, 9.5, 9.6, 9.7, 9.8, 10.7, 10.9, 11, 11, 11.1, 11.2, 11.2, 11.5, 11.9, 12.4, 12.5, 12.9, 13, 13.1, 13.3, 13.6, 13.7, 13.9, 14.1, 15.4, 15.4, 17.3, 17.3, 18.1, 18.2,, 18.4, 18.9, 19, 19.9, 20.6, 21.3, 21.4, 21.9, 23.0, 27, 31.6, 33.1, 38.5.
Table 8 presents the summary characteristics of data set II. The data is skewed to the right, as indicated by a skewness coefficient of 1.47277, and it can be
Table 5. Comprehensive overview of the bladder cancer data set: Descriptive analysis.
(a) Histogram of cancer data set (b) Box plot of cancer data set
Figure 3. Histogram and box plots of bladder cancer data set.
(a) TTT plot of cancer data set (b) Violin plot of cancer dataset
Figure 4. TTT and violin plots of bladder cancer data set.
platykurtic based on kurtosis coeficient of 2.54029. The parameter known as the Kurtosis coefficient was obtained from the data set using descriptive statistics in the R software.
In the Figure 6, the TTT plot of the data set shows that the hazard rate function is on the rise, indicating an increasing shape and the histogram illustrates right skewed of the data.
Table 6. Maximum Likelihood Estimates and goodness-of-fit tests for data set I.
Table 7. A summary of the results from the information criteria analysis conducted on data set I.
Figure 5. The densities of the bladder cancer data set estimated using different distribution models.
Table 8. Investigation of customer waiting duration for bank services: A descriptive analysis.
(a) TTT plot of waiting time data set (b) Histogram of waiting time data set
Figure 6. TTT and histogram plots of waiting time data set.
In the Figure 7, the violin plot emphasizes that most values are centered around the median. The box plot effectively identifies the presence of outliers in the data.
The MEE model is considered the best model based on the information provided in Table 9 and Table 10. This is because it has the lowest values for AIC, HQIC, BIC, W*, A* and CAIC, indicating better model fit. Additionally, it has the highest p-value for the K-S statistic and log-likelihood function value, further supporting its superiority compared to other models.
Figure 8 illustrates a plot of fitted densities, comparing the MEE distribution to its sub-models using the waiting time data set. The plot reveals that the MEE distribution demonstrates a favorable and encouraging fit when compared to the existing distributions.
8. Conclusion
In this paper, we introduced a new four-parameter model called the Modi Exponentiated Exponential (MEE) distribution, and applied to two real data sets. We have examined the mathematical and statistical properties of this proposed distribution. We derived expressions for its rth moment, survival function, hazard rate function, cumulative distribution, and quantile function. Furthermore, through various plots, we have observed that the MEE distribution exhibits different shapes, indicating its versatility in fitting data sets with diverse distributions. We have also obtained the Probability Density Function (PDF) of its minimum and maximum order statistics.
(a) Violin plot of waiting data set (b) Box plot of waiting data set
Figure 7. Violin and box plots of waiting data set.
Table 9. Maximum likelihood estimation and goodness-of-fit analysis for data set II.
Table 10. A summary of the results from the information criteria analysis conducted on data set II.
Figure 8. Histogram and fitted densities of the waiting time data set for different distributions.
To estimate the parameters of the MEE distribution, we employed the method of maximum likelihood estimation. Monte Carlo simulation was used to assess the performance of MLEs. The study observed that MLEs demonstrate good accuracy and consistent estimating of model parameters. As the sample size increases, MLEs tend to approach the true values of the parameters, as indicated by the decreasing ABs. Additionally, RMSEs also decrease with increasing sample size. Our analysis demonstrates that MEE distribution outperforms the existing distributions, in modeling the two data sets considered in this study.
Acknowledgements
The authors would like to acknowledge and thank the Pan African University, Institute for Basic Sciences, Technology, and Innovation (PAUSTI) for their support in conducting this study.