Bayesian Predictive Analyses for Logarithmic Non-Homogeneous Poisson Process in Software Reliability ()

1. Introduction
Software has become a driver for everything in the 21st century from elementary education to genetic engineering. Thus due to high dependency, the size and complexity of computer systems have grown and these pose a great problem in their reliability as failures are prone to happen during their operations. To avoid the failures and faults, reliability of software needs to be studied during development of software so as to come up with reliable software. Reliability of software is of a lot of concern to the developers.
Software reliability is defined as the probability of failure free software operations for a specified period of time in a specified environment [1] . With the increasing need of software with zero defects, predicting reliability of software systems is gaining more and more importance [2] . Software reliability is achieved through testing during the software development stage [3] . Software Reliability modeling is done to estimate the form of the curve of the failure rate by statistically estimating the parameters associated with the selected model. In most cases, the reliability development of a complex system often take place by testing a system until it fails, then making repairs and design changes and testing it again. This process continues until a desired level of reliability is achieved [4] . The purpose of this measure is to estimate the extra execution time during test required to meet a specified reliability objective and to identify the expected reliability of the software when the product is released. During reliability modeling, the software systems are tested in an environment that resembles to the operational environment [5] .
Over the past decades many software reliability models that can be used for predictive analyses have been proposed by different authors [6] [7] . The Musa-Okumoto reliability model, also known as logarithmic was developed by Musa and Okumoto in 1984; which they confirmed to be more accurate than the exponential model. The Musa-Okumoto software reliability model is one of non-homogeneous Poisson process software models with the intensity function given by;
(1)
The model is based on the assumptions that failures are observed during execution time caused by remaining faults in the software; whenever a failure is observed, an instantaneous effort is made to find what caused the failure and the faults are removed prior to future tests and whenever a repair is done it reduces the number of future faults not like other models. The model must remain stable during the entire testing period for any particular testing environment and a reasonably accurate prediction of reliability must be provided by the model. These are the two main aspects of a good reliability model [8] . The Musa-Okumoto (1984) model has been used in various testing environment and in many instances, it provides good estimation and prediction of software reliability. Compared to other models when used in testing industrial data set, Musa-Okumoto model is the best performer in terms of fitting and predictive capability to the data [5] .
There has been a lot of application of Musa-Okumoto software reliability growth model as it one of the best predictive models, it belongs to the selected models in the AIAA recommended practice standard on software reliability [9] , [10] . Musa-Okumoto model have been also used in software cost estimation models with high accuracy [11] [12] [13] . A critical review and categorization of software reliability have been done by many researchers [14] [15] . Predictive analyses on this model is missing in literature and this paper presents predictive analyses on Musa-Okumoto software reliability model using Bayesian approach. This paper presents Bayesian single-sample predictive inference for Musa-Okumoto software reliability model using Bayesian approach.
2. Bayesian Methodology
Bayesian method owes its name to the fundamental role of Bayes’ theorem. In Bayesian reasoning, uncertainty is attributed not only to data but also to the parameters. Therefore, all parameters are modelled by distributions. Before any data are obtained, the knowledge about the parameters of a problem are expressed in the prior distribution of the parameters. Given actual data, the prior distribution and the data are combined into the posterior distribution of the parameters. The posterior distribution summarizes our knowledge about the parameters after observing the data.
In this paper we assume that a reliability growth testing is performed on a computer software system and the number of failures in the time interval
, denoted by
is observed. We also assume that
follows the NHPP with intensity given in Equation (1). Let
be the successive failure times. When testing stops after a pre-determined n number of failures is observed, the failure data is said to be failure-truncated. We denote the n failures time by
where
, a time-truncated data is when testing is observed for fixed time t. We denote the corresponding observed data by
, where
.
2.1. Issues in Prediction
In this paper we present four issues 1) 2) 3) and 4) as listed below in single-sample prediction which are associated closely with development testing program of a software. Here, we consider one software and assume that its cumulative time between failure times obey Musa-Okumoto software reliability growth model with observed data as either
or
. Based on
or
, we are interested in the following problems:
1) What is the probability that at most k software failure will occur in the future time period
with
?
2) Given that the pre-determined target value
for the failure rate of the software undergoing development testing is not achieved at time T, what is the probability that the target value
will be achieved at time
?
3) Suppose that the target value
for the software failure rate is not achieved at time T, how long will it take so that the software failure rate will be attained at
?
4) What is the upper prediction limit (UPL) of
with level
.
being a pre-determined value greater than T?
2.2. Prior, Posterior and Predictive Distributions
Let
represent
or
. The joint density of
is therefore :
(2)
Case 1:
, the shape parameter is known, we adopt the following non-informative prior distribution for
:
(3)
The posterior distribution of
is thus given by;
(4)
Let
be the random variable being predicted. The predictive density of
is;
(5)
Hence, the Bayesian UPL of
with level
, denoted as
, must satisfy
(6)
Case 2: The shape parameter
is unknown; we consider the following joint prior distribution of
and
where both parameters are assumed to be independent.
(7)
Thus the corresponding joint posterior distribution for
and
is given as;
(8)
Equation (8) is similar to Equation (4), let
be the random variable predicted. The predictive density of
is;
(9)
and the Bayesian UPL denoted by
of
with level
similar to Equation (6) is;
(10)
3. Main Results for Prediction Using Non-Informative Priors
In this section we address the four issues stated in Section 2.1 using the Bayesian approach. The main results are presented as propositions and their proof given in the Appendix. Below, we use
to represent
the percentage point of the chi-square distribution with n degrees of freedom such that
, and define Poisson
and gamma
. The prior is assumed to be Equation (3) and Equation (7) in all subsequent propositions.
Preposition 1 (issue 1)
The probability that at most k failures will occur in the time interval
with
is
(11)
Preposition 2 (issue 2)
The probability that the target value
will be achieved at time
(
) is
(12)
Preposition 3 (issue 3)
For a given level
, the time
required to attain
is
(13)
Remark 1: For the second part of Equation (13),
is the solution to the equation
. (14)
Preposition 4 (issue 4)
The Bayesian UPL of
with level
is
(15)
Remark 2: For the second part of Equation (15),
is the solution to
(16)
4. Real Example
We have used the time between failures data described in [16] to illustrate the developed methodologies for the single-sample Bayesian predictive analysis. We conducted the goodness of fit test presented in [17] and found that the data obey the Musa-Okumoto process. On the basis of this data set the maximum likelihood estimates for the parameters
and
of the Musa-Okumoto growth model were obtained as
and
, respectively.
1) Suppose we are interested in the probability
that at most k failures will occur in a future time period
. a) For the case
known, we take its maximum likelihood estimate as its true value, i.e.
. Using the first formula in Equation (11), we have
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
. b) When
is unknown, from the second formula of Equation (33), we obtain
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Figure 1 shows the graph of desired probabilities when
is known and when it is unknown.
From the graph it can be seen that there is high probability that at most 15 failures will occur during that time interval when
is unknown as compared to when it is known. 2) Suppose the target value is given by
chosen arbitrarily. At the time
, the MLE of the achieved failure rate for this software is
, which is greater than
thus it cannot be achieved at time
and development testing will continue. Suppose we want to find the probability that the target value
will be achieved at the time
. a) When
is known (say,
), from the first formula in Equation (12), we obtain
![]()
Figure 1. The graph of the probabilities
that at most k failures will occur in the time interval (180, 250] for the cases of
known and unknown.
, which is very small and hence the target valuewill not be achieved. b) when
is unknown, from the second formula in Equation (12) we have
computed by the Monte Carlo Method of integration based on a sample of size
. This shows that, when
is unknown there is a possibility of achieving the target value at time
.
3) Since the target value
was not achieved at
, we want to know how long it will take for the target value to be achieved. a) when
is known (say,
), let
, from the first formula in Equation (13) we obtain
. This means that, it will take another 538.7523 hours in order to achieve the desired failure rate. b) when
is unknown, from second formula in Equation (13) and Remark1, we obtain
. Thus, it takes another 414 hours in order to achieve the desired failure rate when
is unknown this shows a high reduction in time as compared to when
is known. 4) Given
, from first formula in Equation (15)
the Bayesian UPL of
with level
is given by
.
5. Conclusions
In software development, predictive analysis is very important as it helps the software developer to make a trade-off decision at the right time. In this paper, explicit solution to predictive issues that may arise during development process were derived using Bayesian approach. These solutions are helpful to software developers in many instances such as resource allocation, when to terminate the testing process, modification needed in the software before termination.
The study used Bayesian approach with non-informative priors to derived explicit solutions for predictive issues that may arise during software development process. In all the cases when the shape parameter was known, solutions to posterior and predictive distributions had closed forms while when it is unknown, solutions had no closed forms and the study used Markov Chain Monte Carlo (MCMC). Bayesian approach was used as it is advantageous over classical approach. Bayesian approach is available for small sample sizes and allows the input of prior information about reliability growth process and provides full posterior and predictive distributions [6] .
However, it will be interesting to look at two-sample prediction for Musa-Okumoto (1984) model considering procedures that [3] used. These procedures presented in this paper can also be extended to other NHPP models such as Cox-Lewis process and the delayed S-shaped process. This is left open for future research.
Conflicts of Interest
The authors declare no conflicts of interest regarding the publication of this paper.
Appendix: Proof of Preposition 1 - 4
We first state the following identity without proof: That is
(A.1)
where m is any positive integer, a and b are two real numbers such that
,
is an increasing and differentiable function and
.
Proof of Proposition 1
The probability that at most k failures will occur in the interval
is
. When
is known, we have
. (A.2)
where
is given by equation (4) and
(A.3)
From Equation (2), we have
, and
Thus Equation (A.3) becomes
(A.4)
And hence Equation (A.2) becomes
(A.5)
The integral part of Equation (A.5) integrates to 1 since it is a gamma distribution with parameters j and
and hence Equation (A.5) reduces to
. (A.6)
This is the first formula of Equation (11).
When
is unknown, noting that
and
are given by Equation (A.4) and Equation (8) respectively, we obtain
(A.7)
Since the summation of k is from n to
and k’s are not the same, we substitute letter k with d in Equation (A.7) where
as used in equation (8). Equation (A.7) implies the second formula in Equation (11).
Proof of preposition 2
Let
denote the posterior of
. Hence, the probability that the target value
will be achieved at time
is given by
(A.8)
when
is known, making transformation
, we have
and
. Consequently, the posterior density of
is
(A.9)
From Equation (A.9), it can easily be noted that
has gamma distribution with parameters n and
. Noting that gamma and Poisson distributions have a relationship defined as
. (A.10)
By substituting Equation (A.9) and Equation (A.10) into Equation (A.8), we obtain the first formula of Equation (12).
When
is unknown, making transformation
and
, we obtain
and
. Note that the Jacobian is
. From Equation (8), the joint posterior density of
is
.
(A.11)
By substituting Equation (A.10) and Equation (A.11) into Equation (A.8), we obtain the second formula of Equation (12).
Proof of preposition 3
For given level
, the time required to attain the target value
is
, where
satisfies Equation (44). When
is known, from Equation (46), it can easily be seen that
follows a chi-square distribution with 2n degrees of freedom. Thus we have
. (A.12)
and Equation (13) follows immediately.
The time required to attain the target
with level
when
is unknown is
where
is the solution to
. (A.13).
Proof of preposition 4
For a pre-determined
, the Bayesian upper prediction limit for
with level
is
satisfying
. From Equation (A.8) and Equation (A.12), we have
, thus follows Equation (15). The second part follows similarly.