A Study of Quantitative Progress Evaluation Models for Open Source Projects

Abstract

Open source software (OSS) has become an indispensable part of society, not only for personal use but also for corporate use. Projects developed and operated by OSS are called open source projects, and the number of such projects is increasing. On the other hand, because anyone can participate in an open source project, the progress of the project is uncertain due to differences in project members’ skills, development environments, and time zones of activity. Therefore, many users and companies need to understand the development and operation status of open source project. Then, the developers carefully make decisions on upgrading or installing new OSS. In this paper, we focus on the maintenance effort estimation for open source projects considering uncertainty. Also, we evaluate the project quantitatively using Earned Value Management (EVM). Moreover, we examine the appropriateness of the model for predicting the maintenance effort expeditures. Furthermore, we discuss the appropriateness of this EVM method.

Share and Cite:

Sone, H. , Tamura, Y. and Yamada, S. (2022) A Study of Quantitative Progress Evaluation Models for Open Source Projects. Journal of Software Engineering and Applications, 15, 183-196. doi: 10.4236/jsea.2022.155010.

1. Introduction

Open source software (OSS) is code-designed to be accessible to everyone. OSS can be viewed, modified, and distributed as desired by anyone. It is often cheaper, more flexible, and more long-lasting than proprietary software, because OSS is developed by an open source community rather than a single author or company. However, because anyone can join an open source community, there is a high degree of uncertainty about project progress due to differences in project members’ skills, development environments, and time frames of activity. Therefore, it is difficult to predict the progress in open source projects, and many users and companies need to understand the development and operation status of open source projects in terms of making decisions on upgrading or installing of OSS.

Such issues have led to researches on progress forecasting in open source projects [1] [2] [3] [4]. Many researches of project progress forecasting include the estimating of the required effort in order to resolve the reported faults [1] [2] [3]. Moreover, there are several research papers in order to estimate the maintenance effort required by individual developers [4].

On the other hand, there are few researches that estimate the amount of maintenance effort for the entire project and evaluate the stability of the project. Tamura et al. [5] have researched time-series prediction of maintenance effort for an entire open source project, but their evaluation of the project’s progress is limited because it is limited to simple effort prediction.

In this paper, we examine a method for evaluating project stability based on maintenance effort in open source projects. In particular, we use software reliability growth models [6] [7] [8] [9] to predict the number of maintenance effort for open source projects considering uncertainty. For example, there is the stochastic approach based on stochastic differential equation for the other research area [10]. Then, we evaluate the project stability and quantitatively by using earned value management (EVM) [11]. Finally, we discuss the appropriateness of the model used in predicting maintenance effort, and discuss the appropriate model for this method.

2. Evaluation Approach for Open Source Project Stability

2.1. EVM: Overview

In this paper, we use an EVM methodology for the stability evaluation for open source project. The EVM is one of the project management methodology for measuring the project performance and progress. The project progress evaluation by using EVM is used not only for software development but also for open source projects in various fields.

We can grasp the current cost and schedule condition in the project by using the EVM. The EVM basically measures the project progress and performance by using three indices: Earned Value (EV), Planned Value (PV), and Actual Cost (AC) as shown Figure 1. Also, we can quantitatively grasp the current status of the project by comparing three indices such as Table 1.

Sone et al. [12] have researched the applicability of EVM to open source projects was verified, However, PV could not be derived due to a difficulty with the method used to derive the effort. In this paper, we try to derive EVM indices properly.

2.2. SRGM: Overview

In the testing process of software development, the number of potential faults in the software decreases with the progress of testing time, because a lot of resources are spent on fault detection and correction. Therefore, the probability of software fault occurrence decreases with the testing time. Then, the software reliability and the interval of software fault occurrence time increase. Such software reliability model describes software fault phenomenon. This is called software reliability growth model (SRGM).

Figure 1. An example of EVM.

Table 1. Several examples of the indices used in EVM.

In this paper, we use three models such as the exponential model, the delayed S-shaped model, and the infection S-shaped model. These are well-known models in the SRGM. We apply these models to estimate the open source projects by using EVM.

2.3. Effort Prediction Modeling for Open Source Projects

Considering the characteristic of the operation phase in open source projects, the time-dependent expenditure behavior of maintenance effort keeps an irregular state in the operation phase, because there is variability among the levels of project members’ skill. Then, the time-dependent effort expenditure behavior of operation phase becomes unstable.

The operation phases of many open source projects are influenced from the external factors by triggers such as the difference of skill, and the time lag of development and maintenance activities. Considering the above points, we apply stochastic differential equation modeling for managing of the open source project. Then, let Ω ( t ) be the cumulative maintenance effort expeditures, such as finding software faults and improving functionality up to operational time t ( t 0 ) in the open source project. Suppose that Ω ( t ) takes on continuous real values. Ω ( t ) gradually increases as the operational procedures go on. Based on SRGM approach [6] [7], the following linear differential equation in terms of the maintenance expence effort can be formulated as:

d Ω ( t ) d t = β ( t ) { α Ω ( t ) } , (1)

where β ( t ) is the increase rate of maintenance effort at operational time t and a non-negative function, and α means the estimated maintenance effort expenditures required until the end of operation.

Therefore, we extend Equation (1) to the following stochastic differential equation with Brownian motion [13]:

d Ω ( t ) d t = { β ( t ) + σ ν ( t ) } { α Ω ( t ) } , (2)

where σ is a positive constant representing a magnitude of the irregular fluctuation, and ν ( t ) a standardized Gaussian white noise. By using Itô’s formula [14], we can obtain the solution of Equation (2) under the initial condition Ω ( 0 ) = 0 as follows:

Ω ( t ) = α [ 1 exp { 0 t β ( s ) d s σ ω ( t ) } ] , (3)

where ω ( t ) is the Wiener process which is formally defined as an integration of the white noise ν ( t ) with respect to timet. Moreover, we define the increase rate of maintenance effort in case of β ( t ) defined as [15]:

0 t β ( s ) d s d F * ( t ) d t α F * ( t ) . (4)

In this paper, we assume the following equations based on software reliability models F * ( t ) as the cumulative maintenance effort expenditures function of the proposed model:

F e ( t ) α ( 1 e β t ) , (5)

F s ( t ) α { 1 ( 1 + β t ) e β t } , (6)

F i ( t ) α { 1 exp ( β t ) } 1 + c exp ( β t ) , (7)

where Ω e ( t ) means the cumulative maintenance effort expenditures for the exponential software reliability growth model with F e ( t ) . Similarly, Ω s ( t ) is the cumulative maintenance effort expenditures for the delayed S-shaped software reliability growth model with F s ( t ) . Also, Ω i ( t ) means the cumulative maintenance effort expenditures for the inflection S-shaped software reliability growth model with F i ( t ) , respectively.

Therefore, the cumulative maintenance effort, Ω e , Ω s and Ω i up to time t are obtained as follows:

Ω e ( t ) = α [ 1 exp { β t σ ω ( t ) } ] , (8)

Ω s ( t ) = α [ 1 ( 1 + β t ) exp { β t σ ω ( t ) } ] , (9)

Ω i ( t ) = α [ 1 1 + c 1 + c exp ( β t ) exp { β t σ ω ( t ) } ] . (10)

In these models, we assume that the parameter σ depends on several noises by external factors from several triggers in open source projects. Then, the expected cumulative maintenance effort expenditures spent up to time t are respectively obtained as follows:

E [ Ω e ( t ) ] = α [ 1 exp { β t + σ 2 2 t } ] , (11)

E [ Ω s ( t ) ] = α [ 1 ( 1 + β t ) exp { β t + σ 2 2 t } ] , (12)

E [ Ω i ( t ) ] = α [ 1 1 + c 1 + c exp ( β t ) exp { β t + σ 2 2 t } ] . (13)

2.4. Derivation of EVM for Open Source Project

In EVM for open source project, the period of data used for Planned Value (PV) and Actual Cost (AC) have the different values. Both PV and AC use the data obtained from the bug tracking system and required by the fault reporters and the fault correctors. In the open source projects, we assume that the project period is from OSS release to EOL (End of Life). Then, we can use the maintenance effort data until OSS release based on Equations (8)-(13) in order to derive PV. In particular, the parameter α in Equations (8)-(13) mean as the estimated maintenance effort at the time t, when OSS is released. Therefore, the parameter α can be rephrased as Budget at Completion (BAC) in EVM. AC uses the maintenance effort data including after the OSS release. Therefore, the start time of the data used to derive PV and AC is the same.

Earned Value (EV) is the cumulative maintenance effort expeditures viewed on the same scale as the project budget (BAC). Therefore, if the OSS development effort increases but the fault is not resolved, the value of EV becomes small. Then, it is regarded as an inefficient open source project. In the derivation of EV value, the number of potential faults predicted from the fault data reported up to the time of OSS release is used. We use Equations (8)-(13) to predict the number of potential faults. We derive the fault resolving cost, i.e., the value obtained by dividing the number of potential faults from the BAC, as follows:

γ = BAC p . (14)

Then, γ means the fault resolving cost, and p means the potential faults at OSS release. We can derive the EV in cases of F e ( t ) , F s ( t ) , and F i ( t ) by using the fault resolving cost γ and the cumulative number of resolved faults up to the operating time t.

E V e ( t ) = γ [ α f [ 1 exp { β f t σ f ω ( t ) } ] ] , (15)

E V s ( t ) = γ [ α f [ 1 ( 1 + β f t ) exp { β f t σ f ω ( t ) } ] ] , (16)

E V i ( t ) = γ α f [ 1 1 + c f 1 + c f exp ( β f t ) exp { β f t σ f ω ( t ) } ] . (17)

Then, α f , β f , c f , and σ f are parameters used to predict the cumulative number of resolved faults at time t. Therefore, the expected EV required for OSS maintenance until the end of operation time t are respectively obtained as follows:

E [ E V e ( t ) ] = γ α f [ 1 exp { β f t + σ f 2 2 t } ] , (18)

E [ E V s ( t ) ] = γ α f [ 1 ( 1 + β f t ) exp { β f t + σ f 2 2 t } ] , (19)

E [ E V i ( t ) ] = γ α f [ 1 1 + c f 1 + c f exp ( β f t ) exp { β f t + σ f 2 2 t } ] . (20)

Then, the resolved cumulative number of faults is counted when the fault status is Closed in the bug tracking system.

In this paper, EVM uses the dataset obtained from bug tracking system to derive PV, AC, and EV. We assume the following terms in the Table 2 as the EVM in the open source project considering the derivation of these EVM indices.

Table 2. Explanation for EVM used in this research.

3. Numerical Examples

3.1. Data Set

In this paper, we use the data set of open source project for deriving EVM indices. For applying the proposed model to actual project data set, we use the data of LibreOffice [16] obtained from Bugzilla. LibreOffice is an office suite OSS provided by The Document Foundation. In particular, the effort and fault data have been obtained from Bugzilla are version 7.2 for estimating PV and AC. In this paper, the cumulative number of reported faults are 298 and 878, respectively. In particular, we use the project data for about 39 weeks, before LibreOffice was released for estimating PV. For estimating AC, we also use project data for about 112 weeks. Also, each unit data is weekly.

3.2. Estimation of EVM Indices

In this section, we estimate the model parameters of the three SRGM models for estimating the maintenance effort and the number of faults in case of LibreOffice version 7.2 project. Also, we compare the appropriateness of our model with appropriate models.

Table 3 shows the results of parameter estimation of maintenance effort, and AIC (Akaike’s Information Criterion) for comparison of model equations. Also, the parameter α in the PV data can be rephrased as BAC. In terms of AIC, the delayed S-shaped model is the best one for PV estimation. Figure 2 shows the results of applying the delayed S-shaped model to the open source project data.

Next, Table 4 shows the results of parameter estimation of AC, and AIC. Also, the parameter α can be rephrased as the project’s estimated AC. In terms of AIC, the delayed S-shaped model is the best one for AC estimation. Figure 3 shows the results of applying the delayed S-shaped model to the open source project data.

Table 3. Parameter estimation of maintenance effort in terms of PV.

Figure 2. The cumulative maintenance effort expeditures as PV in LibreOffice Ver. 7.2 project by using Equations (9) and (12).

Table 4. Parameter estimation of maintenance effort in terms of AC.

Figure 3. The cumulative maintenance effort expeditures as AC in LibreOffice Ver. 7.2 project by using Equations (9) and (12).

Also, Table 5 shows the results of parameter estimation of the estimated number of potential faults at OSS release, and AIC. We use the parameter α for deriving fault resolving cost. There is no significant difference in AIC values among all the model equations used in this research. Therefore, it is difficult for us to identify a suitable model for the data used in this research. For convenience, we assume that the exponential model with the smallest AIC is the appropriate model. Figure 4 shows the results of applying the exponential model to the open source project data.

Finally, Table 6 shows the results of parameter estimation of the estimated number of resolved faults at present, and AIC. In terms of AIC, the infection S-shaped model is the best method for the number of resolved faults estimation. Figure 5 shows the results of applying the delayed S-shaped model to the open source project data.

A comparison of the AIC values during parameter estimation in the three model equations showed that the delayed S-shaped model and the infection S-shaped model are appropriate. In the LibreOffice version 7.2 project data, the increase rate of maintenance man-hours and number of faults at the start of the maintenance phase is small. We find that the delayed S-shaped model and infection S-shaped model are appropriate for such project data.

In open source projects, the number of fault reports increases as the number of OSS users increases after the release of a particular version. As a result, the effort required for fault maintenance increases. Therefore, the appropriate model equation for many open source project data would be the same as in this research.

Table 5. Parameter estimation of number of potential faults in case of LibreOffice.

Figure 4. The cumulative estimated number of potential faults by using Equations (8) and (11).

Table 6. Parameter estimation of number of resolved faults in case of LibreOffice.

Figure 5. The cumulative estimated number of resolved faults by using Equations (10) and (13).

In this research, we derive EVM indices by using the best-fit model equation for each data set. The fault resolving cost γ = 662.419 ( man days ) , one of the EVM indices, is necessary for the derivation of EV. Figure 6 shows the results of EV, AC, and PV estimations.

Figure 6 shows that both EV and AC are larger than PV. In particular, the EV value is very large. This is because the number of resolved faults is estimated to be higher than the number of potential faults. On the other hand, the EV value is lower than the EV and AC values around 50 weeks, the time of the version 7.2 release, showing the project is in a delayed state. In other words, after version 7.2 was released, we find that the project became more active as the number of users of that version increased.

Figure 6. EVM estimation results in LibreOffice project.

4. Conclusions

In this paper, we have examined a method for evaluating project stability based on SRGM in open source projects. In terms of AIC, we have identified the appropriateness models in the open source project. Then, we have found that the delayed S-shaped model and the infection S-shaped model are the best models. We have concluded that the results are the same as other open source project, because of the characteristic of the number of OSS users’ transitions. Also, we have derived EVM by using the appropriate SRGM models. As a result, we have found that the trigger for activating open source projects is after the release of a particular version.

Researches on stability evaluation methods for open source projects have often focused on the resolving of individual faults. Therefore, the practical application of EVM for evaluating the stability of open source projects as a whole will contribute to the future development of OSS. On the other hand, since the proposed method evaluates stability based on the cost of the entire open source project, it is difficult to evaluate the causes of project stability in fault units. Therefore, we consider that using not only the proposed method but also individual fault-based project evaluation methods will provide a better project stability evaluation tool.

As only one open source project data set has been used in this paper, it is necessary to verify the characteristics of the trends in maintenance effort and number of faults by using multiple project data sets in the future.

Acknowledgements

This work was supported in part by the JSPS KAKENHI Grant No. 20K11799 in Japan.

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

References

[1] Hooimeijer, P. and Weimer, W. (2010) Modeling Bug Report Quality. Proceedings of the Twenty-Second IEEE/ACM International Conference on Automated Software Engineering (ASE ’07), Georgia, 34-43.
[2] Giger, E., Pinzger, M. and Gall, H. (2010) Predicting the Fix Time of Bugs. Proceedings of the 2nd International Workshop on Recommendation Systems for Software Engineering, Cape Town, May 2010, 52-56.
https://doi.org/10.1145/1808920.1808933
[3] Marks, L., Zou, Y. and Hassan, E.A. (2011) Studying the Fix-Time for Bugs in Large Open Source Projects. Proceedings of the 7th International Conference on Predictive Models in Software Engineering (Promise ’11), Banff Alberta, September 2011, Article No. 11.
https://doi.org/10.1145/2020390.2020401
[4] Mishra, R. and Sureka, A. (2014) Mining Peer Code Review System for Computing Effort and Contribution Metrics for Patch Reviewers. Proceedings of the 2014 IEEE 4th Workshop on Mining Unstructured Data, Victoria, 30 September 2014, 11-15.
https://doi.org/10.1109/MUD.2014.11
[5] Tamura, Y. and Yamada, S. (2017) Open Source Software Cost Analysis with Fault Severity Levels Based on Stochastic Differential Equation Models. Journal of Life Cycle Reliability and Safety Engineering, 6, 31-35.
https://doi.org/10.1007/s41872-017-0009-5
[6] Yamada, S. (2014) Software Reliability Modeling: Fundamentals and Applications, Springer-Verlag, Tokyo/Heidelberg.
[7] Lyu, M.R. (1996) Handbook of Software Reliability Engineering. IEEE Computer Society Press, Los Alamitos.
[8] Musa, J.D., Iannino, A. and Okumoto K. (1987) Software Reliability: Measurement, Prediction Mechanics, Application. McGraw-Hill, New York.
[9] Kapur, P.K., Pham, H., Gupta, A. and Jha, P.C. (2011) Software Reliability Assessment with OR Applications. Springer-Verlag, London.
https://doi.org/10.1007/978-0-85729-204-9
[10] Trung, N. (2018) Modeling Election Problem by a Stochastic Differential Equation. American Journal of Operations Research, 8, 441-447.
https://doi.org/10.4236/ajor.2018.86024
[11] Fleming, Q.E. and Koppelman, J.M. (2010) Earned Value Project Management. 4th Edition, PMI, Newton Square.
[12] Sone, H., Tamura, Y. and Yamada, S. (2019) Statistical Maintenance Time Estimation Based on Stochastic Differential Equation Models in OSS Development Project. Computer Reviews Journal, 5, 126-140.
[13] Wong, E. (1971) Stochastic Processes in Information and Systems. McGraw-Hill, New York.
[14] Arnold, L. (1971) Stochastic Differential Equations-Theory and Applications. John Wiley & Sons, New York.
[15] Yamada, S., Kimura, M., Tanaka, H. and Osaki, S. (1994) Axisymmetric Vortex Solution of Navier-Stokes Equation. IEICE Transactions on Fundamentals, E77-A, 109-116.
[16] LibreOffice.
https://ja.libreoffice.org/

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.