TITLE:
An Approach to Detect Structural Development Defects in Object-Oriented Programs
AUTHORS:
Maxime Seraphin Gnagne, Mouhamadou Dosso, Mamadou Diarra, Souleymane Oumtanaga
KEYWORDS:
Object-Oriented Programming, Structural Development Defect Detection, Software Maintenance, Pre-Trained Models, Features Extraction, Bagging, Neural Network
JOURNAL NAME:
Open Journal of Applied Sciences,
Vol.14 No.2,
February
29,
2024
ABSTRACT: Structural development defects essentially refer to
code structure that violates object-oriented design principles. They make
program maintenance challenging and deteriorate software quality over time.
Various detection approaches, ranging from traditional heuristic algorithms to
machine learning methods, are used to identify these defects. Ensemble learning
methods have strengthened the detection of these defects. However, existing
approaches do not simultaneously exploit the capabilities of extracting
relevant features from pre-trained models and the performance of neural
networks for the classification task. Therefore, our goal has been to design a
model that combines a pre-trained model to extract relevant features from code
excerpts through transfer learning and a bagging method with a base estimator,
a dense neural network, for defect classification. To achieve this, we composed
multiple samples of the same size with replacements from the imbalanced dataset
MLCQ1. For all the samples, we used the CodeT5-small variant to extract features
and trained a bagging method with the neural network Roberta Classification
Head to classify defects based on these features. We then compared this model
to RandomForest, one of the ensemble methods that yields good results. Our
experiments showed that the number of base estimators to use for bagging
depends on the defect to be detected. Next, we observed that it was not
necessary to use a data balancing technique with our model when the imbalance
rate was 23%. Finally, for blob detection, RandomForest had a median MCC value
of 0.36 compared to 0.12 for our method. However, our method was predominant in
Long Method detection with a median MCC value of 0.53 compared to 0.42 for
RandomForest. These results suggest that the performance of ensemble methods in
detecting structural development defects is dependent on specific defects.