ABSTRACT
Introduction: The present work compared the
prediction power of the different data mining techniques used to develop the
HIV testing prediction model. Four popular data mining algorithms (Decision
tree, Naive Bayes, Neural network, logistic regression) were used to build the
model that predicts whether an individual was being tested for HIV among adults
in Ethiopia using EDHS 2011. The final experimentation results indicated that
the decision tree (random tree algorithm) performed the best with accuracy of
96%, the decision tree induction method (J48) came out to be the second best
with a classification accuracy of 79%, followed by neural network (78%).
Logistic regression has also achieved the least classification accuracy of 74%.
Objectives: The objective of this study is to compare the prediction power of
the different data mining techniques used to develop the HIV testing
prediction model. Methods: Cross-Industry Standard Process for Data Mining
(CRISP-DM) was used to predict the model for HIV testing and explore
association rules between HIV testing and the selected attributes. Data
preprocessing was performed and missing values for the categorical variable
were replaced by the modal value of the variable. Different data mining
techniques were used to build the predictive model. Results: The target dataset
contained 30,625 study participants. Out of which 16,515 (54%) participants
were women while the rest 14,110 (46%) were men. The age of the participants in
the dataset ranged from 15 to 59 years old with modal age of 15 - 19 years old.
Among the study participants, 17,719 (58%) have never been tested for HIV while
the rest 12,906 (42%) had been tested. Residence, educational level, wealth
index, HIV related stigma, knowledge related to HIV, region, age group, risky
sexual behaviour attributes, knowledge about where to test for HIV and
knowledge on family planning through mass media were found to be predictors for
HIV testing. Conclusion and Recommendation: The results obtained from this
research reveal that data mining is crucial in extracting relevant information
for the effective utilization of HIV testing services which has clinical,
community and public health importance at all levels. It is vital to apply
different data mining techniques for the same settings and compare the model
performances (based on accuracy, sensitivity, and specificity) with each other.
Furthermore, this study would also invite interested researchers to explore
more on the application of data mining techniques in healthcare industry or
else in related and similar settings for the future.