TITLE:
Visualizing Random Forest’s Prediction Results
AUTHORS:
Hudson F. Golino, Cristiano Mauro Assis Gomes
KEYWORDS:
Machine Learning, Assessment, Prediction, Visualization, Networks, Cluster
JOURNAL NAME:
Psychology,
Vol.5 No.19,
December
5,
2014
ABSTRACT: The
current paper proposes a new visualization tool to help check the quality of
the random forest predictions by plotting the proximity matrix as weighted
networks. This new visualization technique will be compared with the
traditional multidimensional scale plot. The present paper also introduces a
new accuracy index (proportion of misplaced cases), and compares it to total
accuracy, sensitivity and specificity. It also applies cluster coefficients to
weighted graphs, in order to understand how well the random forest algorithm is
separating two classes. Two datasets were analyzed, one from a medical research
(breast cancer) and the other from a psychology research (medical student’s
academic achievement), varying the sample sizes and the predictive accuracy.
With different number of observations and different possible prediction accuracies,
it was possible to compare how each visualization technique behaves in each
situation. The results pointed that the visualization of random forest’s
predictive performance was easier and more intuitive to interpret using the
weighted network of the proximity matrix than using the multidimensional scale
plot. The proportion of misplaced cases was highly related to total accuracy,
sensitivity and specificity. This strategy, together with the computation of Zhang and Horvath’s
(2005) clustering
coefficient for weighted graphs, can be very helpful in understanding how well
a random forest prediction is doing in terms of classification.