TITLE:
A Statistical Analysis of Textual E-Commerce Reviews Using Tree-Based Methods
AUTHORS:
Jessica Kubrusly, Ana Luiza Neves, Thamires Louzada Marques
KEYWORDS:
Text Mining, Supervised Classification, Tree-Based Methods, Classification Trees, Random Forest, Gradient Boosting, XGBoost
JOURNAL NAME:
Open Journal of Statistics,
Vol.12 No.3,
June
14,
2022
ABSTRACT: With the
increasing interest in e-commerce shopping, customer reviews have become
one of the most important elements that determine customer satisfaction
regarding products. This demonstrates the importance of working with Text Mining. This study is based on The Women’s Clothing E-Commerce Reviews database, which consists of reviews written by real
customers. The aim of this paper is to conduct a Text Mining approach on a set
of customer reviews. Each review was classified as either a positive or
negative review by employing a
classification method. Four tree-based methods were applied to solve the
classification problem, namely Classification Tree, Random Forest, Gradient
Boosting and XGBoost. The dataset was categorized into training and test sets.
The results indicate that the Random Forest method displays an overfitting,
XGBoost displays an overfitting if the number of trees is too high,
Classification Tree is good at detecting negative reviews and bad at detecting
positive reviews and the Gradient Boosting shows stable values and quality
measures above 77% for the test dataset. A consensus between the applied
methods is noted for important classification terms.