TITLE:
A Stylometric Investigation of Linguistic Styles Based on a Vietnamese Corpus
AUTHORS:
Tuyet-Nhung Nguyen, Dien Dinh
KEYWORDS:
Stylometry, Vietnamese Corpus, Correspondence Analysis
JOURNAL NAME:
Open Journal of Social Sciences,
Vol.9 No.12,
December
7,
2021
ABSTRACT: The role of stylometric methods in linguistics has received increased attention across a number of
disciplines in recent years, particularly in forensic linguistics. This study
assesses the value of correspondence analysis, a stylometric method, in
Vietnamese text analysis. Based on a dataset extracted from VVC (VnExpress
Viewpoint Corpus), a 1.3-million-token corpus of Vietnamese opinion articles, linguistic features examined are seven parts-of-speech features to seek relational features characterizing authorial styles. Our focus
in the analysis is on feature effects, with the aim to shed light on whether
linguistic features of writing styles are consistent across various genders and
professions. Seven features altogether produce encouraging results to what is
acknowledged to be a difficult problem for Vietnamese language. In addition, we
find that when using correspondence analysis for seven linguistic features in
the dataset based on authors’ gender, conjunctions and verbs perform best.
Regarding authors’ profession, conjunctions and pronouns offer a striking
improvement on stylometric investigation. The discriminating ability was
particularly impressive, suggesting that, in a collective sense,
parts-of-speech features provide a good set of markers.