International Journal of Intelligence Science

Volume 2, Issue 4 (October 2012)

ISSN Print: 2163-0283   ISSN Online: 2163-0356

Google-based Impact Factor: 0.58  Citations  

Gender Prediction on Twitter Using Stream Algorithms with N-Gram Character Features

HTML  Download Download as PDF (Size: 68KB)  PP. 143-148  
DOI: 10.4236/ijis.2012.224019    9,209 Downloads   16,003 Views  Citations

ABSTRACT

The rapid growth of social networks has produced an unprecedented amount of user-generated data, which provides an excellent opportunity for text mining. Authorship analysis, an important part of text mining, attempts to learn about the author of the text through subtle variations in the writing styles that occur between gender, age and social groups. Such information has a variety of applications including advertising and law enforcement. One of the most accessible sources of user-generated data is Twitter, which makes the majority of its user data freely available through its data access API. In this study we seek to identify the gender of users on Twitter using Perceptron and Nai ve Bayes with selected 1 through 5-gram features from tweet text. Stream applications of these algorithms were employed for gender prediction to handle the speed and volume of tweet traffic. Because informal text, such as tweets, cannot be easily evaluated using traditional dictionary methods, n-gram features were implemented in this study to represent streaming tweets. The large number of 1 through 5-grams requires that only a subset of them be used in gender classification, for this reason informative n-gram features were chosen using multiple selection algorithms. In the best case the Naive Bayes and Perceptron algorithms produced accuracy, balanced accuracy, and F-measure above 99%.

Share and Cite:

Miller, Z. , Dickinson, B. and Hu, W. (2012) Gender Prediction on Twitter Using Stream Algorithms with N-Gram Character Features. International Journal of Intelligence Science, 2, 143-148. doi: 10.4236/ijis.2012.224019.

Cited by

[1] Gender Inference Based On Twitter Profiles
[2] An attention based multi-modal gender identification system for social media users
Multimedia Tools and …, 2022
[3] Investigations in Emotion Aware Multimodal Gender Prediction Systems From Social Media Data
IEEE Transactions …, 2022
[4] Development of Multi-task Models for Emotion-Aware Gender Prediction
2022 International Joint …, 2022
[5] Research on gender prediction based on short texts on Chinese social platforms
2022 2nd International Conference on …, 2022
[6] Using peer-to-peer communication characteristics to improve gender prediction in electronic discourse
2021
[7] Detecting Suicidal Ideation from Online Texts
… Virtual Conference on …, 2021
[8] Gender Classification Models and Feature Impact for Social Media Author Profiling
Perez-Abadin, P Martin-Rodilla… - … Conference on Evaluation …, 2021
[9] Text-Based Gender Classification of Twitter Data using Naive Bayes and SVM Algorithm
TENCON 2021-2021 …, 2021
[10] Using homophily to analyze and develop link prediction models with deep learning framework
2021
[11] A multimodal author profiling system for tweets
IEEE Transactions on …, 2021
[12] Demographical gender prediction of Twitter users using big data analytics: an application of decision marketing
2021
[13] Experimental Analysis of the Relevance of Features and Effects on Gender Classification Models for Social Media Author Profiling.
2021
[14] Automating Key Phrase Extraction from Fault Logs to Support Post-Inspection Repair of Software Requirements
2021
[15] An Emotion-aided Gender Prediction System
… Joint Conference on …, 2021
[16] Multi-Label Author Profiling on Multi-Lingual Text
2021
[17] Разработка модели прогнозирования уровня академической успеваемости студентов по цифровому следу из социальной сети: магистерская …
2020
[18] Technik–Medien–Geschlecht revisited. Gender im Kontext von Datafizierung, Algorithmen und digitalen Medientechnologien–eine kritische Bestandsaufnahme
2020
[19] Online Social Networks and Writing Styles–A Review of the Multidisciplinary Literature
2020
[20] Взаимосвязь образовательных достижений старшеклассников и их цифрового следа в социальной сети.
2020
[21] Краткосрочное прогнозирование демографических тенденций на основе данных Google trends
2020
[22] INTERRELATION BETWEEN ACADEMIC PERFORMANCE OF STUDENTS AND THEIR PERSONAL LEARNING ENVIRONMENT IN A SOCIAL NETWORK
2020
[23] Learning and Inferring User Characteristics from Online Behavior and Content
2020
[24] Text mining: A field of opportunities.
2020
[25] Sentiment Analysis as an Indicator to Evaluate Gender disparity on Sexual Violence Tweets in South Africa
2020
[26] Gender Classification using Twitter Text Data
2020
[27] Detecting Suspicious Activities of Digital Trolls During the Political Crisis
2020
[28] Improving User Attribute Classification with Text and Social Network Attention
2019
[29] Towards Identifying Humor and Author's Gender in Code-mixed Social Media Content
2019
[30] Gender Inference for Facebook Picture Owners
2019
[31] Gender Prediction from Tweets: Improving Neural Representations with Hand-Crafted Features
2019
[32] Multi-granularity Convolutional Neural Network with Feature Fusion and Refinement for User Profiling
2019
[33] Cross-Cultural Image-Based Author Profiling in Twitter
2019
[34] Using Short Texts and Emojis to Predict the Gender of a Texter in Turkish
2019
[35] Neural Gender Prediction in Microblogging with Emotion-aware User Representation
2019
[36] Einleitung: Cyborgs revisited: Zur Verbindung von Geschlecht, Technologien und Maschinen
2019
[37] Gender Detection of Twitter Users Based on Multiple Information Sources
Interactions Between Computational Intelligence and Mathematics Part 2, 2019
[38] Twitter Homophily: Network Based Prediction of User's Occupation
2019
[39] Author gender identification from text using Bayesian Random Forest
2019
[40] A Language-independent gender classifier for Online Social Networks
2019
[41] Micro-blog User Profiling: A Supervised Clustering based Approach for Age and Gender Classification
2019
[42] تشخیص جنسیت نویسندگان از روی متون با استفاده از جنگل تصادفی بیز‎
2019
[43] GENDER PREDICTION USING TWITTER DATA
2019
[44] Interactions3: Language, Demographics, and Personality; an In-Depth Analysis of German Tweets
2018
[45] Emotion-based Mining for Gender Prediction in Online Social Networks
2018
[46] INDEPENDENT DETERMINATION OF DEMOGRAPHIC CHARACTERISTICS OF SOCIAL NETWORK USERS
2018
[47] A DISSERTATION SUBMITTED TO THE GRADUATE SCHOOL
2018
[48] Predicting user gender on social media sites using geographical information
MEDES 2018 Proceedings of the 10th International Conference on Management of Digital EcoSystems, 2018
[49] Gender Recognition Based on Social Networks for Multimedia Production
2018
[50] A Study of Arabic Social Media Users—Posting Behavior and Author's Gender Prediction
Cognitive Computation, 2018
[51] Detecting Popularity of Ideas and Individuals in Online Community
2018
[52] 基于双通道 LSTM 模型的用户性别分类方法研究
2018
[53] Word representations for gender classification using deep learning
Procedia computer science, 2018
[54] An Automatic Author Profiling from Non-Normative Lithuanian Texts
CEUR Workshop Proceedings, 2018
[55] Research on User Gender Prediction of Chinese Microblog Based on Short Text Analysis
2018
[56] Automatinis autoriaus charakteristikų nustatymas iš lietuviško nenorminės kalbos teksto
2018
[57] Düzenli İfadelerin Metin İşlemede Kullanımı Üzerine Bir İnceleme
2018
[58] Interactions3: language, demographics, and personality: an in-depth analysis of German tweets
2018
[59] Multilingual SMS-based author profiling: Data and methods
Natural Language Engineering, 2018
[60] Classification automatique des SMS–analyse des caractéristiques langagières de deux groupes d'âge
2018
[61] Humor detection in english-hindi code-mixed social media content: Corpus and baseline system
2018
[62] A Hybrid Model for Role-related User Classification on Twitter
2018
[63] What demographic attributes do our digital footprints reveal? A systematic review
2018
[64] Understanding People in Low Resourced Languages
2018
[65] МЕТОДОЛОГИЯ ПОСТРОЕНИЯ СОЦИАЛЬНО-ДЕМОГРАФИЧЕСКИХ ПРОФИЛЕЙ ПОЛЬЗОВАТЕЛЕЙ В ИНТЕРНЕТЕ
2018
[66] INSTRUMENT FOR DRAWING UP SOCIAL-DEMOGRAPHIC PROFILES OF WEB USERS
2018
[67] Effective methods to detect cyberbullying and influential spreaders in an online social network/Mohammed Ali Derhem Al Garadi
2017
[68] LTRC IIITH at IBEREVAL 2017: Stance and Gender Detection in Tweets on Catalan Independence.
2017
[69] Ανίχνευση φύλου στο Twitter μέσω υβριδικού αλγορίθμου μηχανικής μάθησης
2017
[70] An Approach for Identifying Author Profiles of Blogs
Advanced Data Mining and Applications, 2017
[71] Inferring Gender of Chinese in Social Networks
ICIE 2017 Proceedings of the 6th International Conference on Information Engineering, 2017
[72] Gender Inference on Twitter in Swedish Contexts
2017
[73] Language-independent Gender Prediction on Twitter
2017
[74] Mobile recommendations based on interest prediction from consumer's installed apps–Insights from a large-scale field study
Information Systems, 2017
[75] Twitter Usage Patterns as a Predictor of User Gender
2017
[76] Автономное распознавание демографических атрибутов пользователей социальных сервисов
2017
[77] An automatic gender detection from non-normative Lithuanian texts
2017
[78] LTRC IIITH at IBEREVAL 2017: Stance and Gender Detection in Tweets on Catalan Independence
Proceedings of the Second Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2017), 2017
[79] Detection of User Demographics on Social Media: A Review of Methods and Recommendations for Best Practices
2017
[80] Gender prediction on a real life blog data set using LSI and KNN
2017
[81] Stylometry detection using deep learning
Computational Intelligence in Data Mining, 2017
[82] Improving Twitter gender classification using multiple classifiers
Improving Twitter gender …, 2016
[83] Cybercrime detection in online communications: The experimental case of cyberbullying detection in the Twitter network
Computers in Human Behavior, 2016
[84] Features combination for gender recognition on Twitter users
2016
[85] Enhanced Gender Identification through Social Media Analytics on the Cloud
2016
[86] Identifying Gender from SMS Text Messages
2016
[87] Improving Twitter Gender Classification using Multiple Classifiers⋆
2016
[88] Decoders for predicting author age, gender, location from short texts
2016
[89] 基于字矩阵交运算的 n-grams 特征选择加权算法
2016
[90] Gender classification of microblog text based on authorial style
Information Systems and e-Business Management, 2016
[91] Needmining: Towards Analytical Support for Service Design
International Conference on Exploring Services Science, 2016
[92] Creating Extended Gender Labelled Datasets of Twitter Users
International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, 2016
[93] Towards Real-Time, Country-Level Location Classification of Worldwide Tweets
arXiv preprint arXiv:1604.07236, 2016
[94] An on-device gender prediction method for mobile users using representative wordsets
Expert Systems with Applications, 2016
[95] Evaluation and sociolinguistic analysis of text features for gender and age identification
2016
[96] 基于相关性及语义的n-grams特征加权算法*
2015
[97] Interactive Gender Inference with Integer Linear Programming.
2015
[98] Dimensionality Reduction of Distributed Vector Word Representations and Emoticon Stemming for Sentiment Analysis
Journal of Data Analysis and Information Processing, 2015
[99] W-POS 语言模型及其选择与匹配算法
计算机应用, 2015
[100] 基于认证用户信息的微博用户类型识别方法
计算机科学与探索, 2015
[101] TOWARDS HYBRID CLUSTERING FOR B2CCUSTOMER SEGMENTATION: CONCEPTUAL FRAMEWORK
2015
[102] Interactive gender inference with integer linear programming
Proceedings of the 24th International Conference on Artificial Intelligence, 2015
[103] Interactive Gender Inference in Social Media
Database Systems for Advanced Applications, 2015
[104] 基于相关性及语义的 n-grams 特征加权算法
2015
[105] Методы построения социо-демографических профилей пользователей сети Интернет
2015
[106] 스마트 기기의 다종 데이터를 이용한 사용자 성별 예측 기법
2015
[107] DISEÑO Y CONSTRUCCIÓN DE UN SISTEMA WEB DE ANÁLISIS DE OPINIONES EN TWITTER INTEGRANDO ALGORITMOS DE DATA MINING
2015
[108] Detecting portuguese and english Twitter users' gender
2015
[109] Gender identification in Modern Greek tweets
2015
[110] FACULTAD DE CIENCIAS FÍSICAS Y MATEMÁTICAS DEPARTAMENTO DE INGENIERÍA INDUSTRIAL
2015
[111] Gender Inference for Arabic Language in Social Media
International Journal of Knowledge Society Research (IJKSR), 2014
[112] 中文微博用户性别分类方法研究
中文信息学报, 2014
[113] The predictive power of tweets: an exploratory study
2014
[114] Prediction of interest for dynamic profile of Twitter user
Advanced Informatics: Concept, Theory and Application (ICAICTA), 2014 International Conference of, 2014
[115] Short text clustering using numerical data based on n-gram
Confluence The Next Generation Information Technology Summit (Confluence), 2014 5th International Conference-, 2014
[116] 基于粗糙集的微博用户性别识别
计算机应用, 2014
[117] A robust gender inference model for online social networks and its application to LinkedIn and Twitter
First Monday, 2014
[118] Identifying Gender of Microblog Users Based on Message Mining
Web-Age Information Management, Springer, 2014
[119] Определение демографических атрибутов пользователей микроблогов1
Труды Института системного программирования РАН, 2013
[120] Detection of demographic attributes of microblog users
2013
[121] Научный руководитель
АВ Коршунов - modis.ispras.ru, 2012
[122] СОВРЕМЕННЫЕ ИНФОРМАЦИОННЫЕ ТЕХНОЛОГИИ
[123] Modelo de aprendizaje automático para la clasificación temprana de flujos de texto aplicado a la detección de desórdenes psicológicos
[124] Technik–Medien–Geschlecht revisited

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.