A Comparative Survey on Arabic Stemming: Approaches and Challenges - Intelligent Information Management

IIM > Vol.9 No.2, March 2017

A Comparative Survey on Arabic Stemming: Approaches and Challenges ()

HTML XML

Download as PDF (Size: 664KB) PP. 39-67

DOI: 10.4236/iim.2017.92003 2,562 Downloads 5,832 Views Citations

Author(s)

Mohammad Mustafa¹, Afag Salah Eldeen², Sulieman Bani-Ahmad³, Abdelrahman Osman Elfaki⁴

Affiliation(s)

¹Department of Computer Information Systems, Faculty of Computers and Information Technology, University of Tabuk, Tabuk, SA.
²Department of Computer Science, College of Computer Science and Information Technology, Sudan University of Science and Technology, Khartoum State, Sudan.
³Department of Computer Information Systems, School of Information Technology, Al-Balqa Applied University, Salt, Jordan.
⁴Department of Information Technology, Faculty of Computers and Information Technology, University of Tabuk, Tabuk, Saudi Arabia.

ABSTRACT

Arabic, as one of the Semitic languages, has a very rich and complex morphology, which is radically different from the European and the East Asian languages. The derivational system of Arabic, is therefore, based on roots, which are often inflected to compose words, using a spectacular and a relatively large set of Arabic morphemes affixes, e.g., antefixs, prefixes, suffixes, etc. Stemming is the process of rendering all the inflected forms of word into a common canonical form. Stemming is one of the early and major phases in natural processing, machine translation and information retrieval tasks. A number of Arabic language stemmers were proposed. Examples include light stemming, morphological analysis, statistical-based stemming, N-grams and parallel corpora (collections). Motivated by the reported results in the literature, this paper attempts to exhaustively review current achievements for stemming Arabic texts. A variety of algorithms are discussed. The main contribution of the paper is to provide better understanding among existing approaches with the hope of building an error-free and effective Arabic stemmer in the near future.

KEYWORDS

Arabic Language, Light Stemming, Root-Based Stemming, Co-Occurrence, Artificial Intelligence Stemming

Share and Cite:

Mustafa, M. , Eldeen, A. , Bani-Ahmad, S. and Elfaki, A. (2017) A Comparative Survey on Arabic Stemming: Approaches and Challenges. Intelligent Information Management, 9, 39-67. doi: 10.4236/iim.2017.92003.

Journals Menu

Follow SCIRP

	+1 323-425-8868
	customer@scirp.org
	+86 18163351462(WhatsApp)
	1655362766

	Paper Publishing WeChat

Journals Menu

Home

About SCIRP

Service

Policies