A Hybrid Algorithm for Stemming of Nepali Text - Intelligent Information Management

IIM > Vol.5 No.4, July 2013

A Hybrid Algorithm for Stemming of Nepali Text ()

HTML

Download as PDF (Size: 58KB) PP. 136-139

DOI: 10.4236/iim.2013.54014 4,510 Downloads 7,580 Views Citations

Author(s)

Chiranjibi Sitaula

Affiliation(s)

Central Department of Computer Science and Information Technology, Tribhuvan University, Kathmandu, Nepal.

ABSTRACT

In this paper, a new context free stemmer is proposed which consists of the combination of traditional rule based system with string similarity approach. This algorithm can be called as hybrid algorithm. It is language dependent algorithm. Context free stemmer means that stemmer which stems the word that is not based on the context i.e., for every context such rule is applied. After stripping the words using traditional context free rule based approach, it may over stem or under stem the inflected words which are overcome by applying string similarity function of dynamic programming. For measuring the string similarity function, edit distance is used. The stripped inflected word is compared with the words stored in a text database available. That word having minimum distance is taken as the substitution of the stripped inflected word which leads to the stem of it. The concept of traditional rule based system and corpus based approach is heavily used in this approach. This algorithm is tested for Nepali Language which is based on Devanagari Script. The approach has given better result in comparison to traditional rule based system particularly for Nepali Language only. The total accuracy of this hybrid algorithm is 70.10% whereas the total accuracy of traditional rule based system is 68.43%.

KEYWORDS

Share and Cite:

Sitaula, C. (2013) A Hybrid Algorithm for Stemming of Nepali Text. Intelligent Information Management, 5, 136-139. doi: 10.4236/iim.2013.54014.

Journals Menu