Share This Article:

A HMM-Based System To Diacritize Arabic Text

Full-Text HTML Download Download as PDF (Size:84KB) PP. 124-127
DOI: 10.4236/jsea.2012.512B024    3,819 Downloads   5,019 Views Citations
Author(s)

ABSTRACT

The Arabic language comes under the category of Semitic languages with an entirely different sentence structure in terms of Natural Language Processing. In such languages, two different words may have identical spelling whereas their pronunciations and meanings are totally different. To remove this ambiguity, special marks are put above or below  the spelling characters to determine the correct pronunciation. These marks are called diacritics and the language that uses them is called a diacritized language. This paper presents a system for Arabic language diacritization using Hid- den Markov Models (HMMs). The system employs the renowned HMM Tool Kit  (HTK). Each single diacritic is represented as a separate model. The concatenation of output models is coupled with the input  character sequence to form the fully diacritized text. The performance of the proposed system is assessed using a data corpus that includes more than 24000 sentences.

Cite this paper

M. Khorsheed, "A HMM-Based System To Diacritize Arabic Text," Journal of Software Engineering and Applications, Vol. 5 No. 12B, 2012, pp. 124-127. doi: 10.4236/jsea.2012.512B024.

Copyright © 2020 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.