Journal of Computer and Communications

Volume 3, Issue 12 (December 2015)

ISSN Print: 2327-5219   ISSN Online: 2327-5227

Google-based Impact Factor: 1.12  Citations  

Enhancing Amharic Information Retrieval System Based on Statistical Co-Occurrence Technique

HTML  XML Download Download as PDF (Size: 670KB)  PP. 67-76  
DOI: 10.4236/jcc.2015.312006    4,136 Downloads   5,617 Views  Citations

ABSTRACT

Information retrieval (IR) systems are designed to help information seekers retrieving relevant information from vast document. The need for relevant information from a vast amount of document gave birth to IR systems. Even though different IR systems exist, they cannot meet all users’ expectations. A different level of users’ knowledge makes queries to be expressed in different ways. As a result, the system may miss the core meaning of users query and retrieve dissatisfactory results. This happens mainly because of the ambiguities of words involved in the natural languages and expression mismatch among users and authors. The existing ambiguities in Amharic language have negative impacts on the performance of Amharic IR system. Some of the ambiguities for this type of problem are: spelling variants of the same word, polysemous and synonymous terms. If users are not fully knowledgeable about the information domain area, they will mostly formulate weak queries to retrieve documents. Thus, they end up frustrated with the results found from an IR system. This research has been conducted, aiming at augmenting the recall of previous work. Statistical co-occurrence technique has been used in order to expand query terms. The main reason for performing query expansion is to provide relevant documents as per users’ query that can satisfy their information need. Statistical co-occurrence method considers, frequently appearing terms with the query term, regardless of their position. The efficiency of proposed technique has been tested on the prototype system and the result found compared with the result of previous study. Accordingly, 6% recall and 2% f-measure improvement has been made. Hence, the statistical co-occurrence method outperformed the bi-gram based IR system.

Share and Cite:

Bruck, A. and Tilahun, T. (2015) Enhancing Amharic Information Retrieval System Based on Statistical Co-Occurrence Technique. Journal of Computer and Communications, 3, 67-76. doi: 10.4236/jcc.2015.312006.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.