TITLE:
Enhancing Amharic Information Retrieval System Based on Statistical Co-Occurrence Technique
AUTHORS:
Abey Bruck, Tulu Tilahun
KEYWORDS:
Statistical, Co-Occurrence, Information Retrieval, Query Expansion, Amharic
JOURNAL NAME:
Journal of Computer and Communications,
Vol.3 No.12,
December
16,
2015
ABSTRACT: Information retrieval (IR) systems are
designed to help information seekers retrieving relevant information from vast
document. The need for relevant information from a vast amount of document
gave birth to IR systems. Even though different IR systems exist, they cannot
meet all users’ expectations. A different level of users’ knowledge makes
queries to be expressed in different ways. As a result, the system may miss the
core meaning of users query and retrieve dissatisfactory results. This happens
mainly because of the ambiguities of words involved in the natural languages
and expression mismatch among users and authors. The existing ambiguities in
Amharic language have negative impacts on the performance of Amharic IR system.
Some of the ambiguities for this type of problem are: spelling variants of the
same word, polysemous and synonymous terms. If users are not fully
knowledgeable about the information domain area, they will mostly formulate
weak queries to retrieve documents. Thus, they end up frustrated with the
results found from an IR system. This research has been conducted, aiming at
augmenting the recall of previous work. Statistical co-occurrence technique has
been used in order to expand query terms. The main reason for performing query
expansion is to provide relevant documents as per users’ query that can satisfy
their information need. Statistical co-occurrence method considers, frequently
appearing terms with the query term, regardless of their position. The
efficiency of proposed technique has been tested on the prototype system and
the result found compared with the result of previous study. Accordingly, 6%
recall and 2% f-measure improvement has been made. Hence, the statistical
co-occurrence method outperformed the bi-gram based IR system.