TITLE:
Performance Analysis of Optimized Content Extraction for Cyrillic Mongolian Learning Text Materials in the Database
AUTHORS:
Bat-Erdene Nyandag, Ru Li, G. Indruska
KEYWORDS:
Cyrillic Mongolian Language, Content Extraction Formatting, Learning Text Materials Style
JOURNAL NAME:
Journal of Computer and Communications,
Vol.4 No.10,
August
31,
2016
ABSTRACT: This paper had developed and tested
optimized content extraction algorithm using NLP method, TFIDF method for word
of weight, VSM for information search, cosine method for similar quality
calculation from learning document at the distance learning system database.
This test covered following things: 1) to parse word structure at the distance
learning system database documents and Cyrillic Mongolian language documents at
the section, to form new documents by algorithm for identifying word stem; 2)
to test optimized content extraction from text material based on e-test results
(key word, correct answer, base form with affix and new form formed by word
stem without affix) at distance learning system, also to search key word by
automatically selecting using word extraction algorithm; 3) to test Boolean and
probabilistic retrieval method through extended vector space retrieval method.
This chapter covers: to process document content extraction retrieval
algorithm, to propose recommendations query through word stem, not depending on
word position based on Cyrillic Mongolian language documents distinction.