Performance Analysis of Optimized Content Extraction for Cyrillic Mongolian Learning Text Materials in the Database - Journal of Computer and Communications

JCC > Vol.4 No.10, August 2016

Performance Analysis of Optimized Content Extraction for Cyrillic Mongolian Learning Text Materials in the Database ()

HTML XML

Download as PDF (Size: 823KB) PP. 79-89

DOI: 10.4236/jcc.2016.410009 1,560 Downloads 2,343 Views Citations

Author(s)

Bat-Erdene Nyandag¹, Ru Li¹, G. Indruska²

Affiliation(s)

¹School of Computer Sciences, Inner Mongolia University, China.
²IT Consultant, Destination Consulting Co., Hattisaar Hub, Putalisadak, Kathmandu, Nepal.

ABSTRACT

This paper had developed and tested optimized content extraction algorithm using NLP method, TFIDF method for word of weight, VSM for information search, cosine method for similar quality calculation from learning document at the distance learning system database. This test covered following things: 1) to parse word structure at the distance learning system database documents and Cyrillic Mongolian language documents at the section, to form new documents by algorithm for identifying word stem; 2) to test optimized content extraction from text material based on e-test results (key word, correct answer, base form with affix and new form formed by word stem without affix) at distance learning system, also to search key word by automatically selecting using word extraction algorithm; 3) to test Boolean and probabilistic retrieval method through extended vector space retrieval method. This chapter covers: to process document content extraction retrieval algorithm, to propose recommendations query through word stem, not depending on word position based on Cyrillic Mongolian language documents distinction.

KEYWORDS

Cyrillic Mongolian Language, Content Extraction Formatting, Learning Text Materials Style

Share and Cite:

Nyandag, B. , Li, R. and Indruska, G. (2016) Performance Analysis of Optimized Content Extraction for Cyrillic Mongolian Learning Text Materials in the Database. Journal of Computer and Communications, 4, 79-89. doi: 10.4236/jcc.2016.410009.

Journals Menu

Follow SCIRP

	+1 323-425-8868
	customer@scirp.org
	+86 18163351462(WhatsApp)
	1655362766

	Paper Publishing WeChat

Journals Menu

Home

About SCIRP

Service

Policies