Mathematical Expression Extraction in Text Fields of Documents Based on HMM

HTML  XML Download Download as PDF (Size: 2048KB)  PP. 1-13  
DOI: 10.4236/jcc.2017.514001    888 Downloads   2,670 Views  Citations

ABSTRACT

Aiming at the problem that the mathematical expressions in unstructured text fields of documents are hard to be extracted automatically, rapidly and effectively, a method based on Hidden Markov Model (HMM) is proposed. Firstly, this method trained the HMM model through employing the symbol combination features of mathematical expressions. Then, some preprocessing works such as removing labels and filtering words were carried out. Finally, the preprocessed text was converted into an observation sequence as the input of the HMM model to determine which is the mathematical expression and extracts it. The experimental results show that the proposed method can effectively extract the mathematical expressions from the text fields of documents, and also has the relatively high accuracy rate and recall rate.

Share and Cite:

Tian, X.D., Bai, R.H., Yang, F., Bai, J.Y. and Li, X.F. (2017) Mathematical Expression Extraction in Text Fields of Documents Based on HMM. Journal of Computer and Communications, 5, 1-13. doi: 10.4236/jcc.2017.514001.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.