Threshold Selection Study on Fisher Discriminant Analysis Used in Exon Prediction for Unbalanced Data Sets

Yutao Ma; Yanbing Fang; Ping Liu; Jianfu Teng

doi:10.4236/cn.2013.53B2108

Communications and Network > Vol.5 No.3C, September 2013

Threshold Selection Study on Fisher Discriminant Analysis Used in Exon Prediction for Unbalanced Data Sets

Yutao Ma, Yanbing Fang, Ping Liu, Jianfu Teng
School of Electronic Information Engineering, Tianjin University, Tianjin, China.
School of Mathematics and Computer Science, Ningxia University, Yinchuan, China.
School of Physics and Electrical Information Engineering, Ningxia University, Yinchuan, China.
DOI: 10.4236/cn.2013.53B2108 PDF HTML 4,859 Downloads 6,411 Views

Abstract

In gene prediction, the Fisher discriminant analysis (FDA) is used to separate protein coding region (exon) from non-coding regions (intron). Usually, the positive data set and the negative data set are of the same size if the number of the data is big enough. But for some situations the data are not sufficient or not equal, the threshold used in FDA may have important influence on prediction results. This paper presents a study on the selection of the threshold. The eigen value of each exon/intron sequence is computed using the Z-curve method with 69 variables. The experiments results suggest that the size and the standard deviation of the data sets and the threshold are the three key elements to be taken into consideration to improve the prediction results.

Keywords

Fisher Discriminant Analysis; Threshold Selection; Gene Prediction; Z-Curve; Size of Data Set

Share and Cite:

Ma, Y. , Fang, Y. , Liu, P. and Teng, J. (2013) Threshold Selection Study on Fisher Discriminant Analysis Used in Exon Prediction for Unbalanced Data Sets. Communications and Network, 5, 601-605. doi: 10.4236/cn.2013.53B2108.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1]	J. P. Mena-Chalco, H. Carrer, Y. Zana, et al., “Identification of Protein Coding Regions Using the Modified Gabor-Wavelet Transform,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, Vol. 5, No. 2, 2008, pp. 198-206. http://dx.doi.org/10.1109/TCBB.2007.70259
[2]	W. Y. Wang; X. B. Ma and R. Kang, “Fisher Discriminant Analysis for fault classification,” 2012 IEEE Conference on Prognostics and System Health Management (PHM), 2012, pp. 23-25.
[3]	H. Huang, J. W. Li and J. M. Liu, “Gene Expression Data Classification Based on Improved Semi-Supervised Local Fisher Discriminant Analysis,” Expert Systems with Applications, Vol. 39, No. 3, 2012, pp. 2314-2320.
[4]	F. Yan, J. Kittler, K. Mikolajczyk and A. Tahir, “Non-Sparse Multiple Kernel Fisher Discriminant Analysis,” The Journal of Machine Learning Research, Vol. 13, 2012, pp. 607-642.
[5]	J. S. Hamid, C. M. T. Greenwood and J. Beyene, “Weighted Kernel Fisher Discriminant Analysis for Integrating Heterogeneous Data,” Computational Statistics & Data Analysis, Vol. 56, No. 6, 2012, pp. 2031-2040.
[6]	F. Gao and C.-T. Zhang, “Comparison of Various Algorithms for Recognizing Short Coding Sequences of Human Genes,” Bioinformatics, Vol. 20, No. 5, 2004, pp. 673-681.http://dx.doi.org/10.1093/bioinformatics/btg467
[7]	C.-T. Zhang and J. Wang, “Recognition of Protein Coding Genes in the Yeast Genome at Better Than 95% Accuracy Based on the Z Curve,” Nucleic Acids Research, Vol. 28. No. 14, 2000, pp. 2804-2814. http://dx.doi.org/10.1093/nar/28.14.2804
[8]	Y. Li and L. Jiao, “Target Recognition Based on Kernel Fisher Discriminant (In Chinese),” Journal of Xidian University, Vol. 30, No. 2, 2003, pp. 179-182.
[9]	C. Zhao, W. Chen and C. Guo. “Research and Analysis of Methods for Multiclass Support Vector Machines (In Chinese),” CAAI Transactions on Intelligent Systems, Vol. 2, No. 2, 2007, pp. 11-17.
[10]	J. Liu, F. Zhao and Y. Liu, “Learning Kernel Parameters for Kernel Fisher Discriminant Analysis,” Pattern Recognition Letters, Vol. 34, No. 9, 1 2013, pp. 1026-1031.
[11]	Z. Ji, P. G. Jing, T. S. Yu, Y. T. Su and C. S. Liu, “Ranking Fisher Discriminant Analysis,” Neurocomputing, 2013. http://www.sciencedirect.com/science/article/pii/S0925231213002877)
[12]	M. Burset and R. Guigo, “Evaluation of Gene Structure Prediction Programs,” Genomics, Vol. 34, 1996, pp. 353-367. http://dx.doi.org/10.1006/geno.1996.0298

Journals Menu

Follow SCIRP

	+1 323-425-8868
	customer@scirp.org
	+86 18163351462(WhatsApp)
	1655362766

	Paper Publishing WeChat

Journals Menu

Home

About SCIRP

Service

Policies