Threshold Selection Study on Fisher Discriminant Analysis Used in Exon Prediction for Unbalanced Data Sets

Abstract

In gene prediction, the Fisher discriminant analysis (FDA) is used to separate protein coding region (exon) from non-coding regions (intron). Usually, the positive data set and the negative data set are of the same size if the number of the data is big enough. But for some situations the data are not sufficient or not equal, the threshold used in FDA may have important influence on prediction results. This paper presents a study on the selection of the threshold. The eigen value of each exon/intron sequence is computed using the Z-curve method with 69 variables. The experiments results suggest that the size and the standard deviation of the data sets and the threshold are the three key elements to be taken into consideration to improve the prediction results.

Share and Cite:

Ma, Y. , Fang, Y. , Liu, P. and Teng, J. (2013) Threshold Selection Study on Fisher Discriminant Analysis Used in Exon Prediction for Unbalanced Data Sets. Communications and Network, 5, 601-605. doi: 10.4236/cn.2013.53B2108.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1] J. P. Mena-Chalco, H. Carrer, Y. Zana, et al., “Identification of Protein Coding Regions Using the Modified Gabor-Wavelet Transform,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, Vol. 5, No. 2, 2008, pp. 198-206. http://dx.doi.org/10.1109/TCBB.2007.70259
[2] W. Y. Wang; X. B. Ma and R. Kang, “Fisher Discriminant Analysis for fault classification,” 2012 IEEE Conference on Prognostics and System Health Management (PHM), 2012, pp. 23-25.
[3] H. Huang, J. W. Li and J. M. Liu, “Gene Expression Data Classification Based on Improved Semi-Supervised Local Fisher Discriminant Analysis,” Expert Systems with Applications, Vol. 39, No. 3, 2012, pp. 2314-2320.
[4] F. Yan, J. Kittler, K. Mikolajczyk and A. Tahir, “Non-Sparse Multiple Kernel Fisher Discriminant Analysis,” The Journal of Machine Learning Research, Vol. 13, 2012, pp. 607-642.
[5] J. S. Hamid, C. M. T. Greenwood and J. Beyene, “Weighted Kernel Fisher Discriminant Analysis for Integrating Heterogeneous Data,” Computational Statistics & Data Analysis, Vol. 56, No. 6, 2012, pp. 2031-2040.
[6] F. Gao and C.-T. Zhang, “Comparison of Various Algorithms for Recognizing Short Coding Sequences of Human Genes,” Bioinformatics, Vol. 20, No. 5, 2004, pp. 673-681.http://dx.doi.org/10.1093/bioinformatics/btg467
[7] C.-T. Zhang and J. Wang, “Recognition of Protein Coding Genes in the Yeast Genome at Better Than 95% Accuracy Based on the Z Curve,” Nucleic Acids Research, Vol. 28. No. 14, 2000, pp. 2804-2814. http://dx.doi.org/10.1093/nar/28.14.2804
[8] Y. Li and L. Jiao, “Target Recognition Based on Kernel Fisher Discriminant (In Chinese),” Journal of Xidian University, Vol. 30, No. 2, 2003, pp. 179-182.
[9] C. Zhao, W. Chen and C. Guo. “Research and Analysis of Methods for Multiclass Support Vector Machines (In Chinese),” CAAI Transactions on Intelligent Systems, Vol. 2, No. 2, 2007, pp. 11-17.
[10] J. Liu, F. Zhao and Y. Liu, “Learning Kernel Parameters for Kernel Fisher Discriminant Analysis,” Pattern Recognition Letters, Vol. 34, No. 9, 1 2013, pp. 1026-1031.
[11] Z. Ji, P. G. Jing, T. S. Yu, Y. T. Su and C. S. Liu, “Ranking Fisher Discriminant Analysis,” Neurocomputing, 2013. http://www.sciencedirect.com/science/article/pii/S0925231213002877)
[12] M. Burset and R. Guigo, “Evaluation of Gene Structure Prediction Programs,” Genomics, Vol. 34, 1996, pp. 353-367. http://dx.doi.org/10.1006/geno.1996.0298

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.