Threshold Selection Study on Fisher Discriminant Analysis Used in Exon Prediction for Unbalanced Data Sets ()
ABSTRACT
In gene prediction, the Fisher discriminant analysis
(FDA) is used to separate protein coding region (exon) from non-coding
regions (intron). Usually, the positive data set and the negative data set are
of the same size if the number of the data is big enough. But for some
situations the data are not sufficient or not equal, the threshold used in FDA
may have important influence on prediction results. This paper presents a study
on the selection of the threshold. The eigen value of each exon/intron sequence
is computed using the Z-curve method with 69 variables. The experiments results
suggest that the size and the standard deviation of the data sets and the
threshold are the three key elements to be taken into consideration to improve
the prediction results.
Share and Cite:
Ma, Y. , Fang, Y. , Liu, P. and Teng, J. (2013) Threshold Selection Study on Fisher Discriminant Analysis Used in Exon Prediction for Unbalanced Data Sets.
Communications and Network,
5, 601-605. doi:
10.4236/cn.2013.53B2108.
Cited by
No relevant information.