TITLE:
Threshold Selection Study on Fisher Discriminant Analysis Used in Exon Prediction for Unbalanced Data Sets
AUTHORS:
Yutao Ma, Yanbing Fang, Ping Liu, Jianfu Teng
KEYWORDS:
Fisher Discriminant Analysis; Threshold Selection; Gene Prediction; Z-Curve; Size of Data Set
JOURNAL NAME:
Communications and Network,
Vol.5 No.3C,
October
9,
2013
ABSTRACT:
In gene prediction, the Fisher discriminant analysis
(FDA) is used to separate protein coding region (exon) from non-coding
regions (intron). Usually, the positive data set and the negative data set are
of the same size if the number of the data is big enough. But for some
situations the data are not sufficient or not equal, the threshold used in FDA
may have important influence on prediction results. This paper presents a study
on the selection of the threshold. The eigen value of each exon/intron sequence
is computed using the Z-curve method with 69 variables. The experiments results
suggest that the size and the standard deviation of the data sets and the
threshold are the three key elements to be taken into consideration to improve
the prediction results.