Data Fusion with Optimized Block Kernels in LS-SVM for Protein Classification

DOI: 10.4236/eng.2013.510B048   PDF   HTML     4,083 Downloads   5,200 Views   Citations

Abstract

In this work, we developed a method to efficiently optimize the kernel function for combined data of various different sources with their corresponding kernels being already available. The vectorization of the combined data is achieved by a weighted concatenation of the existing data vectors. This induces a kernel matrix composed of the existing kernels as blocks along the main diagonal, weighted according to the corresponding the subspaces span by the data. The induced block kernel matrix is optimized in the platform of least-squares support vector machines simultaneously as the LS-SVM is being trained, by solving an extended set of linear equations, other than a quadratically constrained quadratic programming as in a previous method. The method is tested on a benchmark dataset, and the performance is significantly improved from the highest ROC score 0.84 using individual data source to ROC score 0.92 with data fusion.


Share and Cite:

Liao, L. (2013) Data Fusion with Optimized Block Kernels in LS-SVM for Protein Classification. Engineering, 5, 223-236. doi: 10.4236/eng.2013.510B048.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1] G. R. Lanckriet, T. D. Bie, N. Cristianini, M. I. Jordan and W. S. Noble, “A Statistical Framework for Genomic Data Fusion,” Bioinformatics, Vol. 20, 2005, pp. 2626-2635. http://dx.doi.org/10.1093/bioinformatics/bth294
[2] C. Berg, J. Christensen and P. Ressel, “Harmonic Analysis on Semigroups: Theory of Positive Definite and Related Functions,” Springer, New York, 1984. http://dx.doi.org/10.1007/978-1-4612-1128-0
[3] T. D. Bie, L. C. Tranchevent, L. van Oeffelen and Y. Moreau, “Kernel-Based Data Fusion for Gene Prioritization,” Bioinformatics, Vol. 23, 2007, pp. i125-i132. http://dx.doi.org/10.1093/bioinformatics/btm187
[4] A. Howard and T. Jebara, “Transformation Learning via Kernel Alignment,” The Proceedings of International Conference on Machine Learning and Applications, December 2009, pp. 301-308.
[5] J. A. K. Suykens and J. Vandewalle, “Least Squares Support Vector Machine Classifiers,” Neural Processing Letters, Vol. 9, 1999, pp. 293-300. http://dx.doi.org/10.1093/bioinformatics/bth294
[6] R. Craig and L. Liao, “Improving Protein-Protein Interaction Prediction Based on Phylogenetic Information Using a Lest-Squares Support Vector Machine,” Annals of the New York Academy of Sciences, Vol. 1115, 2007, pp. 154-167. http://dx.doi.org/10.1196/annals.1407.005
[7] U. Guldener, M. Munsterkotter, G. Kastenmuller, N. Strack, J. van Helden, C. Lemer, J. Richelles, S. J. Wodak, J. Carcie-Martinez, J. E. Perez-Ortin, H. Michael, A. Kaps, E. Talle, B. Andre, J. L. Souciet, J. De Montigny, E. Bon, C. Gaillardin and H. W. Mewes, “CYGD: The Comprehensive Yeast Genome Database,” Nucleic Acids Re-search, Vol. 33, 2005, pp. D364-D368. http://dx.doi.org/10.1093/nar/gki053
[8] S. F. Altschul, W. Gish, W. Miller, E. W. Myers and D. J. Lipman, “Basic Local Alignment Search Tool,” Journal of Molecular Biology, Vol. 215, 1990, pp. 403-410.
[9] T. F. Smith and M. S. Waterman, “Identification of Common Molecular Subsequences,” Journal of Molecular Biology, Vol. 147, 1981, pp. 195-197. http://dx.doi.org/10.1016/0022-2836(81)90087-5
[10] L. Liao and W. S. Noble, “Combining Pairwise Sequence Similarity and Support Vector Machines for Detecting Remote Protein Evolutionary and Structure Relationships,” Journal of Computational Biology, Vol. 10, 2003, pp. 857-868. http://dx.doi.org/10.1089/106652703322756113
[11] W. S. Noble, “Support Vector Machine Applications in Computational Biology,” In B. Schoekkopf, K. Tsuda and J.-P. Vert, Eds., Kernel Methods in Computational Biol-ogy, MIT Press, Cambridge, 2004, p. 7192.
[12] M. Punta, P. C. Coggill, R. Y. Eberhardt, J. Mistry, J. Tate, C. Boursnell, N. Pang, K. Forslund, G. Ceric, J. Clements, A. Heger, L. Holm, E. L. L. Sonnhammer, S. R. Eddy, A. Bateman and R. D. Finn, “The Pfam Protein Families Database,” Nucleic Acids Research, Vol. 40, 2012, pp. D290-D301.
[13] M. Gribsbov and N. Robinson, “Use of Receiver Operating Characteristic Analysis to Evaluate Sequence Matching,” Computers & Chemistry, Vol. 10, 1996, p. 2533.

  
comments powered by Disqus

Copyright © 2020 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.