Incorporating heterogeneous biological data sources in clustering gene expression data

HTML  Download Download as PDF (Size: 267KB)  PP. 17-23  
DOI: 10.4236/health.2009.11004    5,169 Downloads   8,899 Views  Citations

Affiliation(s)

.

ABSTRACT

In this paper, a similarity measure between genes with protein-protein interactions is pro-posed. The chip-chip data are converted into the same form of gene expression data with pear-son correlation as its similarity measure. On the basis of the similarity measures of protein- protein interaction data and chip-chip data, the combined dissimilarity measure is defined. The combined distance measure is introduced into K-means method, which can be considered as an improved K-means method. The improved K-means method and other three clustering methods are evaluated by a real dataset. Per-formance of these methods is assessed by a prediction accuracy analysis through known gene annotations. Our results show that the improved K-means method outperforms other clustering methods. The performance of the improved K-means method is also tested by varying the tuning coefficients of the combined dissimilarity measure. The results show that it is very helpful and meaningful to incorporate het-erogeneous data sources in clustering gene expression data, and those coefficients for the genome-wide or completed data sources should be given larger values when constructing the combined dissimilarity measure.

Share and Cite:

Li, G. and Wang, Z. (2009) Incorporating heterogeneous biological data sources in clustering gene expression data. Health, 1, 17-23. doi: 10.4236/health.2009.11004.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.