TITLE:
A Low-Memory-Requiring and Fast Approach to Cluster Large-Scale Decoy Protein Structures
AUTHORS:
Yate-Ching Yuan, Yingzi Shang, Hongzhi Li
KEYWORDS:
Protein Structure Predicition; Protein Structure Cluster; Principal Component Analysis; Low-Momery-Requiring Clustering; Ultra-Fast Clustering
JOURNAL NAME:
Open Journal of Biophysics,
Vol.2 No.3,
July
27,
2012
ABSTRACT: This work demonstrates the so-called PCAC (Protein principal Component Analysis Clustering) method, which clusters large-scale decoy protein structures in protein structure prediction based on principal component analysis (PCA), is an ultra-fast and low-memory-requiring clustering method. It can be two orders of magnitude faster than the commonlyused pairwise rmsd-clustering (pRMSD) when enormous of decoys are involved. Instead of N(N – 1)/2 least-square fitting of rmsd calculations and N2 memory units to store the pairwise rmsd values in pRMSD, PCAC only requires N rmsd calculations and N × P memory storage, where N is the number of structures to be clustered and P is the number of preserved eigenvectors. Furthermore, PCAC based on the covariance Cartesian matrix generates essentially the identical result as that from the reference rmsd-clustering (rRMSD). From a test of 41 protein decoy sets, when the eigenvectors that contribute a total of 90% eigenvalues are preserved, PCAC method reproduces the results of near-native selections from rRMSD.