A Scalable Method for Cross-Platform Merging of SNP Array Datasets

HTML  Download Download as PDF (Size: 703KB)  PP. 502-508  
DOI: 10.4236/eng.2013.510B103    4,210 Downloads   5,497 Views  
Author(s)

ABSTRACT

Single nucleotide polymorphism (SNP) array is a recently developed biotechnology that is extensively used in the study of cancer genomes. The various available platforms make cross-study validations/comparisons difficult. Meanwhile, sample sizes of the studies are fast increasing, which poses a heavy computational burden to even the fastest PC.Here, we describe a novel method that can generate a platform-independent dataset given SNP arrays from multiple platforms. It extracts the common probesets from individual platforms, and performs cross-platform normalizations and summari-zations based on these probesets. Since different platforms may have different numbers of probes per probeset (PPP), the above steps produce preprocessed signals with different noise levels for the platforms. To handle this problem, we adopt a platform-dependent smoothing strategy, and produce a preprocessed dataset that demonstrates uniform noise levels for individual samples.To increase the scalability of the method to a large number of samples, we devised an algorithm that split the samples into multiple tasks, and probesets into multiple segments before submitting to a parallel computing facility. This scheme results in a drastically reduced computation time and increased ability to process ultra-large sample sizes and arrays.

Share and Cite:

Chen, P. and Hung, Y. (2013) A Scalable Method for Cross-Platform Merging of SNP Array Datasets. Engineering, 5, 502-508. doi: 10.4236/eng.2013.510B103.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.