Pitfall of genome-wide association studies: Sources of inconsistency in genotypes and their effects


Personalized medicine will improve heath outcomes and patient satisfaction. However, implementing personalized medicine based on individuals’ biological information is far from simple, requiring genetic biomarkers that are mainly developed and used by the pharmaceutical companies for selecting those patients who benefit more, or have less risk of adverse drug reactions, from a particular drug. Genome-wide Association Studies (GWAS) aim to identify genetic variants across the human genome that might be utilized as genetic biomarkers for diagnosis and prognosis. During the last several years, high-density genotyping SNP arrays have facilitated GWAS that successfully identified common genetic variants associated with a variety of phenotypes. However, each of the identified genetic variants only explains a very small fraction of the underlying genetic contribution to the studied phenotypic trait. The replication studies demonstrated that only a small portion of associated loci in the initial GWAS can be replicated, even within the same populations. Given the complexity of GWAS, multiple sources of Type I (false positive) and Type II (false negative) errors exist. The inconsistency in genotypes that caused either by the genotypeing experiment or by genotype calling process is a major source of the false GWAS findings. Accurate and reproducible genotypes are paramount as inconsistency in genotypes can lead to an inflation of false associations. This article will review the sources of inconsistency in genotypes and discuss its effect in GWAS findings.

Share and Cite:

Hong, H. , Xu, L. , Su, Z. , Liu, J. , Ge, W. , Shen, J. , Fang, H. , Perkins, R. , Shi, L. and Tong, W. (2012) Pitfall of genome-wide association studies: Sources of inconsistency in genotypes and their effects. Journal of Biomedical Science and Engineering, 5, 557-573. doi: 10.4236/jbise.2012.510069.

Conflicts of Interest

The authors declare no conflicts of interest.


[1] Langreth, R. and Waldholz, M. (1999) New era of personalized medicine: targeting drugs for each unique genetic profile. Oncologist, 4, 426-427.
[2] President’s Council of Advisors on Science Technology. (2008) Priorities for Personalised Medicine. (www.whitehouse.gov/files/documents/ostp/PCAST/ pcast_report_v2.pdf)
[3] Jorgensen, J.T. (2009) New era of personalized medicine: a 10-year anniversary. Oncologist, 14, 557-558.
[4] Hamburg, M.A. and Collins, F.S. (2010) The path to personalized medicine. N Engl J Med, 363, 301-304.
[5] Biomarkers Definitions Working Group. (2001) Bio-markers and surrogate endpoints: preferred definitions and conceptual framework. Clin Pharmacol Ther, 69, 89-95.
[6] Hong, H., Goodsaid, F., Shi, L., Tong, W. (2010) Molecular biomarkers: a US FDA effort. Biomarkers Med, 4, 215-225.
[7] Lucas, A., Nolan, D., Mallal, S. (2007) HLA-B*5701 screening for susceptibility to abacavir hypersensitivity. J Antimicrob Chemother, 59, 591-593.
[8] Mallal, S., Phillips, E., Carosi, G., et al. (2008) HLA-B*5701 screening for susceptibility to abacavir hypersensitivity. N Engl J Med, 358, 568-569.
[9] US Food and Drug Administration. Table of valid genomic biomarkers in the context of approved drug labels. Available at: www.fda.gov/Drugs/ScienceResearch /ResearchAreas/Pharmacogenetics/ucm083378.htm.
[10] Sanderson, S., Emery, J., Higgins, J. (2005) CYP2C9 gene variants, drug dose, and bleeding risk in warfarin-treated patients: A HuGEnet? systemic review and metaanalysis. Genet Med, 7, 97-104.
[11] Takahashi, H., Wilkinson, G.R., Nutescu, E.A., et al. (2006) Different contributions of polymorphisms in VKORC1 and CYP2C9 to intra- and inter-population differences in maintenance doses of warfarin in Japanese, Caucasians and African Americans. Pharmacogenet Genomics, 16, 101-110.
[12] Kim, M.J., Huang, S.M., Meyer, et al. (2009) A regulatory science perspective on warfarin therapy: A pharacogenetic opportunity. J Clin Pharmacol, 49, 138-146.
[13] McGrath, J.P., Capon, D.J., Smith, D.H., et al. (1983) Structure and organization of the human Kiras protooncogene and a related processed pseudogene. Nature, 304, 501-506.
[14] Kranenburg, O. (2005) The KRAS oncogene: past, present, and future. Biochim Biophys Acta, 1756, 81-82.
[15] Lander, E.S. (1996) The new genomics: Global views of biology. Science, 274, 536-539.
[16] The International HapMap Consortium. (2005) A haplo-type map of the human genome. Nature, 437, 1299-1320.
[17] The International HapMap Consortium. (2007) A second generation human haplotype map of over 3.1 million SNPs. Nature, 449, 851-862.
[18] Klein, R.J., Zeiss, C., Chew, E.Y., et al. (2005) Complement factor H polymorphism in agerelated macular degeneration. Science, 308, 385-389.
[19] Dupuis, J., Langenberg, C., Prokopenko, I., et al. (2010) New genetic loci implicated in fasting glucose homeostasis and their impact on TYPE 2 DIABETES risk. Nat Genet, 42, 105-116.
[20] Saxena, R., Hivert, M.F. Langenberg, C., et al. (2010) Genetic variation in GIPR influences the glucose and insulin responses to an oral glucose challenge. Nat Genet, 42, 142-148.
[21] Spars?, T., Bonnefond, A., Andersson, E., et al. (2009) G-allele of intronic rs10830963 in MTNR1B confers increased risk of impaired fasting glycemia and TYPE 2 DIABETES through an impaired glucose-stimulated insulin release: studies involving 19,605 Europeans. Diabetes, 58, 1450-1456.
[22] Lavedan, C., Licamele, L., Volpi, S., et al. (2008) Association of the NPAS3 gene and five other loci with response to the antipsychotic iloperidone identified in a whole genome association study. Mol Psychiatry, 14, 804-819.
[23] Turner, S.T., Bailey, K.R., Fridley, B.L., et al. (2008) Genomic association analysis suggests chromosome 12 locus influencing antihypertensive response to thiazide diuretic. Hypertension, 52, 359-365.
[24] Takeuchi, F., McGinnis, R., Bourgeois, S., et al. (2009) A genome-wide association study confirms VKORC1, CYP2C9, and CYP4F2 as principal genetic determinants of warfarin dose. PLoS Genet, 5, E1000433.
[25] Kindmark, A., Jawaid, A., Harbron, C.G., et al. (2008) Genome-wide pharmacogenetic investigation of a hepatic adverse event without clinical signs of immunopathology suggests an underlying immune pathogenesis. Pharmaco-genomics J, 8, 186-195.
[26] Daly, A.K., Donaldson, P.T., Bhatnagar, P., et al. (2009) HLA-B*5701 genotype is a major determinant of drug induced liver injury due to flucloxacillin. Nat Genet, 41, 816-9.
[27] Zeggini, E., Weedon, M.N., Lindgren, C.M., et al. (2007) Replication of genome-wide association signals in UK samples reveals risk loci for type 2 diabetes. Science, 316, 1336-1341.
[28] Scott, L.J., Mohlke, K.L., Bonnycastle, L.L., et al. (2007) A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants. Science, 316, 1341-1345.
[29] Sladek, R., Rocheleau, G., Rung, J., et al. (2007) A genome-wide association study identifies novel risk loci for type 2 diabetes. Nature, 445, 881-885.
[30] Steinthorsdottir, V., Thorleifsson, G., Reynisdottir, I., et al. (2007) A variant in CDKAL1 influences insulin response and risk of type 2 diabetes. Nat Genet, 39, 770-775.
[31] Rung, J., Cauchi, S., Albrechtsen, A., et al. (2009) Genetic variant near IRS1 is associated with type 2 diabetes, insulin resistence and hyperinsulinemia. Nat Genet, 41, 1110-1115.
[32] Wellcome Trust Case Control Consortium. (2007) Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature, 447, 661-678.
[33] Raelson, J.V., Little, R.D., Ruether, A., et al. (2007) Genome-wide association study for Crohn’s disease in the Quebec Founder Population identifies multiple validated disease loci. Proc Natl Acad Sci USA, 104, 14747-14752.
[34] Smyth, D.J., Cooper, J.D., Bailey, R., et al. (2006) A genome-wide association study of nonsynonymous SNPs identifies a type 1 diabetes locus in the interferon-induced helicase (IFIH1) region. Nat Genet, 38, 617-619.
[35] Gudmundsson, J., Sulem, P., Manolescu, A., et al. (2007) Genome-wide association study identifies a second prostate cancer susceptibility variant at 8q24. Nat Genet, 39, 631-637.
[36] van Heel, D.A., Franke, L., Hunt, K.A., et al. (2007) A genome-wide association study for celiac disease identifies risk variants in the region harboring IL2 and IL21. Nat Genet, 39, 827-829.
[37] Tomlinson, I., Webb, E., Carvajal-Carmona, L., et al. (2007) A genome-wide association scan of tag SNPs identifies a susceptibility variant for colorectal cancer at 8q24.21. Nat Genet, 39, 984-988.
[38] Buch, S., Schafmayer, C., V?lzke, H., et al. (2007) A genome-wide association scan identifies the hepatic cholesterol transporter ABCG8 as a susceptibility factor for human gallstone disease. Nat Genet 2007; 39: 995-999.
[39] Winkelmann, J., Schormair, B., Lichtner, P., et al. (2007) Genome-wide association study of restless legs syndrome identifies common variants in three genomic regions. Nat Genet, 39, 1000-1006.
[40] Zondervan, K.T. and Cardon, L.R. (2007) Designing candidate gene and genome-wide case-control association studies. Nat Protoc, 2, 2492-2501.
[41] McCarthy, M.I., Abecasis, G.R., Cardon, L.R., et al. (2008) Genome-wide association studies for complex traits: Consensus, uncertainty and challenges. Nat Rev Genet, 9, 356-369.
[42] Pearson, T.A. and Manolio, T.A. (2008) How to interpret a genome-wide association study. JAMA, 299, 1335-1344.
[43] Reich, D.E. and Lander, E.S. (2001) On the allelic spectrum of human disease. Trends Genet, 17, 502-510.
[44] Barrett, J.C. and Cardon, L.R. (2006) Evaluating coverage of genome-wide association studies. Nat Genet, 38, 659-662.
[45] Pe’er, I., Bakker, P.I., Maller, J., et al. (2006) Evaluating and improving power in whole-genome association studies using fixed marker sets. Nat Genet, 38, 663-667.
[46] Hong, H., Shi, L., Fuscoe, J.C., et al. (2009) Potential Sources of Spurious Associations and Batch Effects in Genome-Wide Association Studies. In: Batch Effects and Noise in Microarray Experiments: Sources and Solutions, edited by A. Scherer, John Wiley & Sons, West Sussex, United Kingdom, p191-201.
[47] Hong, H., Shi, L., Su, Z., et al. (2010) Assessing sources of inconsistencies in genotypes and their effects on genome-wide association studies with HapMap samples. The Pharmacogenomics Journal, 10, 364-374.
[48] Takeuchi, F., Serizawa, M., Yamamoto, K. et al. (2009) Confirmation of multiple risk Loci and genetic impacts by a genome-wide association study of TYPE 2 DIA-BETES in the Japanese population. Diabetes, 58, 1690-1699.
[49] Saxena, R., Voight, B.F., Lyssenko, V., et al. (2007) Genome-wide association analysis identifies loci for TYPE 2 DIABETES and triglyceride levels. Science, 316, 1331-1336.
[50] Di, X., Matsuzaki, H., Webster, T.A., et al. (2005) Dynamic model based algorithms for screening and geno-typing over 100K SNPs on oligonucleotide microarrays. Bioinformtics, 21, 1958-1963.
[51] Hong, H., Su, Z., Ge, W., et al. (2010) Evaluating Variations of Genotype Calling: A Potential Source of Spurious Associations in Genome-Wide Association Studies. Journal of Genetics, 89, 55-64.
[52] Hong, H., Su, Z., Ge, W., et al. (2008) Assessing batch effects of genotype calling algorithm BRLMM for the Affymetrix GeneChip Human Mapping 500K Array Set using 270 HapMap samples. BMC Bioinformatics, 9, S17.
[53] Margulies, M., Egholm, M., Altman, W.E., et al. (2005) Genome sequencing in microfabricated high-density picolitre reactors. Nature, 437, 376-380.
[54] Fullwood, M.J., Wei, C.L., Liu, E.T. and Ruan, Y. (2009) Next-generation DNA sequencing of paired-end tags (PET) for transcriptome and genome analyses. Genome Res, 19, 521-532.
[55] Bansal, V., Harismendy, O., Tewhey, R., et al. (2010) Accurate detection and genotyping of SNPs utilizing population sequencing data. Genome Res, 20, 537-545.
[56] Nielsen, R., Paul, J.S., Albrechtsen, A. and Song, Y.S. (2011) Genotype and SNP calling from next-generation sequencing data. Nat Rev Genet, 12, 443-451.
[57] Bus, A., Hecht, J., Huettel, B., Reinhardt, R. and Stich, B. (2012) High-throughput polymorphism detection and genotyping in Brassica napus using next-generation RAD sequencing. BMC Genomics, 13, 281.
[58] Skotte, L., Korneliussen, T.S. and Albrechtsen, A. (2012) Association testing for next-generation sequencing data using score statistics. Genet Epidemiol, 36, 430-437.

Copyright © 2023 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.