Genetic Structure of Cartagena de Indias Population Using Hypervariable Markers of Y Chromosome ()
1. Introduction
Cartagena de Indias is a city in the north of Colombia located on the shores of the Caribbean Sea and is the capital of the Department of Bolivar. Cartagena is one of the oldest cities on the Colombian Caribbean coast founded in 1533 by Spanish conquerors [1] . This city also represented the primary slaving port during the early colonial period (1570-1640), where more than 150,000 Africans coming from western Africa (Angola and the Guineas, principally) were introduced into Americas [2] . As a consequence, the ethnic composition of Cartagena is the result of a three-hybrid fusion: the native aborigines known as the Calamari; the European conquerors (principally from Spain); and the African slaves [3] . This ancestral admixture has predominantly resulted in a mulatto, mestizo, Amerindian, and Afro-descendant population. However, recent evidences show a disproportionate contribution of European males and Amerindian females, provoking a sex bias in admixture proportions [4] [5] .
Genetic association studies are a powerful strategy for identifying genotypic-phenotypic associations in complex diseases. Recent findings confirm that populations with multiple ethnic origins show important differences in allelic and genotypic frequencies, which may inflate false positive rates causing a spurious association because the genetic stratification is unevenly distributed across different subpopulations [6] . Therefore, allele differences between cases and controls could be related with systematic differences in ancestry as well as dissimilar frequencies among different population strata rather than a real association of the genes with the disease [7] [8] . These differences are more important in young populations with complex patterns of admixture such as Latin American populations [9] . Hence, population stratification is the principal confounder variable on genetic association studies, causing bias that may yield misleading results, which could be used in the practice of medicine and public health [10] .
Focused on identifying the patrilineal contribution in Cartagena’s population and to avoid spurious associations, we determined the genetic structure of Cartagena de Indias population using Y-Chromosome Short Tandem Repeats (Y-STR). We analyzed 130 unrelated men, using 15 Y-STR loci which is routinely employed in human migration and evolutionary studies as well as genetic structure and forensic analysis among others [11] [12] . Our findings show that Cartagena’s population is highly diverse, showing patrilineal ancestry from European, Middle Eastern, African and Amerindian populations. These findings highlight the importance of knowing local-specific patterns throughout the country in order to establish the population stratification and correct the impact of admixture in future genetic association studies. In addition, genetic variability studies provide information about population history, as well as the relationship between some diseases and ancestral lineages, which could also be used in the subsequent epidemiological studies [13] .
2. Materials and Methods
2.1. Population
We analyzed the STR genetic data of 130 unrelated individuals born in Cartagena de Indias, Colombia with at least three generations of ancestors who had been born in this city (Figure 1). The questionnaire for choosing candidates was designed to ensure that the participants were not filially related, i.e., no one was son or brother of another. This study was approved by the Ethics Research Committee of the University of Cartagena, Colombia (Resolution Number 46, 2012). Each individual signed an informed consent to participate in this study.
Figure 1. Map of the republic of Colombia showing Bolivar Department and the location of cartagena de Indias.
2.2. Molecular Analysis
Genomic DNA was extracted from peripheral blood leukocytes using Qiamp DNA mini kit (Qiagen, Düsseldorf, Germany). Approximately 10 ng target DNA was amplified using fifteen Y-chromosome short tandem repeat markers described previously (Table 1). Amplicons were obtained using multiplex reactions described in Table 2. The resulting amplicons were carried out on the ABI Prism 3130XL Genetic Analyzer using GeneMapper ID v.3.2. software (Applied Biosystems, Carlsbad, CA, USA).
2.3. Quality Control
Control DNA 007 was used as international validated internal controls (Applied Biosystems, Carlsbad, CA, USA).
2.4. Statistical Analysis
Allelic and haplotypic frequencies, number of alleles (k), haplotype diversity (HD), genetic diversity over loci (h), and mean pairwise differences (M) were estimated using Arlequin software v 3.5 [14] . The number of unique haplotypes (UH) was estimated by direct counting. In order to compare our data with other populations, haplotype and haplogroup information was collected from previous reports. A total of 1372 individuals from different Colombian populations were included in the database and used for further analysis [15] - [17] . Y haplogroups were determined from Y-STR haplotypes with haplogroup predictor software (http://www.hprg.com/hapest5/), using equal priority to estimate the probability of assignment to a particular haplogroup [18] . The phylogenetic relationship of STR haplotypes was analyzed with Network v 4.6.1.1 software [19] , which was built with Network Publisher software using a median joining approach and MP post-processing. Each haplotype was connected to all other haplotypes from which it differed by one repeat unit step at a single microsatellite locus. The Y-STR loci were weighted based on the inverse of their variances.
3. Results
3.1. Y-Chromosome STR Diversity
The haplotype distribution of the 15 Y-STR loci in the 130 individuals studied is shown in Table 3. The distribution of allele frequencies, number of different alleles (k), and locus diversity (h) are shown in Table 4. The most diverse loci were DYS458 (h = 0.775), DYS438/DYS390 (h = 0.766), DYS635/DYS389II (h = 0.740), and DYS19 (h = 0.721); while DYS393 (h = 0.406) was the least diverse.
Table 1. Primer sequences used for the multiplex reactions.
Table 2. Cartagena amplification conditions for PCR multiplex of tested loci.
In order to know the haplotype distribution, locus diversity, and mean number of pairwise differences, we used only the complete haplotypes (n = 37). Out of 37 haplotypes studied, we found 36 different haplotypes suggesting high haplotype diversity. In addition, locus diversity over loci showed the highest values (1.000 ± 0.0063), as well as the mean number of pairwise differences (10.084 ± 4.7048), whereas average gene diversity over loci obtained was 0.6722 + 0.3485.
3.2. Genetic Structure
In order to know the genetic structure, we determined the frequency of haplogroups as well as fitness score and Bayesian probability using Haplogroup predictor software (Table 5). Our results showed that Cartagena de In- dias was an admixture population represented by ~80% European, ~10% Amerindian and ~10% African. The most frequent haplogroups were R1b (~40%), I2a (xI2a1) (11%) and Q (~10%), as well as E1b1a (~5%) and E1b1b (~4%). Additional haplogroups, evident in the low and moderate frequencies, were also found (G2a ~1%, I1 2%, I2a1~ 3%, I2b ~2%, I2b1 ~ 9%, J2a1-bh ~2%, J2b ~1%, L ~5%, R1a, 1% and T ~5%).
3.3. Comparison with Other Populations
We compared our data with previous results obtained in other Colombian populations. As can be seen in Figure 2, Cartagena’s population maintains a genetic relationship with Antioquia, Magdalena and other populations
Table 4. Allelic frequencies, descriptive statistical parameters and diversity index regarding the 15 STR loci of Cartagena de Indias population.
n: number of individuals studied; k: number of alleles; h: locus diversity.
Table 5. Cartagena de Indias haplogroup frequencies.
Figure 2. MDS plot of Colombian populations Rst pairwise diffe- rences using 15 Y-STR loci.
from the Department of Bolivar. Moreover, Cartagena population is in the centre of the Colombian populations with different ancestries, underlining the complexity of this population.
3.4. Network Analysis
In order to establish the genetic relationship within each lineage, a median joining network was constructed (Figure 3). The R1b haplogroup shows a star-like network, indicating that Cartagena’s population is closely related to the Western European populations of Majorca and Valencia (Iberian Peninsula). In addition, a separate group of R1b Cartagena men related to Sicily’s population, suggests a high genetic diversity even within this lineage (Figure 3(a)). Moreover, the I2a (xI2a1) lineage shows a star-like network suggesting that Cartagena’s population could be related to a young population that may have suffered some demographic events (e.g. bottleneck, genetic drift, and founder effects). With respect to the Q haplogroup, we compared our data with Q-M242
Figure 3. Median joining network of ancestral lineages in Cartagena de Indias population. (a) European lineages: R1b and I2a (xI2a1); (b) Amerindian lineages: Q-M242 and Q-M3; (c) African lineages: E1b1a and E1b1b.
and Q-M3 haplogroups described previously [3] . Cartagena’s population showed a lineage closely related with Waunan (Q-M242) and Zenú ethnic groups. In addition, our haplogroups Q are related to Amerindian populations such as Kogi and Arhuacos (Figure 3(b)). Nevertheless, this diversity of pattern shows lineages poorly characterized by the lack of availability of markers that allow a higher resolution. Our results also point to the important African ancestry in Cartagena’s population represented by E1b1a and E1b1b lineages (Figure 3(c)). Interestingly, Cartagena’s population also has a patrilineal relationship with Senegal and Gabon.
4. Discussion
Population stratification is one of the most important confounding factors in population-based genetic association studies, provoking 40% of spurious associations [20] . These false associations are more frequent in Latino populations, heterogeneous populations in which a dissimilar ancestry proportion give rise to each subpopulation not being equally represented [21] - [24] . Consequently, in recent years much research has focused on detecting the population stratification before beginning genetic association studies of complex diseases in order to avoid spurious associations [5] [25] [26] .
The contemporary Cartagena de Indias population emerged from recent miscegenation (500 years ago) as a cosmopolitan city where Spanish conquerors mixed with Native American people derived principally from Karib, Malibu, Arawak and Chibcha language families [27] . As stated above, this admixture was asymmetrical and it is in agreement with our findings which show that the Cartagena sample studied is comprised of ~80% European, ~10% Amerindian and ~10% African ancestries. With respect to the European ancestry, it was principally represented by the haplogroup R1b (hg-R1b), which was present in ~50% of the total European haplogroups found in Cartagena’s population. The hg-R1b is the result of the admixture with the Spanish conquerors during the colonial period, because Cartagena was one of the most important Spanish settlements in America [28] [29] . This haplogroup is actually present in more than 60% of the Spaniard population [30] , as well as ~ 80% of Basque Country population [17] [31] . However, hg-R1b could be also related to Mediterranean populations [17] , since it showed a haplotype relationship with Cartagena’s population (Figure 3(a)). In addition, the Italian population could also participate in introducing the other important European lineage (haplogroup I2a (xI2a1)). This haplogroup is one of the most frequent in the island of Sardinia as well as the Mediterranean region [15] , and could be related to the Italian migrations 200 years ago [28] . These Italian migrations came from Sicily and Cosenza principally, and settled down on the northern coast of Colombia (Barranquilla, Cartagena and Santa Marta) [32] . With respect to the other European lineages, these could be related to pirates and corsairs from England, France, Portugal and the Netherlands, who continuously invaded Cartagena de Indias because this city was the principal port for gold and silver during the colonial period [29] [33] .
Apart from the European ancestry, Cartagena showed an important contribution from African lineages, which were represented by the haplogroups E1b1a and E1b1b. As mentioned before, thousands of African slaves especially those from Western and Central Africa were introduced in the 16th century [2] , which disembarked on the Pacific and Atlantic Coasts [34] . The African lineages increased noticeably the diversity of Cartagena’s population because they represented different clans from Senegambia, Ivory Coast, Central Africa, Congo, Angola and Mozambique among others; many of them were found in Cartagena’s population [3] [35] (Figure 3).
Both ancestries (European and African) were admixed with native Amerindian populations, which actually maintained an ancestral relationship with Cartagena’s population. On the other hand, our results suggested that the Amerindian diversity of Cartagena was related with Waunan, Kogui, Chocó, Pequé and Zenú groups, all of which were related to the Q-M242/Q-M3 lineages, which represented the majority of Amerindian Y chromosomes [36] . Nevertheless, the Amerindian diversity of Cartagena’s population could show even more heterogeneity, which could be related to the diversity inside the Q haplogroup [3] .
The great genetic diversity of Cartagena de Indias’ population, represented by mestizo, Afro-Colombian, and Amerindian lineages, supports the importance of ancestral studies in admixture populations. Our results suggest an important substructure degree within Cartagena’s population. This dissimilar ancestral proportion indicates the necessity to increase the resolution as well as the use of different genetic markers in order to elucidate the complex population history of Cartagena.
Although different research groups have also studied Colombian and Cartagena samples populations, the regional differences, the demographic events, and the complex patterns of diversity suggest examining different samples of the same population in order to represent the whole genetic complexity [5] [37] - [39] . Nevertheless, these ancestral patterns should not be applied to the entire Colombian population, because the demographic events and consequently the diversity patterns are specific of each population.
In addition, our results emphasize the contribution of population genetics in population-based genetic association studies, where the ethnic self-identification is not appropriate to correct the population stratification. Our data could contribute to avoid or diminish statistical errors type 1 and 2, which is a fundamental strategy in the search of disease biomarkers.
Acknowledgements
This study was partially supported by the Kellogg’s Nutritional and Health Institute-Mexico (to R.G.), Departamento Administrativo de Ciencia, Tecnología e Inovación, University of Cartagena, Colombia (grant Nº 110765741638), as well as National University from Colombia. We also thank all Cartagena people for their enthusiastic participation for whose collaboration made this study possible, as well as Laboratorio de Genómica, Proteómica y Metabolómica from LaNSE-Cinvestav-México for helping in Genotyping processes.
NOTES
*These authors equally contributed to this work.
#Corresponding author.