Integration between Genomic and Computational Statistical Surveys for the Screening of SNP Genetic Variants in Inflammatory Bowel Disease (IBD) Pediatric Patients* ()
1. Introduction
Inflammatory bowel disease (IBD) is a chronic, disabling, crypto-genetic disease that progresses in relapses interspersed with periods of remission. IBD constitutes one of the major problems in hepato-gastroenterology. Numerous studies reported gut microbes role in IBD, indicating gut microbiota as an essential component in the development of mucosal lesions [1]. However, it is still unclear whether a specific individual bacterial species might be causative of IBD or only contribute in exacerbation of IBD pathogenesis. In addition, according to [2], IBD can have a hereditary tendency and affects patients of all ages and genders, but it usually occurs before the age of 30, with a peak incidence in young people aged 14 to 24. Indeed, IBD refers to two rare bowel diseases including Crohn’s disease (CD) and Ulcerative Colitis (UC) or Hemorrhagic Recto-Colitis, diagnosed based on precise clinical, endoscopic, radiological and histological criteria [3]. Thus, over the last few decades, the incidence of these two pathologies has changed profoundly according to a pattern specific to each of these diseases, making them definitively distinct from each other [4]. Indeed, the inflammation in Ulcerative Colitis is continuous and limited to the mucosal layer of the colon, whereas CD is characterized by segmental transmural lesions that can affect any part of the gastrointestinal tract [5]. IBD is becoming increasingly common, particularly in Crohn’s disease, but there is no specific treatment for it, and its etiology is unknown. Furthermore, UC and CD do not increase mortality, but due to their early onset and chronicity, they induce high morbidity that impairs patients’ quality of life [4]. Similarly, the highest rates are traditionally reported in Northern and Western Europe and North America, whereas in Africa, South America and Asia (including China) the incidence of IBD has long been noted as low [4]. However, various studies have demonstrated the involvement, in varying proportions, associating genetic predisposition factors with environmental, immune factors, altered digestive bacterial flora and altered intestinal permeability, which contribute to the development of intestinal lesions [3] [6]. Knowing that genetic predisposition is likely to play a greater role in the onset of inflammatory bowel disease, numerous studies have been conducted in this direction [7]-[9]. Genome-wide association studies (GWAS) have identified new single nucleotide polymorphisms (SNPs), confirmed numerous IBD susceptibility loci and identified genes such as NOD2 [10] [11], autophagy related genes such as IRGM [12] and ATG16L1 [13] [14], and genes associated with autoimmune diseases such as IL23R [15] and PTPN2 [16]-[18]. Like many other researchers, we have also used next-generation sequencing (NGS) methods to improve our understanding of the genetic and molecular basis of inflammatory bowel diseases [6] [10] [11]. It is noteworthy to underline that, computational statistical analysis of NGS data and its applications in clinical oncology as well as in medicine gain popularity. Of note, rigorous statistical scheme and demarche is needed for a correct inference regarding structural and functional genomic data. Indeed, we previously developed a computational statistical script in assessing genetic variants in CD patients [6]. Since CD and UC represent the main components of IBD troubles exhibit a quite similar phenotype, the present study set out to examine and evaluate both IBD risk factor and pathogenic single nucleotide polymorphisms (SNPs) genetic variants in discerning IBD pathologies phenotypes based on bioinformatics (i.e. NGS) tool by integrating functional genomic data to the computational statistical analysis. To this end, we considered genomic DNA read sequences from a clinical exome sequencing experiment regarding four (4) pediatric patients with evident IBD phenotype including UC and CD of the “Spedali Civili” of Brescia in Italy, aiming to highlight genomic functions that fit well with each IBD phenotype by developing our own computational statistical script in R programming environment.
2. Material and Methods
2.1. Inflammatory Bowel Disease (IBD) Patient’s Population
Genomic read sequences from clinical exome samples of four (4) pediatric patients with IBD phenotype were processed. IBD patients with acronyms IBD1, IBD3 exhibit Crohn’s disease phenotype, while IBD2 and IBD4 patients display respectively ulcerative colitis and recto-colitis phenotypes (Table 1). Age of analyzed onset inflammatory bowel disease patients ranged from four (4) to 15 years. Indeed, 50% of IBD patients in this study were female and 50% male. In addition, IBD patients’ blood samples, used in the present study were collected from February to August 2018 in reference center and shipped to our laboratory for clinical exome sequencing analysis [6]. DNA library preparation and clinical exome sequencing have been performed following the Illumina MiSeq sequencer manufacture.
Table 1. Anthropomorphic and clinical features of analyzed inflammatory bowel (IBD) disease pediatric patients.
IBD Patients |
IBD patient 1 |
IBD patient 2 |
IBD patient 3 |
IBD patient 4 |
Age (years) |
17 |
14 |
19 |
21 |
Age at onset (years) |
11 |
8 |
4 |
15 |
Gender |
M |
F |
F |
M |
IBD type |
Crohn’s disease |
Ulcerative colitis |
Crohn’s disease |
Ulcerative recto colitis |
Other pathologies |
– |
Recurrent infections |
– |
Autism |
2.2. Genomic Read Sequences Quality Control, Alignment and
Genetic Variants Calling Procedures
We executed a quality control of genomic read sequences obtained from DNA clinical exome sequencing by running Fast-Q quality control package in R programming environment. Next, we performed genomic reads sequences alignment on hg19 human genome running Bowtie 2 package in Galaxy bioinformatics platform by setting standard parameters. We selected genomic read sequence with length ranking between 290 - 300 bp as well as exhibiting quality control score threshold ≥30 for the subjacent bioinformatics as well as genomic and computational statistical analysis (Figure S1 and Figure S2). We retrieved and characterized genetic variants in term of single nucleotide polymorphism (SNP) for each IBD patient by processing BAM (compressed binary files storing sequence reads) files and their indexes from the process of genomic read sequences alignment to hg19 human genome by running in frame the following bioinformatics packages and script (see below) in the Galaxy platform as following:
Freebyes
VCF allelic-primitive
SNPEff-Eff
VCF ToTab-delimited
SNPSift-Extract-field
Gemini load
Gemini database-info
Gemini query
We carried out genetic variants calling analysis (VCF = Variants Calling Format) from the BAM files and their indices resulting from aligned genomic read sequences to hg19 reference genome. The bioinformatics packages used for the genetic variant calling analysis (VCF analysis) are as following Freebyes [19], VCF allelic-primitive, SNPEff-Eff [20], VCF ToTab-delimited, SNPSift-Extract-field, Gemini load [21], Gemini database-info and Gemini query. Of note, read sequences quality control as well as alignment process, genetic variant calling analysis and their characterization were performed on Galaxy platform. Output files of this analysis are variant calling format files (VCF). Of note, Figure 1 describes and summarizes experimental protocol of bioinformatics and genomic analysis.
Figure 1. Bioinformatics and Genomic workflow experimental dispositive for assessing and processing inflammatory bowel disease (IBD) pediatric patients genomic read sequences for SNP genetic variants characterization and functional genomic analysis.
2.3. Assessment of IBD (CD and CU) Genetic Variants Genomic
Functions
As a prelude to the statistical analyses, functional genomic data resulting regarding SNP retrieved from analyzed IBD patients were characterized and structured. Here, we were interested in genetic variants covered by at least 20 genomic read sequences. Indeed, SNP were classified in term of intronic and exonic mutations. Genomic functions regarding IBD patients SNP features are as following:
1) intronic mutation function: 3’ and 5’ un-transcribed regions (3’ and 5’ UTR), downstream gene variant, inter-genic region, intronic variant, upstream gene variant and splice region;
2) Exonic mutation function: induction of the initiator codon, nonsense mutation, loss of the stop codon, loss of the initiator codon, splice mutation and silent mutation.
(b)
Figure 2. Multivariate statistical analysis weighing inflammatory bowel disease patient’s data variability in term of aligned genomic read sequences in prelude to genetic variants (SNP) calling survey.
2.4. Statistical and Functional Genomic Analysis of IBD Risk Factor
and Pathogenic SNP Genetic Variants (SNPs)
We checked and quantified genomic functions regarding significant SNP retrieved from inflammatory bowel disease patients by processing data from previous obtained VCF file. Statistical analyses were performed in R programming environment (version 3.6.2). Statistical descriptive analyses included Venn-Diagram, limma, and density plot scripts with the purpose to characterize and assess genetic variants distribution in IBD patient population. We developed a script for endogenous variable assignment of variance for both selected IBD risk factor and pathogenic genetic variants. Analytical statistical analysis included the following statistical tests: i) Kruskal Wallis non-parametric test for variance analysis [22]; ii) Shapiro normality test; iii) Multiple pairwise comparison analysis (Turkey’s statistical test); iv) Fisher exact test as well as v) Bartlett non-parametric test. Indeed, from the output of the Kruskal-Wallis test, we checked for a significant difference between groups. A significant Kruskal-Wallis test is generally followed up by Dunn’s test to identify which groups are different. A test is considered significant at α threshold = 0.05 (p ≤ 0.05). Genomic functions were assigned to retrieve pathogenic and risk factor SNP genetic variants by using ENSEMBL genomic database. The statistical parameter Fitcon, which indicates the probability of a given mutation having a significant impact on the phenotype, has been introduced to characterize IBD patients CD and UC phenotypes. The following Fitcon parameter values have been set for the: i) intronic genetic variants: risk factors and/or pathogens: 0.3 ≤ Fitcon ≤ 0.4 and ii) exonic genetic variants: risk and/or pathogenic factors: Fitcon ≥ 0.6.
3. Results
3.1. IBD Patients Genomic Read Sequences Quality Control and
Alignment Statistical Comparative Analysis
We checked for IBD patients’ genomic read quality control. Genomic read sequences quality Phred score is higher than 30 (Figure S1). Exome DNA genomic read sequences size raking between 100 - 300 bp (Figure S2). According to Figure S2, the majority of genomic sequences for the four analyzed IBD patients have a size of 300 bp. The MiSeq Illumina sequencer carried out sequencing experiment. The total number of obtained genomic read sequences of considered IBD patients is ranking between 1,830,550 - 2,128,442 sequences. Of note, the average rate of those sequences was estimated to 1,958,395 with a variation coefficient (CV) = 7.06%. Interestingly, more than 70% of aligned genomic read sequences result to be specific for their gene target, while 9% of them claim to recognize more than one gene model targets (Table 2). Taking together, 80% of IBD patients genomic read sequences have been aligned on the h19 human genome (Table 2). Of note, approximately 20% of the genomic read sequences have been not aligned on human h19 reference genome (Table 2). We performed a multivariate comparative statistical analysis regarding IBD patients basing on the proportion of aligned and/or not aligned read sequences, aiming to assess read sequences distribution on h19 human genome (reference genome) in prelude to genetic variant calling as well as to subjacent functional genomic analysis. Multivariate comparative analysis shown no variance difference (p = 0.87) between the four analyzed IBD patients (Figure 2(A)). In the other words, inflammatory disease patients 1, 2, 3 and 4 exhibited high variance homogeneity considering aligned genomic read sequences of considered IBD pediatric patients on the h19 human genome (Figure 2(B)). Considering as a whole, IBD patients exhibited homogeneous features for the purpose of subjacent genetic variance calling and functional genomic analysis.
Table 2. Descriptive statistical analysis of IBD pediatric patients genomic read sequences aligned on hg19 human genome.
Genomic read sequences |
IBD Patient 1(IBD1) |
IBD Patient 2(IBD2) |
IBD Patient 3(IBD3) |
IBD Patient 4(IBD4) |
Total number of analyzed genomic read sequences |
1,862,628 |
2,011,960 |
2,128,442 |
1,830,550 |
Number and proportion (%) of read sequences aligned 1 time |
1,335,762 (71.71%) |
1,431,974 (71.17%) |
766,939 (72.07%) |
656,309 (71.71%) |
Number and proportion (%) of read sequences aligned sequences > 1 |
159,094 (8.54%) |
173,284 (8.61%) |
89,295 (8.39%) |
85,580(9.35%) |
Number and proportion (%) of read sequences aligned 0 times |
367,772 (19.75%) |
406,702 (20.22%) |
207,987 (19.54%) |
17,338(18.94%) |
Proportion (%) of aligned read sequences |
80.25% |
79.78% |
80.46% |
81.06% |
3.2. Analysis of the Typology of Genetic Mutations Observed in the IBD Pediatric Patients’ Population
Analysis revealed 6465 - 8492 genetic variants covered by at least 20 genomic read sequences for all analyzed IBD patients. IBD patient 4 recorded the lowest number of genetic mutations (6465) while IBD patient 3 recorded the highest number (8492) of genetic mutations. IBD patients 1 and 2 respectively reported 7782 and 7924 genetic variants (Table 3). Next, we discriminated the proportions of those genetic variants in term of homozygosity and heterozygosity’s mutation. Findings revealed respectively 4037 and 2444 heterozygous and homozygous mutations in the IBD pediatric patient 1. The same analyzes showed respectively 4212 and 2639 heterozygous and homozygous mutations in the IBD pediatric 2. IBD pediatric patient 3 recorded respectively 4650 and 2722 heterozygous and homozygous mutations. Analysis of the typology of genetic mutation in the IBD pediatric patient 4, revealed respectively 3370 and 2215 heterozygous and homozygous mutation (Table 3). Of note, genetic mutations retrieved in the IBD population, mainly are single nucleotide polymorphisms (SNPs) inducing in some cases amino acid changing. Thus, we recorded 3669, 3829, 4054 and 3347 amino acid changes respectively in IBD pediatric patients 1, 2, 3 and 4 (Table 3). We stimulated a genomic comparative analysis between IBD patients 1, 2, 3 and 4 basing on heterozygous and homozygous genetic variants inducing amino acid changes. Turkey’s statistical test revealed a non-significant difference in term of variability (p > 0.05) between analyzed IBD pediatric patients 1, 2, 3 and 4 (Figure 3). This result would be in favor of the inflammatory bowel disease (IBD) phenotype shared by all of the patients.
Table 3. Analysis of inflammatory bowel disease pediatric population genetic variants inducing amino acid changes on polypeptide chain.
|
Genetic variants covered by reads sequences ≥20 |
Heterozygous |
Homozygous |
Mutation Type |
Amino Acid change |
IBD Patient 1 |
7782 |
4037 |
2444 |
SNP |
3669 |
IBD Patient 2 |
7924 |
4212 |
2639 |
SNP |
3829 |
IBD Patient 3 |
8492 |
4650 |
2722 |
SNP |
4054 |
IBD Patient 4 |
6465 |
3370 |
2215 |
SNP |
3347 |
Figure 3. Genomic comparative analysis basing on homozygous and heterozygous mutation inducing amino acid change in inflammatory bowel disease pediatric patient 1, 2, 3 and 4.
3.3. Distribution of Genomic Functions Induced by Exonic Single Nucleotide Polymorphism (SNPs) in Influencing IBD
Population Variability
We checked for IBD population variability by assessing genomic functions induced by exonic SNP genetic variants. Analysis revealed the following genomic functions: initiator codon induction, nonsense mutation, stop codon loss, initiator codon loss, splice mutation and silent mutation (Table 4). The recurrent genomic functions induces by exonic SNP in the analyzed IBD patients population are nonsense mutations and silent mutations, while the rare genomic functions are introduction of stop codon and gain of initiator codon respectively (Table 4). The rarest genomic functions in the present analysis are represented by i) stop codon loss in IBD patient 4 and ii) introduction of initiator codon in IBD patient 3. Interestingly, genomic analysis revealed a modest frequency of genetic mutations in several splice regions of the genomes. Indeed, Fisher’s exact test showed a significant recurrence of splice mutations in heterozygote in contrast to splice mutations in homozygote (p = 0.04). The same test supported a non-significant difference for the genomic functions nonsense mutations and silent mutations in terms of hetero and homozygosity (p = 0.84). We checked for the normality distribution regarding heterozygous and homozygous mutations inducing amino acid (aa) changes in analyzed IBD population (Figure 4). Shapiro normality test suggested an asymmetric distribution (p = 0.00) of genomic functions associated to heterozygous and homozygous exonic SNP mutation in the IBD population (Figure 4(A)). Density plot graphic descriptive analysis, relative to those genomic functions distribution exhibited a similar profile between the four analyzed IBD patients. Kruskal Wallis non-parametric test evaluating variance difference on that IBD population in term of genomic functions distribution (33 genomic functions) revealed no significant variance difference (p = 0.72) between IBD patients 1, 2, 3 and 4 (Figure 4(B)). Interestingly, estimated eta-squared of the Kruskal Wallis test effect size suggested small effect (0.01 - <0.06) of above retrieved genomic functions distribution in affecting IBD population variability.
Table 4. Assessment heterozygous and homozygous SNPs exonic genetic mutations inducing and/or non-inducing amino acid (aa) changes for assessing inflammatory bowel disease (IBD) pediatric patients 1, 2, 3 and 4 variability.
|
Heterozygous mutations (SNPs) |
Homozygous mutations (SNPs) |
IBD
patients |
IC |
MM |
SpM |
IC L |
ICG |
SCI |
SCL |
SM |
IC |
MM |
SpM |
ICL |
ICG |
SCI |
SCL |
SM |
IBD
Patient 1 |
0 |
958 |
17 |
2 |
1 |
6 |
1 |
1327 |
0 |
546 |
4 |
1 |
3 |
1 |
0 |
802 |
IBD
Patient 2 |
0 |
988 |
17 |
1 |
2 |
6 |
2 |
1372 |
0 |
581 |
5 |
0 |
3 |
3 |
0 |
849 |
IBD
Patient 3 |
1 |
1021 |
18 |
0 |
2 |
7 |
3 |
1487 |
0 |
618 |
5 |
0 |
3 |
3 |
0 |
886 |
IBD
Patient 4 |
0 |
883 |
13 |
1 |
0 |
7 |
1 |
1185 |
0 |
496 |
5 |
0 |
4 |
2 |
1 |
749 |
IC = Initiator codon; MM = Missense mutation; SpM = Splice mutation; ICL = Initiator codon loss; ICG = Initiator codon gain; SCI = Stop codon introduction; SCL = Stop codon loss; SM = Silent mutation, IC = Initiator codon; MM = Missense mutation; SpM = Splice mutation; ICL = Initiator codon loss; ICG = Initiator codon gain; SCI = Stop codon introduction; SCL = Stop codon loss; SM = Silent mutation.
(a)
(b)
Figure 4. (A) Density plot accessing distribution of genomic functions induce by heterozygous and homozygous exonic SNP mutations in the IBD pediatric population (IBD pediatric patients 1, 2, 3 and 4). (B) Non-parametric test (Kruskal-Wallis test) assessing IBD patient’s population variability by weighing homozygous and heterozygous exonic SNP mutation distribution in the IDB population genome.
3.4. Analysis of the Distribution of Genomic Functions
Induced by Intronic SNP Genetic Variants
Influencing IBD Population’s Variability
We analyzed the impact of intronic SNP genetic mutation on the genetic variability of IBD patients. Before doing that, we checked for exonic SNP distribution normality in the IBD population. Of note, genomic functions in terms of intronic SNPs genetic mutations were observed in the following genome regions: i) 3’ and 5’ non-transcribed regions (3’ and 5’ UTR), ii) downstream gene variant and iii) inter-genic regions, iv) intronic variant, v) upstream gene variant and vi) splice regions (Table 5). The majority of intronic SNPs retrieved in the IBD population are represented by i) intronic mutation variants and ii) mutation in splice and inter-genic regions (Table 5) that exhibited significant difference in term of heterozygous and homozygous mutations (p < 0.05). Thus, intronic gene variants are more frequent in heterozygosity than in homozygosity (p = 0.02). The same analysis suggests a greater number of mutations in heterozygous splice regions compared to those in homozygosity (p = 0.0002). Inter-gene mutations are more frequent in homozygosity (p = 0.002). Mutations in the 3’ and 5’ non-transcribed regions (the 3’UTR and 5’UTR genomic regions) have a more significant average frequency in heterozygosity than in homozygosity (p < 0.05) for each IDB patients (Table 5). Considering, as a whole IBD population seem to exhibit a variability in terms of intronic SNP variants detected in heterozygous and homozygous as opposite to those detected in exonic regions. We checked for the normality distribution of intronic SNP (genomic functions) in the IBD patient’s population. Shapiro normality test revealed asymmetric distribution of intronic SNP (genomic functions) in the IBD population (p < 0.05) (Figure 5(A)). Of note, Kruskal-Wallis no-parametric test suggested no variance difference regarding IBD population by processing intronic SNP genomic functions distribution in IBD patients genome (p > 0.05) (Figure 5(B)).
Table 5. Genomic functions retrieved from non-coding exonic and intronic SNPs in IBD pediatric patient’s population (IBD1, IBD2, IBD3 and IBD4).
|
Heterozygous variants |
Homozygous variants |
|
3’UTR |
5’UTR |
DGV |
IGR |
IV |
SIC |
SAS |
SDS |
SR |
UGV |
3’UTR |
5’UTR |
DGV |
IGR |
IV |
SIC |
SAS |
SDS |
SR |
UGV |
P.1 |
57 |
74 |
3 |
88 |
1191 |
17 |
0 |
2 |
120 |
8 |
33 |
30 |
3 |
129 |
636 |
10 |
1 |
0 |
45 |
9 |
P.2 |
68 |
70 |
0 |
86 |
1262 |
26 |
2 |
1 |
120 |
8 |
34 |
36 |
3 |
124 |
716 |
13 |
2 |
0 |
66 |
10 |
P.3 |
80 |
82 |
3 |
86 |
1358 |
13 |
0 |
1 |
135 |
13 |
34 |
38 |
10 |
106 |
706 |
25 |
2 |
0 |
66 |
14 |
P.4 |
43 |
52 |
6 |
68 |
823 |
10 |
1 |
0 |
108 |
4 |
30 |
25 |
2 |
136 |
508 |
22 |
1 |
0 |
43 |
7 |
RNA message includes untranslated regions upstream (5’UTR) and downstream (3’UTR) of coding sequence; DGV = Downstream Gene Variant; IGR = Inter-genic Region; IV = Intronic Variant; SIC = Sequence variant that changes the non-coding exon sequence in a non-coding transcript; SAS = Splice Acceptor Site (3’ end of intron); SDS = Splice Donor Site (5’ end of intron); SR = Splice Region; UGV = Upstream Gene Variant.
(a)
(b)
Figure 5. (A) Density plot referring intronic SNPs distribution in IBD patients; (B) Kruskal-Wallis non-parametric test assessing intronic SNPs genomic mutation distribution in influencing IBD.
3.5. Clustering of Heterozygous and Homozygous SNP
in Exonic and Intronic Genomic Regions for
Evaluating IBD Pediatric Population Variability
We previously shown a significant variability in IBD pediatric population by comparing heterozygous and homozygous SNPs variants in exonic genomic regions. Analysis revealed variability in terms of heterozygous non-coding region mutation (intronic SNPs) for splice, inter-genic and 3’UTR and 5’UTR genomic regions (Figure 5(A)). Herein we characterized IBD population by combining heterozygous and homozygous exonic and intronic SNP variants with the purpose to estimate that population variability. Shapiro normality test suggested random distribution regarding intronic and exonic homozygous and heterozygous SNPs in IBD patients’ genome (Table 6). Of note, all analyzed IBD patients seem to exhibit the same statistic features regarding heterozygous and homozygous intronic and exonic genomic SNPs distribution (Table 6, Figure 6(A)). Interestingly, by removing intronic variants as well as exonic missense and silent SNP mutations, statistical analysis revealed significant variability in the inflammatory bowel disease pediatric patient’s population (Figure 6(B) and Figure 7). Wilcoxon pairwise comparative analysis suggested significant difference between i) IBD patients 1 and 2 (p = 0.04), between ii) IBD patient 1 and 3 (p = 0.02), between iii) IBD patients 2 and 3 (p = 0.03), between iv) IDB patients 3 and 4 (p = 0.02) and between IBD 1 and 4 (p = 0.04) (Figure 7 and Table S1). Of note, intronic and non-coding exonic SNP exhibited a significant normality (p < 0.05) in the IBD patient 1, 2, 3 and 4 as opposite to the other analyzed SNP (Table S1).
(a)
(b)
Figure 6. Multivariate analysis weighing IBD population variability by Kruskal Wallis test by processing homozygous and heterozygous intronic and exonic SNP genetic variant (A). Multivariate statistical analysis assessing IBD population variability by removing missense, silent and intronic SNP mutation variants (B). GFF = acronym is for genomic function frequency induced by exonic and intronic SNPs.
Table 6. Shapiro test assessing exonic and intronic homozygote and heterozygote SNPs normality distribution in the IBD pediatric population.
|
IBD patients exonic SNPs |
IBD patients non-coding exonic and intronic SNPs |
|
Heterozygous |
Homozygous |
Heterozygous |
Homozygous |
IBD patient 1 |
p |
3.068e−05 |
p |
0.00015 |
p |
4.703e−06 |
p |
1.64e−05 |
w |
0.51 |
w |
0.57 |
w |
0.49 |
w |
0.53 |
IBD patient 2 |
p |
3.01e−05 |
p |
0.0014 |
p |
4.073e−06 |
p |
1.379e−05 |
w |
0.51 |
w |
0.63 |
w |
0.48 |
w |
0.53 |
IBD patient 3 |
p |
3.071e−05 |
p |
0.0013 |
p |
4.244e−06 |
p |
9.427e−06 |
w |
0.51 |
w |
0.63 |
w |
0.49 |
w |
0.51 |
IBD patient 4 |
p |
0.00019 |
p |
0.00016 |
p |
8.12e−06 |
p |
4.492e−05 |
w |
0.56 |
w |
0.56 |
w |
0.51 |
w |
0.57 |
Figure 7. Hierarchical clustering analysis evaluating heterozygous and homozygous SNPs normality distribution in assessing IBD pediatric patient’s variability.
3.6. Introduction of Fitcon Parameter for Measuring SNPs Inducing Significant Genomic Function Change in Influencing IBD
Population Variability
We introduced Fitcon parameter with the purpose to reveal homozygote and heterozygote intronic as well as exonic SNPs that significantly influence IBD patient’s genomic functions. Density plot analysis suggested asymmetric distribution of SNP influencing significantly IBD patient genomic functions by introducing Fitcon parameter (Figure 8(A)). The same descriptive statistical analysis exhibited similar distribution regarding heterozygote and homozygote SNPs genetic variants selected by Fitcon parameter in the IBD population (Figure 8(A)). Of note, hierarchical clustering analysis exhibit an apparent variability in the IBD pediatric population by considering homozygous and heterozygous intronic as well as exonic SNPs mutations that significantly influence IBD patient’s genomic functions (Figure 8(B)). Interestingly, estimated eta-squared of the Kruskal Wallis test effect size suggested moderate effect of those SNPs genomic functions distribution in affecting IBD population variability. However, Kruskal Wallis test suggested Fitcon parameter as reducing IBD pediatric population variability by contrast to descriptive result of hierarchical clustering analysis (Figure S3) as well as Kruskal Wallis test effect size (Figure 8(B)).
(a)
(b)
Figure 8. (A) Density plot descriptive statistical analysis measuring data normality by processing heterozygote and homozygote SNP selected by introducing Fitcon parameter. (B) Hierarchical clustering analysis evaluating IBD pediatric population variability.
3.7. Assessment of the Distribution of IBD Genetic Risk Factors
and Pathogenic Variants in Weighing IBD
Pediatric Population Variability
We analyzed the impact of pathogenic mutations and as well risk factor variants of inflammatory bowel diseases on the genetic diversity of a pediatric IBD patient population. Analysis revealed 29 and 64 IBD genetic pathogenic and risk factor variant respectively (Table 7 and Table S3 and Table S4). Of note, Shapiro normality test revealed an abnormal distribution of risk factor (Bandwidth = 0.15) and as well pathogenic (Bandwidth = 0.17) SNPs genetic variants in the IBD population (Figure S4). Bartlett non-parametric test revealed a non-significant variance difference for analyzed IBD patient population by processing IBD genetic risk factor variants (SNPs) (p = 0.99) (Table S3). The same analysis revealed a relative significant variance difference in the IBD patients population by processing IBD pathogenic genetic variants by opposite genetic risk factor (p = 0.28) (Table S4). In addition, finding revealed that more than 95% of IBD pathogenic and risk factor genetic SNP mutations happened in the coding regions. Proportion survey regarding IBD risk factor and genetic pathogenic variants by introducing Fitcon parameter suggested more than 50% of exon mutations regarding IBD genetic pathogenic (55%) and risk factors (69%) displayed a significant probability value (Fitcon ≥ 0.6) in terms of significantly impacting IBD patient population phenotype (Table 7). Interestingly, finding revealed IBD pathogenic and risk factor SNP variants as exhibiting the same performances by introducing the Fitcon parameter in selecting both pathogenic and risk factor variants, confirming Fitcon parameter as normalizing factor in assessing IBD patient phenotype (Figure 9). Of note, genetic variability in IBD pediatric patients population through endogenous variable assignment of variance regarding normalized IBD pathogenic and risk factor variants suggested 2 IBD patients cluster groups as following, group 1 including IBD patients 1 and 3,while group 2 included IBD patients 2 and 4 (Figure 9).
Table 7. Proportion estimation of IBD risk factors and as well pathogenic SNP genetic variants affecting significantly IBD patient’s phenotype by introducing Fitcon parameter
IBD genetic risk variants (SNP) |
IBD pathogenic genetic variants (SNP) |
Exon mutations |
Intron mutations |
Exon mutations |
Intron mutations |
Total |
Fitcon ≥ 0.60 |
Total |
0.30 ≤ Fitcon ≤ 0.40 |
Total |
Fitcon ≥ 0.60 |
Total |
0.30 ≤ Fitcon ≤ 0.40 |
64 |
35 |
2 |
2 |
29 |
20 |
2 |
2 |
Figure 9. Evaluation of genetic variability in IBD pediatric patient’s population through endogenous variable assignment of variance for selected IBD pathogenic SNPs genetic variants (A) and IBD risk factors (B) by introducing Fitcon parameter.
3.8. Distribution Analysis of Inflammatory Bowel Disease Risk
Factors Genetic Variants (SNPs) in the Pediatric IBD
Patient’S Population
Findings revealed 66 genetic risk factor SNP variants including 14, 4 and 3 genetic risk factor variants associated with metabolic, inflammatory bowel and autoimmune diseases respectively, while 68% of revealed SNPs claim to be associated with other diseases (Figure 10(A) and Table S3). Of note, rs429358 (c.466 T > C/p. (Cys156Arg)) and rs1805097 (c.3170G > A/p. (Gly1057Asp)) genetic variants, respectively from APOE and IRS2 genes associated to metabolic disorder have been detected in IBD patient 1. IBD patient 2 exhibited three specific genetic variants associated to metabolic disorders i.e. rs4880 (c.47T > C/p. (Val16Ala)), rs13266634 (c.973C > T/p. (Arg325Trp)) and rs231775 (c.49A > G/p. (Thr17Ala)) respectively from SOD2, SLC30A8 and CTLA4 genes. rs2904552 (c.1292G > A/p.(Arg431His)) from gene PRODH involved in metabolic abnormalities, rs34911341 (c.152G > A/p.(Arg51Gln)) from GHRL gene, a susceptibility factor for obesity and rs1053874 (c.731 G > A/p.(Arg244Gln)) from DNASE1 gene, a susceptibility factor for systemic lupus erythematosus disease and body fat distribution have been specifically detected in IBD patient 4. The same analysis suggested rs12150220 homozygous variant (c.464 T > A/p. (Leu155His)) from NLRP1 gene involved in Vitiligo disease associated with multiple autoimmune diseases, rs373237 (c.935C > T/p. (Thr312Met)) from CX3CR1 gene and rs3732379 (c.841 G > A/p. (Val281Ile)) variant from CX3CR1 gene were revealed only in IBD patient 4 (Table S3). Of note, analysis showed inflammatory bowel disease genetic risk factors commonly shared in t the IBD pediatric population involved in metabolic, inflammatory bowel and autoimmune disorder. The homozygote genetic variants rs450046 (c.1562 G > A/p.(Arg521Gln)) and rs1799983 (c.894T > G/p.(Asp298Glu)) respectively from genes PRODH and NOS3 and the genetic variant rs237025 (c.163G > A/p.(Val55Met)) from SUMO4 gene as well as risk factors variant, rs180223 (c.2200 T > G/p(.Ser734Ala)) involved in metabolic syndrome, and homozygous genetic variant rs853326 (c.3082A > G/p.(Met1028Val)) from TG gene, associated with thyroid autoimmune diseases and rs2241880 genetic variant (c.898A > G/p.(Thr300Ala)) from ATG16L1 gene a risk factor for Crohn’s disease and characteristic of inflammatory bowel diseases are shared by the four analyzed IBD patients population (Table S3). IBD patients 1, 2 and 3 exhibited homozygous genetic variants rs861539 (c.722C > T/p. (Thr241Met)) and rs1044498 (c.517A > C p. (Lys173Gln)), respectively from XRCC3 and ENPP1 gene, linked to metabolic disorders. IBD patients 2 and 3 share rs1799945 (c.187C > G/p. (His63Asp)) genetic variant of HFE gene, a susceptibility factor for metabolic abnormalities, while homozygous genetic variant rs1131454 (c.484G > A/p. (Gly162Ser)) of OAS1 gene, a susceptibility factor of metabolic abnormalities, is shared by IBD 1 and 3 patients (Figure 10 and Table S3). Analysis revealed homozygous rs5219 (c.67A > G/p. (Lys23Glu)) genetic variant of KCNJ11 gene involved in abnormal metabolic disorders in IBD patients 1, 2 and 3 (Figure 10 and Table S3). IBD patients 2 and 4 share rs2066844 (c.2104C > T/p. (Arg702Trp)) genetic variant of NOD2 gene a Crohn disease risk factor and a well know IBD biomarker (Figure 10, Table S3 and Figure S5). IBD patients 2, 3 and 4 share homozygous rs7076156 (c.184A > G/p. (Thr62Ala)) genetic variant of ZNF365 gene, a risk factor of Crohn’s disease. IBD patients 1, 3 and 4 share the homozygous rs2227564 (c.422T > C/p. (Leu141Pro)) and rs1169288 (c.79A > C/p. (Ile27Leu)) genetic variants respectively of PLAU and HNF1A genes that are risk factors of Crohn’s disease and metabolic syndrome respectively.
3.9. Distribution Analysis of Inflammatory Bowel Disease
Pathogenic Genetic Variants (SNPs) in the Pediatric IBD
Patient’s Population
Finding revealed 31 IBD pathogenic genetic variants. Among them, nine (9) pathogenic genetic variants result to be involve in metabolic disorders, while one (1) pathogenic variant is associated to inflammatory bowel disease (Figure 10(B)). The same analysis revealed 3 pathogenic genetic variants associated with autoimmunity and 18 pathogenic variant linked to other diseases (Figure 10(B)). The analysis revealed heterozygous rs429358 (c.466T > C /p. (Cys156Arg)) pathogenic variant of APOE gene involved in metabolic abnormalities in IBD patient 1. Rs2904552 (c.1292G > A/p. (Arg431His)) pathogenic genetic variants of PRODH gene involved in metabolic abnormalities, rs3732378 (c.935 C > T/p. (Thr312Met)) and rs3732379 (c.841G > A/p. (Val281Ile)) pathogenic genetic variants of CX3CR1 gene, susceptibility factor for acquiring immunodeficiency syndrome, were revealed only in patient inflammatory bowel (IBD) patient 4 (Table S4). IBD patients 1, 2, 3 and 4 share six (6) pathogenic genetic variants (Figure 10(B)) and 4 of them i.e. rs450046 homozygous (c.1562G > A/p. (Arg521Gln)), rs820878 homozygous (c.185T > C/p. (Leu62Ser)), rs1169305 homozygous (c.1741 A > G/p. (Ser581Gly)) and rs1799983 homozygous (c.894T > G/p. (Asp298Glu)), are IBD pathogenic variants (SNPs) involved in metabolic pathologies (Table S4 and Figure 4(B)). Of note, IBD patients 3 and 4 share the pathogenic variant rs17580 (c.863A > T/p. (Glu288Val)) of SERPINA1 gene associated with the control of low-density lipoprotein cholesterol levels. IBD patients 2 and 3 shared rs1799945 (c.187C > G/p. (His63Asp)) pathogenic variant of HFE gene involved in metabolic disorders. The heterozygous rs10065172 pathogenic genetic variant (c.313C > T/p. (Leu105Leu)) of IRGM gene associated with inflammatory bowel disease was revealed in IBD patients 1, 2 and 4 patients (Figure S5). IBD patients 1, 3 and 4 share the rs10010131 and rs351855 genetic variant of WFS1 and FGFR4 genes respectively susceptible factor of metabolic syndrome and body fat distribution. The pathogenic rs1805010 genetic variant of IL4R gene involved in the acquired immunodeficiency process has been revealed in IBD patients 1, 2 and 4 (Table S4 and Figure 10(B)). Of note, statistical analysis evaluating variability in the IBD population by processing IBD pathogenic and as well risk factor genetic variants together highlights high similarity (p > 0.05) between Crohn’s disease (CD) and ulcerative colitis phenotype (Figure 10(C)).
(a)
(b)
(c)
Figure 10. Venn diagram assessing risk factor (A) and pathogenic (B) genetic variants (SNPs) distribution in inflammatory bowel disease patients 1, 2, 3 and 4. Kruskal-Wallis multivariate analysis assessing IBD patient’s variability by analyzing pathogenic and risk factors genetic variants distribution in the inflammatory bowel disease pediatric patient population (C). IBD 1, 2, 3 and 4 referred to inflammatory bowel disease patient 1, 2, 3 and 4.
3.10. Multivariate Statistical Survey Evaluating IBD Patients’
Variability by IBD Pathogenic and Risk Factors
SNP Genetic Variants
Here we embarked in characterizing IBD pediatric patient’s variability by processing separately IBD pathogenic and risk factor SNP genetic variants. Because of IBD pathogenic and risk factor SNP genetic variants displayed asymmetric distribution (Shapiro test, p < 0.05), we performed Kruskal-Wallis test assessing IBD patient population variability (Figure 11(A) and Figure 11(B), suggesting no significant variability between the four analyzed IBD pediatric patients (Figure 11(c) and Figure 11(D). However, a comparative analysis between IBD pathogenic SNP genetic variants (p = 0.4) and IBD SNP risk factors (p = 0.76) in evaluating IBD population variability, suggested a moderate aptitude of pathogenic genetic variants in categorizing IBD patients phenotype (Figure 11(C) and Figure 11(D)). Interestingly Euclidean distance clustering analysis suggested a relative high aptitude of IBD SNP pathogenic genetic variants in categorizing IBD patient’s in ulcerative colitis and Crohn’s disease phenotype as opposite to IBD risk factor variants (Figure 12(A) and Figure 12(B)). In the other words, analyzed IBD patients exhibit the same phenotype by considering IBD risk factors as opposite to IBD pathogenic SNP genetic variants, that clustered together i) IBD patients 2 and 4 recognized as exhibiting ulcerative colitis phenotype and ii) IBD patients 1 and 3 patients with Crohn’s disease phenotype (Figure 12(B)). This result confirm Fitcon parameter clustering analysis that suggested two IBD patients groups as following; i) IBD patients 1 and 3 and ii) IBD patients 2 and 4 (Figure 9).
Figure 11. Inflammatory bowel disease pathogenic and risk factor SNP genetic variants distribution (A and B) in evaluating IBD patient population variability by non-parametric Kruskal-Wallis test (C and D).
Figure 12. Inflammatory bowel disease patient dendrogram clustering analysis by Euclidean distance method for categorizing IBD patient basing on CD and UC, IBD phenotypes basing on IBD SNPs risk factors (A) and IBD pathogenic SNP genetic variants (B). RF and PF acronyms referred to IBD risk and pathogenic factors respectively.
4. Discussion
Inflammatory bowel diseases are multifactorial chronic diseases of the gastrointestinal tract including ulcerative colitis and Crohn’s disease. Numerous studies have shown genetic susceptibility, gut microbiota as well as the immune system troubles as involving in the onset of intestinal diseases [3]. Many studies point to the presence of genetic, immunological, environmental, and microbiological factors and the interactions between them in the occurrence of IBD. Indeed, the first genetic factor that was linked with the occurrence of IBD i.e. CD was a mutation in the nucleotide oligomerization domain containing the protein 2 gene (NOD2). The NOD2 gene encodes a protein that functions as a receptor that recognizes components of the building wall of pathogenic bacteria. The main variants of NOD2 mutations associated with CD are the following R702W and G908R [23]. Alterations of genes responsible for autophagy, e.g., ATG16L1 autophagy-related 16-like 1, LRRK2 repeat kinase rich in leucine, 2, and IRGM immune-related GTPase M, which can predispose to IBD, are also presented in the literature [24]-[26]. Of note, IL-10 receptor mutations (IL10RA and IL10RB) are associated with colitis [27] [28]. Study revealed some 240 gene loci associated with the predisposition and occurrence of inflammatory bowel disease confirming genetic susceptibility of IBD. It is noteworthy to underline that genomic analysis and as well, results interpretation due to the abundance and complexity of high-throughput sequencing NGS data, requires in many cases the use of computational statistical analysis. Indeed, several authors integrated genomic and computational statistical approaches in solving complex molecular genetics as well as molecular biology concerns [29]-[31]. Thus, to assess the role of genetic predisposition in the onset of inflammatory bowel disease phenotype, we screened through a computational statistic survey, for a pediatric IBD patient population by analyzing IBD risk factors and pathogenic genetic SNPs variants from clinical exome sequencing analysis.
Bioinformatics analysis suggested high quality of genomic sequences as well as high precision rate regarding those genomic sequences alignment to hg19 human genome guarantying right subjacent structural and functional genomic analysis [32]. Statistical analysis shows high homogeneity between analyzed IBD pediatric patients, by considering intronic, exonic homozygous and heterozygous pathogenic and risk factor genetic variants covered by at least 20 genomic sequence reads, since 20 per (20×) sequencing depth is enough to guarantee efficient genomic data analysis and interpretation [33]. Monitoring of biological samples variance homogeneity is an essential parameter in favor of data normalization, in prelude to statistical as well as to genomic comparative analysis [30] [31] [34]-[36]. We selected and quantified genomic functions related to the IBD SNPs pathogenic and risk factors genetic variants in IBD patients population, showing non-significant variance difference in the IBD patients phenotype by considering exonic SNPs genetic variants as opposite to IBD intronic variants. Indeed, computational statistical analysis suggested IBD population variability by clustering intronic variants in homozygote and heterozygote regarding several genomic functions (i.e. mutation in splice and inter-genic genomic region; mutation in 3’ and 5’ non-transcribed regions, 3’UTR and 5’UTR genomic regions) linked to IBD SNPs genetic variants. Koufariotis et al. (2018) [37] showed genetic variants in some functional classes, such as splice site regions, DNA methylated regions and long noncoding RNA as explaining more variance in complex animal genetic population. In the same tendency, Kryukov et al. (2007) study [38] suggested through their study most rare missense alleles in terms of intronic SNP mutation as deleterious in human in complex disease pathways. Of note, several studies have found that SNP in splice site regions are significantly associated with genetic variability [39]. Li et al. (2016) has delivered evidence that splicing quantitative trait loci (QTL) have major contributions to complex traits in humans; in fact, these contributions are stated to be just as significant as variants that affect gene expression [40]. Several studies have attributed gene regulation and/or gene expression profile to polymorphisms within introns [41] [42]. Studies of genetic variation can successfully discriminate and identify functional elements in non-coding regions [43]. Considering as a whole, intronic mutations (SNP) are able to explain variability in characterizing and clustering genetic population phenotypes. However, findings revealed a relative aptitude of IBD pathogenic variants to induce variability in the analyzed IBD population by contrast risk factor variant parameter in clustering patients in both CD and UC phenotypes by introducing Fitcon parameter. Then, we checked for genomic functions significantly impacted by IBD risk factors as well as pathogenic variants by introducing Fitcon statistical probability parameter, measuring their impact on IBD patients’ genome and as well phenotype. Interestingly, 57.81% of analyzed IBD risk factor SNP variants revealed by Fitcon parameter claimed to affect significantly IBD patient’s phenotype. Of note, 3.12% of risk factor variants affecting significantly IBD patient’s phenotype are intronic, while 54.69% of them result to be exonic variants. The same analysis suggested that 75.86% of IBD pathogenic variants selected by Fitcon statistical parameter give good contribution in clustering IBD patients in CD and UC phenotypes. Of note, 70% of IBD pathogenic genetic variants retrieved by Fitcon parameter are exonic, while 6.90% of them claimed to be intronic variants. Indeed, evaluating genetic variability in inflammatory bowel disease (IBD) pediatric patient’s population through endogenous variable assignment of variance for selected IBD pathogenic and risk factor variants by introducing Fitcon parameter clustered IBD patients 1 and 3 exhibiting Chon’s disease phenotype together as well as IBD patients 2 and 4 together, with respectively ulcerative colitis and ulcerative recto-colitis phenotype. Interestingly, clustering analysis based on endogenous variance assignment suggested relative strong correlation between IBD patient 1 and 3 with the Crohn diagnostic in comparison to IBD patients 2 and 4 clustering group with the ulcerative colitis and recto-colitis diagnostic. It is noteworthy to underline that variance difference between these two clustering groups in not statistically significantly different, for sure because the symptoms CD and UC are very similar. Then, non-significant genetic variability observed in the IBD population as previously mentioned could be explained by highly influential mutations of pathogenic variants such as the exonic variant synonyms rs10065172 = p.Leu105Leu of the Crohn’s disease pathogenic IRGM gene [44] revealed in pediatric IBD 1, IBD 2 and IBD 4 patients. Of note, similar symptoms of both CD and UC troubles in the studied IBD population could be explained by genetic variant rs2227564 (c.422T > C/p. (Leu141Pro)) of PLAU gene, a risk factor for Crohn’s disease expressed in IBD patients 1, 3 and 4 that exhibit CD and UC phenotypes respectively. The rs7076156 homozygous variant (c.184A > G/p. (Thr62Ala)) of the ZNF365 gene, a Crohn’s disease susceptibility factor was reported in IBD pediatric 2 and 4 with CD phenotype as well as in IBD pediatric patient 3 with UC phenotype. In addition, the similarity between four analyzed IBD pediatric patients could be supported by rs2241880 heterozygous SNP variant (p.Thr300Ala) of ATG16L1 gene that have retrieved in IBD patients 1, 2, 3 and 4. Indeed, IBD and CD phenotype susceptibility variant loci’s have been retrieved in ATG16L1 gene involved in cellular autophagy process [6] [15]. However, performed clustering analysis focusing on significant pathogenic IBD variants suggested two phenotypes i.e. CD and UC in the analyzed IBD population. This result supports the subtle differences between IBD pathologies i.e. CD and UC, since Crohn’s disease can cause inflammation anywhere in the gastrointestinal tract from the mouth to the anus while ulcerative colitis can cause inflammation and ulceration in the large intestine. Alongside intronic variants, our study has clearly indicated the impact and role of pathogenic genetic variants on the genetic variability of the IBD population since 6.90% of significantly detected IBD pathogenic variants in the four analyzed IBD patients are intronic. Findings we recorded the recurrence of the intronic rs2066844 variant (Arg702Trp) of NOD2 gene in IBD patients 2 and 4 patients exhibiting exclusively CD phenotype. Indeed, NOD2 gene is predominantly expressed by immune cells i.e. macrophages, lymphocytes and dendritic cells. Intestinal epithelial cells, known as Paneth cells, code for an intracellular receptor involved in the recognition of muramyl-dipeptide motifs found in the bacterial wall [45] [46]. Numerous studies have shown that the Arg702Trp variant of NOD2 gene, which affects the innate immune response, is one of the three best-known variants associated with inflammatory bowel infections specific to the Crohn’s disease phenotype [7] [47]. So, as more non-coding sequence data becomes available, the genomic methods can be used to identify additional functional elements in the human genome and provide possible explanations for phenotypic associations. An interesting observation by Parkes (2007) [48] showed that genetic variants in the IRGM gene played a key role in the autophagy mechanism and were strongly correlated with the Crohn’s disease phenotype [12]. Furthermore, IBD patient 2 reported IBD risk factor, rs231775 of CTLA4 gene implicated in metabolic disorder syndrome as well as susceptibility to systemic lupus erythematosus. The CTLA4 gene is expressed on the surface of T helper cells essential for the function of CD25+ CD4+ regulatory cells involved in the process of controlling intestinal inflammation [49]. It has also been shown that the CTLA4 gene variant rs231775 (g.49A > G) can control the phenotype of Crohn’s disease [50]. Interestingly our study revealed several genetic variants associated with metabolic disorders in characterizing pediatric IBD patients [6]. Sztembis et al. (2018) [51] argued that patients with Crohn’s disease had a different metabolic profile to those with ulcerative colitis. The same study showed that the occurrence of metabolic syndrome in patients with hemorrhagic colitis was higher in patients with the Crohn’s disease phenotype [51] [52]. Of note, genetic variants associated with inflammatory bowel infections in the NOD2, ATG16L1 and IRGM genes affect cellular autophagy processes, so these genes indicate that alterations in the intracellular fate of bacteria are a central element in the pathogenicity of CD [53]. These observations suggest an interaction between the occurrence of inflammatory bowel disease in general, and Crohn’s disease in particular, and susceptibility to autoimmune and metabolic disorders in the pediatric IBD population under investigation [6]. Considering as a whole, our study clearly discriminating two distinct phenotypes i.e. CD and UC in the four (4) analyzed IBD pediatric patients confirming integration between genomic and computational statistical approaches as an acceptable practice aiming to improve molecular diagnostic of rare genetic disease and inflammatory bowel disease in particular. A limit of our study could be studied population sample size. However, an increasing of IBD patients sample can sturdily contribute in improving statistical significance in distinguishing both IBD phenotypes. Despite this, we proposed for the first time an integrative genomics and statistics approach for the phenotypic analysis of inflammatory bowel diseases through the clinical analysis of exome sequencing.
5. Conclusion
Inflammatory bowel diseases are multifactorial disorders influenced by genetic susceptibility, altered intestinal flora and immune dysfunction making challenging IBD phenotypes diagnosis. Our study provided integrative analysis including genomic, bioinformatics and computational statistical in improving IBD molecular diagnosis process allowing distinguishing clearly both IBD phenotypes i.e. Chron’s disease and ulcerative colitis, by characterizing statistically IBD risk factors and as well pathogenic genetic variants, by performing clinical exome analysis regarding four (4) IBD pediatric patients.
Authors’ Contribution
RB set up the experimentation and the study. NDD proposed the protocol of the work as well as the organization of the article, figures and tables. NDD and KNBS wrote the article. MG, KNBS and DDN performed genomic data analysis and interpretation. NDD performed computational statistical analyses. NDD gave orientation for bioinformatics analysis. DD, KNBS and NDD performed the bioinformatics analyses. DO give a contribution to revising and adjusting bibliographic references. All authors revised the paper as well as approved final version of the article.
Acknowledgements
Thank you to all Health Institute in Italy participating in this project. Thank you to the Institute of Molecular Medicine Angelo Nocivelli, University of Brescia and Children’s Hospital, ASST Spedali Civili, Brescia, Italy.
NOTES
*Genomic and Computational Statistical Surveys for the Screening of SNP Genetic Variants in Inflammatory Bowel Disease (IBD) Pediatric Patients.
#Corresponding author.