The Expression and Prognostic Significance of Major MicroRNA Genes in Breast Cancer Based on Bioinformatics Analysis ()
1. Introduction
At present, BC of female patients has surpassed lung cancer as the leading cause of global cancer incidence. Globally, with an estimated 2.3 million new cases being reported, which corresponds to 11.7% of all cancer cases, it is the fifth leading cause of cancer-related mortality, contributing to 685,000 deaths [1]. There are several histologic subtypes of invasive breast carcinoma, including, but not limited to the following: invasive carcinoma of no special type (commonly known as invasive ductal carcinoma); invasive lobular carcinoma; tubular carcinoma; carcinoma with medullary features (sometimes called medullary carcinoma); and metaplastic carcinoma. Among these subtypes, Invasive (or Infiltrating) Ductal Carcinoma (IDC) accounts for the majority (approximately 75%) of BCs, followed by Invasive Lobular Carcinoma (ILC) which accounts for 5% - 15% of BCs [2]. BC is a heterogeneous and complex disease with unique morphological and molecular features, unlike a disease where only a few genes, proteins, and/or signaling pathways contribute to its progression in a simple, independent, and autonomous manner. Previous studies showed that patients with the same type of BC can show differential treatment responses, which further contributes to the high heterogeneity in this disease [3]. Therefore, histological features have certain limitations for diagnosis, prognosis, and prediction of clinical outcomes for individual patients [4]. Many prognosis factors are available for BC; therefore, it is necessary to develop suitable prognostic biomarkers for the diagnosis and treatment of BC.
MicroRNAs (miRNAs) are a group of small non-coding RNAs that can interrupt the expression of protein-coding genes by binding to their mRNAs and inhibiting the subsequent protein translation [5]. miRNAs can regulate the expression of genes associated with various biological phenomena including homeostasis, development, proliferation, differentiation, and apoptosis [6]. Deregulated signaling and expression of miRNAs have been well-studied in the pathogenesis of various cancers. Aberrant expression of miRNAs is vital for the initiation and progression of human malignancies as they function as both tumor suppressors and oncogenes [7]. This study aimed at identifying the potential prognostic biomarkers to predict overall survival in BC. We also evaluated the clinical potential of miRNAs as biomarkers for early as well as differential diagnosis and prognosis of BC.
2. Methods
Data Source and Data Processing
We obtained the first dataset for our analyses from The Genomic Data Commons Data Portal (GDC, https://portal.gdc.cancer.gov/), a robust data-driven platform. MiRNA sequencing data for invasive carcinoma samples of the breast (103 normal vs 1057 tumor), mRNA sequencing data for invasive carcinoma samples of the breast (112 normal vs 1066 tumor), and the corresponding clinical information were obtained TCGA. Then we obtained the information on individual mature miRNA expression profiles through Perl language.
Survival Data and Identification of Differentially Expressed miRNAs and mRNAs
EdgeR package (R software v. 4.0.3) was used to compare the DEMs and DEGs in tumor group with normal group. The expression profiles of miRNAs and mRNAs were normalized, according to mean value > 1, p-value (FDR) < 0.05, and |log2FC| > 1. The clinical information of patients with survival time ≥ 30 days was obtained, and then we correspondingly combined with differentially expression profiles information of miRNA and mRNA.
Evaluation of the Prognostic Model
The Caret package was used for random division of the samples into two groups (training group and test group) with complete survival information and differentially miRNA expression profile information. We also performed univariate Cox regression analyses of miRNAs in the training group.
To reduce the number of miRNAs with a similar expression, we subjected these to a stepwise multivariate Cox regression and constructed the miRNA-based prognostic model with a p-value < 0.005. Then, the risk scores for the prognostic miRNA signature comprising multiple miRNAs were established based on the summation of the product of each miRNA and its coefficient. Then we tested the Proportional Hazards Assumption of the Cox model. Further, this model was used to evaluate the survival prognosis for each patient in the training and test groups. Furthermore, we classified the patients into the high-risk group and the low-risk group on median value grouping of risk score performed by Kaplan-Meier curve and log-rank tests. The predictive power of the miRNA signature was assessed using the “survival ROC” package by calculating the AUC of 3 year-ROC curves [8].
Independent Prognostic Evaluation of miRNA Signature for Clinical Variables
After the classification based on the age, gender, size, and/or extent of main cancer, clinical-stage, lymph nodes, distant metastasis, we used univariate Cox regression analysis for evaluating the prognostic value of miRNA signature and overall patient survival in the training group. Variables with p-value < 0.05 in univariate Cox regression were further used for multivariate Cox regression analysis which determined whether these variables could function as independent prognostic factors. Forest plots were used for comparison of the predictive power of this risk model with other clinical characteristics.
Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) Enrichment Analyses
We utilized three different web-based tools, miRTarBase (https://mirtarbase.cuhk.edu.cn/), targetScan (http://www.targetscan.org) and miRDB (http://www.mirdb.org/), to identify the target genes of the miRNAs and collate the target genes of miRNA signature which were covered in at least 2 databases. Venn diagram and Cytoscape 3.7.1 were used to map the relationship between the miRNA and these target genes. We performed functional gene enrichment analyses to identify the major biological attributes of these genes. Annotation by Gene Ontology (GO) and enrichment analyses by Kyoto Encyclopedia of Genes and Genomes (KEGG) were performed through “clusterProfiler” package [9] and the “org.Hs.eg.db” package in R software, with p adjust < 0.05 and q value < 0.05, set as the cut-off criteria.
Protein-Protein Interaction (PPI) Network and Module Analyses
STRING database (http://string-db.org/) was used for information on PPIs. The medium confidence interval of 0.4 was used as the threshold. Cytoscape_v3.7.1 software was used to visualize the resulting PPI network, and we identified the top 10 hub genes which formed the hub. The Kaplan-Meier method with log-rank test < 0.05 was used to verify the intersection gene correlation with overall survival.
Statistical Analysis
All statistical analyses were performed by R language 4.0.3 version and attached packages.
3. Results
Identification of DEMs and DEGs
TCGA is an open-access database containing miRNA/mRNA profiles and their corresponding patient clinical information. Based on the screening criteria, 304 miRNAs were differentially expressed (DEmiRNAs) in 1057 tumor and 103 normal samples; 213 DEmiRNAs were upregulated and 91 were downregulated. 7364 mRNAs were differentially expressed between 1066 tumor and 112 normal samples; 4894 DEmRNAs were up-regulated while 2470 were downregulated (Figure 1).
Prognostic Model Construction
The miRNA mature expression profiles of the total group (N = 1033) were randomly divided into the training group (N = 517) and test group (N = 516). Based
on univariate Cox regression results for the training group, a total of fifteen miRNAs were significantly associated with overall patient survival (p-value < 0.05) (Table 1).
In our study, nine (hsa-miR-204-5p, hsa-miR-7706, hsa-miR-1247-3p, hsa-miR-20b-3p, hsa-miR-605-5p, hsa-miR-615-3p, hsa-miR-4652-5p, hsa-miR-9-5p, hsa-miR-2115-5p) miRNAs were selected (Table 2) by stepwise multivariate Cox regression analysis. A predictive miRNA signature model was established by the summation of the product of each miRNA and its multivariate Cox regression coefficient as follows: miRNA signature risk score = ((−0.249) × expression of hsa-miR-204-5p) + (0.228 × expression of hsa-miR-7706) + ((−0.243) × expression of hsa-miR-1247-3p) + ((−0.306) × expression of hsa-miR-20b-3p) + (0.485 × expression of hsa-miR-605-5p) + (0.136 × expression of hsa-miR-615-3p) + (0.194 × expression of hsa-miR-4652-5p) + (0.151 × expression of hsa-miR-9-5p) + (0.339 × expression of hsa-miR-2115-5p). Kaplan-Meier method showed that these nine miRNAs were significantly associated with overall patient survival (p value < 0.05; Figure 2).
Evaluation of Overall Survival
According to median value grouping based on the risk score in the training, test, and entire groups, Kaplan-Meier curves showed that indeed the high-risk group had poorer overall survival as compared to the low-risk group (p = 1.474E−08, p = 8.708E−03, and p = 7.825E−09, respectively; Figures 3(a)-(c)). The 5-year
Table 1. Univariate and multivariate Cox regression of differentially expressed miRNAs.
Table 2. 53 positive expression of genes correlate with survival prognosis.
overall survival for training-group patients was 69.8% and 92.7% for the high and low-risk groups, respectively. The 5-year overall survival for test group patients was 76.1% and 86.9% in the high and low-risk groups, respectively. The entire group showed a 5-year overall survival of 73.8% and 90.5% in the high and low-risk groups, respectively.
The AUC of ROC based on the nine-miRNA signature for three years were 0.804, 0.667, 0.739 in the training, test, and entire groups, respectively (Figures 3(d)-(f)), which showed that the model had better performance for prediction of BC patient survival risk. In addition, in all the three groups, the high-risk score patients had higher mortality rates as compared to their low-risk score counterparts (Figures 3(g)–(i)).
Independence of Clinical Factors
The nine-miRNA signature was evidently associated with overall patient survival in univariate Cox regression analysis (hazard ratio HR = 1.051, confidence interval 95% CI = 1.025 - 1.078, p = 1.148E−04). Multivariate Cox regression analysis showed that the nine-miRNA signature was also independent of the overall survival for conventional clinical factors (HR = 1.039, 95% CI = 1.012 - 1.066, p = 3.794E−03), such as T stage, lymph node status, and distant metastasis. This makes it a potential prognostic marker for BC. In addition, clinical stage and age were also found to be independent prognostic factors (HR = 1.738, 1.032, 95% CI = 1.036 - 2.916, 1.016 - 1.048, p = 0.036, 8.56E−05) (Figure 4).
Prediction of Target Genes
We predicted the target genes regulated by the nine miRNAs from at least 2 databases. The overlapping target genes were identified to further enhance the reliability of the bioinformatic analyses. Using the above three databases, the results indicated that 1314, 431, 1297, 609, 151, 381, 627, 331 and 82 genes overlapped for hsa-miR-9-5p, hsa-miR-20b-3p, hsa-miR-204-5p, hsa-miR-605-5p, hsa-miR-615-3p, hsa-miR-1247-3p, hsa-miR-2115-5p, hsa-miR-4652-5p, hsa-miR-7706, respectively, as shown using the Venn diagram (Figure 5) and the network map of miRNA target genes. A total of 15404 target genes were predicted for the nine miRNAs. To assess whether the target genes of these miRNAs participated in the progression of BC, the 7364 DEmRNAs (upregulated 4894, downregulated 2470) described above, were used for further analyses. The intersection of target mRNAs for upregulated miRNAs (hsa-miR-9-5p, hsa-miR-20b-3p, hsa-miR-2115-5p, hsa-miR-4652-5p, hsa-miR-7706, hsa-miR-615-3p) and downregulated mRNAs and target mRNAs for down-regulated miRNAs (hsa-miR-204-5p, hsa-miR-605-5p, hsa-miR-1247-3p) and upregulated mRNAs were taken. The results were obtained for a total of 555 genes including 261 upregulated genes, 294 downregulated genes. The sub-network of the nine miRNAs and their 555 target genes are shown in Figure 6.
GO Annotation and KEGG Enrichment Analyses
We performed GO and KEGG analyses for a detailed understanding of the selected 522 target genes associated with BC. The top thirty terms from the GO annotations for Biological Process (BP), Cellular Component (CC), and Molecular
(a)(b)
Figure 4. Comparison of risk score and clinical features in predicting the accuracy of patients’ survival prognosis. (a) Forest plot of univariate Cox regression analysis; (b) Forest plot of multivariate Cox regression analysis.
Function (MF) are shown in dotplot (Figures 7(a)-(c)). BP included 447 annotations such as axonogenesis, embryonic organ development, and sensory system development. CC contained 50 annotations including synaptic membrane, collagen-containing, and extracellular matrix transcription regulator complex. There were 25 MF annotations, including metal ion transmembrane transporter activity, ion channel activity, and actin-binding. A total of 433 KEGG pathways were enriched, of which 5 signaling pathways were significantly enriched including calcium signaling pathway, cytokine-cytokine receptor interaction, MAPK signaling pathway, proteoglycans in cancer, and cAMP signaling pathway. In addition, a readable graphic representation of the complex relationship between target genes and relative KEGG pathway using the “pathway-gene network” is shown in Figures 7(d)-(e).
PPI Network and Module Selection
Using the STRING database and Cytoscape software, a total of 546 target genes were incorporated into the PPI network consisting of 178 nodes and 327
Figure 6. Sub network map of miRNA regulating mRNA. The hexagon represents miRNA; The circle stands for mRNA; Red means upregulated; Green means downregulated.
edges. 10 hub genes (GNG7, CDK1, EGF, BIRC5, STAT1, ESR1, FZD4, DSN1, APOB, GNAI1) were identified as hubs (Figure 8). In addition, the Kaplan-Meier method showed 53 (Table 2) of the 546 gene expressions were positively correlated with survival prognosis; a high expression of ANO6, B4GALT3, C2CD2, CENPI, CHD5, CLEC3A, DLG3, FAM155B, FFAR2, KNSTRN, LRRC59, MRPS23, POP1, POU3F2, PTK6, RACGAP1, REEP1, SOX11, ZNF367 was associated with poorer overall survival in BC patients (Figure 9).
4. Discussion
Among women, the three most frequent malignancies are breast, lung, and colorectal cancers. These account for 50% of all new diagnoses; breast cancer alone accounts for 30% of female tumors [10]. Recently, with the advancement in early diagnoses and comprehensive treatment strategies, the mortality rate of BC has gradually decreased. However, distant metastasis is a major cause of death among these patients [11] [12]. Therefore, the identification of highly specific and sensitive prognostic markers for BC is important. Recently, miRNAs have been widely demonstrated to be associated with a variety of diseases including breast cancer [13]. A vast and growing body of evidence firmly supports the involvement of miRNAs in tumors including breast cancer [14]. However, most of these have studied signal genes. To capture the genes associated with DEMs in breast cancer, we screened the DEGs and DEMs and identified the key prognostic DEMs, all of which may be potential therapeutic targets. In the present study, from the TCGA database, the miRNA and mRNA expression profile data on invasive carcinoma of the breast were analyzed. DEMs and DEGs were analyzed using the edgeR package, and their targets were predicted. All the patients were randomly
Figure 8. Hub genes of PPI network. The darker the color, the bigger the degrees.
divided into training and test groups. We found a nine miRNA-based signature model (hsa-miR-204-5p, hsa-miR-7706, hsa-miR-1247-3p, hsa-miR-20b-3p, hsa-miR-605-5p, hsa-miR-615-3p, hsa-miR-4652-5p, hsa-miR-9-5p, hsa-miR-2115-5p), using univariate Cox regression and stepwise multivariate Cox regression in training group. We also validated the utility of the nine miRNA signature model in the test group and the entire group. The median value grouping based on the risk score, and the results from K-M analysis indicated that the high-risk group indeed had poorer overall survival as compared to the low-risk groups across all three groups. We used the ROC curve to evaluate the predictive power of the nine-miRNA signature for overall survival in the three groups. Our results indicated that our model had a reliable performance. The results of univariate Cox regression and multivariate Cox regression revealed that the nine-miRNA signature was independent of overall survival for the conventional clinical factors in breast cancer patients. Several studies on the nine miRNAs show their involvement in the initiation, progression, and prognosis of various cancers. miR-204-5p is involved in the regulation of multiple human malignant tumor progression and is also a tumor suppressor for some human cancers such as malignant melanoma, hepatocellular, and esophageal squamous cell cancer [15] [16] [17]. Some studies show that expression levels of miRNA-7706 significantly decrease in patients with lymph node metastasis. Expression levels of miRNA-299 and miRNA-7706 in tumor tissues were significantly lower than those in adjacent tissues. These data may provide new parameters for clinical staging and prediction of lymph node metastasis and these two miRNAs could effectively inhibit the proliferation and invasion of cancer cells [18]. Previous studies show that miR-1247 is aberrantly expressed in several cancers [9] [19] [20] [21]. miR-125b and miR-20b display context-dependent properties in regulating the tumorigenicity of various cancers. They are key mediators of Wnt signaling activity. Together, these three form a regulatory circuit [22] [23] [24] [25]. miR-605-5p promotes invasion and proliferation by targeting TNFAIP3 in NSCLC [26]. It also inhibits melanoma progression and glutamine catabolism by targeting GLS [27]. In a recent study, Lei et al. found that ectopic miR-615-3p expression in breast cancer cells promotes EMT-like traits. The tumor-promoting role of miR-615-3p is directly linked to its ability to target 3’-UTR of PICK1 [28]. MiR-4652, miR-9-5p, miR-2115 are expressed in several cancers. However, the current research mechanism of these three miRNA in tumors has not been reported yet. Thus, experiments in the future need to focus on the potential roles of these miRNA in BC.
To further understand the regulatory mechanism of the nine miRNA-based signatures in breast cancer, the target genes of nine miRNAs were predicted using three prediction databases. For breast cancer, we obtained the intersection of the target genes of these miRNAs and the differentially expressed genes from the TCGA database, and performed functional enrichment analyses for the overlapping genes. The GO annotations of the target genes were primarily associated with axonogenesis, embryonic organ development, sensory system development, synaptic membrane, collagen-containing, extracellular matrix transcription regulator complex, metal ion transmembrane transporter activity, ion channel activity, and actin-binding. KEGG analysis enriched signaling pathways included calcium signaling pathway, cytokine-cytokine receptor interaction, MAPK signaling pathway, proteoglycans in cancer and cAMP signaling pathway. Calcium signaling pathways are involved in proliferation, angiogenesis, invasion, immune evasion, disruption of cell death pathways, ECM remodelling, Epithelial-Mesenchymal Transition (EMT) and drug resistance [29]. Proteoglycans (PGs) and Glycosaminoglycans (GAGs) have important roles in cancer development, including regulation of metabolic patterns, angiogenesis and distant metastases, and treatment resistance [30]. Mitogen-Activated Protein Kinase (MAPK) pathways are among the most conserved signal transduction pathways with critical functions in different aspects of tumorigenesis such as tumor growth, apoptosis, angiogenesis, invasion, metastasis, and drug resistance. MAPK signaling plays critical functions in the various aspects of human malignancies [31]. These signaling pathways exert their effects on tumors to varying degrees. These five signaling pathways are only the tip of an iceberg of the target genes involved in signaling pathways. Thus, the construction of the miRNA prognosis model may have implycations in understanding the regulation of tumor signaling pathways.
To find the key regulatory nodes in the miRNA signature model breaast cancer, 10 hub genes (GNG7, CDK1, EGF, BIRC5, STAT1, ESR1, FZD4, DSN1, APOB, GNAI1) were identified with Cytoscape 3.7.1 and its plug-in (degree ranking of cytoHubba). Xu et al. show that GNG7 expression negatively correlates with patient grade and overall survival in clear cell renal cell carcinoma (ccRCC). Reduced expression of GNG7 was related to mTOR1, E2F, G2M, and MYC signaling pathways. In addition, ccRCC cells with lower GNG7 expression show an increased proportion of cells in of G2/M cell-cycle phase [32]. Currently, CDK1, EGF receptor expression are enhanced in several cancer types including gastric cancer, esophageal adenocarcinoma, oral squamous cell carcinoma, ovarian cancer, liver cancer, colorectal cancer, and breast cancer. CDK1-dysregulation leads to robust tumor growth, chromosomal instability, and a higher proliferation rate in cancer cells. Overexpression of EGF receptors can augment cell growth by increased active ligand: receptor complex formation [33] [34] [35] [36]. Survivin (also called BIRC5) is a part of several cancer-cell signaling networks. The current cancer therapeutic strategies use survivin as a target [37]. Previous studies show that Stat3 and ESR1 have their respective mechanisms in breast cancer [38] [39]. FZD4 (frizzled 4) is a transmembrane protein belonging to the frizzled receptor family. It is involved in the signaling processes in a variety of malignant tumors and participates in cell proliferation, trans-differentiation, and carcinogenesis [40]. DSN1 is involved in cell cycle transition, cell growth. Depletion of the DSN1 gene induced G2/M arrest and significantly decreased migration and invasion of colorectal cancer and hepatocellular carcinoma [41] [42] [43]. Heterotrimeric G proteins are crucial in multiple cellular signal transduction pathways [44]. The guanine nucleotide-binding protein G(i), α-1 subunit (GNAI1), belongs to the Gαi family [45]. These proteins primarily function as inhibitors of adenylyl cyclase. Gαi signaling can regulate cell proliferation and differentiation. It is taxol resistance in human ovarian carcinoma cells and hydrogen peroxide-induced apoptosis in human lung cancer cells [46] [47] [48] [49] [50]. The current mechanistic basis of APOB function in tumors has not yet been reported. Thus, the miRNA signature may affect the survival prognosis and progression in breast cancer patients by regulation through CDK1, EGF, BIRC5, STAT3, ESR1, FZD4. In addition, the Kaplan-Meier method showed that a positively associated expression of 53 genes with survival prognosis; however the high expression of ANO6, B4GALT3, C2CD2, CENPI, CHD5, CLEC3A, DLG3, FAM155B, FFAR2, KNSTRN, LRRC59, MRPS23, POP1, POU3F2, PTK6, RACGAP1, REEP1, SOX11, ZNF367 were associated with poorer overall survival outcomes.
5. Conclusion
In summary, we constructed a new predictive model based on miRNA signature through miRNA mature expression profiling for the prognosis of BC. By grouping, we also verified and evaluated the predictive ability of the model. The most important finding was its utility as an independent prognostic factor in BC. Additionally, the target genes of the model provide a new direction for investigations on the pathogenesis and diagnosis, and development of targeted inhibitors for the breast cancer. However, this is a bioinformatic study and extensive clinical validations are required to confirm our findings of a reliable predictor and therapeutic target signature for BC.
Acknowledgements
This project was sponsored by the Chengde Science and Technology Planning Project (grant nos. 202006A030).
Data Availability
Data underlying this study are provided in Supplementary Materials (Supplemental 1-4) and are available on TCGA (https://gdc.cancer.gov/), miRTarBase (https://mirtarbase.cuhk.edu.cn), targetScan (http://www.targetscan.org) and miRDB (http://www.mirdb.org/). The raw data supporting the conclusions of this manuscript will be made available by the authors, without undue reservation, to any qualified researcher.
NOTES
*Qiongge Wang and Jiuli Hu have contributed equally to this work.
#Chanchan Hu and Rui Wang have equally cooperated and should be considered the joint corresponding authors.