Cross-Dataset Transcriptomic Analysis Reveals Distinct Immune Regulatory Networks in Non-Tuberculous Mycobacterial Disease ()
1. Introduction
Non-tuberculous mycobacteria (NTM) represent a diverse group of environmental pathogens causing increasingly prevalent infections worldwide [1]. Unlike tuberculosis, NTM diseases exhibit remarkable clinical heterogeneity and variable treatment responses, suggesting complex host-pathogen interactions that remain poorly understood [2].
Recent epidemiological studies demonstrate rising NTM infection rates globally, with annual increases of 4.0% for infection and 4.1% for disease burden [1] [3]. This trend coincides with aging populations, increased immunosuppression, and enhanced diagnostic capabilities [4]. The clinical spectrum ranges from asymptomatic colonization to severe disseminated disease, particularly affecting individuals with underlying respiratory conditions or immune dysfunction [5]-[7].
The complexities of NTM pathogenesis stem from intricate immune evasion mechanisms and host susceptibility factors [8]. While innate immunity provides the first line of defense through pathogen recognition and phagocytosis, adaptive immune responses mediated by T-helper cells and cytotoxic lymphocytes are crucial for pathogen clearance [9]. However, NTM species demonstrate sophisticated strategies to subvert host immune responses, leading to chronic infections and tissue damage [10].
Diagnostic challenges remain substantial, with traditional culture-based methods requiring weeks to months for definitive identification and susceptibility testing [11]. The complexity of NTM taxonomy, encompassing over 190 recognized species with distinct pathogenic potential, further complicates clinical management [12]. Molecular diagnostic approaches offer promise for rapid identification but require continued validation and standardization [13] [14].
Previous transcriptomic studies have provided valuable insights into host immune responses to NTM infections. Cowman et al. conducted pioneering whole-blood gene expression analysis in pulmonary NTM disease, revealing downregulation of 213 transcripts enriched for T-cell signaling pathways, including interferon-gamma (IFNG) [15]. These findings suggested that NTM disease associates with compromised adaptive immune responses, potentially reflecting underlying host susceptibility or pathogen-induced immunosuppression.
However, single-dataset studies provide limited statistical power and may reflect population-specific effects rather than universal disease mechanisms. Cross-dataset analysis approaches, successfully applied in tuberculosis research [16], offer enhanced statistical robustness and the potential to identify core pathogenic signatures across diverse patient populations.
In this study, we addressed these knowledge gaps through comprehensive cross-dataset transcriptomic analysis of peripheral blood samples from NTM patients and controls (Figure 1). Our objectives were to: (1) identify core molecular signatures common across independent NTM patient cohorts; (2) construct and analyze protein-protein interaction networks to identify hub genes critical for disease pathogenesis; (3) characterize functional pathways dysregulated in NTM disease; and (4) evaluate potential biomarkers and therapeutic targets for precision medicine approaches.
2. Methods
2.1. Dataset Selection and Characteristics
We identified and analyzed two independent publicly available RNA-seq datasets from the Gene Expression Omnibus (GEO) database focusing on peripheral blood samples from NTM patients:
Dataset selection criteria included: (1) peripheral blood samples from NTM patients; (2) appropriate control groups; (3) high-quality RNA-seq or microarray data; and (4) sufficient sample sizes for statistical analysis. Both datasets utilized RNA-seq technology on whole blood samples and provided comprehensive transcriptomic data suitable for comparative analysis. The complete analytical workflow is shown in Figure 1.
2.2. Data Analysis Pipeline
We employed a streamlined bioinformatics pipeline for cross-dataset analysis:
Preprocessing: Raw sequencing data underwent quality control, batch effect correction, and normalization.
Differential expression: Analysis was performed using a fold change threshold of |log2FC| > 1.3 and P-value < 0.05. This threshold was selected to balance detection sensitivity with biological significance, enabling identification of moderate but consistent expression changes across independent datasets.
Cross-dataset comparison: Common differentially expressed genes were identified through intersection analysis.
Network construction: Protein-protein interaction networks were built using STRING database (confidence score > 0.4) and analyzed for centrality measures and functional enrichment.
All statistical analyses incorporated appropriate multiple testing corrections (Benjamini-Hochberg FDR < 0.05).
Figure 1. Research workflow for NTM disease transcriptomic analysis. Overview of the complete analytical pipeline from data acquisition through network analysis and validation. The study analyzed two independent RNA-seq datasets from whole blood samples (GSE97298: 32 NTM vs 9 controls, GSE290289: 18 NTM vs 6 controls) using stringent differential expression criteria (|log2FC| > 1.3, P < 0.05), followed by intersection analysis, PPI network construction, and functional enrichment analysis.
3. Results
3.1. Dataset Characteristics and Quality Assessment
The analyzed datasets comprised a total of 65 whole blood samples across two independent cohorts, providing robust statistical power for cross-dataset analysis. Quality control metrics indicated high-quality data suitable for downstream analysis, with clear separation between NTM and control groups in principal component analysis.
Differential expression analysis revealed distinct transcriptomic signatures between NTM patients and controls (Figure 2). GSE97298 demonstrated broader transcriptomic changes, while GSE290289 showed a more focused response pattern. Functional pathway enrichment analysis revealed predominant dysregulation in immune system processes, cytokine signaling, T-cell activation, and pathogen recognition pathways (Figure 3).
Figure 2. Differential gene expression analysis for GSE97298 dataset. Volcano plot showing log2 fold change versus −log10 (P-value) for all genes. Red dots represent significantly upregulated genes (|log2FC| > 1.3, P < 0.05), blue dots represent significantly downregulated genes (|log2FC| < −1.3, P < 0.05), and gray dots represent non-significant genes.
Figure 3. Functional pathway enrichment analysis for GSE97298 dataset. GO biological process and KEGG pathway enrichment analysis showing the top significantly enriched pathways (FDR < 0.05). The analysis reveals predominant enrichment in immune system processes, cytokine signaling, T-cell activation, and pathogen recognition pathways.
3.2. Cross-Dataset Gene Intersection Analysis
Despite methodological differences between studies, we identified 10 genes commonly dysregulated across both datasets, representing core molecular changes in NTM disease detectable in peripheral blood (Figure 4). The identification of common dysregulated genes across independent datasets provides robust evidence for core pathogenic mechanisms in NTM disease, demonstrating the value of cross-dataset validation approaches for biomarker identification.
Figure 4. Cross-dataset gene intersection analysis. Venn diagram and summary statistics showing the overlap of differentially expressed genes between GSE97298 and GSE290289 datasets. The analysis identified 10 commonly dysregulated genes representing core molecular changes in NTM disease.
3.3. Protein-Protein Interaction Network Analysis
The protein-protein interaction network analysis revealed highly coordinated immune dysregulation patterns (Figure 5). The analysis revealed a highly connected network comprising 10 nodes and 12 edges, with a network density of 0.267 indicating efficient information flow. Key topological features included:
Average path length: 2.2 steps.
Network diameter: 4 steps.
Clustering coefficient: 0.35.
Connected components: 1 (fully connected network).
The network topology characteristics suggest a highly coordinated immune response dysregulation in NTM disease, with efficient communication pathways between key regulatory nodes (Figure 6).
Figure 5. Protein-protein interaction network of 10 core intersection genes in nontuberculous mycobacterial disease. Network visualization showing the 10 commonly dysregulated genes with their interaction patterns. Node size reflects degree centrality, with colors indicating regulation direction (red: upregulated, blue: downregulated, purple: opposite regulation). The network demonstrates high connectivity with CD36, CD3E, and GZMK as major hubs.
Figure 6. Comprehensive network analysis. Multi-panel analysis showing (A) Main PPI network topology, (B) Centrality analysis heatmap revealing key network hubs, (C) Node degree distribution, and (D) Network topology statistics summary. The analysis identifies CD36, CD3E, and GZMK as central regulatory nodes.
3.4. Hub Gene Identification and Functional Analysis
Three genes emerged as major network hubs based on centrality analysis:
3.4.1. CD36-Central Regulatory Hub
CD36 demonstrated the highest degree centrality (0.556) and betweenness centrality (0.722), positioning it as the most critical node in the network. Interestingly, CD36 showed opposite regulation patterns between the two datasets:
GSE97298: Upregulated (log2FC = +1.2, P < 0.001).
GSE290289: Downregulated (log2FC = −1.8, P < 0.001).
CD36, a scavenger receptor involved in lipid metabolism and pathogen recognition, plays crucial roles in mycobacterial infections through its involvement in fatty acid oxidation and inflammatory responses. The differential regulation may reflect different patient populations, disease stages, or therapeutic interventions between studies.
3.4.2. CD3E-T-Cell Signaling Center
CD3E exhibited high connectivity (degree centrality: 0.444) and was consistently downregulated across both datasets, indicating compromised T-cell receptor signaling in NTM disease. This finding aligns with previous observations of impaired T-cell responses in NTM patients.
3.4.3. GZMK-Cytotoxic Function Node
GZMK showed significant connectivity and consistent downregulation across both cohorts, suggesting impaired cytotoxic T-cell and NK cell function. This finding is consistent with the known role of cytotoxic cells in mycobacterial clearance.
3.5. CD36 Comprehensive Analysis
The comprehensive analysis reveals several key aspects of CD36’s role in NTM disease (Figure 7).
3.5.1. Cross-Dataset Expression Patterns
The opposite regulation of CD36 between datasets suggests molecular heterogeneity in NTM disease, potentially reflecting:
Species-specific responses: Different NTM species may elicit distinct metabolic responses.
Disease progression stages: Early versus advanced disease may involve different metabolic demands.
Treatment effects: Therapeutic interventions may modulate CD36 expression.
3.5.2. Clinical and Therapeutic Implications
CD36’s central network position and functional diversity make it an attractive candidate for precision medicine applications in NTM disease:
Biomarker potential: CD36 expression patterns could potentially serve as biomarkers for patient stratification, though extensive validation in clinically well-characterized cohorts would be required.
Therapeutic implications: The central role of CD36 in our network analysis suggests it may represent a candidate for targeted therapeutic approaches, warranting further mechanistic and therapeutic studies.
Figure 7. CD36 comprehensive multi-dimensional analysis. Seven-panel analysis showing (A) Cross-dataset expression comparison, (B) Expression patterns across patient cohorts, (C) Functional pathway enrichment, (D) Expression dynamics, (E) Clinical relevance radar chart, (F) Drug target assessment, and (G) Regulatory mechanism model. The analysis reveals CD36’s central role in systemic immune responses with potential therapeutic implications.
4. Discussion
4.1. Novel Insights into NTM Disease Pathogenesis
This study provides the first comprehensive cross-dataset transcriptomic analysis of NTM disease using peripheral blood samples, revealing previously unrecognized immune regulatory networks. Our findings demonstrate that despite using different patient cohorts and methodologies, consistent patterns of immune dysfunction can be identified, supporting the existence of core pathogenic mechanisms in NTM disease.
The identification of a 10-gene regulatory network with high connectivity (density = 0.267) suggests coordinated dysregulation of immune responses in NTM disease. This network-based approach provides new insights into the systemic nature of immune dysfunction in NTM infections.
4.2. CD36 as a Central Regulatory Hub
The identification of CD36 as the central network hub represents a significant finding with multiple implications for understanding NTM pathogenesis. CD36’s role as a scavenger receptor involved in lipid metabolism, pathogen recognition, and inflammatory responses positions it at the intersection of multiple pathways critical for host-pathogen interactions.
The differential CD36 regulation between datasets may reflect fundamental aspects of NTM disease heterogeneity:
Patient heterogeneity: Different NTM species exhibit distinct pathogenic mechanisms and treatment responses [17] [18].
Disease progression: Early-stage infections may require different metabolic adaptations compared to chronic, established diseases [19].
Treatment effects: Antimicrobial therapy may modulate host metabolic responses [20].
4.3. Adaptive Immune Dysfunction
The consistent downregulation of CD3E and GZMK across both datasets provides strong evidence for adaptive immune dysfunction in NTM disease. This finding aligns with previous observations by Cowman et al., who reported widespread downregulation of T-cell signaling pathways in pulmonary NTM patients [15].
The implications of adaptive immune suppression extend beyond immediate pathogen clearance difficulties:
Increased susceptibility: Compromised T-cell responses may predispose to secondary infections and disease progression [21].
Treatment challenges: Reduced cytotoxic function may limit the effectiveness of antimicrobial therapy [22].
Prognostic implications: The degree of adaptive immune suppression may predict clinical outcomes [23].
4.4. Clinical and Translational Implications
4.4.1. Biomarker Development
The expression patterns identified in this study offer multiple opportunities for biomarker development:
Diagnostic biomarkers: The 10-gene signature could complement current microbiological methods for rapid diagnosis.
Prognostic indicators: CD36 expression patterns may predict disease severity and treatment response.
Monitoring tools: Longitudinal gene expression monitoring could assess treatment efficacy.
4.4.2. Therapeutic Target Identification
Our findings identify multiple potential therapeutic targets:
CD36 modulators: For personalized immune modulation based on patient-specific expression profiles.
T-cell activators: To restore adaptive immune function.
Cytotoxic enhancers: To improve pathogen clearance capacity.
4.5. Limitations and Future Directions
Several limitations should be acknowledged:
Sample size: Despite cross-dataset validation, larger cohorts would provide greater statistical power.
Clinical data availability: Limited clinical information beyond basic demographics was available.
Functional validation: The identified networks require experimental validation in appropriate model systems.
Future research directions should include:
Prospective validation: Testing the identified biomarkers in prospective clinical cohorts.
Functional studies: Investigating the mechanistic roles of hub genes in NTM pathogenesis.
Therapeutic development: Translating network insights into targeted therapeutic interventions.
4.6. Technological and Methodological Advances
This study benefits from recent advances in computational biology and systems medicine approaches. The integration of machine learning algorithms [24] and multiple-criteria decision-making frameworks [25] represents a growing trend in precision medicine research. Additionally, emerging therapeutic approaches including bacteriophage therapy [26] and hostdirected therapy targeting ferroptosis pathways [27] may complement the transcriptomic insights identified in this study.
5. Conclusions
This comprehensive cross-dataset transcriptomic analysis reveals novel insights into NTM disease pathogenesis, identifying a core regulatory network of 10 genes with consistent dysregulation patterns in peripheral blood. CD36 emerges as a central regulatory hub with dataset-specific regulation patterns, highlighting the molecular heterogeneity of NTM disease.
The identified network demonstrates coordinated immune dysfunction characterized by adaptive immune suppression coupled with variable innate immune responses. These findings provide new mechanistic insights into the systemic immune dysfunction underlying NTM pathogenesis and identify potential biomarkers and therapeutic targets.
Our results support recognizing NTM disease as a molecularly heterogeneous condition with distinct signatures identifiable through peripheral blood transcriptomic profiling. This understanding provides a foundation for future research into precision medicine approaches, though extensive validation in clinically well-characterized cohorts will be required.
Data Availability
All data used in this study are publicly available through the Gene Expression Omnibus (GEO) database under accession numbers GSE97298 and GSE290289. Analysis scripts and processed data are available upon request from the corresponding author.
Author Contributions
Conceptualization: C.L., H.L.; Data curation: C.L., H.L., S.Y.; Formal analysis: C.L., H.L., N.Z.; Funding acquisition: C.L., S.M.; Investigation: C.L., H.L., N.Z., W.M.; Methodology: C.L., H.L., S.Y., Z.Y.; Project administration: C.L., S.M.; Resources: C.L., S.M.; Software: C.L., H.L., S.Y.; Supervision: C.L., S.M.; Validation: N.Z., W.M., S.Y., Z.Y.; Visualization: C.L., H.L., S.Y.; Writing-original draft: C.L., H.L.; Writing-review & editing: All authors.
Funding
No external funding was received for this research.
Acknowledgements
We thank the researchers who made their data publicly available through the GEO database, enabling this cross-dataset analysis. We also acknowledge the computational resources provided by our institutions and the valuable contributions of all study participants.
NOTES
*Co-first authors.
#Corresponding author.