1. Introduction
In Africa, the number of annual malaria cases and deaths in 2021 was estimated at 163 million and 444,600, respectively [1]. Malaria case-fatality rates are particularly high and may reach 80% among children aged under five years [2]. The malaria cycle operates in two stages, an initial asexual phase in the invertebrate host (anopheles), and a second asexual phase occurring in the vertebrate host (humans). Plasmodium (P.) falciparum is a common parasite in Africa and is responsible for most fatal cases worldwide [3]. After ingestion, P. falciparum parasites migrate to the liver and multiply (exo-erythrocytic phase). Parasites then pass into the bloodstream after the merozoites burst and invade red blood cells. Once inside the erythrocytes, the parasites induce numerous modifications by bringing several proteins via complex trafficking mechanisms that are usually expressed at different stages of its development (trophozoite and schizont) and play a crucial role in the virulence and survival of the parasite [4]. Interactions between the P. falciparum parasites and erythrocytes have led to consequential changes in morphology and physiology, particularly in the function of the host cell [5].
In recent years, significant progress has been made in the fight against malaria, largely due to massive awareness and mobilization campaigns for insecticide-treated bed nets (ITNs) and environmental sanitation [6]. Nevertheless, malaria remains a considerable burden, in part due to the resistance of parasites to antimalarial drugs and the resistance of mosquitoes (anopheles) to insecticides [7]. To overcome natural immune defenses by the host, parasites use one of the most remarkable survival strategies—molecular mimicry—which is an adaptive system of imitation [8]. Short linear motifs (SLiMs) are short linear interaction mediators sequences that mediate protein interactions and can be computationally identified [9]. This approach is essential for understanding the interaction networks involved in the host-pathogen and determining the mimicry process that Plasmodium undertakes during erythrocyte adhesion for its survival [10].
The identification and characterization of key host and parasite protein interactions responsible for clinical forms during the intra-erythrocytic blood phase may facilitate the identification of new, targeted therapies. A literature mining process identified a predominant human glycoprotein of the red blood cell membrane known to be the main invasion route of human erythrocytes leading to malaria symptoms B3AT (current HUGO gene symbol SLC4A1) [11]-[13]. B3AT was found to interact with the well-known parasitic pathogen protein A5K5E5 [14], although the ligand-binding domain remains unknown. This study aimed to apply SLiMs predictors, molecular modeling, and docking predictors to ascertain whether mimicry mechanisms mediate the interaction between B3AT and A5K5E5 and to determine and predict other interactions between B3AT and Plasmodium in an effort to guide future studies for identifying new therapeutic agents.
2. Materials and Methods
2.1. Protein-Protein Interaction Network Building
The Agile Protein Interactomes DataServer (APID) protein interaction repository [15] was used to identify interactions between the human B3AT protein and other human proteins. All reported protein pairs (without quality filtering, classified as Level 0) were chosen with interactions and verified against at least one additional method. These results included 25 interactions retrieved on October 7, 2021 which were chronicled for further investigation.
2.2. SLIM Identification and Characterization
Short, Linear Motif Finder (SLiMFinder) is a web server tool for predicting protein interactors and finding Short Linear Motifs (SLIMs) in a set of sequences [16]. All interactors maintained in the APID database were abstracted for the human B3AT protein. The sequences of each interactor were then retrieved using the Universal Protein Resource (Uniprot) [17] and input into SLiMFinder to perform SLIM identification. Using the Patmatdb search engine, screening, and identification for the motifs were carried out using SLiMFinder in the sequences of Plasmodium proteins to predict those interacting with B3AT.
2.3. Protein Model Construction
The Human B3AT structure (PDB identification number 4YZF) with a resolution of 3.50 A with four chains (labeled A, B, C, and D), all including segments involved in the interaction with parasite A5K5E5 [18]. Since the three-dimensional (3D) structure of A5K5E5 was unavailable, the protein structure prediction was performed with MODELLER (version 10.2, University of California San Francisco, San Francisco, CA) [19]. This workflow performs homology or comparative modeling computer program models the 3D structures of proteins and their assemblies by satisfying spatial constraints. Three main inputs were required to perform comparative modeling tasks. The first input was an alignment file (with a .pir file extension) containing the pairwise alignments between the target and template sequences. The second input was the cleaned and relaxed structure of the selected templates. The third input was a Python script file (with a .py file extension) to run the modeling processes [20]. It was necessary to generate a large number of models to distinguish decoy structures from native structures. Discrete Optimized Protein Energy (DOPE) scores were calculated using the MODELLER workflow to accomplish these tasks. Manual verification is recommended using molecular visualizers such as Pymol [21], which allows the evaluation of the overall quality of the selected model. The quality of the modeled structures was assessed by checking the stereochemical errors using the SAVES server (https://saves.mbi.ucla.edu/). This approach was essential for the correct interpretation of the model and performed to verify stereochemical quality by establishing the Ramachandran plot.
2.4. Protein-Protein Docking
High Ambiguity Driven protein-protein Docking (HADDOCK) is a server that predicts the interface interaction of two interacting proteins in ambiguous interaction restraints (AIRs) using biochemical or biophysics data [22]. This application performs a driven, flexible docking approach for modeling biomolecular complexes [23]. For this study, docking was guided by SLIMs. The server aimed at uncovering the molecular mechanism of protein-protein interactions via the predicted complex.
2.5. Structure Preparation and Docking
Pymol was used to remove water molecules, add hydrogen atoms, and the structures were optimized using energy minimization. The B3AT structure was considered as the receptor, and the structure A5K5E5 was set as the ligand. According to the two-hybrid experiments, the SLIM-containing segment was selected for inclusion into the interaction interface between the two proteins [24]. Next, HADDOCK started the molecular docking through 3D structure analyses and provided molecular orientation and constructed complex structures based on the SLIM information. The workflow was adapted to identify the interaction zones through protein-protein docking.
3. Results
3.1. B3AT Protein Interactions
The APID program permitted the identification of 25 interactions, which were validated at least by two experiments (Level 0: all reported protein with no quality filter) and involving B3AT protein, including the A5K5E5 protein (Table 1).
Table 1. List of interactions present in APID, proven by at least one experiment and involving B3AT.
Uniprot ID |
Protein |
Tax ID |
Experiments |
Methods |
Publications |
3D Structures |
P27105 |
STOM_HUMAN |
9606 |
3 |
3 |
1 |
0 |
P11166 |
GTR1_HUMAN |
9606 |
2 |
2 |
1 |
0 |
P05026 |
AT1B1_HUMAN |
9606 |
5 |
6 |
1 |
0 |
P42771 |
CDN2A_HUMAN |
9606 |
5 |
6 |
1 |
0 |
A5K5E5 |
A5K5E5_PLAVS |
126793 |
2 |
2 |
1 |
0 |
O75955 |
FLOT1_HUMAN |
9606 |
1 |
1 |
1 |
0 |
P02724 |
GLPA_HUMAN |
9606 |
4 |
4 |
2 |
0 |
P00918 |
CAH2_HUMAN |
9606 |
4 |
2 |
4 |
0 |
Q14254 |
FLOT2_HUMAN |
9606 |
2 |
2 |
1 |
0 |
P04406 |
G3P_HUMAN |
9606 |
2 |
2 |
1 |
0 |
Q9BWU0 |
NADAP_HUMAN |
9606 |
2 |
3 |
1 |
0 |
P22748 |
CAH4_HUMAN |
9606 |
2 |
3 |
1 |
0 |
P16157 |
ANK1_HUMAN |
9606 |
2 |
2 |
3 |
0 |
Q9NP59 |
S40A1_HUMAN |
9606 |
1 |
1 |
1 |
0 |
P29972 |
AQP1_HUMAN |
9606 |
1 |
1 |
1 |
0 |
P18031 |
PTN1_HUMAN |
9606 |
2 |
2 |
1 |
0 |
P16452 |
EPB42_HUMAN |
9606 |
1 |
2 |
2 |
0 |
Q1336 |
UT1_HUMAN |
9606 |
1 |
1 |
1 |
0 |
P02549 |
SPTA1_HUMAN |
9606 |
1 |
1 |
1 |
0 |
P02730 |
B3AT_HUMAN |
9606 |
1 |
2 |
1 |
0 |
P27824 |
CALX_HUMAN |
9606 |
1 |
2 |
1 |
0 |
P10909 |
CLUS_HUMAN |
9606 |
1 |
1 |
1 |
0 |
Q92831 |
KAT2B_HUMAN |
9606 |
1 |
1 |
1 |
0 |
P04075 |
ALDOA_HUMAN |
9606 |
1 |
1 |
1 |
0 |
P43405 |
KYSK_HUMAN |
9606 |
1 |
2 |
2 |
0 |
Note. ID = identification number, 3D = three-dimensional.
To better characterize B3AT by A5K5E5 interactions, the SLIMs were first defined that allowed the interaction between B3AT and any other protein by screening the sequence of the latter (B3AT interactors) for SLIMs. SLIMFinder identified “[DE] I..R” as a SLIM (bracket means one of the amino acids, not both, and dots could be any amino acid). The PATMATDB program was then used to screen the human proteome from ENSEMBL [25] and the parasite proteome from PlasmoDB (release 55, The Plasmodium Genome Consortium) for the identified SLIM motif. This SLIM was highly represented in the human (7,424 proteins) and P. falciparum (584 proteins) proteomes emphasizing the importance of the interaction between B3AT and Plasmodium. The statistics of the top 10 clusters are shown in Table 2.
Table 2. The statistics of the top 10 clusters generated by HADDOCK approach.
Characteristic |
Cluster numbers |
5 |
3 |
9 |
7 |
12 |
6 |
1 |
2 |
8 |
11 |
Cluster size |
10 |
15 |
6 |
7 |
4 |
8 |
49 |
23 |
6 |
5 |
Z-score |
−2.2 |
−0.8 |
−0.1 |
−0.1 |
−0.1 |
−0.0 |
0.2 |
0.5 |
0.9 |
1.8 |
HADDOCK score1 |
−201.3(15.8) |
−166.6 +/− 2.0 |
−148.8 +/− 16.4 |
−148.7 +/− 15.6 |
−146.6 +/− 19.8 |
−145.6 +/− 4.9 |
−140.8 +/− 5.9 |
−132.5 +/− 2.9 |
−121.5 +/− 10.5 |
−98.6 +/− 14.9 |
RMSD from the overall lowest energy structure1 |
0.7(0.4) |
11.3 +/− 0.1 |
9.4 +/− 0.2 |
16.8 +/− 0.5 |
24.0 +/− 0.1 |
22.8 +/− 0.5 |
7.6 +/− 0.8 |
16.7 +/− 0.2 |
19.9 +/− 0.1 |
7.5 +/− 0.1 |
Van der Waals energy |
−117.9 (10.4) |
−95.2 +/− 4.3 |
−89.8 +/− 11.9 |
−89.7 +/− 11.9 |
−92.7 +/− 6.7 |
−102.9 +/− 16.3 |
−84.9 +/− 3.8 |
−99.1 +/− 2.7 |
−79.4 +/− 5.5 |
−80.3 +/− 11.3 |
Electrostatic energy1 |
−361.1 (65.0) |
−405.6 +/− 48.3 |
−125.4 +/− 37.4 |
−354.4 +/− 42.9 |
−182.6 +/− 44.2 |
−258.9 +/− 18.0 |
−280.4 +/− 55.9 |
−64.4 +/− 44.6 |
−246.8 +/− 33.9 |
−55.4 +/− 50.5 |
Desolvation energy1 |
−75.5(7.5) |
−42.7 +/− 11.4 |
−105.8 +/− 4.2 |
−69.9 +/− 6.8 |
−101.5 +/− 12.2 |
−44.6 +/− 19.6 |
−50.4 +/− 12.5 |
−96.5 +/− 6.0 |
−51.6 +/− 4.4 |
−78.0 +/− 14.1 |
Restraints violation energy1 |
642.5 (59.05) |
523.8 +/− 61.40 |
719.2 +/− 42.64 |
817.6 +/− 91.63 |
841.2 +/− 160.14 |
536.4 +/− 54.50 |
505.8 +/− 11.29 |
760.7 +/− 90.09 |
588.4 +/− 68.53 |
708.5 +/− 49.24 |
Buried Surface Area1 |
3873.7 (257.7) |
3516.2 +/− 108.1 |
3303.3 +/− 151.1 |
3116.1 +/− 249.3 |
3232.5 +/− 85.0 |
3571.3 +/− 327.0 |
3037.9 +/− 157.6 |
3175.5 +/− 127.7 |
2755.1 +/− 116.8 |
2857.4 +/− 262.3 |
Note. RMSD = Root-mean-square deviation. Z-score indicates the number of standard deviations from the average this cluster is in terms of score (lower values correspond to better performance); 1Results are expressed as scores or responses with their associated standard deviations shown in parentheses.
The top cluster was the most reliable according to the HADDOCK approach. The best-oriented complex provided by this approach had a significant HADDOCK score of −201.3.
Next, B3AT and A5K5E5 proteins were analyzed to fully understand their functions and involvement in the parasite invasion mechanisms on human red blood cells. The glycoprotein B3AT comprises two domains, the first being a cystolic N-terminal domain at position 1 - 360 and the second an integral membrane domain from 361 - 911. This protein is a dual role protein and serves primarily as an exchanger of HCO3/Cl-anions across the cell membrane of red blood cells (RBCs) and hepatocytes [26]. The tryptophan-rich antigen protein A5K5E5 is classified among the “Pv-fam-a” family of antigens. Some of these proteins are potential drug or vaccine targets, but their functional role(s) remain largely unexplored [27] [28].
3.2. Model Prediction
MODELLER was set to generate 100 models from the primary sequence of the tryptophan-rich antigen (A5K5E5). The top 10 best conformations were selected based on the lowest system energy described by the DOPE score. Following manual checks with the molecular visualization software “pymol” [29], the selected conformation was adopted for further structural investigation (Figure 1).
Figure 1. Optimal model for the tryptophan-rich antigen predicted by MODELLER.
3.3. Molecular Docking
Before proceeding to dock in-silico experiments, 3D structures were prepared and verified. The tryptophan-rich antigen protein (A5K5E5) structure was validated using Ramachandran Plots and subsequently refined. The B3AT protein structure was extracted from the Protein DataBase, cleaned, and refined, and the amino acids residues SLIM part was highlighted on the 3D structure (Figure 2).
The SLIM components were represented in terms of the structures as red spheres. After a multitude of docking attempts and several layers of refinement of the in-silico interaction process, the most stable docking position was determined among hundreds of poses and molecular orientations. The HADDOCK approach revealed 149 structures in 12 clusters, representing 74.5% of the water-refined models generated. Figure S1 illustrates the molecular interactions between Human B3AT and P. falciparum A5K5E5. The relevant residues involved in the B3AT-A5K5E5 docking process and resume the type of interaction between them are included as supplemental material (Tables S1 and Tables S2).
A bio-complexes contact map was generated to analyze and visualize the interface in the biological complex in focus efficiently and effectively. This output provided detailed information about the interacting residues with a cut-off distance of 6 Å (Figure S2). The contact map illustrated a distinct interaction pattern between the A5K5E5 and B3AT proteins from the side of the SLIM of the B3AT (Figure 3).
Figure 2. SLIMs identification for the human B3AT protein.
Figure 3. B3AT-A5K5E5 docking Complex template predicted. Human B3AT is colored in white and the parasite protein A5K5E5 on surface representation in green. Protein-protein interaction is mediated by SLIMs with residues E I..R.
4. Discussion
This study aimed to understand the mechanism of interaction of P. falciparum, the most clinically significant malaria pathogen, at the intra-erythrocytic stage that causes the symptoms of the disease in humans. The B3AT by P. falciparum tryptophan-rich antigen (accession code A5K5E5) interaction was demonstrated by several studies involved in the human erythrocyte invasion by malaria parasites causing malaria clinical symptoms [30]. Indeed, the B3AT protein is the main invasive route and most abundant (covering more than 25% of the surface of erythrocytes) membrane protein in human erythrocytes. This protein is considered as the preferred receptor used by P. falciparum and P. vivax [31]. The results showed that the five amino-acid sequence [DE] I..R detected by SLiMFinder mediated the B3AT by A5K5E5 interaction. The proteome analysis with PATMATDB yielded 7,424 human proteins and 584 parasite proteins interacting with B3AT through this SLIM motif. These results could potentially provide insight toward further understanding the molecular mimicry strategy used by the P. falciparum to escape the immune system and invade incognito the erythrocytes and destroy them [32].
There are three main families of variant antigens PfEMP1, RIFIN and STEVOR in P. falciparum, each transcribed in chronological order during the asexual cycle of the parasite. These findings suggest that PfEMP1 is transcribed first, then RIFIN appears in the early trophozoite stages and finally STEVOR at the end of the trophozoite stage. Despite this sequence, all three variant antigens are present on the surface of the parasite and show the genetic capacity of the parasite to develop three different mechanisms that facilitate resetting [33] [34]. Further analysis of the SLIM sequence showed that it is mainly present in the Subtelomeric Variable Open Reading frame (STEVOR) protein and Repetitive Interspersed Families of Polypeptides (RIFIN). These proteins belong to variant antigens families like PfEMP1, which interact with Hematite Glycoproteins C.
SLIMs have been used to guide the molecular protein-protein docking between B3AT and A5K5E5. The molecular docking pose showed that A5K5E5 interacts with SLIMs present in B3AT. The insights of these interactions should be further investigated to understand the interaction mechanism between human erythrocytes and malaria parasites. Despite the availability of some treatments, we observed that Plasmodium still succeeds in resisting antimalarial drugs, likely because of the complexity of its mechanism of action [35] [36].
SLIM identification plays an essential function in the life of the parasite and thus are key factors for understanding complex protein-protein interactions. The most important for the role of SLIMs involves recognition of functionally pertinent residues in proteins of known and unknown function [37], and also an efficacy mechanism for parasites to modify pathways in host cells [38]. By blocking the interaction of the docked complex predicted here it will be possible to interrupt the parasite life cycle and fight against malaria. Without an experimental structure of P. falciparum tryptophan-rich antigen, the model of interaction proposed here can be used as a reference for clarifying the structure-function relationship and can lead to the conception of new drugs in a meaningful way. The observed interactions can be taken as a building block for monitoring experiments that could provide a better understanding of disease pathogenesis and parasite biology and offer new opportunities for identifying new therapeutic targets for diseases such as malaria [39].
Availability of Data and Materials
All individual data analyzed for this study were abstracted from publicly available data sources. Data tabulations are provided in this article and its supplemental materials.
Acknowledgments
The authors thank the Fogarty International Center of the National Institutes of Health and eLwazi Open Data Science Platform and Coordinating Center for financial support for our trainees who contributed substantially to this effort. We also thank the African Center of Excellence in Bioinformatics (ACE) for supporting infrastructure and computer laboratories. We dedicate this work to those impacted by malaria. We sincerely hope this work serves as a building block towards its ultimate worldwide malaria eradication.
Authors’ Contributions
FGF, AK, CC, AB, JGS, SD, MW conceived and wrote the manuscript;
FGF, AB, SD, MW performed the initial reviews of the data;
FGF, AK, CC, OS, HO carried out statistical analyses;
FGF, AB, AK, OS, HO performed the secondary reviews of the data;
MW, CC, SD, JGS assisted in carrying out the training program;
FGF, AK, CC, OS, AB, HD, MS, JGS, SD, MW prepared the final draft of the manuscript, which was then reviewed and approved by all authors.
Funding
This study was supported by the United States Fogarty International Center and National Institute of Allergy and Infectious Diseases sections of the National Institutes of Health under award number U2RTW010673 for the West African Center of Excellence for Global Health Bioinformatics Research Training and the United States National Institutes of Health, National Institute of Biomedical Imaging and Bioengineering grant 5U2CEB032224-eLwazi Open Data Science Platform. In addition to that, the Third World Academy of Sciences grant (no. 16-218 RG/PHA/AF/AC_C-FR3240293272).
Supplementary Material
Figure S1. The workflow adopted to identify the interaction zones through the protein-protein docking
Figure S2. Distance contact maps of B3AT-A5K5E5 complex. Intermolecular contact map of B3AT-A5K5E5 with SLIM obtained by COCOMAPS software.
Table S1. List of relevant residues in B3AT-A5K5E5 docking and making H-bond interaction.
Donor |
Acceptor |
Dist (Å) |
Res1 |
N° res1 |
Atom1 |
Chain1 |
Res2 |
N° res2 |
Atom2 |
Chain2 |
Dist H-Bonds |
Dist CA-CA |
SER |
17 |
OG |
A |
SER |
594 |
OG |
B |
2.74 |
7.07 |
LYS |
300 |
NZ |
A |
GLN |
754 |
OE1 |
B |
2.88 |
8.43 |
LYS |
301 |
NZ |
A |
ILE |
753 |
O |
B |
2.83 |
5.83 |
VAL |
305 |
N |
A |
ASN |
593 |
O |
B |
3.32 |
5.92 |
LYS |
590 |
NZ |
B |
GLU |
218 |
O |
A |
2.67 |
9.33 |
LYS |
590 |
NZ |
B |
ASP |
219 |
OD1 |
A |
2.71 |
9.27 |
ARG |
589 |
NH2 |
B |
GLN |
302 |
OE1 |
A |
2.82 |
12.57 |
ASN |
593 |
ND2 |
B |
GLN |
302 |
OE1 |
A |
2.77 |
8.49 |
ARG |
602 |
NH2 |
B |
ASN |
304 |
OD1 |
A |
2.89 |
11.70 |
ARG |
602 |
NH1 |
B |
VAL |
305 |
O |
A |
2.99 |
10.91 |
ARG |
602 |
NH2 |
B |
VAL |
305 |
O |
A |
2.73 |
10.91 |
ARG |
603 |
NH2 |
B |
GLU |
309 |
OE1 |
A |
2.70 |
10.58 |
PHE |
597 |
N |
B |
VAL |
316 |
O |
A |
2.94 |
4.80 |
The residues present are the most essential in the formation of the complex in the process of B3AT-A5K5E5 docking.
Table S2. Different types of interactions between residues illustrating the different types of interactions involved in the formation of the B3AT-A5K5E5 complex.
Title |
Value |
Number of interacting residues Molecule1 |
92 |
Number of interacting residues Molecule2 |
78 |
Number of hydrophilic-hydrophobic interaction |
230 |
Number of hydrophilic-hydrophilic interaction |
175 |
Number of hydrophobic-hydrophobic interaction |
46 |