1. Introduction
The epitope-based vaccines are a new generation of vaccines which are very well tolerated and have fewer side effects than the conventional vaccines. The in silico prediction of peptide binding affinities to MHC proteins is a very important first step in the process of vaccine design and development. Peptides which act as T-cell epitopes bind to MHC molecules; thus all T-cell epitopes are MHC binders but not all MHC binders are T-cell epitopes. Binding to a MHC protein is a necessary but not a sufficient condition for a peptide to be an epitope. Peptides presented by an MHC on the cell surface have either an intracellular or an extracellular origin. MHC class I molecules, present on most cell types, present peptides primarily from protein synthesized within the cell (endogenous processing pathway). MHC class II molecules, expressed on a restricted number of cell types, such as dendritic cells, B cells and macrophages, can present peptides derived from endocytosed extracellular proteins (exogenous processing pathway) [1]. A principal feature of MHC molecules is their allelic polymorphism: 3,411 human leukocyte antigens (HLA) class I and 1,222 HLA class II molecules were listed by the ImMunoGeneTics/HLA database in July 2010 [2].
Peptides binding to MHC class II proteins are typically between 10 and 20 residues in length. The complex comprising peptide and class II molecule is expressed on the cell surface and interacts exclusively with CD4+ T cells (helper T-cells, THC). TH cells help to trigger an appropriate immune response which may include localized inflammation and swelling due to recruitment of phagocytes or may lead to an antibody-mediated immune response via B-cell activation. X-ray data from peptide-MHC class II [3-6] and TCRpeptide-MHC class II complexes [7-9] indicate that nine amino acids are bound in an extended conformation within the peptide binding groove of the MHC. The MHC class II binding groove is formed by two separate protein chains: α and β [10]. A dozen hydrogen bonds are formed between the MHC and the peptide main-chain carbonyl and amide groups. There are five pockets in the binding groove: one deep pocket (denoted p1), and four shallow pockets (p4, p6, p7 and p9), that accept peptide side chains. Peptide side chains at p2, p3, p5 and p8 project outward toward the T-cell. Compared to MHC class I, the MHC class II peptide binding groove is open at both ends. The degree to which this allows many potential registers in which a peptide might bind remains a controversial issue [11,12].
Systematic mapping of peptides binding to MHC proteins involves the synthesis and testing of all overlapping peptides spanning the whole length of a target antigen: a costly and time-consuming task. Alternatively, computational methods which can predict the best binding peptides can be used before any experimental work, followed by synthesis and testing of a tiny subset of the potential peptides. There are now several servers for MHC class II binding prediction. A recent study indicates that only certain servers perform sufficiently well to be useful and useable [13].
Combining results from multiple prediction tools often increases overall accuracy. Such a consensus strategy was proposed by Mallios [14], who combined SYFPEITHI [15], ProPred [16] and the iterative stepwise discriminant analysis meta-algorithm [17]. MULTIPRED [18] integrates hidden Markov models (HMMs) and artificial neural networks (ANN). Six MHC class II predictors were combined by Karpenko et al. [19] basing its overall prediction on the probability distributions of the different scores. Wang et al. [20] applied a consensus method to calculate the median rank of the top three predictive methods for each MHC class II protein initially evaluated so as to rank all possible 8-, 9- and 10-mers from one protein. This rank was used then to select the top 1% of peptides within each protein.
Here, we explore the effectiveness of different strategies for combining five servers for MHC class II binding prediction: ProPred [16], RANKPEP [21], NetMHCIIpan [22], NetMHCII [23], and EpiTOP [24,25]. Previous work identified these servers as the best single predictors available [26]. Our aim here is to test their combined use, with the hope of generating more accurate and more reliable overall predictions than when used individually. The servers were used in union and intersection modes. Union output compiles the results of two or three servers, while the intersection output selects only commonly predicted binders.
2. Materials and Methods
2.1. Test Set
The test set comprised 4540 binders of different length, originating from 167 proteins. The data was extracted from the immune epitope database (December 2009) [27]. The study was performed on 12 DRB1 Alleles. The peptides bind the following alleles: DRB1*0101 (2051 Binders), DRB1*0301 (190 Binders), DRB1*0401 (392 Binders), DRB1 *0404 (159 Binders), DRB1*0405 (244 Binders), DRB1*0701 (336 Binders), DRB1*0802 (153 Binders), DRB1*0901 (160 Binders), DRB1*1101 (275 Binders), DRB1*1201 (24 Binders), DRB1*1302 (243 Binders) and DRB1* 1501 (313 Binders). Some of the servers do not predict binding to all DRB1 alleles used in the test set. Only servers NetMHCIIpan and EpiTOP make predictions for all 12 DRB1 alleles. Although many methods give quantitative predictions, in our evaluation they were used as classification methods. Each server was evaluated only on the alleles it predicts. The test set is available as supporting information.
2.2. Servers Used in the Study
The five best performing servers from our preliminary study were used here (Table 1) [26]. NetMHCIIpan and NetMHCII are ANN-based servers. ProPred predicts MHC class II binding peptides using quantitative matrices (QM) based on pocket profiles [28]. RANKPEP uses position-specific scoring matrices (PSSM) which represent the observed sequence-weighted frequency of all amino acids in every position of a sequence alignment. EpiTOP is a newly developed method for MHC class II binding prediction based on proteochemometrics [25]. It is a matrix-based method which considers both peptide and protein binding-site amino acids contributions.
2.3. Union Method
The complete sequence of each protein was submitted to each server and the results recorded. The top 5% of the ranked predicted binding nonamers was used as a threshold. Two-and three-server combinations were inspected. The top 5% of the best predicted binders from each
Table 1. Servers for MHC class II binders prediction used in the present study.
server were compiled into one set and compared to the set of known binders originating from the same protein. An identified binder was considered to be any nonamer sequence available within the tested binder peptide, which may be of arbitrary but longer length. Identified binders are shown as a percentage of all binders (sensitivity). Additionally, to test the precision of prediction, a positive predictive value (PPV) was calculated as a ratio of true binders to all predicted binders included in the top 5%.
In mathematical terms, the union method corresponds to applying the logical operator OR (). If the predicted top 5% best binders generated by server A form set a, and the top 5% of the best binders predicted by server B form set b, then the union set c = ab.
2.4. Intersection Method
The same sets of the top 5% of the best predicted binders generated by each server were used in the intersection method. Intersection sets contain only common nonamer binders identified from a particular protein, as predicted by different numbers of servers. Two-, three-, four-and five-server combinations were inspected. The common binders were compared to the set of known binders originating from the same protein. An identified binder was considered to be any nonamer sequence available within the tested binder peptide, which may be of arbitrary but longer length. The final sensitivity and PPV were assessed from the number of true binders identified by two, three, four, or five servers simultaneously.
In mathematical terms, the intersection method corresponds to the logical operator AND (). If the predicted top 5% best binders generated by server A form set a, and the top 5% of the best binders predicted by server B form set b, the intersection set c = ab.
2.5. Performance Measures
Sensitivity is the proportion of experimentally determined binders that are predicted as binders. It is defined as true positives/(true positives + false negatives). Positive predictive value (PPV) is defined as true positives / (true positives + false positives). Server performance was assessed using the sensitivity and PPV at the top 5% of the best predicted binders.
3. Results
For this assessment, servers were selected on the basis of the following criteria: computational or machine-learning method-based, free web access, and the ability to predict binding to at least 9 of the 12 HLA-DRB1 alleles considered in this study. Given these criteria, previous studies indicated that the following were the best performing servers: NetMHCIIpan, NetMHCII, ProPred, RANKPEP, and EpiTOP.
3.1. Single Predictor Performance
The performance of single predictors is shown in Figure 1. At the top 5% threshold, NetMHCIIpan and NetMHCII perform best with 55% sensitivity. ProPred, EpiTOP and RANKPEP perform almost as well with sensitivities of 46%, 45% and 44%, respectively. PPVs range from 8% for RANKPEP and EpiTOP to 10% for NetMHCIIpan, NetMHCII and ProPred.
3.2. Union Method Performance
The results of combining two or three servers are shown in Figures 2 and 3. Any union combination performs better than single predictors. At the top 5% level, sensitivity ranges from 65% to 70% in two-server combinations and from 72% to 79% in three-server combinations. However, in terms of PPV, union outputs perform poorly. The highest PPV is 8%.
Figure 2. Two-server union method performance: 1 - NetMHCIIpan-+NetMHCII; 2 - NetMHCIIpan+ProPred; 3 - NetMHCIIpan + RANKPEP; 4 - NetMHCIIpan + EpiTOP; 5 - NetMHCII + ProPred; 6 - NetMHCII + RANKPEP; 7 - NetMHCII + EpiTOP; 8 - ProPred + RANKPEP; 9 - ProPred + EpiTOP; 10 - RANKPEP + EpiTOP.
Figure 3. Three-server union method performance: 1 - NetMHCIIpan + NetMHCII + ProPred; 2 - NetMHCIIpan + NetMHCII + RANKPEP; 3 - NetMHCIIpan + NetMHCII + EpiTOP; 4 - NetMHCIIpan + ProPred + RANKPEP; 5 - NetMHCIIpan + ProPred + EpiTOP; 6 - NetMHCIIpan + RANKPEP + EpiTOP; 7 - NetMHCII + ProPred + RANKPEP; 8 - NetMHCII + ProPred + EpiTOP; 9 - NetMHCII + RANKPEP + EpiTOP; 10 - ProPred + RANKPEP + EpiTOP.
3.3. Intersection Method Performance
When servers are used in intersection mode, their sensitivities are typically poor (Figure 4). The common binders predicted by five servers identify only 4% of the known binders at the top 5% threshold. The four-server combinations recognize between 5% and 14%, threeserver combinations identify between 6% and 27%, and two-server combinations yield between 9% and 41% of the known binders. In contrast to the union mode, increasing the number of servers within the intersection decreases the sensitivity; yet the opposite is true for PPVs: increasing the number of servers increases the precision of the prediction, PPV reaches 31% for 5- server combination.
4. Discussions
In the present study, the impact on predictive accuracy of combining up to five servers for MHC class II binding
Figure 4. Intersection method performance: 1 - NetMHCIIpan + NetMHCII + ProPred + RANKPEP + EpiTOP; 2 - NetMHCIIpan + NetMHCII + ProPred + RANKPEP; 3 - NetMHCIIpan + NetMHCII + ProPred + EpiTOP; 4 - NetMHCIIpan + NetMHCII + RANKPEP + EpiTOP; 5 - NetMHCIIpan + ProPred + RANKPEP + EpiTOP; 6 - NetMHCII + ProPred + RANKPEP + EpiTOP; 7 - NetMHCIIpan + NetMHCII + ProPred; 8 - NetMHCIIpan + NetMHCII + RANKPEP; 9 - NetMHCIIpan + NetMHCII + EpiTOP; 10 - NetMHCIIpan + ProPred + RANKPEP; 11 - NetMHCIIpan + ProPred + EpiTOP; 12 - NetMHCIIpan + RANKPEP + EpiTOP; 13 - NetMHCII + ProPred + RANKPEP; 14 - NetMHCII + ProPred + EpiTOP; 15 - NetMHCII + RANKPEP + EpiTOP; 16 - ProPred + RANKPEP + EpiTOP; 17 - NetMHCIIpan + NetMHCII; 18 - NetMHCIIpan + ProPred; 19 - NetMHCIIpan + RANKPEP; 20 - NetMHCIIpan + EpiTOP; 21 - NetMHCII + ProPred; 22 - NetMHCII + RANKPEP; 23 - NetMHCII + EpiTOP; 24 - ProPred + RANKPEP; 25 - ProPred + EpiTOP; 26 - RANKPEP + EpiTOP.
prediction was evaluated. The top 5% of the best predicted binders for each server were combined using union and intersection operators. The outputs were evaluated in terms of sensitivity and positive predictive value. Union outputs showed high sensitivities (65% - 79%) and low PPVs (6% - 8%), while intersection outputs had low sensitivities (4% - 41%) but high PPVs (14% - 31%). The trade-off between sensitivity and PPV thus defines a combination. Uniting the outputs of different predictors brings more “noise” than “signal” to the resulting set of predicted binders. Conversely, selecting only the common predicted binders increases the probability of identifying true binders.
The predictive ability of any model depends strongly on the data used to derive it. Models work better interpolating between data than extrapolating from them. Generally, data used in MHC binding prediction methods fall into two categories: ligand-based and structure-based. Ligand-based data are focused on the structure of binding peptides, while structure-based data makes use of the 3D structures of target macromolecules and especially from structures of their binding sites. The nature of any analysis determines the choice of data and methods. Limitations in the quality and availability of data can have profound consequences for the efficiency and predictivity of resulting methods.
The aim of both ligand-based and structure-based MHC binding prediction is to identify viable biophores that interact with the great variety of binding sites implicit within the population of MHC molecules. In immunology, such biophores are typically called motifs. Ligand-based methods achieve this using sets of binding (and non-binding) peptides. Most of the known predictors are ligand-based, starting with motif-searching algorithms, progressing though different quantitative matrices, to more sophisticated machine-learning methods, such as ANN, HMM and SVM. Among the servers used in the present study NetMHCII and RANKPEP are ligandbased methods for MHC binding prediction.
In contrast, structure-based methods identify biophores using the structures of MHC binding sites. Sturniolo’s method based on MHC pocket profiles is an example of a structure-based method for MHC binding prediction [28]. Each MHC pocket on the binding site is determined by a set of amino acids; some are conserved, others are polymorphic. The interactions made by all natural amino acids with a given pocket establish the pocket profile. Pocket profiles are nearly independent of the remaining MHC binding site. QMs were generated for a large number of HLA class II alleles based on different combinations of pocket profiles. Among the servers used in the present study ProPred uses QMs based on Sturniolo’s pocket profiles. NetMHCIIpan and EpiTOP are mixed ligandand structure-based methods, because they consider information from both binding ligands and binding sites.
Apart from using different training sets, MHC class II binding predictors also differ in methodology. Each method can be evaluated in two ways: interpretative ability and predictive ability. The interpretation of models is a vital issue in immunoinformatics. In terms of interpretation, methods used for prediction could be classified into “easy to interpret” and “black boxes”. Motif-based search and QMs are “easy to interpret” methods, while machine-learning methods, like ANN, HMM and SVM, belong to the “black box” class. Because of many approximations, “easy to interpret” models often have moderate predictive ability, especially those using ligand -based training sets, while “black box” methods have a high capacity for identifying binders [29]. Thus there is often a trade-off between the interpretative and predictive abilities of MHC binding prediction methods. One must choose between easy to interpret but moderately predictive and highly predictive but uninterruptable methods. Using methods from the two groups in combination is a sensible compromise.
Peptides which bind to MHC proteins are extremely flexible molecules with very many low-energy conformations. Also, the binding site on class II proteins is open-ended which potentially allows a peptide to bind in several different registers [11,12]. To reduce this inherent uncertainty, MHC class II predictors use the “one binderone pattern” assumption. Another approximation used by QM methods is the additivity concept based on the hypothesis of independent binding of residues [30], which considers the binding affinity as a linear sum of the binding affinities at each peptide position. Of the servers tested, only EpiTOP avoids this assumption by including cross terms. Peptide binding to MHC molecules is neither single pattern-based, position-independent, or linear additive.
Because of its complexity, predicting MHC binding is seemingly beyond the power of a single predictor. We can seek to overcome such limitations by combining several predictors, each using different training sets and implementing different methods. In terms of the training sets used to develop the predictors, NetMHCII and RANKPEP are pure ligand-based, ProPred is pure structure-based, while NetMHCIIpan and EpiTOP are mixed methods. In terms of methodology used, ProPred, RANKPEP and EpiTOP use QMs, while NetMHCIIpan and NetMHCII use ANN. ProPred and RANKPEP apply the hypothesis of independent binding of peptide side chains, NetMHCIIpan and NetMHCII as ANN methods represent nonlinear relationships, while EpiTOP is a QM containing cross terms to capture the non-linearity of binding.
Considering all these differences between the predictors, it is not surprising that the intersection of the five servers has low sensitivity but high PPV. Single servers do not make good or bad predictions, they just make quite different predictions. Only 4% of the known binders in the test set are predicted by all five servers at the top 5% threshold. Four servers identify 14%, three servers find 27%, two servers recognize 41% of the known binders. At the same time, PPV increased with the number of servers: from 14% for two-server combinations to 31% for five-server combination. Thus, using the predictors in an intersection mode decreases the overall number of identified binders but increases the precision of prediction. Each prediction is much more likely to be correct, though many binders will be missed.
Results are quite different for the approach using data union. The sensitivity of all two-server combinations ranges from 65% to 70% at the top 5% threshold. The three-server combinations reach 79% sensitivity at the same level. Such sensitivity is currently beyond that of any single predictor. The best performing combination is NetMHCIIpan, RANKPEP and EpiTOP, followed closely by the NetMHCII, RANKPEP and EpiTOP combination with 78% sensitivity. Unfortunately, the precision of the predictions made by the union method is very low; the highest PPV is 8%. This means that increasing the number of predicted also increases false positives relative to the number of true positives, or more “noise” than “signal”.
Due to the high resource implications of experimental testing, when scanning a large proteome high numbers of false positives present a greater problem than high numbers of false negatives. Taking into account only the best predicted binders significantly reduces the number of false positives. With this in mind, three conclusions could be derived from the present study. First, combinations of different servers work better than single servers. Second, when the aim of the immunological study is to identify as many binders as possible, servers should be used in union mode. Third, when efficiency is a priority, and experimental work aims to pick out only a few, highly probable binders, server outputs should be combined using the intersection mode.
Meta-prediction is a now a well-established strategy within bioinformatic prediction [31-33]. This approach seeks to amalgamate the output of various predictors, typically internet servers, in an intelligent way so that the combined results possess greater accuracy than that of any individual predictor. Within Immunoinformatics, Trost et al. have used a heuristic method to address class I peptide-MHC binding [34], while Dai and co-workers have applied these methods to predicting peptides binding to class II MHCs [35]. Making use of such a tactic may in time prove of significant utility. In the current work, we have explored the optimality of combining server results, and largely verified the sagacity of this approach. In future work, the possibility remains to leverage the protocol we have developed in order to coalesce diverse server outputs in a similar reinforcing manner. Our work lays the solid foundation upon which to build future success. While limits exist to what computational vaccinology can achieve, it certainly offers tools and methods that can transform wider clinical and experimental endeavour. Immunoinformatic techniques are of true utilitarian value which can used by to foster and facilitate the design and discovery of vaccines, diagnostics, and reagents.
Supporting Information: The test set used in the study is given as supporting information.
5. Acknowledgements
The study was supported by a grant from the National Research Fund of Ministry of Education and Science, Bulgaria (Grant 02-1/2009).