Development of Predictive QSPR Model of the First Reduction Potential from a Series of Tetracyanoquinodimethane (TCNQ) Molecules by the DFT (Density Functional Theory) Method ()
1. Introduction
Conjugated simple organic molecules carrying both electron donors and acceptors have recently attracted a lot of attention because of their various and interesting properties. Non-linear optical properties [1], molecular electronic devices [2], artificial photosynthesis models [3] and solvatochromic effects [4] are among their potential applications.
Intramolecular electron transfer processes are one of the main topics of current interest in physic organic chemistry [5], particularly regarding tetracyanoquinodimethane (TCNQ)-based charge transfer complexes. In fact, TCNQ is an organic electron acceptor with a high electron affinity [6] [7] [8]. This electron acceptor can react according to an oxidation-reduction process with electron donors to form charge transfer complexes that display electrical properties and various applications. Indeed, it has been used for the synthesis of a large number of charge transfer compounds that have been widely explored as molecular electronics building blocks [9] [10], non-linear optics [11] and organic semiconductors [12] [13]. Existing TCNQ molecules have generally exhibited exemplary redox properties. Improving their properties and finding molecules with even better properties is therefore a challenge for scientific research. However, in the synthesis of these complexes, the objective of organic chemists is to synthesize thermodynamically stable radical species, which is not an easy task. Also, the two molecules constituting the charge transfer complex must have moderate donor and acceptor powers [14]. Under these conditions, the use of alternative methods for experimentation becomes essential. Among these, QSPR (Quantitative Structure-Property Relationships) methods are of great interest and even recommended according to new regulations [15] [16]. They make it possible to develop mathematical models linking physico-chemical properties with molecular structure. They either explain the origin of these properties or predict them for the molecules whose experimental data are not available. Quantum chemistry provides access to a large number of descriptors through its different methods.
The objective of this work is to develop a predictive QSPR model of the first reduction potential from a series of TCNQ molecules using quantum descriptors, to explain and predict the first reduction potential of the future TCNQ molecules of this same family belonging to its applicability domain.
2. Computational Details
2.1. Training Set and Test Set
In the development of the predictive QSPR model of the first reduction potential, we considered a series of forty Tetracyanoquinodimethane derivatives codified TCNQ [17] - [23]. The choice of these molecules is due to the availability of their experimental first reduction potentials. These properties have been all determined by cyclic voltammetry in acetonitrile. These molecules have constituted our database. Thirty of which (75% of the database) were used for the training set and ten molecules (25% of the database) were used for the test set. Table 1 presents these different molecules with their corresponding experimental first reduction potentials expressed in volts (V).
2.2. Computational Theory Level and Softwares
The GaussView 5.0 [24] software was used to represent the 3D structure and visualize the studied molecules. Then, the Gaussian 09 software [25] was used for optimization and frequency calculation (temperature 298.15 Kevin, pressure 1 atmosphere, in vacuum). The theory level used is B3LYP/6-31G(d,p). As for 2D structures, they have been designed with ChemSketch [26]. The EXCEL software [27] was used for graphic representation. The XLSTAT software [28] was used for modeling and statistical tests. For the calculation of the observation levers, the Minitab 18 [29] software was used.
2.3. Statistical Analysis
To develop a QSPR model, a data analysis method is required. This method quantifies the relationship between the studied property and the molecular structure (descriptors). There are several methods for the implementation of a model and the analysis of its statistical data. But the one we used in our study is Simple Linear Regression (SLR) (a single explanatory variable). Generally speaking, the equation of the simple regression is of the form:
(1)
with Y standing for the studied property, X represents the explanatory variable in correlation with the studied property and
are the model regression constants.
The selection of descriptors is a crucial step in QSPR modeling. In this study, the selection of descriptors was based on two criteria described as follows:
• Criterion 1
There must be a linear dependence relationship between the first reduction potential and the descriptors. Under these conditions we shall have
[30] with R, the linear correlation coefficient of the line
.
• Criterion 2
The descriptors must be independent from one another. To do this, the partial correlation coefficient
between the descriptors i and j must be less than 0.70 (
) [30]. For a multilinear regression, the coefficients R and
are expressed as follows:
(2)
and
(3)
The relationships (4), (5), (6) and (7) were used to calculate many statistical and validation parameters:
(4)
(5)
(6)
(7)
where TSS is total sum of squares, ESS stands for extended sum of squares and RSS is residual sum of squares.
• Determination coefficient (R2) [31]
The determination coefficient is given by the following relationship:
(8)
with
(9)
• Standard deviation (s) [32]
It is an indicator of dispersion. It provides information on how the distribution of data is performed around the average. The closer its value is to 0, the better the adjustment and the more reliable will be the prediction.
(10)
• Adjusted determination coefficient (
) [33]
It allows to measure the robustness of a model unlike
. This coefficient is used in multiple regressions because it considers the number of descriptors parameters of the model.
(11)
• Fisher-Snedecor coefficient (F) [34]
It allows to test the global significance of linear regression. A globally significant regression equation contains at least a relevant explanatory variable to explain the dependent variable. The Fisher-Snedecor coefficient is related to the determination coefficient by the following relationship:
(12)
• Kubinyi Criterion (FIT) [35]
It measures the size or robustness of the model. The smaller the FIT, the more robust the model is, meaning that the model has more variables.
(13)
• Cross-validation coefficient (
) [36]
It measures the accuracy of the prediction on the data of the training set
(14)
• Cross-validation criteria (PRESS) [36]
As the sum of the quadratic prediction errors, PRESS (Prediction Sum of Squares) is defined by the relationship:
(15)
This criterion is used to select models with good predictive power (we always look for the smallest PRESS). A Standard Deviation of Error of Prediction (SDEP) is calculated from PRESS:
(16)
In these expressions, n is the number of molecules in the training set, p is the number of explanatory variables.
and
are respectively the experimental and predicted values of property for molecule i and
is the average value of the property for the training set.
• Todeschini’s parameter (
) [37]
is the corrected form of P.P. Roy’s parameter noted
[38]. It allows to know if the model is due to chance correlations or not. If this parameter is greater than 0.50, the model is not due to a chance correlations. It is defined as:
(17)
with
, the average value of
of the models obtained with the randomized property.
• External validation coefficient (
) [39]
It measures the accuracy of the prediction on the test set data.
(18)
here, next refers to the number of test set compounds.
• Parameter (RMSEP) [39]
External predictive ability of QSPR model may further be determined by the Root Mean Square Error in Prediction given by:
(19)
• Roy K. and al. parameters (
and
) [40]
For the acceptable prediction, the value of
should preferably be lower than 0.20 when the value of
is more than 0.50.
(20)
(21)
here
(22)
and
(23)
The parameters
and
are the determination coefficients between the observed and predicted values of the compounds (training set or test set) with and without intercept, respectively. The parameter
bears the same meaning but uses the reversed axes.
• External validation criteria or “Tropsha’s criteria” [36] [41]
There are five such criteria:
v Criterion 1:
v Criterion 2:
v Criterion 3:
and
v Criterion 4:
and
v Criterion 5:
where,
stands for the determination coefficient of molecules for the test set;
represents the determination coefficient of the regression between predicted and experimental values for the test set without intercept;
is the determination coefficient of the regression between experimental and predicted values for the test set without intercept; k stands for the slope of the correlation line (values predicted according to the experimental values with intercept = 0) and
is the slope of the correlation line (experimental values according to the predicted values with intercept = 0). Ouanlo Ouattara et al. [42] reported that if at least 3/5 of the Tropsha’s criteria are verified, the QSPR model developed is considered as a successful model in predicting of the studied property.
• Lever (hii) [43]
The lever is a kind of distance from the barycentre of the points in the space of the explanatory variables. It identifies observations that are abnormally far from others. For observation i
(24)
where xi is the line vector of the descriptors of compound i and X is the matrix of the model derived from the values of the descriptors of the training set. The index T refers to the transposed matrix/vector. The critical value of lever h* is, in
general, set to
[44], where n is the number of compounds in the
training set and p is the number of model descriptors. If a compound has a residual and a lever that exceeds the critical value h* then this compound is considered outside the applicability domain of the developed model.
2.4. Calculation of Molecular Descriptor
The descriptor considered in this work is electronic affinity (EA). This descriptor has been calculated according to Koopmans [45] approach: the electronic affinity is the opposite of LUMO energy.
(25)
where LUMO is the Lowest Unoccupied Molecular Orbital. Table 2 reports the values of this descriptor for both the training set and the test set.
2.5. Submission of the Descriptor to the Selection Criterion 1
The calculated descriptor (electronic affinity) will be subject to selection criterion 1 because it is the lone considered descriptor (Table 3).
3. Resultats and Discussion
3.1. QSPR Model
The regression equation of the predictive QSPR (Quantitative Structure-Property Relationship) model of the first reduction potential dependent to electronic affinity (EA) is given below:
Table 2.Descriptor values expressed in eV, at B3LYP/6-31G(d,p) theory level.
Table 3.Submission of the descriptor to the selection criterion 1.
The positive sign of the coefficient of the EA in the regression equation of model shows that the first reduction potential increases with electronic affinity. There is therefore a direct correlation between the explanatory variable and the studied property. Examination of the above parameters shows that the correlation coefficient is very high (
). This high value indicates that there is a strong correlation between the first reduction potential and the selected descriptor. The determination coefficient
shows that 92.25% of the experimental variance of the first reduction potential is explained by the model's descriptor alone. In addition, the standard deviation (
) tends towards 0, indicating a good fit and high reliability of the prediction. The p-value is less than 0.0001 so
(5% risk). It is therefore clear that the regression equation of the model is highly significant for predicting the first reduction potential of the series of studied molecules. This global significance is confirmed by the very high Fischer value (F = 333.3279). Under these conditions, the only explanatory variable (electronic affinity) of the regression equation is very relevant to explain the studied property (first reduction potential). In addition, the experimental variance is TSS = 1.7407 when the theoretical variance due to the model is ESS = 1.6058. It is important to note that this relationship of dependence between the first reduction potential and electronic affinity has been corroborated by the work of Peter W. Kenny [46] who showed that the first reduction potential is a function of LUMO energy. He developed a predictive QSPR model dependent only on LUMO energy calculated at HF/6-31G(d) theory level, from a series of sixteen analogous TCNQ molecules with statistical parameters (
;
;
;
;
). However, the internal and external validations of this model have not been studied. It is also important to note that a QSPR (Quantitative Structure-Property Relationship) model can be obtained in a hazardous way. Therefore, one must always make sure of its stability. To do this, both internal and external validations methods are performed.
3.2. Internal Validation of the Model
For internal validation, the Leave-One-Out (LOO) procedure and the property of the randomization test have been used.
• Leave-One-Out procedure
Table 4 indicates that the value of
. The model is therefore excellent as seen
[47]. In addition, 91.36 % of the molecules in the training set have their redox potentials predicted by this model. With regard to the molecules of the training set, this model therefore has a high predictive power. This result shows that model is not very sensitive to this operation of setting apart a molecule and putting it back into the training set (Leave-One-Out procedure). This justifies the stability of this model. For
, its value is greater than 0.50 when that of
is less than 0.20. Consequently, for the prediction of the redox potential, the model is acceptable. Moreover, to ensure that the model is not due to chance correlations, the Y-randomization test of the property has been realized. A circular permutation of the property has been made (29 iterations).
• Y-randomization test
The average values of the Y-randomization parameters are shown in Table 5.
Table 5 shows that the average value of
tends to 0 (
), showing that the equation of the regression line only determines 6.00% of the point distribution (redox potential). In addition, there is scatter around the regression line confirmed by a high standard deviation (
). The very low value of the statistic
shows that the equation of the model obtained with the randomized property is not significant. As for Todeschini’s parameter
, its value is greater than 0.50 (
). This confirms that the established model is not due to chance correlations.
3.3. External Validation of the Model
The external validation only concerns the molecules of the test set. Table 6 reports the statistical parameters of the external validation of the model.
Table 4. Statistical parameters of the LOO internal validation of the model.
Table 5. Mean values of the randomization parameters.
Table 6. Statistical parameters of the external validation of the model.
From the analysis of the data in Table 6, it appears that the model has a very high predictive power because
. This shows that, 95.04% of molecules of the test set have their redox potentials predicted by the model. Also, 96.17 % of the experimental variance of the first reduction potential is explained by the descriptor model. For
, its value is greater than 0.50 while that of
is less than 0.2. Thus, this model is acceptable for the prediction of the redox potential of the test set molecules. In addition, the five (05) criteria of external validation (Tropsha’s criteria) have been verified.
Verification of Tropsha’s criteria
Criterion 1:
Criterion 2:
Criterion 3:
and
avec
Criterion 4:
and
avec
Criterion 5:
At this level, we see that all five (05) Tropsha criteria are verified. As a result, the developed model is very efficient in predicting the first reduction potential of the series of studies molecules.
3.4. Correlation between the Predicted Values by the Model and the Experimental Values
In Figure 1, all points tend to approach the regression line. This figure therefore shows a strong linear correlation between the predicted values of the first reduction potential by model and the experimental values. As for Figure 2, it shows that the predicted values by the model and the experimental values evolve in a similar way, particularly for the test set. Thus, these graphs confirm that the model is validated and is very efficient in predicting the redox potential. This reflects the adequacy of the theory level used to develop this model.
Figure 1.
scatter diagram of the model.
Figure 2. Similarity between model-predicted values and experimental values.
3.5. Model Normality Tests
• Shapiro-Wilk’s test [48]
The data in Table 7 shows that the calculated p-value is greater than
(5% threshold). Thus, the theoretical values of the first reduction potential obtained from the model follow a normal distribution law. This normal distribution is confirmed by the distribution of the point cloud according to the first bisector in Figure 3.
• Durbin-Watson’s test [49]
The values in Table 8 show that the calculated p-value is greater than
(5% threshold). It is therefore clear that the residues are not autocorrelated (zero correlation). Under these conditions, these residues do not contain information that can influence the model’s prediction of the first reduction potential. This interpretation is confirmed by the random distribution of the point cloud in Figure 4.
Table 7. Values of the parameters of Shapiro-Wilk’s test.
Table 8. Values of the parameters of Durbin-Watson’s test.
Figure 3. P-P plot (
) graph of the model.
Figure 4. Normalized residue =
graph of the model.
3.6. Applicability Domain (AD) of the Model
The Applicability Domain (AD) has been determined by analyzing Williams’s diagram of Figure 5.
Figure 5. Williams diagram of the model.
The examination of the Williams diagram shows that for training and test set, all observations have their standardized residuals between ±3 standard deviation units (±3σ) [50]. This justifies the absence of outliers. The choice “3 units of standard deviation” was made because our data follow a normal distribution law. Indeed, for leverage effect, a value of 3 is commonly used as a limit value for accepting predictions because the points between ±3 standard deviation units cover on average 99% of the data that follow a normal distribution law [51]. With regard to the levers of the training set, except for the observation TCNQ_28, all the others have their levers below the threshold value (h* = 0.2000). In the case of the test set, it is observation TCNQ_34, which has its lever above the critical value. However, the value of a lever above the critical value does not always indicate an outlier for the developed model. Compounds of training set with levers above the threshold value with low residues stabilize the model and increase its accuracy. They are called “good influential points”. On the other hand, compounds with hii greater than the critical value h* with large residues are called “bad influencing points” [51]. As a result, our elaborate QSPR (Quantitative Structure-Property Relationship) model does not show any evidence of aberrant observation of molecules in either set. The molecule TCNQ_28 is a “good influence point”. The results of the external validation showed that the model is suitable for predicting future redox potentials of TCNQ of this same family belonging to its applicability domain.
4. Conclusion
The objective of this study was to develop a predictive QSPR (Quantitative Structure-Property Relationship) model linking the first reduction potential from a series of tetracyanoquinodimethane molecules analogous to quantum descriptors from the conceptual density functional theory. A predictive QSPR model dependent to electronic affinity has been developed. The determination coefficient
of this model shows that 92.25% of the experimental variance of the first reduction potential is explained by the model’s descriptor alone. The Fisher coefficient of this model is very high (
) indicating that the regression equation is highly significant. The standard deviations (
) are well below 0.50 indicating a good fit and high reliability of the prediction. Regarding the parameters of the internal and external validations, they revealed that the model is validated and is assumed to predict efficiently the first reduction potential. The cross-validation coefficient
indicates that 91.36% of molecules of the training set have their predicted first reduction potential. Regarding the external validation coefficient,
, it shows that 95.04% of the test set molecules have their predicted first reduction potentials. Thus, to search for new tetracyanoquinodimethane (TCNQ) acceptors of this same family with the desired first reduction potentials, one can play on electronic affinity.