Prediction Model of Compressive Strength of Fly Ash-Slag Concrete Based on Multiple Adaptive Regression Splines

Abstract

Accurate prediction of compressive strength of concrete is one of the key issues in the concrete industry. In this paper, a prediction method of fly ash-slag concrete compressive strength based on multiple adaptive regression splines (MARS) is proposed, and the model analysis process is determined by analyzing the principle of this algorithm. Based on the Concrete Compressive Strength dataset of UCI, the MARS model for compressive strength prediction was constructed with cement content, blast furnace slag powder content, fly ash content, water content, reducing agent content, coarse aggregate content, fine aggregate content and age as independent variables. The prediction results of artificial neural network (BP), random forest (RF), support vector machine (SVM), extreme learning machine (ELM), and multiple nonlinear regression (MnLR) were compared and analyzed, and the prediction accuracy and model stability of MARS and RF models had obvious advantages, and the comprehensive performance of MARS model was slightly better than that of RF model. Finally, the explicit expression of the MARS model for compressive strength is given, which provides an effective method to achieve the prediction of compressive strength of fly ash-slag concrete.

Share and Cite:

Dong, J. , Xie, H. , Dai, Y. and Deng, Y. (2022) Prediction Model of Compressive Strength of Fly Ash-Slag Concrete Based on Multiple Adaptive Regression Splines. Open Journal of Applied Sciences, 12, 284-300. doi: 10.4236/ojapps.2022.123021.

1. Introduction

Many researchers have attempted to improve the durability of concrete by reducing the amount of carbon dioxide produced during the production of Portland cement, thereby improving the sustainability of concrete. One common strategy is to use recycled aggregates and mineral admixtures such as fly ash and finely ground blast furnace slag as partial replacements for cement or aggregates in concrete [1] [2] [3]. Fly ash-slag concrete has a promising future due to its low porosity, erosion resistance, excellent workability and compaction properties.

For concrete, compressive strength is one of its basic performance indicators, and scholars at home and abroad have developed a number of empirical equations and mathematical models for prediction to minimize the loss of labor, material and time [4] [5] [6]. These equations are usually in the form of regressions based on a series of experimental results. However, selecting the appropriate regression equation for each analysis requires a great deal of experience and a variety of techniques, and the accuracy of the analysis decreases as the number of explanatory variables increases. Moreover, the compressive strength of concrete is affected by many factors, and each factor shows a strong nonlinear relationship, which makes it difficult to establish an accurate analytical model to explore the law of influence of each factor on it [7]. In contrast, artificial neural networks and machine learning can dig into the deep laws of the data, and through training, reliable prediction models can be obtained, so some scholars have applied artificial neural networks and machine learning to the prediction of concrete compressive strength. Ma [8] established a BP neural network model to predict the compressive strength of CFRP-constrained concrete; Hu et al. [9] proposed a random forest (RF) based concrete compressive strength prediction method; Cao et al. [10] proposed an artificial intelligence algorithm based on the support vector machine (SVM) algorithm and a weighted SVR (MWSVR) based on the Marxian distance for concrete strength prediction; Yaseen et al. [11] used the extreme learning machine (ELM) model for the prediction of compressive strength of lightweight foam concrete; Xu et al. [12] used the stepwise regression analysis method and multiple nonlinear regression method (MnLR) of SPSS software to establish the strength prediction model of lithium-slag concrete.

Highly blended fly ash and slag concrete, due to its increased components, the mechanism of hydration reaction is not completely clear, and the influencing factors are complex and interacting, exhibiting specific high-dimensional nonlinear laws [13]. Most of the latest neural networks and machine learning models for predictive regression of high-dimensional nonlinear data are collectively referred to as “black box” models, although they have higher accuracy and better stability. The decision process is not easy to understand. Multiple adaptive regression splines (MARS) were proposed by Friedman, a well-known scholar at Stanford University, in the 1990s. MARS approximates the real curve/surface relationship by fitting segmented line segments, which can handle the variable interactions and deformations hidden in the complex data structure of high-dimensional variables. And MARS model can give an explicit expression of high-dimensional data with good model interpretation. In this paper, a multivariate adaptive regression spline (MARS) nonparametric regression model is constructed as an alternative method to predict the compressive strength of fly ash-slag concrete. Eight ratio factors that have a large influence on the compressive strength of fly ash-slag concrete were selected, and the compressive strength was used as the concrete strength evaluation index to establish the compressive strength prediction system. Finally, the prediction results are compared and analyzed with those of artificial neural network (BP), random forest (RF), support vector machine (SVM), extreme learning machine (ELM) and multiple nonlinear regression (MnLR) to verify the feasibility of the proposed MARS model in fly ash-slag concrete strength prediction and to provide new ideas for such concrete strength prediction.

2. Method and Principle

2.1. Basic Principle of MARS Algorithm

MARS is a nonlinear and nonparametric regression method that models the nonlinear response between the inputs and outputs of a system through a series of segmented line segments (splines) of different gradients. No specific assumptions about the underlying functional relationship between the input variables and the output are required. The endpoints of the line segments are called nodes. A node marks the end of one data region and the beginning of another. The resulting segmented curves (called basis functions) provide greater flexibility to the model, allowing for bending, thresholding, and other deviations from linear functions.

MARS generates basis functions by distribution search. An adaptive regression algorithm is used to select the location of the nodes. The MARS model is constructed in two phases. The forward transmission phase adds basis functions and searches for potential nodes to improve the model performance, thus forming an overfitting model. The backward propagation stage prunes the basis functions with the smallest contribution to finally form the optimal model.

In the one-dimensional case, MARS builds a model of the form [14]

y = f ( X 1 , , X p ) + e = f ( X ) + e (1)

where y is the target output and X = ( X 1 , , X p ) is the matrix of p input variables.

MARS approximates the function f by applying basis functions (BFs). BFs are formally spline functions, including segmented linear functions and segmented cubic functions. For the sake of simplicity, only segmented linear functions are represented in this paper, which can be expressed as

max ( 0 , x t ) = { x t , x t 0 , x < t (2)

When x < t , zero values are taken over the range. Clearly, the basis function can be used to partition the data into disjoint regions and then process each region independently.

Constructing the MARS model f ( X ) as a linear combination of BFs and their interactions, it can be expressed as

f ( X ) = β 0 + m = 1 M β m λ m ( X ) (3)

where λ m ( X ) is a basis function in the form of either a spline function or the product of two or more spline functions already included in the model. The coefficient β 0 is a constant. The coefficient β m is the coefficient of the mth basis function, the value of which is estimated using the least squares method.

MARS modeling is a data-driven process. To fit the model in Equation (2), the training data is first passed forward. A model is constructed using only the intercept, and the basis function with the largest reduction in training error is added. A model with M basis functions is added to the model with the next basis function of the form

(4)

The least squares method is used to estimate each β. As basis functions are added to the model space, the interactions between different basis functions are also considered. The addition of basis functions is stopped until the specified maximum number of terms, , is reached. At this point the MARS model is an overfitting model.

The backward propagation stage is used to streamline the model by removing the basis functions that contribute the least to the model. A subset of models is compared using the computationally less expensive generalized cross-validation (GCV) method. the GCV equation is a goodness-of-fit test that penalizes a large number of basis functions and helps reduce overfitting. For training data with N observations, the GCV calculation for the model is expressed as

(5)

where N is the total number of observations in the training dataset; C(M) is the penalty function of model complexity, and its calculation formula is given through Equation (6).

(6)

where M is the number of non-constant basis functions; b is the penalty factor, usually. The maximum interaction degree k is a constraint on the interaction training of basis functions, limiting the maximum number of variables allowed to appear in any basis function [15].

After determining the best MARS model, all BFs involving one variable are usually grouped together and BFs involving two-two interactions (or even higher-level interactions) are grouped together in a process called analysis of variance (ANOVA) decomposition. It is used to assess the significance of the input variables and BFs by comparing the statistical significance of the test variables.

The details of the MARS algorithm are shown in Figure 1.

Figure 1. MARS modeling process.

2.2. Model Database Description and Analysis

The data for model validation in this paper were obtained from the UCI Concrete Compressive Strength dataset, which contains 1030 data sets, each consisting of 9 parameters, including 8 input parameters and 1 output parameter. The first seven input parameters are the content of each ingredient per cubic meter of concrete, including cement, blast furnace slag powder, fly ash, water, reducing agent, coarse and fine aggregates, the eighth input parameter is the number of days of placement, and the output parameter is the compressive strength of concrete in MPa.

The first eight input parameters were noted as independent variables X1, X2, X3, X4, X5, X6, X7, and X8, and the output independent variable compressive strength was Y1. To provide a detailed description of the database, the minimum, mean, median, maximum, standard deviation, skewness, and kurtosis of all variables were statistically analyzed, as shown in Table 1. In addition, the histograms of the input independent and output dependent variables are shown in Figure 2.

Table 1. Statistical parameters of input and output variables.

- represents no practical significance; Std—represents skewness; Kur—represents kurtosis.

(a) (b) (c) (d) (e) (f) (g) (h) (i)

Figure 2. Histogram of data for each variable.

The Pearson correlation coefficient allows the analysis of the correlation between the independent variables and between the independent variables and the dependent variable, and the results of the computational plotting using Rstudio software are shown in Figure 3. As can be seen from Figure 3, the correlation between the input variables is not high and there is no problem of multicollinearity.

Among 1030 sets of fly ash-slag concrete data, 70% of the data were randomly selected as training samples and 30% of the data were used as test samples. the MARS model and other control group models were set up with this criterion for training and test samples.

2.3. Performance Indicators of the Model

In this paper, five statistical values were selected to assess the prediction accuracy of the MARS model and the control model, namely, the coefficient of determination (R2), the mean square error (RMSE), the mean error (MAE), the mean absolute deviation percentage error (RMAE), and the coefficient of effectiveness (E), which are calculated as shown in Table 2.

The coefficient of determination R2 is widely used in regression problems to estimate the correlation between the target and predicted values; RMSE and MAE are two criteria used to measure the average size of the error between the target and predicted outputs; RMAE indicates the average percentage size of the total absolute deviation error between the target and predicted outputs; and the efficiency coefficient E indicates the predictive accuracy of the model. Numerically, R2 values close to 1, RMSE, MAE, and RMAE values close to 0, and E values of 90% or more indicate a higher accuracy of the model [16].

3. Model Validation

3.1. Parameter Selection

In this paper, the upper limit of the model term Nmax is set to 20, the pre-step threshold d is set to 0.001, and the penalty factor b is set to 3 [17] [18].

Table 2. Statistical index calculation formula.

is the actual target value, is the average of, is the predicted value, is the average of; N is the number of model data.

Figure 3. Heat map of correlation among variables.

From Figure 4, the maximum number of reverse transfer terms (N-prune) is 18 and the R2 of MARS model is maximum and the RMSE and MAE values are minimum when the degree of basis function interaction (k) is 2.

3.2. Training MARS Model and Result Analysis

In this paper, 721 sets of data were randomly selected from the model database to construct the training samples for the MARS model, and the remaining 309 sets of data were used as test samples to build the MARS model for fly ash-slag concrete prediction using MATLAB software based on a total of eight characteristics: cement content, blast furnace slag powder content, fly ash content, water content, reducing agent content, coarse aggregate content, fine aggregate content, and age , the open source code of MARS algorithm written by Jekabsons (2010) was used to construct the model in this paper. According to the previous paper, the four parameters of the model were set to N-max = 20, k = 2, N-prune = 18, b = 3. Because of the large number of training and test sample data, 25 sets of data were selected for visualization and plotting in order to demonstrate the effect of fitting, as shown in Figure 5 and Figure 6, respectively, for both groups.

(a)(b)(c)

Figure 4. Plot of statistical indicators of MARS models with different interaction degree basis functions with the maximum number of reverse transfer terms (N-prune).

As can be seen from Figure 5 and Figure 6, the predicted and actual values are close to each other, i.e., the MARS model predicts accurate results with small errors and high fitting accuracy.

3.3. Model Performance Comparison

The prediction performance of the MARS model was compared with various models, including nonlinear models such as BP, RF, SVM, ELM, and MnLR, and the fitting results are shown below.

Figure 5. MARS model training sample fitting effect.

Figure 6. MARS model test sample fitting effect.

As seen in Figure 7, most of the scatter data for the MARS prediction model are concentrated in and around the 100% regression line, while the BP, RF, SVM, ELM, and MnLR scatter data are highly discrete. Fitting the data in Figure 7 according to the form, the value of the MARS model is closer to 1 and the value is closer to 0 compared to the five models in the control group. verifies that the MARS model has higher prediction accuracy and less error.

3.4. Statistical Indicators of the Model

Based on the evaluation index formula described in the previous section, the calculation results are shown in Table 3.

From Table 3, it can be concluded that MARS and RF prediction models have higher R2 and E(%) values and smaller RMSE, MAE, and RMAE values than the other models, which means that MARS and RF have superior prediction performance on this database.

From Figure 8, the relative errors of the MARS model and the RF model are mostly concentrated within ±20%, accounting for 81.0% and 80.7% of the total data volume, respectively. In terms of the overall relative error distribution, the prediction performance of MARS is slightly better than that of the RF model.

Table 3. Results of R2, RMSE, MAE, RMAE and E(%) for different models.

(a) (b) (c) (d) (e) (f)

Figure 7. Scatter plot of predicted and actual values of each model.

Figure 8. The relative error variation of each model.

3.5. Parameter Sensitivity Analysis

Table 4 shows the ANOVA decomposition data of the established MARS model. The first column lists the ANOVA decomposition function numbers. The second column lists the GCV scores of the model after removing the BFs, indicating the importance of the ANOVA decomposition function corresponding to the removed BFs. The third column provides the standard deviation of the function, indicating the importance to the overall model. The fourth column gives the number of BFs included in the ANOVA decomposition function. The last column gives the specific input variables corresponding to this function.

The importance of this variable was assessed by removing the value of the increase in GCV due to the considered variable from the established MARS deletion and the results are shown in Figure 9. it is clear that the compressive strength of fly ash-slag concrete is more sensitive to X8 (age), the coarse aggregate content (X5) has the least effect and the water reducing agent content (X6) does not have any effect on the compressive strength.

3.6. Display Expressions for the MARS Model

Table 5 lists the BFs for which the MARS model has been developed and their corresponding equations. From Table 5, it can be obtained that interactions occur between the BFs (4 out of 16 BFs are interaction terms). The presence of interactions indicates that the developed MARS model is not only additive, but the

Table 4. ANOVA decomposition of the MARS model.

Table 5. Equation of BFs for MARS.

Figure 9. Relative importance of input variables.

interactions play an important role in building an accurate model for compressive strength prediction. This again shows that MARS is able to capture the high-dimensional nonlinear relationships between compressive strength and multiple influencing factors without making any specific assumptions about the potential functional relationships between the input variables and the associated responses. The equations of the MARS prediction model on the compressive strength of fly ash-slag concrete are given in Equation (7).

(7)

4. Conclusions

1) In this paper, a fly ash-slag concrete compressive strength prediction model based on the multivariate adaptive regression spline (MARS) model is developed to solve the complex high-dimensional nonlinear relationship between fly ash-slag concrete mix ratio and compressive strength for high accuracy prediction. Based on the Concrete Compressive Strength dataset from UCI, the dataset was first statistically analyzed, and then correlation analysis was performed using Pearson coefficients to determine that the input variables were representative and did not have multicollinearity problems.

2) Eight factors such as cement content, blast furnace slag powder content, fly ash content, water content, reducing agent content, coarse aggregate content, fine aggregate content and age were used as input variables, and high precision results were found after predicting the compressive strength of fly ash-slag concrete based on MARS model. In order to further verify the reliability of MARS model, the prediction results of MARS model artificial neural network (BP), Random Forest (RF), Support Vector Machine (SVM), Extreme Learning Machine (ELM) and Multiple Non-Linear Regression (MnLR) prediction results were compared and analyzed, and MARS model and RF model outperformed other models in R2, RMSE, MAE, RMAE and E values, and the comprehensive performance of MARS was slightly better than RF model.

3) ANOVA decomposition of the established MARS model yields that the compressive strength of fly ash-slag concrete is more sensitive to X8 (age), the coarse aggregate content (X5) has the least effect, and the water reducing agent content (X6) has no effect on the compressive strength.

4) The equation expression of MARS prediction model about the compressive strength of fly ash-slag concrete is derived, which further illustrates the reliability and accuracy of the model and can be well applied in engineering practice.

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

References

[1] Monteiro, P.J., Miller, S.A. and Horvath, A. (2017) Towards Sustainable Concrete. Nature Materials, 16, 698-699.
https://doi.org/10.1038/nmat4930
[2] Tang, Y.C., Li, L.J., Feng, W.X., et al. (2018) Study of Seismic Behavior of Recycled Aggregate Concrete-Filled Steel Tubular Columns. Journal of Constructional Steel Research, 148, 1-15.
https://doi.org/10.1016/j.jcsr.2018.04.031
[3] Tang, Y.C., Li, L.J., Wang, C.L., et al. (2019) Real-Time Detection of Surface Deformation and Strain in Recycled Aggregate Concrete-Filled Steel Tubular Columns via Four-Ocular Vision. Robotics and Computer Integrated Manufacturing, 59, 36-46.
https://doi.org/10.1016/j.rcim.2019.03.001
[4] Neville, A.M. (1963) Properties of Concrete. Pearson Education, Bengaluru.
[5] Oner, A. and Akyuz, S. (2007) An Experimental Study on Optimum Usage of GGBS for the Compressive Strength of Concrete. Cement and Concrete Composites, 29, 505-514.
https://doi.org/10.1016/j.cemconcomp.2007.01.001
[6] Papadakis, V.G. and Tsimas, S. (2002) Supplementary Cementing Materials in Concrete: Part I: Efficiency and Design. Cement and Concrete Research, 32, 1525-1532.
https://doi.org/10.1016/S0008-8846(02)00827-X
[7] Jin, J.W., Dong, C.F. and Feng, G.H. (2015) Prediction of Concrete Compressive Strength Based on Grey Relational-Support Vector Machine. Journal of Zhengzhou University, (Nature Science Edition.), 47, 59-63.
[8] Ma, G. and Liu, K. (2021) Prediction of Compressive Strength of CFRP-Confined Concrete Columns Based on BP Neural Network. Journal of Hunan University (Natural Sciences), 48, 88-97.
[9] Hu, Y., Zhang, L.S., Yuan, M.Y., Li, T.J., Wu, X.G. and Deng, T.T. (2020) Prediction of Concrete Strength Based on Random Forest. Construction Technology, 49, 89-94.
[10] Cao, F., Zhou, Y., Wang, C.X., Ren, M.Y. and Zhou, F. (2021) An Improved Support Vector Regression Method for Concrete Strength Prediction. Bulletin of the Chinese Ceramic Society, 40, 90-97.
[11] Yaseen, Z.M., Deo, R.C., Hilal, A., et al. (2018) Predicting Compressive Strength of Lightweight Foamed Concrete Using Extreme Learning Machine Model. Advances in Engineering Software, 115, 112-125.
https://doi.org/10.1016/j.advengsoft.2017.09.004
[12] Xu, K.C., Bi, L.P. and Chen, M.C. (2017) Prediction Model of Compressive Strength of Lithium Slag Concrete Based on SPSS Regression Analysis. Journal of Architecture and Civil Engineering, 34, 15-24.
[13] Liao, X.H., Huang, X., Shi, J.L. and Sun, Y.L. (2010) Forecast Model about Compressive Strength of Recycle Aggregate Concrete Base on BP Neutral Network. Journal of Nanjing Forestry University (Natural Science Edition), 34, 105-108.
[14] Zhang, W.G. (2020) MARS Applications in Geotechnical Engineering Systems. Science Press, Beijing.
https://doi.org/10.1007/978-981-13-7422-7
[15] Ji, Y., Wang, S.Q., Cheng, L., Xu, X.Z. and Zheng, J.Y. (2018) Control Strategy of Grid-Connected Photovoltaic Power System Based on Modular Multilevel Converter. High Voltage Apparatus, 54, 146-153.
[16] Ly, H.-B., et al. (2021) Development of Deep Neural Network Model to Predict the Compressive Strength of Rubber Concrete. Construction and Building Materials, 301, Article ID: 124081.
https://doi.org/10.1016/j.conbuildmat.2021.124081
[17] Li, Y., Su, Y. and Shu, L. (2014) An ARMAX Model for Forecasting the Power Output of a Grid Connected Photovoltaic System. Renewable Energy, 66, 78-89.
https://doi.org/10.1016/j.renene.2013.11.067
[18] Boehmke, B. and Greenwell, B. (2019) Multivariate Adaptive Regression Splines. In: Boehmke, B. and Greenwell, B., Eds., Hands-On Learning with R, Chapman and Hall/CRC, New York.
https://doi.org/10.1201/9780367816377

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.