Visible and Near-Infrared Spectroscopic Discriminant Analysis Applied to Identification of Soy Sauce Adulteration ()
1. Introduction
High-quality brewed soy sauce is mainly made from soybeans, starch and wheat, which are fermented by microorganisms to form a flavorful liquid condiment. Soy sauce is one of the most widely used condiments in the world. Due to the huge consumer market, counterfeiting and adulteration by criminals often occur in order to obtain high profits. For example, use blended soy sauce that was made up of salt water, monosodium glutamate and caramel color, or mix blended soy sauce into high-quality brewed soy sauce to adulterate. This behavior may cause some components to be excessive or produce harmful components in soy sauce. The identification of soy sauce adulteration can avoid fraud, protect the rights and interests of producers and consumers, and contribute to food safety.
Normally, the identification methods for soy sauce adulteration mainly adopt the quantitative analysis of characteristic components [1] [2] [3]. It usually requires the quantitative analysis of a variety of minute quantity characteristic components, which involves a variety of high-end measurement methods that are complex and expensive. A quick and easy detection method that can be used in the field has important application value.
Near-infrared (NIR) spectroscopy mainly reflects the vibration absorption of the overtones and combination frequencies of the hydrogen-containing group X-H, thus, it has significant absorption for most agricultural products and foods. This method usually does not require reagents, and it can measure samples directly, with the advantage of being quick and easy. Combined with the visible light region, visible and near-infrared (Vis-NIR) spectroscopy has been applied in many fields, such as agriculture and food [4] [5] [6] [7], environment [8] [9], and biomedicine [10] [11] [12] [13].
The spectral qualitative discriminant analysis is based on the spectral similarity of the same type of samples and the spectral difference of different types of samples. For the identification of samples with small differences in component content, the qualitative discriminant analysis is more convenient than quantitative analysis. It is a hot research direction in recent years, and has been applied in many fields, such as melon genotype [14], edible oil types [15], transgenic sugarcane leaf [16] [17], milk powder adulteration identification [18] [19], wine identification [20] [21] [22], rice seed authenticity identification [23] as well as thalassemia screening [17] [24].
Vis-NIR spectroscopy had high application potential, but its application research in soy sauce analysis mainly focused on the quantitative analysis of quality indicators, e.g., total acid, total nitrogen and amino acid nitrogen [25] - [30]. Little research of soy sauce qualitative discriminant analysis was reported, such as detection of adulteration of soy sauce by brine [31], and classification of soy sauce [30].
The partial least squares-discriminant analysis (PLS-DA) is an effective spectral discriminant analysis method. Based on sample categorical variables (positive 1 and negative 0), PLS quantitative analyses have been performed. Samples are classified according to their predicted categorical variables. Based on matrix theory and applied examples, some previous studies have shown that the discrimination effect of PLS-DA method is superior to that of PCA-LDA in most cases [32] [33] [34]. In the present study, PLS-DA was used to establish NIR discriminant analysis models. Standard Normal Variate (SNV) is a common spectral preprocessing method. It associates spectral changes with the component concentrations, increasing the difference between spectra, thereby improving the robustness and prediction ability of the models [35] [36] [37]. In the present study, SNV combined with PLS-DA, denoted as SNV-PLS-DA, was used to establish NIR discriminant analysis models.
In the present paper, the Vis-NIR spectroscopy combined with the SNV-PLS-DA method was used to establish the discriminant analysis models for adulterated and brewed soy sauces. Among them, the famous Chubang brand soy sauces were used as identification samples; the blended soy sauces made up of salt water, monosodium glutamate and caramel color, were mixed into the Chubang brand samples in different proportions and were used as adulteration samples.
In addition, from the spectral experiments of water-system samples, it was found that the use of a short transmission measurement optical path can avoid saturated absorption in the long-wavelength region, and the use of a long transmission measurement optical path can highlight the difference in absorption in the short-wavelength region. In view of this, the cuvettes of 1 mm and 10 mm were used to measure the transmission spectra of soy sauce samples. For the case of 1 mm, five waveband models (visible, short-NIR, long-NIR, whole NIR and whole scanning regions) were established and compared respectively; in the case of 10 mm, three waveband models (visible, short-NIR and visible-short-NIR regions) for unsaturated absorption were also established and compared respectively. It provided valuable reference for designing small dedicated spectrometers with differrent measurement modals and different spectral regions.
2. Materials and Methods
2.1. Experimental Materials, Instruments, and Measurement Methods
The Chubang soy sauce was collected from regular sale channels as identification brand (70 bottles, 1 sample for each bottle). The mother solution sample (100 ml) of “blended soy sauce” was concocted of salt water (NaCl, wt 15%), monosodium glutamate (C5H10NNaO5, 10 g) and caramel color (C6H8O3, 4 g). Among them, the concentrations of salt water and monosodium glutamate were based on salt (18 g/100ml) and amino acid nitrogen (≥1.3 g/100ml) contents of Chubang soy sauce; the content of caramel color was based on the color of Chubang soy sauce. The adulteration samples (positive) were prepared by mixing the Chubang soy sauce and the blended soy sauce samples according to different adulteration rates (3% - 69% with tolerance 3%, 100%). Among them, the adulteration samples contained 24 adulterated rates, each rate had 3 samples, a total number of 72. Two types of samples, a total number of 142, were used for spectral measurement.
The XDS Rapid ContentTM Liquid Grating Spectrometer (FOSS, Denmark) and transmission accessories with 1 mm and 10 mm cuvettes were used for spectral measurement. Spectral scope ranged as 400 - 2498 nm with a 2 nm wavelength interval. Wavebands of 400 - 1100 nm and 1100 - 2498 nm were used for Si and PbS detection, respectively. Using the cuvettes of two optical path lengths of short and long (1 mm, 10 mm) respectively, each sample was measured five times. The experimental temperature and humidity were 25˚C ± 1˚C and 45% ± 1%, respectively. The obtained spectral data set (negative 350, positive 360, a total of 710) of each measurement modal (1 mm, 10 mm) was used for modeling and validation respectively.
2.2. Calibration-Prediction-Validation Design and Evaluation Indicators
A rigorous calibration-prediction-validation “three-stage” sample experiment design was adopted. Samples of each category were randomly divided into calibration, prediction, and validation sets respectively. The calibration and prediction sets (all including two types of samples) were used for modeling and parameter optimization; and the independent validation samples (also including two types of samples) that not involved in modeling were used to validate the selected models, thereby obtain objective evaluation.
Among them, the negative samples (70 samples, 350 spectra) were randomly divided into calibration (24 samples, 120 spectra), prediction (24 samples, 120 spectra) and validation (22 samples, 110 spectra) sets. Among the positive samples, three samples were formulated for each adulteration rate, and they were divided into calibration, prediction, and validation sets, respectively. The all positive samples were divided to the calibration (24 samples, 120 spectra), prediction (24 samples, 120 spectra), and validation (24 samples, 120 spectra) sets. The calibration-prediction-validation division for the spectra of two types of samples is shown in Table 1.
In summary, a total of 710 spectra (negative 350, positive 360) were used for calibration (negative 120, positive 120, a total of 240), prediction (negative 120, positive 120, a total of 240) and validation set (negative 110, positive 120, a total of 230).
2.3. PLS-DA
Based on the calibration and prediction sets of Vis-NIR spectra, the framework
Table 1. Calibration-prediction-validation division for the spectra of two types of samples.
of PLS-DA algorithm are as follows. (1) Each positive and negative sample was assigned the categorical variable (C) values 1 and 0, respectively. (2) The number of PLS latent variables (LV) was set as 1 to 10. Based on the spectra and categorical variables of calibration samples, the PLS coefficients for each LV were calculated. (3) Based on the spectrum of each sample (calibration, prediction) and the obtained PLS coefficients, the prediction value (
) of categorical variable for the sample was calculated for each LV. (4) When
, the sample was judged to positive, and when
, the sample was judged to negative. (5) The optimal LV was determined according to the optimal recognition-accuracy rate of the prediction set.
2.4. Model Evaluation Indicators
According to the actual category (positive and negative) of the samples, the nine recognition-accuracy rates (RARs) corresponding to the positive or negative or total and calibration or prediction or total were proposed and calculated as follows:
,
,
(1)
,
,
(2)
,
,
(3)
where
,
,
, and
were the numbers of actual positive and negative samples in the calibration and prediction sets, respectively;
,
,
, and
were the numbers of correctly recognized positive and negative samples in the calibration and prediction sets; From the negative, positive, calibration and prediction aspects, these indicators comprehensively evaluated the prediction effect of the discrimination model. The optimal LV was determined according to the maximum RARTotal.
Next, the validation samples were identified following the previous steps. And referring to the actual category, the positive and negative validation RARs (
and
) and the total validation RAR (RARV) were calculated as follows:
,
,
(4)
where
and
were the numbers of actual positive and negative samples in the validation set, respectively; and
,
were the numbers of correctly recognized positive and negative samples in the validation set, respectively.
The computer algorithms for the above mentioned method were designed using MATLAB v7.6 software.
3. Results and Discussion
3.1. PLS-DA Modeling
In the case of 1 mm transmission measurement modal, the Vis-NIR spectra of all soy sauce samples in the whole scanning region (400 - 2498 nm) are shown in Figure 1(a). In the case of 10 mm transmission measurement modal, the spectra of all samples were only displayed in the unsaturated spectral region (400 - 1388 nm) due to saturated absorption and noise in the long wavelength region (1400 - 2498 nm), as shown in Figure 1(b). Among them, the green and red lines represent the negative and positive samples, respectively. In both cases, the spectra of negative sample group located at the edge of the spectra of positive sample group and were very similar to the edge spectra shape of the positive. Significant spectral differences between the two types of samples were not observed. Discriminant analysis based on chemometrics method was required.
SNV is an effective method of spectral preprocessing. It can reduce the spectral error caused by scattering and improve the prediction accuracy and stability of the model. In the present paper, SNV was used to correct the spectra of soy sauce samples. The SNV spectra of the two measurement modals are shown in Figure 2. Further, the average SNV spectra of the negative and positive samples of the two models were also calculated, see Figure 3. In the case of 1 mm, a small difference was observed between the average spectra of two types of samples; while in the case of 10 mm, a significant difference was observed between the average spectra, but the fluctuation of positive spectral group was very significant, so that the difference between the two spectral groups was still insignificant.
Firstly, the direct PLS-DA models were established by using the raw spectra.
Figure 1. Vis-NIR spectra of two types of soy sauce samples: (a) 1 mm; (b) 10 mm.
Figure 2. SNV spectra of two types of soy sauce samples: (a) 1 mm; (b) 10 mm.
Figure 3. Average SNV spectra of two types of samples: (a) 1 mm; (b) 10 mm.
In the case of 1 mm, the direct PLS-DA models were established based on five spectral regions (visible, short-NIR, long-NIR, whole NIR and whole scanning regions), respectively. The results of the corresponding nine indicators are summarized in Table 2. Among them, the model of 400 - 780 nm obtained the optimal RARTotal (99.4%). In the case of 10 mm, the direct PLS-DA models were established based on three spectral regions (visible, short-NIR and visible-short-NIR regions), respectively. The results of the corresponding nine indicators are summarized in Table 3. Among them, the models of 400 - 780 nm and 400 - 1388 nm obtained the optimal RARTotal (100%).
Secondly, the SNV-PLS-DA model was established by using the SNV spectra. In the case of 1 mm, the SNV-PLS-DA models were established based on five spectral regions (visible, short-NIR, long-NIR, whole NIR and whole scanning regions), respectively. The results of the corresponding nine indicators are summarized in Table 4. Among them, the model of 400 - 780 nm obtained the optimal RARTotal (100%). In the case of 10 mm, the SNV-PLS-DA models were established based on three spectral regions (visible, short-NIR and visible-short-NIR regions), respectively. The results of the corresponding nine indicators are summarized in Table 5. Among them, the models of 400 - 780 nm and 400 - 1388 nm obtained the optimal RARTotal (100%).
Table 2. Modelling effects of the direct PLS-DA models for the case of 1 mm.
Table 3. Modelling effects of the direct PLS-DA models for the case of 10 mm.
Table 4. Modelling effects of the SNV-PLS-DA models for the case of 1 mm.
Table 5. Modelling effects of the SNV-PLS-DA models for the case of 10 mm.
In summary, the discriminant effect of SNV spectra was slightly better than that of the raw spectra.
3.2. Independent Validation
The 230 validation samples that not participated in the modeling process were used to validate the optimal SNV-PLS-DA models. The validation effects (
,
,
) for the cases of 1 mm and 10 mm are summarized in Table 6.
The results showed that the SNV-PLS-DA models achieved good validation discriminant effects for both the 1 mm and 10 mm cases. In the case of 1 mm,
Table 6. Validation effects of the SNV-PLS-DA models for the cases of 1 mm and 10 mm.
the visible model achieved the optimal validation effect (
); while in the case of 10 mm, both the visible and visible-short-NIR models achieved the optimal validation effect (
). These experimental results can provide valuable reference for designing small dedicated spectrometer with different measurement modals and different spectral regions.
4. Conclusions
In this paper, based on two measurement models (1 mm, 10 mm), the Vis-NIR spectroscopy combined with the SNV-PLS-DA method was used to establish the discriminant analysis models for adulterated and brewed soy sauces. In the case of 1 mm, five waveband models were established and compared respectively; in the case of 10 mm, three waveband models for unsaturated absorption were also established and compared respectively.
The validation results showed that the models of all wavebands in the cases of 1 mm and 10 mm have achieved good discrimination effects. In the case of 1 mm, the visible model achieved the optimal validation effect (
); while in the case of 10 mm, both the visible and visible-short-NIR models achieved the optimal validation effect (
). These experimental results can provide valuable reference for designing small dedicated spectrometers with different measurement modals and different spectral regions. The detection method does not require reagents and is fast and simple, which is easy to promote the application.
Acknowledgements
This work was supported by the National Natural Science Foundation of China (No.61078040), and the Science and Technology Project of Guangdong Province of China (No.2014A020213016, No.2014A020212445).