Validation and Comparison of Calibration Techniques for Measurements of Carbon Dioxide in Atmospheric Air Standards ()
1. Introduction
The growing requirements concerning the accuracy of measurements and their traceability and worldwide comparability constitute a considerable challenge for the Brazilian National Metrology Institute (NMI), Inmetro, with its responsibility for ensuring the scientific background for the consistency and accuracy of all measurements in Brazil and South America society.
Withal, implementing efficient provisions in order to protect biodiversity and to act against climate change requires an effective and efficient, demand- oriented quality infrastructure (QI). This way, QI with its infrastructure can fulfil obligations laid down in international conventions such as the United Nations Framework Convention on Climate Change (UNFCCC) and use the instruments provided within the scope of the World Trade Organization (WTO). Regional research laboratories over wide will benefit from reliable, accredited measurement and testing capacities which have to be provided to ensure sustainable management.
Carbon dioxide (CO2) the largest contributor to the greenhouse effect, and its concentration in the atmosphere has been monitored using high accuracy techniques, such as gas chromatography, FTIR and cavity ring-down spectroscopy, instead of the established technique used at this type of monitoring―nondispersive infrared gas analyzers (ND-IR). This is need as measurements of CO2 in the atmosphere require that uncertainty in the concentration of reference gas mixtures be lower than 0.1 μmol∙mol−1 [1] .
Aiming to integrate Inmetro’s mission to promote citizens’ quality of life and the challenges of global metrology with regard to air quality analysis, the Gas Analysis Laboratory (Lanag) has been working for the development and certification of mixtures of standard gases destined to atmospheric and air quality monitoring [2] [3] [4] [5] . In this way, the objective of this study is the development of methodologies either by cavity ring down system (CRDS) and by gas chromatography with a flame ionization detector coupled to a methaniser catalyst (GC-FIDmeth), and its posterior comparison of results of method validation. The idea of this work is to evaluate the results of the best technique using primary standard gas mixtures, which are not easily available or widely applied at laboratories that make atmospheric analysis on this type of component. This evaluation is based on the results obtained by the validation of this methods developed by the use of primary standard gas mixtures of carbon dioxide matrix at atmospheric levels in a synthetic clean dry air. Preliminary analysis usually using GC-FIDmeth for this type of gas mixtures standard is presented on a key-comparison analysis―CCQM-K52 [32] , among diverse national metrology institutes that shows an average relative expanded uncertainty of 0.25%, while Inmetro estimated an uncertainty of 1%. Nevertheless, the method after this comparison was optimized. Besides, a new technique, CRDS, was implemented and used to compare the preliminary results. Up to now, there are no studies published on CO2 at atmospheric air standard analysis by this specific spectroscopy technique. In this way, this study is proposed for the quantification of the concentration of carbon dioxide in synthetic clean dry air at a range of 370 to 835 μmol∙mol−1, analyzed by both techniques: GC-FIDmeth and CRDS, of which the methods were validated and compared, in order to provide reliability of this type of measurements regarding analysis of greenhouse gases.
2. Materials and Methods
2.1. Methanizer Gas Cromatography with FID Detector
Low level concentration of atmospheric carbon dioxide must be accurately quantified using methods with the highest sensitivity and the lowest possible detection limit. The flame ionization detector (FID) detects CH4 but CO and CO2 have to be reduced to CH4 to be detected. The reduction is achieved by using a methanizer composed of a catalyst, usually metallic nickel, at high temperature in presence of hydrogen, placed between the analytical column outlet and the detector [6] .
Online catalytic reduction of carbon monoxide to methane for detection by FID was previously described [7] , suggesting that both carbon dioxide and hydrocarbon could also be converted to methane with the same nickel catalyst. This was confirmed by the determination of the optimum operating parameters for each of the gases.
(1)
(2)
The catalyst consists of a 2% coating of Ni in the form of nickel nitrate on Chromsorb G. A 1/2” long bed is packed around the bend of an 8” × 1/8” SS U-tube. The tube is clamped in a block so that the ends protrude down into the column oven for easy connection between column or TCD outlet and FID base. Heat is provided by a pair of cartridge heaters and controlled by a temperature controller. Hydrogen for the reduction is provided by adding it via a tee at the inlet to the catalyst.
Methanizer efficiency is important to monitor when analyzing trace gases with GC-FID. It can be calculated by comparing the CO2 and CH4 peak ratio against concentration ratio and can be compromised by high concentrations of hydrocarbons blocking the active sites on the catalyst. The degradation of efficiency is said to be reversible using air as a carrier gas (as shown in Equation (3)) to burn off the carbon film from the catalyst by bypassing the column first [8] .
(3)
The GC system used was a Varian GC3800 equipment equipped with a column Carbobond (Varian) and a flame ionization detector (FID) coupled to a methanizer catalyst. The method parameters for analysis of carbon dioxide scheme of the GC at atmospheric level range are presented on Table 1, and the scheme of the GC applied at Figure 1.
2.2. Cavity Ringdown System
Nearly every small gas-phase molecule (e.g., CO2) has a unique near-infrared absorption spectrum. At sub-atmospheric pressure, this consists of a series of narrow, well-resolved, sharp lines, each at a characteristic wavelength. Because
Table 1. GC method parameters for CO2/SCDA analysis.
these lines are well-spaced and their wavelength is well-known, the concentration of any species can be determined by measuring the strength of this absorption, i.e. the height of a specific absorption peak. But, in conventional infrared spectrometers, trace gases provide far too little absorption to measure, typically limiting sensitivity to the parts per million at best. Cavity Ring-Down Spectroscopy (CRDS) avoids this sensitivity limitation by using an effective pathlength of many kilometers. It enables gases to be monitored in seconds or less at the parts per billion level [9] .
Thus, CRDS works by attuning light rays to the unique molecular fingerprint of the sample species. By measuring the time it takes the light to fade or “ring- down”, you receive an accurate molecular count in milliseconds. The time of light decay, in essence, provides an exact, non-invasive, and rapid means to detect contaminants in the air.
Regarding the principle of the CRDS to generate the result expected, a computer-controlled system tunes the laser off the absorption peak for the sample species to determine the τ empty value, equivalent to a zero baseline correction. It tunes back to the absorption peak to determine the τ value, dependent on the sample species concentration. Based on Beer’s Law, this value constitutes an absolute measurement and is unaffected by losses outside the ring-down cavity. The graph presented at Figure 2 depicts the concept of ring-down decay within the cavity after the laser source is shuttered. As the laser light bounces back and forth between the ultra-high reflective mirrors, the sample species absorbs the light energy until it’s all gone. The CRDS schematic is presented at Figure 3.
2.3. Method Validation
Science should be used to improve quality of life. In this regard, scientists must guarantee the quality of their results. These results must provide relevant information, not just for the scientific community, but also for citizens. Demonstration of the ability of an analytical method to provide reliable results is of great importance to ensure quality, safety and efficacy in the analysis. Consequently, before an analytical method is implemented for routine use, it must be previously validated to demonstrate that it is suitable for its intended purpose. The
final goal of the validation of an analytical method is to ensure that every future measurement in routine analysis will give a true assessment of the actual value of the content of an analyte in the sample [10] [11] [12] [13] .
All accredited Brazilian test laboratories shall comply with the requirements of ABNT NBR ISO/IEC 17025 [14] [15] , in order to demonstrate their technical competence, and which one of these requirements requested is the method validation (item 5.4.5). It is essential that laboratories have objective means and criteria to validation, that the test methods they perform lead to reliable results towards the desired quality. When employed standardized methods, it’s necessary to demonstrate that it has the appropriate manner, within the specific installation and facilities from the laboratory before deploying them, not needing to perform all the parameters from validation. In the case of this work, the parameters designated to characterize the validation of each method developed was: selectivity, linearity, working and linear range, acceptability assessment, such as detection and quantification limits, trend and regression analysis, and comparison of the precision between the methods, according to what established by the document DOC-CGCRE-08 [16] . Calculations of precision are also based on ISO standards 5725:1994 part 2 [17] [18] , and uncertainty estimation derived from regression of the calibration curve adopted was based on ISO 6143:2001 [19] using the software XGenline from NPL [20] [21] [22] . The limit of detection and quantification were calculated according to the International Conference on Harmonization of Technical Requirements for Registration of Pharmaceuticals for Human Use (ICH) [23] .
3. Results and Discussion of the Methods Validation
With the methods developed for the two techniques evaluated, calibration curves were drawn for each selected range in a given technique, using the primary standards produced in-house to compose the curve and as samples, as well as, certified reference materials to verify the adjust of the proposed calibration curve [24] [25] [26] [27] . In this way, the establishment of the calibration curve that represents the set of points (xi, yi), which the known concentrations used are xi, are plotted against the responses of the instrument, yi, obtained at independent and repetitive conditions. A linear relationship between the concentrations and the measurement results is adjusted, obtained by the mathematical model of correlation: least squares numerical method [19] .
The validation of the CRDS analysis was performed using independent results under conditions of repeatability (5 replicates) and reproducibility (3 different days). It was selected 03 (three) primary standard gaseous mixtures produced by Lanag, which range of concentration adopted was from 370 to 420 μmol∙mol−1, due to the specifications from the equipment used. Those standards were analyzed to be fitted in a linear model of calibration curve, and two standards were selected as samples: one developed by Lanag at a mole fraction of 380 μmol∙mol−1, and the other, a CRM from the renowned Dutch Metrology Institute, VSL, at a mole fraction of 300 μmol∙mol−1, both at the lower limit of the range, in order to determine the limits of quantification of the technique evaluated.
Regarding the validation analysis of the GC using a methanizer catalyst before the FID detector was performed using independent results under conditions of repeatability (4 replicates) and reproducibility (2 different days). As it’s known that chromatographic areas results are non linear, specially if a wide range is fitted, it was selected 05 (five) primary standard gaseous mixtures from Lanag, which range of concentration varied from 450 to 835 μmol∙mol−1, as the specifications from the equipment allows to go further at the higher limit of the range. Those standards were analyzed to be fitted in a quadratic model of analytical calibration curve, and two standards were selected as samples: one produced by Lanag at a mole fraction of 555 μmol∙mol−1, and the other, a CRM, also from VSL, at a mole fraction of 600 μmol∙mol−1, both at the middle of the range in order to prove the good adjustment of the function model specified. Table 2 presents the primary standard mixtures used at the calibration curve fitted for each method and their respective relative standard deviation or coefficient of variation (CV) obtained by each standard mixture.
The linearity was assessed by repeated injections of five primary standards produced at different concentrations englobing the selected range. The linearity is evaluated by the following statistical approach: the coefficient of correlation and the goodness of fit, derived by the part of validation of the response model at ISO 6143. To effectively test the compatibility of a prospective analysis function, calculate the measure of goodness-of-fit (GOF), defined as the maximum value of the weighted differences,
and
, between the coordinates of measured and adjusted calibration points (
). A function is admissible if GOF < 2, as well, as r2 > 0.99.
Table 2. Primary standard mixtures from calibration curve from each method validation.
Results from the primary standards used as samples at the curve of calibration fitted with the responses from the primary standards obtained from each technique can be seen at Table 3. It is presented the relative deviation of the calibrated mole fraction (xc) obtained by the regression from the gravimetric mole fraction (xg); the relative expanded uncertainty (U) calculated by the calibration curve fitted; and the number of response repetitions (N) of each day analyzed at each technique evaluated.
The regression analysis of all data obtained from both techniques was made using the trend line regression tool from Microsoft Excel. And the GOF was obtained by applying the regression based on ISO6143 [19] with the calibration curve fitted through the software Xgenline, developed by NPL, the English Metrology Institute [20] . The linearity results obtained for the coefficient of correlation, r2, and the goodness-of fit (GOF) after applying the regression of the data analyzed are presented on Table 4.
In order to verify if the regression from the calibration curve adopted is significant, it was performed an analysis of variance (ANOVA) that derived the results for the Ftest that is determined by the relation of the quadratic average of the regression and the quadratic average of the residuals. This F value is compared with the value of Fcritic tabulated at the selected confidence level (95%). If F > Fcritic, it is accepted, at the selected confidence level, that a ¹ 0, which means that the slope of the regression line is not zero, that is, the regression is significant. If F £ Fcritic, there is no indication of a linear relationship between the variables x (concentration values) and y (measurement responses) [16] .
Another evaluation made from the linear regression of data analysis, presented at Figure 4, was the graph of residuals, which assesses if there is any trend on the results plotted. According to Figure 5, there is no trend observed neither for GC-FIDmeth and CRDS.
Precision evaluates the dispersion of results between independent and repeated assays. Repeatability is the degree of agreement between the results of successive measurements under the same measurement conditions. Reproducibility is the degree of agreement between the results of successive measurements
Table 3. Results from the sample evaluated at the calibration curve.
Table 4. Results of methods validation evaluated for CRDS and GC-FIDmeth.
(a)(b)
Figure 4. Validation parameter: Linearity (a) CRDS; (b) GC-FIDmeth.
(a)(b)
Figure 5. Validation parameter: Residual trend (a) CRDS; (b) GC-FIDmeth.
under varying measurement conditions. Reproducibility was determined internally, only by Inmetro’s laboratory, by injecting patterns and samples by different analysts on different days (intermediate precision). The repeatability and the intermediate precision were evaluated by assessing the following statistical approaches: relative standard deviation (RSD %) or coefficient of variation (CV), relative deviation standard for repeatability (sr), relative deviation standard for reproducibility (sR). The acceptance criteria for the parameter quoted before are RSD < 1%, sr < 2%, and sR < 5%.
Another parameter evaluated was the repeatability limit (r), which is given by 2.8 times the relative standard deviation of the repeatability (sr), for a significant level of 95%. The reproducibility limit (R) is given by 2.8 times the square root of the relative standard deviation of the reproducibility (sR).
Accuracy is the agreement between the result of a test and the reference value accepted as true, and for its evaluation the value of the relative difference between the gravimetric and the analytical concentration derived from the calibration curve (Δ) from the results through the Xgenline software, as it can be seen on Table 4. Other statistical parameters from accuracy evaluation calculated, such as recovery, relative error (ER), normalized error (EN), and its respective criteria of acceptance and calculation are described on Equations (4) to (6):
・ 90% < Recovery < 110%
・
(4)
・
(5)
・
is satisfactory, (6)
where, xc is the analytical and xg the gravimetric concentration or mole fraction, Uc and Ug the expanded uncertatinty of analysis and gravimetry respectively, and s can be the standard deviation from the analysis responses or even the combined uncertainty from the samples evaluated.
Limit of detection (LOD) is the lowest concentration of analyte that can only be detected. Limit of quantification (LOQ) is the lowest concentration of analyte that can be quantified as an exact value with precision and accuracy. There are two main methods used to determine the LOD and LOQ: the signal-noise relation method and the method based on the standard deviation of response and slope of the calibration curve [24] . The form of calculation selected, presented on Equations (7) and (8), and at Table 4, is presented as:
・
; and (7)
・
(8)
As observed from both techniques most of the days analyzed, r2 > 0.99 and the GOF < 2, which means all calibration curves were adjustable and satisfactory. The third day of CRDS analysis had a bad adjustment of the curve fitted. The best function adjustment for GC was a quadratic model, as presented better GOF results when it was adjusted by ISO 6143 regression than the r2 results obtained by the linear regression fit.
According to Table 4, all data for both methods developed presents significant linear regression, as F values are much higher than Fcritic. It also can be observed that results from the third day of CRDS are more homogenous and less dispersive.
Regarding the repeatability results from the GC analysis days are higher than the established criteria, i.e., relative standard deviation higher that 1%, which means that both process and/or method should be optimized and more replicates should be taken. Nevertheless, results from repeatability and reproducibility among days are lower than the established criteria, although results of GC are much higher than CRDS data.
According to all days of data for both methods analyzed recovery results were between 99.5% and 100.5%, which means the expected calibrated results and the real ones from gravimetric concentration agree on accuracy, as well as, it was satisfactory all other parameters tested.
The mole fraction obtained for LOD and LOQ evaluation results are considered acceptable, as are much lower than the range of concentration selected for each technique.
In summary, most of parameters evaluated presented an accepted acceptance criteria and the method validation for both techniques were considered satisfactory.
Finally, the results obtained on different days were also compared through ANOVA single factor test, which performs the analysis of simple data variance of two or more samples. The acceptance criteria used was the following: if F < Fcritic and P-value > 0.05, the average through different days of the sample are equivalent. This analysis tests the hypothesis that each sample comes from the same baseline probability distribution. Anova single factor results are presented on Table 5.
According to ANOVA single factor criteria, results of average for GC-FID- meth are slightly not equivalent, as F < Fcritic and P-value a little bit higher than 0.05. On the other hand, CRDS Anova results are considered satisfactory for both criteria evaluated.
All data obtained from GC results can also be compared and validated to Lanag’s previous results on the key-comparisons participated among other national metrology institutes for the parameters evaluated [28] [29] [30] [31] [32] .
4. Conclusion
Method validation using primary standard gas mixture was applied to carbon dioxide at atmospheric air levels. Preliminary results of key-comparisons of this gas component at such nominal concentration presented different estimations for uncertainty calculation, but usually GC-FIDmeth was used. There are no previous studies published so far for the analysis of this standard by CRDS, which is being used by known national metrology institutes nowadays. So, this paper presented the comparison of the results from a validation of method for GC-FIDmeth and CRDS applied to a range of 370 to 835 μmol∙mol−1 of carbon dioxide in synthetic clean dry air primary standard mixture. When compared linearity and calibration curve adjustment, it can be seen that CRDS and GC are equivalent, with GOF for both lower than the criteria value of 2. The relative
Table 5. Anova single factor results.
standard deviation results for CRDS is four times lower than GC. The repeatability and reproducibility results for CRDS are better than the GC as well. Another parameter evaluated was the limit of quantification (LOQ), which for CRDS was obtained 50 μmol∙mol−1 and for GC 120 μmol∙mol−1. According to ANOVA single factor criteria, results of average for GC-FIDmeth are slightly not equivalent, as F < Fcritic and P-value a little bit higher than 0.05. On the other hand, CRDS Anova results are considered satisfactory for both criteria evaluated. Finally, when comparing relative expanded uncertainty from a sample applied at the calibration curve developed for each method, GC-FIDmeth presented an estimation of 2.7%, while CRDS presented around 0.4%. It can be seen that the GC-FIDmeth method developed shall be optimized in order to get better uncertainty estimations, especially if compared this GC method evaluation to preliminary GC results from key-comparisons, which average relative uncertainty obtained was 10 times lower than the one presented at this study of 2.7%. In this way, the results obtained by comparing both validations of the methods developed for each technique conclude that CRDS, based on the robustness of the spectroscopy technique, has better results than GC-FIDmeth, especially when uncertainty is the criteria of decision. This conclusion of this study of comparison is confirmed considering the high application of CRDS when atmospheric monitoring is applied, allowing the achievement of low detection limits allied to high level of uncertainty at a good speed of response.