Improvement of Countrywide Vegetation Mapping over Japan and Comparison to Existing Maps ()
1. Introduction
Satellite data are a major source of information for classification and mapping of vegetation types. However, classification and mapping of vegetation types using satellite data at broad scale is a challenging field. Supervised classification of remotely sensed data is a commonly used technique for vegetation classification and mapping on a timely basis. A number of supervised classifiers such as decision trees, random forests, support vector machines, and neural networks have been employed for this purpose [1] - [8] .
Moderate Resolution Imaging Spectroradiometer (MODIS) based Land Cover Type Product (MCD12Q1), and Global Land Cover by National Mapping Organizations (GLCNMO) are recently available global land cover maps from which vegetation types information can be obtained. The MCD12Q1 product utilizes an ensemble-based supervised classification algorithm (decision trees) complemented by training data from 1860 sites distributed across the Earth’s land areas [9] . However, the MCD12Q1 product, which is basically the land cover product, has not proven effective for discriminating the vegetation types in Japan [10] . The Global Land Cover by National Mapping Organizations (GLCNMO) has also provided the global scale land cover map from which information on vegetation types can be obtained. The vegetation types in the GLCNMO product have been mapped based on the decision tree method with the support of ground truth data [11] . However, to the best of our knowledge, accuracy of the GLCNMO map for classification of vegetation types is not assessed so far in Japan.
The objectives of the research were: 1) to compare MODIS Nadir BRDF-Adjusted Reflectance (MCD43A4) product and conventional surface reflectance (MOD09A1/MOY09A1) products for vegetation classification, 2) to produce improved vegetation map at a national scale using better MODIS product, and 3) to compare newly produced map to extant moderate resolution land cover maps. This research deals with classification and mapping of seven vegetation types: evergreen coniferous forest, evergreen broadleaf forest, deciduous coniferous forest, deciduous broadleaf forest, shrubs, herbaceous, arable; and non-vegetation.
2. Methodology
2.1. Processing of Satellite Data
We processed MCD43A4 product over Japan of 2013. Six bands (Red, Near Infrared, Blue, Green, Mid Infrared, and Shortwave Infrared) were used and three spectral indices―Normalized Difference Vegetation Index [12] , Superfine Water Index [13] , and Urban Built-Up Index [14] ―were calculated. These 8-day data were composited using monthly medians (January to December) and eleven percentiles (0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100) methods. In this way, rich features (in total 207) were prepared. We also processed MOD09A1/MYD09A1 products of 2013 over Japan. The MOD09A1/MYD09A1 products provide an estimate of the surface spectral reflectance as it would be measured at ground level in the absence of atmospheric scattering and absorption. Altogether, 207 features were prepared in the manner similar to the MCD43A4 product.
2.2. Machine Learning, Mapping and Comparison
Ground truth data, relied on previous study [10] , were further increased and refined by visual interpretation of the Google Earth imagery. A total of 500 ground truth points, located all over Japan, were established for each class.
We compared the performance of MCD43A4 features to MOD09A1/MYD09A1 features for the classification of vegetation types. Random Forests classifier-based cross-validation approach was employed for the quantitative comparison of MCD43A4 and MOD09A1/MYD09A1 features with the support of ground truth data. The ground truth data were used for the purpose of supervised classification. Random forests classifier uses bootstrap aggregating (bagging) to form an ensemble of trees by searching random subspaces from the given data (features) and the best splitting of the nodes by minimizing the correlation between the trees [15] . The performance of each dataset was compared by 10-fold cross-validation approach. Machine learning was carried out only on nine folds whereas the remaining fold was used for validation [16] . Optimum number of features was retrieved from inbuilt feature importance module available in the Random Forests algorithm [17] . Hyper-parameters of the Random Forests classifier were tuned by repeated hit and trial method with reference to the validation metrics.
Random Forests classifier was employed for producing the seamless countrywide vegetation map. The newly produced map was compared to the extant MODIS Land Cover Type product (MCD12Q1) and Global Land Cover by National Mapping Organizations (GLCNMO) of 2013. For the comparison, the extant maps (MCD12Q1 and GLCNMO) were remapped according to the legends established in the research. Then, the accuracy metrics (overall accuracy and kappa coefficient) were calculated for all maps using the ground truth data prepared in the research.
3. Results and Discussion
3.1. MOD09A1/MYD09A1 versus MCD43A4
The confusion matrices obtained with the optimum number of features using Random Forests-based 10-fold cross-validation approach are plotted in Figure 1.
The MCD43A4 features indicated better cross-validation results (Overall accuracy = 0.73; Kappa coefficient = 0.69) than the MOD09A1/MYD09A1 features (Overall accuracy = 0.70; Kappa coefficient = 0.66). As seen in Figure 1, the MCD43A4 features have also discriminated all land cover types slightly better than the MOD09A1/MYD09A1 features. The cross-validation results obtained from different datasets (MCD43A4 versus MOD09A1/MYD09A1) are also summarized in Table 1.
Figure 1. Confusion matrices computed: (a) in the case of MOD09A1/MYD09A1 features, (b) in the case of MCD43A4 features.
Table 1. Comparison of MCD43A4 and MOD09A1/MYD09A1.
We carried out the McNemar’s test to confirm whether the classification results are significantly different or not. McNemar’s test is a statistical test used on paired nominal data, to determine whether the row and column marginal frequencies are equal [18] . It has been effectively used for comparing accuracy assessments of image classifications [19] [20] [21] . The McNemar’s test (Table 1) showed a significant difference (p-value < 0.05) in the classifications obtained from different datasets. The cross-validation results and McNemar’s test indicated superiority of the MCD43A4 features for the classification of vegetation types over the MOD09A1/MYD09A1 features.
3.2. Vegetation Map and Comparison Results
The Random Forests model established with MCD43A4 features was used for the production of a countrywide vegetation map. The newly produced map is displayed in Figure 2.
Comparison results of the newly produced vegetation map with the extant MODIS Land Cover Type product (MCD12Q1) and Global Land Cover by National Mapping Organizations (GLCNMO) are shown in Table 2. This comparison is based on the ground truth data prepared in the research. The newly produced map using the MCD43A4 features by employing Random Forests classifier showed better accuracy than the extant maps in Japan.
4. Conclusion
The cross-validation results indicated better performance of the MCD43A4 product than the MOD09A1/MYD09A1 products for the classification of vegetation
Figure 2. Countrywide vegetation map produced through the research: (a) Display over the national territory; (b) Zoomed in over the black polygon region in (a). The national boundary is based on Global Administrative Areas database (GADM) version 3.6, May 2018.
Table 2. Performance of newly produced vegetation map over the extant maps.
types. The MOD09A1/MOY09A1 products do not account for the dependence of observed reflectance on the view and solar angles, but instead select the best observation data from the eight-day cycle based on high-observation coverage, low-view angle, and the absence of clouds or aerosols. On the contrary, the MCD43A4 product provides Nadir Bidirectional Reflectance Distribution Function (BRDF)-adjusted reflectance by modeling the anisotropic characteristics of the scattering using multi-date (16 days) quality observations. Therefore, in contrast to the previous study [10] , MCD43A4 features were used to produce new vegetation map for all of Japan in this research. The mapping was carried out by employing Random Forests-based supervised classification approach with the support of ground truth data. The newly produced map demonstrated better accuracy than extant maps (MCD12Q1 and GLCNMO). However, moderate resolution satellite data, such as MODIS have limited ability to map fragmented vegetation. In this situation, mapping with high spatial resolution satellite data is suggested.
Acknowledgements
This research was supported by Japan Society for the Promotion of Science (JSPS) grant-in-aid for scientific research (No. P17F17109). MODIS data used in the research were available from the NASA EOSDIS Land Processes Distributed Active Archive Center (LP DAAC), USGS/Earth Resources Observation and Science (EROS) Center, Sioux Falls, South Dakota.