Subject Areas: Ecosystem Science, Plant Science
 

 
1. Introduction
 
The docks and sorrels, genus Rumex L., are a genus of about 200 species of annual, biennial and perennial herbs in the buckwheat family Polygonaceae widely distributed around the World. Members of this family are very common perennial herbs growing mainly in the northern hemisphere, but various species have been introduced almost everywhere. Some are nuisance weeds and are sometimes called dock weed, but some are grown for their edible leaves. The genus Rumex is the ancient Latin name for the docks or sorrels, while the species vesicarius, vesica, is a bladder; from the inflated pods following the flowers on these herbs. Rumex vesicarius L. is an annual, pale green, glabrous herb branched from the root [1] . It is known by various names in different languages viz., chukkakura (Telugu), Bladder dock (English), chooka (Hindi), Chuka (Marathi), Tamil (Shakkankirai), Punjab (Kattamitha), Chukra or amlavetasa (Sanskrit), and Sukki soppu (Kannada) [2] . It is cultivated throughout the world as a pot herb. It is sparsely cultivated as a minor green leafy vegetable crop in kitchen, market and truck gardens in many states of south India. Leaves are fleshy, sour, alternate, elliptic-ovate, broadly ovate, entire, acute or obtuse, and cordate at base with long petiole. It has vegetable and medicinal uses. It has always contributed to the nations nutritional and health security. Sorrel is a well known commodity of Indian cuisine. It is eaten fresh [3] or cooked [4] . It was considered a dietary complementary plant, since it is a rich source of β carotenes [5] . It is a good source of minerals like Na, K, Ca, Fe, and Mn in different organs at different stages of development from flowering to fruiting stages during spring, autumn and winter seasons [6] - [8] . The genus Rumex includes many edible plant species that have medicinal importance for the treatment of some most dangerous diseases [9] [10] . Sorrel is a valuable potent medicinal herb possessing antimicrobial, antiinflammatory, antidiarrhoeal and antioxidant properties.
 
This plant is an environmental weed, with the potential to have a significant impact on the natural flora and fauna in areas where it grows. This is a wild edible plant [3] . It can grow in a moist moderately fertile well drained soil in sunny position [11] . It is found in wild state in West Punjab, Trans-Indus Hills, Afghanistan, Persia and North Africa. Sorrel has been an important traditional leafy vegetable crop of India. It is grown in garden lands at any time of the year in the Bombay presidency. India is one of the most important countries of sorrel production. Although sorrel occurs frequently in wild state, it has not yet grown all potentially suitable habitats. The sorrel production scenario in India till 1980’s was quite dismal due primarily to low production and productivity. Efforts have been made to augment the domestic production by introducing sorrel cultivation in non- traditional areas from 1990 onwards in India. In order to effectively plan and manage this crop, it is necessary to investigate its potential geographical distribution nationwide. A comprehensive inventory of its distribution does not currently exist. Understanding the potential spread of any underutilized and underexploited species such as sorrel is essential for land managers to promote its commercial cultivation. Identification of potential new growing areas is the most important aspect of sorrel cultivation under climate smart agriculture. Identification of potential new growing areas for sorrel plays a vital role in strengthening the leafy vegetable industry and helps the country to reap the benefits with enhanced level of leafy vegetable basket diversification.
 
Species distribution models (SDMs) are the predictive models that estimate the relationship between species records at sites and the environmental and/or spatial characteristics of those sites [12] and are widely used for many purposes in biogeography, conservation biology and ecology [13] [14] . SDMs provide useful information for exploring and predicting species distributions [15] . Habitat suitability models provide a tool for researchers and managers to understand the potential extent of a species spread [16] [17] . Modeling techniques that require only presence data are extremely valuable [13] . Habitat suitability models can fill data gaps in survey records and can highlight priority locations for future surveying and monitoring [18] . Maximum entropy (MaxEnt) modeling is one of a suite of habitat suitability modeling techniques requiring only presence locations [19] . MaxEnt’s predictive performance is consistently competitive with the highest performing methods [20] . MaxEnt uses a background sample in computing the maximum entropy distribution [19] . MaxEnt is a machine learning method that compares presence locations to environmental variables at those locations and then across the study area using principles of maximum entropy to generate predictions of suitable habitat in un-sampled regions [17] . It is user-friendly, produces robust metrics to evaluate model fit and has proven effective in predicting habitat- specific species at small spatial extents [21] . Presence only method like MaxEnt is appropriate for modeling species with unstable distributions such as environmental weed species because true absence data can be difficult to obtain. If a species is absent at a location, it could either be because it has not yet invaded or because the location is unsuitable, and these two options are often indistinguishable for invasive species. MaxEnt has many advantages as compared to other ecological niche models in predicting the potential distribution of species. Earlier researchers have described MaxEnt as estimating a distribution across geographic space [19] [22] .
 
Our goal was to provide agriculture planners of India, a preliminary state-wise climate suitability map depicting the potential new growing areas using MaxEnt niche modeling approach, for sorrel in India. The method presented in this manuscript and made accessible in MaxEnt provides a forward step.
 
2. Materials and Methods
 
We are interested in predicting potential habitat distribution of sorrel from a set of occurrence localities (occurrence data), together with a set of environmental variables (climate data) using MaxEnt model in the lines of vegetable Roselle [23] and Indian spinach [24] .
 
2.1. Occurrence Data
 
Each occurrence locality is simply a latitude-longitude pair denoting a site where the species has been observed; such geo-referenced occurrence records often derive from specimens in natural history museums and herbaria [25] [26] . The occurrence locations of sorrel were mainly based on two extensive exploration surveys by the National Bureau of Plant Genetic Resources Regional Station, Rajendranagar in collaboration with Vegetable Research Station, Dr. Y. S. R. Horticultural University, Rajendranagar during 2010-2011. Following random sampling strategy, crop presence data of sorrel was collected from 21 points covering four districts of Andhra Pradesh and two of Karnataka, India (Table 1). The geographical coordinates (longitude and latitude) of occurrence locations were recorded using a Global Positioning System (Garmin GPS-12) Receiver. Using the above source, a total of 21 distributional localities (n = 21 records) of sorrel were compiled into a database to generate a preliminary, state-wise Indian national-level map of potential distribution for sorrel, thus making use of the best available data.
 
 
  ![]()
 
 Table 1. Sorrel (Rumex vesicarius L.) presence locations used for MaxEnt analysis.
 
  
2.2. Climate Data
 
We obtained 19 bioclimatic data layers from the WorldClim dataset [27] at 1 km spatial resolution to represent current climatic conditions. The WorldClim dataset was generated using an interpolation technique using altitude and monthly temperature and precipitation records from 1950 to 2000. The 19 bioclimatic variables that define general trends, seasonality and extremes are considered biologically more meaningful than simple monthly or annual averages of temperature and precipitation in defining a species’ ecophysiological tolerances [28] [29] .
 
2.3. MaxEnt Analysis
 
MaxEnt provides species distribution information based only on known presences (recorded occurrences). MaxEnt performs extremely well in predicting occurrences in relation to other common approaches, and it is also designed to integrate with GIS software such as Arc products, thus making data input and predicted (mapped) output easier to handle. MaxEnt works by finding the largest spread (maximum entropy) in a geographic dataset of species presences in relation to a set of “background” environmental variables.
 
We used MaxEnt software (version 3.2.19) [19] , a set of environmental variables and a dataset of occurrence data for training and testing for MaxEnt analysis. The algorithm runs either 1000 iterations of these processes or until convergence. This model produced prediction values ranging from 0 to 100, representing cumulative probabilities of occurrence. Predictions were mapped in DIVA-GIS (version 5.2) [30] .
 
2.4. Statistical Analysis of MaxEnt Model
 
MaxEnt runs were performed using 30% of the points chosen randomly as the test data and the remaining 70% as the training data. Default settings were used in MaxEnt so that the complexity of the model varied depending upon the number of data points used for model fitting. Two measures of model skill were used: the area under the ROC (receiver operating characteristic) curve (AUC) and the defined thresholds.
 
3. Results and Discussion
 
Knowledge of species occurrence is a prerequisite for efficient and effective conservation and management. Unfortunately, knowledge of species occurrence is usually insufficient, so models that use environmental predictors and species occurrence records are used to predict species occurrence. Predicting the occurrence of sorrel is often difficult because sampling data insufficiently describe species occurrence and important environmental conditions and predictive models insufficiently describe relations between species and environmental conditions. The availability of detailed environmental data, together with inexpensive and powerful computers, has fueled a rapid increase in predictive modeling of geographic distributions. For some species, detailed occurrence (presence) data are available, allowing the use of a variety of standard statistical techniques. Many methods are used to predict species occurrence. In this paper, we attempted the use of the MaxEnt method for modeling Rumex vesicarius L. species geographic distributions with presence-only data.
 
3.1. Analysis of Worldwide Climate Suitability Map Generated Using MaxEnt Model
 
Figure 1 is a map of the worldwide geographical distribution of sorrel generated using MaxEnt model. The information available about the target distribution of sorrel landraces often presents itself as a set of real-valued variables, called “features”, and the constraints are that the expected value of each feature should match its empirical average. The program starts with a uniform probability distribution and works in cycles adjusting the probabilities to maximum entropy. It iteratively alters one weight at a time to maximize the likelihood of reaching the optimum probability distribution. The probability distribution of sorrel landraces is the sum of each weighted variable divided by a scaling constant to ensure that the probability value ranges from 0 to 1. Warmer colors show areas with better predicted conditions. White dots show the presence locations used for training, while violet dots show test locations. The red color indicates areas with a high probability of occurrence for sorrel, the blue and green represent moderately high probability of occurrence, the yellow color represents low probability of occurrence and the white indicates areas not suitable for sorrel. In fact, this worldwide climate suitability map can be used in the countries that lack precise coordinates of sorrel occurrences and generate a
 
 
 ![]()
 
 Figure 1. The worldwide geographical distribution map of sorrel generated using MaxEnt model.
 
  
preliminary climate suitability map of sorrel because it may be too late to wait for the precise coordinates of sorrel occurrences to generate a perfect climate suitability map.
 
3.2. Analysis of State-Wise Indian National Level Climate Suitability Map Generated Using MaxEnt Model and DIVA-GIS
 
The preliminary climate suitability map using MaxEnt model for sorrel cultivation in India was generated using MaxEnt and DIVA-GIS (Figure 2). We classified climatic zones in terms of their suitability for sorrel cultivation, based on the existence probability determined using the MaxEnt model. The geographical ranges of the excellent area (0.7087 - 1.0), optimum area (0.5315 - 0.7087), suitable area (0.3543 - 0.5315), less suitable area (0.1772 - 0.3543) and unsuitable area (0.0 - 0.1772) are shown in the climate suitability map of sorrel with different colours (Figure 2). The image uses colours to indicate predicted probability that conditions are suitable, with red indicating high probability (0.71 to 1.0) of suitable conditions for the sorrel, green indicating conditions typical of those where the species is found and lighter shades of green indicating low predicted probability of suitable conditions. The highest probability (0.7087 - 1.0) of distribution of these sorrel landraces is represented by red colour. The excellent area in this study is slightly southward, and it includes most parts of Andhra Pradesh, Karnataka and Orissa. These states had the potential regions for introducing and cultivating the sorrel landraces and for planning in-situ on-farm conservation sites in the light of climate change scenario. In addition, most of the northern, western, north-eastern regions had unsuitable areas. The potential impact of climate changes on agricultural crop production varies spatially and depends on crop specific biophysical constraints [31] . Because of sorrel’s extensive adaptation would prove vital in meeting the food, nutritional and economic security of the people, use of MaxEnt model is highly warranted in preserving the important sorrel landraces.
 
3.3. Evaluation of Quality of MaxEnt Model
 
The first step in evaluating the models produced by the two algorithms was to verify that both performed significantly better than random. For this purpose, we first used a threshold-dependent binomial test based on omission and predicted area. However, it does not allow for comparisons between algorithms, as the significance of the test is highly dependent on predicted area. Then we used threshold-independent receiver operating characteristic analysis, which characterizes the performance of a model at all possible thresholds by a single number, the area under the curve, which may be then compared between algorithms.
 
3.3.1. Receiver Operating Characteristic (ROC) Curve
 
The “30” we entered for “random test percentage” command the program to randomly set aside 30% of the sample records for testing. This allows the program to do some simple statistical analysis. Much of the analysis
 
 
 ![]()
 
 Figure 2. Climate suitability map of sorrel cultivation in India using MaxEnt software and DIVA-GIS.
 
  
used the use of a threshold to make a binary prediction, with suitable conditions predicted above the threshold and unsuitable below. The following picture (Figure 3) shows the omission rate and predicted area as a function of the cumulative threshold. The omission rate is calculated both on the training presence records (70% of presence records) and on the test records (30% of presence records). The omission rate should be close to the predicted omission, because of the definition of the cumulative threshold. Figure 3 shows how testing and training omission and predicted area vary with the choice of cumulative threshold. Here, we see that the omission on test samples (sky blue line) is a very good match to the predicted omission rate (black line), the omission rate for test data drawn from the MaxEnt distribution itself. The predicted omission rate is a straight line (black line), by definition of the cumulative output format. In some situations, the test omission line (sky blue line) lies well below the predicted omission line (black line), while in some other situations the test omission line (sky blue line) lies well above the predicted omission line (black line): a common reason is that the test and training data are not independent, for example if they derive from the same spatially auto-correlated presence data. MaxEnt model was significantly better than random in binomial test of omission and predicted area curve. Because we have
 
 
 ![]()
 
 Figure 3. Omission and predicted area for MaxEnt model on the first random partition of occurrence records of sorrel.
 
  
only occurrence data and no absence data, “fractional predicted area” (the fraction of the total study area predicted present) is used instead of the more standard commission rate (fraction of absences predicted present).
 
MaxEnt also calculates an area under the receiver operating characteristic curve to evaluate the performance or the simulation accuracy of the model [32] . A ROC plot was built by plotting the sensitivity values and the false positive fraction for all available probability thresholds [33] . The area below the ROC curve, i.e. the value of the area under the curve, a threshold-independent measure of model performance, indicates the predictive accuracy of the model and determines how well a model discriminates between presence locations and other locations in the area of interest. AUC values can range between 0.5 and 1.0, with 0.5 indicating no discrimination ability; values below 0.7 are low, values between 0.7 and 0.9 are useful in some cases, and values > 0.9 indicate high discrimination [34] . The value of AUC indicates the following degrees of predictive accuracy [34] : 0.50 - 0.60 (fail), 0.60 - 0.70 (poor), 0.70 - 0.80 (fair), 0.80 - 0.90 (good), and 0.90 - 1.0 (excellent). A model with AUC values approaching 1.0 is usually considered a good model, while AUC values close to 0.5 are considered no better than random.
 
The picture (Figure 4) is the receiver operating characteristic curve for the training and test data. We calculated an AUC for the training dataset and an AUC for the test data we withheld for this model. In this study, the AUC value for the training data was 0.993 and the AUC value for the test data was 0.985, indicating a high level of accuracy for the MaxEnt predictions (Figure 4). Note that the specificity is defined using predicted area, rather than true commission. This implies that the maximum achievable AUC is less than 1. If test data is drawn from the MaxEnt distribution itself, then the maximum possible test AUC would be 0.987 rather than 1; in practice the test AUC may exceed this bound. From the AUC values of this study, it is evident that MaxEnt model had high discrimination between presence locations and other locations in the area of interest. Further, from the AUC values of this study, it is also evident that this MaxEnt model had excellent degree of predictive accuracy. MaxEnt model was significantly better than random in receiver operating characteristic analyses. Earlier study also indicated very high model accuracy (test AUCs 0.945) for a MaxEnt model constructed to predict occurrence of several invasive plant species in riparian areas along Nebraska’s North Platte River using local environmental layers assembled at a 30 m cell resolution [35] . In general, the AUC inaccuracies were most apparent in imbalanced samples and smaller samples. Caution is required in the use of AUC measures unless the sample size is very large and also point out that while it would be nice to have a simple rule of thumb to determine if a sample is sufficiently large [36] .
 
 
 ![]()
 
 Figure 4. ROC curve of sensitivity versus specificity for MaxEnt model on the first random partition of occurrence records of sorrel.
 
  
3.3.2. Threshold
 
Some common thresholds and corresponding omission rates are as follows (Table 2). Since test data are available, binomial probabilities were calculated exactly if the number of test samples is at most 30, otherwise using a normal approximation to the binomial. These are 1-sided P-values for the null hypothesis that test points are predicted no better than by a random prediction with the same fractional predicted area. The “Balance” threshold minimizes 6* training omission rate + 0.04* cumulative threshold + 1.6* fractional predicted area.
 
3.3.3. Strengths and Weaknesses of MaxEnt Model
 
The efficacy of predictive models is based on the quantity and quality of the occurrence data. Recently, several comparative analyses have investigated the efficacy of different methods for modeling species’ distributions. MaxEnt modeling has frequently outperformed a number of other approaches that rely on presence-only data [19] [37] , it is relatively insensitive to spatial errors associated with location data [13] , and it can produce useful models with as few as five locations [38] [39] . MaxEnt is robust to small sample sizes but it is affected by the way background data points are selected [40] [41] . Specifically, it is possible to run model with small numbers of sample localities in MaxEnt [20] [42] . In the present study, the dataset used consist of relatively small numbers of sample localities (n = 21). The number of occurrence localities may be too low to estimate the parameters of the model reliably [26] . Therefore, we ought to add more distribution records so that we can obtain more reliable prediction results. On the other hand, it is important to make use of the knowledge of a species’ natural history, and patterns of habitat use to examine the prediction results by MaxEnt. A current land cover classification derived from remotely sensed data can be used to exclude highly altered habitats by humans [43] . More research is needed into how models perform with biased datasets like those generally available for crop species across large spatial extents. Most of our data were compiled from disparate efforts, each with unique sampling goals and strategies. We cannot differentiate between poorly sampled areas, areas that could be invaded but have not been yet, and true absence areas. Sampling incompleteness and uncertainty aggravate the issues related to assessing sampling bias. The preliminary climate suitability map developed in this study can be refined to district scales by integrating more detailed species occurrence data collected using scientifically designed field surveys [44] and higher resolution predictor variables. The MaxEnt modeling approach can be used in its present form for identifying potential new growing areas for sorrel in India with presence-only datasets, and merits further research and development.
 
 
  ![]()
 
 Table 2. Common thresholds and corresponding omission rates for sorrel.
 
  
4. Conclusion
 
We examined the performance of MaxEnt, the presence-only SDM, which typically had been used to model plant and animal distributions in the natural environment, as a tool for modeling relative land suitability for sorrel. Our goal of using MaxEnt modeling approach to map the state-wise potential possible distribution areas of sorrel in India was successfully achieved. MaxEnt model was significantly better than random in both bino- mial tests of omission and receiver operating characteristic analyses. The AUC was almost higher, indicating better discrimination of suitable versus unsuitable areas for the species. The results indicate that sorrel will potentially be able to colonize five states viz., Andhra Pradesh, Karnataka, Maharashtra, Orissa and West Bengal in India. This preliminary state-wise climate suitability map will be useful for designing state-wise planning for sorrel based farming systems in India. Further, our approach can be used in other countries that lack precise coordinates of sorrel occurrences and generate a preliminary climate suitability map of sorrel because it may be too late to wait for the precise coordinates of sorrel occurrences to generate a perfect climate suitability map.
     
NOTES
 

 
*Corresponding author.