Predicting the Drought-Induced Yield Loss of Cotton in the Southern Plains Region of the United States Using a Drought Index ()
1. Introduction
The semi-arid Southern Plains region of the United States (US) encompassing northwestern Texas and southwestern Oklahoma represents a substantial portion of the agricultural land mass in the US and is responsible for a large share of cotton (Gossypium hirsutum L.) production. Over several decades, Texas has consistently accounted for approximately one-half of the total cotton production area in the US [1]. The High Plains region within Texas is particularly well-suited for cotton production and contributes the largest portion of the state’s total yield. In Oklahoma, cotton cultivation is concentrated in the southwestern part due to improved production efficiency and the deep-rooted traditions of producers [2].
The Southern Plains region is vulnerable to persistent drought. When low precipitation is combined with high temperatures, drought can rapidly develop. Research has shown that this region will experience a drier climate in the future [3]. The elevated temperatures in this region due to climate change will cause more intense droughts [4]-[6] and a longer dry spell between two rain events [7].
The primary source of water for nearly all irrigated agriculture in the Southern Plains region is the Ogallala Aquifer, which lies beneath the region. However, the ongoing depletion of this aquifer due to unsustainable water use has decreased the potential of irrigating cotton crops in this region in the future [8]. This region, particularly the heavily irrigated areas, which have already been plagued by water shortage, will experience further stress on the water supply due to continued urbanization [9]. These circumstances have forced cotton producers in this region to efficiently manage irrigated as well as rainfed farming systems [8].
Drought in this region may occur during any time of the year. However, droughts, especially those that occur during the critical phenological phases of a crop, can be very costly [9]. Many local economies and communities in the region have been facing huge challenges caused by drought for several decades. In Texas alone, drought has cost up to $50 billion over the last 40 years [9].
Although an imminent drought cannot be prevented, its impacts can be minimized through mitigation measures if it is predicted in advance [10]. Predicting yield loss due to an impending drought is a crucial need of farmers. The ability to predict yield loss can help stakeholders with decisions regarding applying appropriate mitigation measures. The impact of drought depends on the sensitivity of a crop to water stress, which is different across phenological stages [11]-[13]. The yield loss caused by drought may be predicted using yield models that are based on various methods, including drought indices [14]. Relative to other methods, drought indices are simpler, yet they provide a comprehensible big picture on drought conditions [15] [16].
A drought can be meteorological, hydrological, or agricultural. An agricultural drought is a temporary condition where the amount of plant available water in the soil cannot meet the consumptive demand of crops [16]. A number of drought indices are available that can monitor or predict an agricultural drought [15] [17], However, only a plant physiology-based drought index, such as the Agricultural Reference Index for Drought (ARID) [18], can predict the yield loss due to drought more accurately because yield formation is a plant physiological process. The ARID is biophysically sound, computationally simple, and generally applicable; it is able to characterize an agricultural drought better than many similar indices [19]. The ARID is calculated with a daily time-step as the ratio of crop water deficit to crop water need as:
(1)
where the subscript d stands for the d-th day, Ť is transpiration, and ET is the reference grass evapotranspiration [20] [21]. The ARID value ranges from 0 (no water stress) to 1 (full water stress).
To estimate the water-stressed yield relative to the non-water-stressed yield, hereafter referred to as the relative yield (Ý) and, ultimately, the fraction of drought-induced yield loss (1 − Ý) for a flowering crop having several distinct phenological phases with differing drought sensitivities, [22] developed Equation (2) using the phenological phase-specific values of ARID.
(2)
where Ý is the relative yield of a crop, the symbol Π indicates a product, p is a phenological phase, P is the total number of phases considered during the crop season, and
is the relative sensitivity of the crop to water stress during the p-th phase.
The objective of this study was to develop an ARID-based relative yield model for predicting the drought-induced yield loss for cotton in the Southern Plains region of the US by estimating the phase-specific drought sensitivity coefficient (
) values for various phenological phases (P) of cotton to be used in the yield model (Equation 2).
2. Materials and Methods
2.1. Obtaining the Yield Data
To calculate the relative yields of cotton, its absolute lint yields (kg ha−1) under both irrigated and dryland farming conditions, with the same input except the water, would be needed. In the Southern Plains region, such yields were available only for Lubbock and Lamesa in northwestern Texas and Chickasha and Fort Cobb in southwestern Oklahoma. These data (Table 1) were obtained from the Cotton Performance Trials conducted at Texas A&M AgriLife Research at Lubbock (https://lubbock.tamu.edu/programs/crops/cotton/general-production/cotton-performance-trials/) and the USDA ARS National Cotton Variety Test website (https://www.ars.usda.gov/southeast-area/stoneville-ms/crop-genetics-research/docs/national-cotton-variety-test/). At these locations, a trial season involved up to 55 cultivars. However, as new cultivars evolved every year, there was no cultivar that was grown under both dryland and irrigated conditions for a period sufficiently long for statistical analyses. To maximize consistency in the data while minimizing variability and ensuring that the data reflected standard management, the cultivars that were within 95% of the highest yield at a location each year under irrigated conditions were picked and considered as high-yielding cultivars for that location-year. Because maximizing crop productivity is a primary objective of a farmer, a cotton producer would obviously be interested in high-yielding cotton cultivars. Thus, we picked such cultivars for developing the yield model. From the irrigated and unirrigated (dryland) yields of each of these cultivars, the relative yield of that cultivar for a given location-year was calculated using Equation (3).
(3)
where
,
, and
, respectively, are the relative, unirrigated, and irrigated yields of the c-th high-yielding cultivar at the L-th location in the y-th year.
The
of all the high-yielding cultivars in the y-th year at the L-th location were then averaged. This average was assumed to be the relative yield of a general, high-yielding cotton cultivar grown in the y-th year at the L-th location. Accordingly, there were 12 Ý for Lubbock, 7 for Lamesa, 6 for Chickasha, and 1 for Fort Cobb, thus totaling 26 Ý for the study region (Table 1). These Ý values were used as the relative yields of cotton for further analyses.
2.2. Obtaining the Weather Data
The daily weather data on precipitation, solar radiation, windspeed, and ambient temperatures (minimum, maximum, and dewpoint), which were needed for calculating ARID and the thermal time to estimate the durations of various phenological phases of cotton, for the locations and years associated with the yield data (Table 1) were obtained online from various sources, including the National Centers for Environmental Information (NCEI; https://www.ncei.noaa.gov/access/search/data-search/daily-summaries). Due to the lack of measured data, the daily solar radiation data for these locations and years were generated using a reliable global solar radiation model for the southeastern US [23]. This model performed the best among the 16 similar models reference [23] studied across 30 locations in the southeastern region of the United States, with the lowest root mean square (RMSE) value of 0.29, the lowest R2 value of 0.85, and the highest modeling efficiency value of 0.83.
The daily data on some meteorological variables, especially temperature and precipitation, for some locations (weather stations) in certain years were missing. Those data gaps were filled in with the corresponding data obtained from the closest weather station. The missing value of average daily windspeed for a given day of a given year at a given location was estimated by averaging the windspeed data for that day obtained from the other years for which such data were available. The missing daily dewpoint temperature values were estimated from the corresponding minimum temperature values.
Table 1. The number of seasons and years for which both dryland and irrigated cotton lint yield data were available at four locations in the Southern Plans region of the United States.
Location |
Lat., Lon. |
Seasons |
Years |
Lubbock, TX |
33.69, −101.82 |
12 |
2009 - 2011, 2014 - 2018, 2021 - 2024 |
Lamesa, TX |
32.74, −101.95 |
7 |
2009, 2010, 2014 - 2017, 2019 |
Chickasha, OK |
35.05, −97.94 |
6 |
1999 - 2004 |
Fort Cobb, OK |
35.10, −98.44 |
1 |
2021 |
2.3. Estimating Phenological Phase Durations
The sensitivity of a crop to drought stress varies across phenological phases [15]-[17]. For phenological phase-specific drought sensitivity analyses, therefore, the cotton seasons involved in the trials (Table 1) needed to be split into multiple phenological phases. Based on the limited phase duration data available for cotton in the Southern Plains region and the phases considered in the literature [24]-[26], six phenological phases for cotton were considered in this study: 1) planting-emergence; 2) emergence-pinhead square; 3) pinhead square-first bloom; 4) first bloom-peak bloom; 5) peak bloom-first open boll; and 6) first open boll-harvest.
To split the cotton-growing seasons associated with the yield data into these multiple phenological phases, the planting and harvesting date data corresponding to the yield data were used. For splitting a given season into several phases, the total number of days required to complete each phase was estimated. Because the temperature is the key factor defining the timespan of a development phase, a logical approach to this estimation was to use the thermal time, also known as the growing degree-days (GDD). Based on this approach and using the total thermal time (TTT; ˚C d) needed for each phase [24]-[26] (Table 2), the number of days taken to complete each of the six phenological phases was estimated using Equation (4).
(4)
where
is the duration (days) of the p-th phenological phase (p = 1, 2, …, 6);
is the d-th day of the p-th phase;
is the first day of the p-th phage;
is the n-th day of the p-th phase;
is the total thermal time needed to complete the p-th phase (Table 2); and
is the thermal time (˚C d) on the d-th day of the p-th phase, which in turn, was calculated as:
(5)
where
is the base temperature at which the plant development stops, assumed to be 15.56˚C for cotton [24]-[26]; and
is the average temperature on the d-th day of the p-th phase, which in turn, was calculated using Equation (6).
(6)
where
and
are maximum and minimum temperatures on the d-th day of the p-th phase, respectively. Using these phase duration (Dp) values, each cotton season at each location was split into the six phenological phases considered above.
2.4. Computing Phenological Phasic Values of ARID
The daily values of ARID for each cotton growing season (year) at each location that had both dryland and irrigated yield data (Table 1) were computed from the daily weather data, using the ARID equations described by [18] and the computational procedure (the MATLAB program) provided by [16]. The daily ARID values for each year-location then were averaged by the phenological phase. Consequently, there were 72 phasic values of ARID for Lubbock (12 seasons × 6 phases), 42 for Lamesa (7 × 6), 36 for Chickasha (6 × 6), and 6 for Fort Cobb (1 × 6). Finally, the phasic values of ARID were converted into the corresponding phasic values of “1 – ARID”, which then were used in the yield model (Equation 2).
2.5. Developing the Yield Model
Once the phasic values of “1 – ARID” were calculated as explained above and the relative yields of cotton were calculated using Equation (3) for each year at each location, a dataset matrix of 12 rows (years) and 7 columns (yield + phases) was prepared for Lubbock. Accordingly, the matrices of 7 rows × 7 columns for Lamesa, 6 rows × 7 columns for Chickasha, and 1 row × 7 columns for Fort Cobb were created. For developing the yield model for the Southern Plains region, these four matrices were combined to produce a single matrix of 26 rows and 7 columns. The first column in the matrix contained relative yields (output variable), and the other columns contained the corresponding phasic values of “1 – ARID” for the six phases (input variables).
For developing the yield model for cotton, estimating the values of various phenological phase-specific drought-sensitivity coefficients (λp) would be necessary. This would require regressing the linearized form of Equation (2): Equation (7). Accordingly, all the values in the matrix prepared for the region were converted into the natural logarithmic (ln) values. These transformed matrices, in turn, were used in the R-project software to estimate the λp values through multiple linear regressions.
Table 2. Total thermal time (TTT) needed to complete each phenological phase of cotton in the Southern Plains region of the United States.
Phenological phase (p) |
p |
TTT (˚C d) |
Planting to emergence |
1 |
60 |
Emergence to pinhead square |
2 |
230 |
Pinhead square to first bloom |
3 |
225 |
First bloom to peak bloom |
4 |
200 |
Peak bloom to first open boll |
5 |
370 |
First open boll to harvest |
6 |
200 |
The linearized form of Equation (2) is as follows (Equation 7).
(7)
where Ý is the relative yield of cotton; and the subscripts L, y, and p stand for the L-th location, the y-th year, and the p-th phenological phase of cotton, respectively.
2.6. Evaluating the Yield Model
For evaluating the performance of the cotton yield model, independent datasets would be necessary for model development and model evaluation. If the available data were split into two different subsets, this process would further reduce the size of the model development set. Given the limited number of years available, the leave-one-out technique of cross-validation was used to evaluate the yield model. Following this technique, the available dataset (the transformed matrix) was divided into two parts: one for model development and the other for evaluation. That is, of the total 26 input-output combinations, the first 25 combinations (rows) were used as the model development set for estimating the λp values through the regression of Equation (7) and the last one combination as the evaluation set for yield estimation through the use of the just estimated λp values in Equation (2). Leaving one combination out and adding one combination in, both development and evaluation sets were moved forward 25 times. Each movement created a new development set and a new evaluation set, which, in turn, produced a set of new λp values through regressions and, finally, a yield estimate. This process, consequently, provided 26 yield estimates. Finally, using the mean absolute error, the root mean square error (RMSE), the Nash-Sutcliffe Index [27], and the Willmott Index [28] as the measures of fit, the estimated relative yields using Equation (2) for the years for which the observed yields were available were compared with the corresponding observed relative yields to evaluate the performance of the cotton yield model developed for the study region.
3. Results and Discussion
3.1. The Cotton Yield Model
Table 3 shows the phenological phase-specific drought sensitivity coefficients estimated for the cotton yield model for the Southern Plains region of the US. The use of these coefficients in Equation (2) resulted in the relative yield model for cotton for this region (Equation 8).
(8)
where Ý is the relative yield of cotton; and the subscript PE stands for the planting-emergence phase, ES the emergence-pinhead square phase, SF the pinhead square-first bloom phase, FP the first bloom-peak bloom phase, PO the peak bloom-first open boll phase, and OH the first open boll-harvest ready phase. It is important to note that this yield model is specifically for high-yielding cultivars. For other types of cultivars, the drought sensitivity coefficient values, if generated, could be entirely different from the ones estimated for high-yielding cultivars because cultivars vary in drought sensitivity. Thus, the yield model developed for the high-yielding cultivars might not be applicable to other varieties.
The values of all sensitivity coefficients, except λ2, were positive, indicating that the drought stress occurred during most of the phenological phases of the cotton would have negative impacts on yields. However, the negative value of λ2 showed that the water stress during the period of emergence through the pinhead square stage would help increase the cotton yields. These results were in agreement with those of [22], who carried out a similar study using the necessary data from several places in the southeastern US. They also found that the drought sensitivity coefficients for the second month of the growing season, during which the pre-flowering period of cotton occurs [29], were negative, whereas those for the other growth periods were positive.
Of all the phenological phases of cotton studied, the pinhead square-first bloom phase had the largest value for the drought sensitivity coefficient, indicating this phase as the most sensitive to water stress. Literature also shows that water stress during this phase is the most detrimental of all phases of cotton. Through their review studies, [30] [31] demonstrated that the cotton crop was most sensitive to drought stress following seed germination and seedling establishment. In a technical report prepared for cotton producers in the southeastern US to provide them with key concepts related to water management for cotton, [13] stated that the first square to first bloom is a critical time for avoiding severe water stress because cotton vegetative growth and root growth are very rapid and the number of potential fruiting sites is determined during this period. The severe water stress during this period is especially damaging to the cotton crop in short-season environments [13]. Reference [12] observed that the period from square initiation to first flower had the highest correlation with the lint yield. They found that this period was strongly correlated with boll number and boll size. Boll size (lint/boll) also was strongly correlated with water supply during this period. References [30] and [31] also demonstrated that pre-flowering is a critical period for yield determination as fiber production is based on the number of ovules contained in a boll, which is determined around 15 to 25 days before anthesis. In experiments conducted in Lubbock and New Deal in the Texas High Plains region, reference [32] also found that the early flowering stage was the most sensitive to drought stress, and drought at this stage produced the lowest yields. Accordingly, reference [32] suggested that when a drought event occurs, cotton producers make a decision on preserving water resources for the crop’s critical growth stage. With an experiment conducted in Halfway, Texas, [33] concluded that flower initiation to early bloom and peak bloom stages were the most sensitive stages to water stress.
Table 3. The phenological phase-specific drought sensitivity coefficient (λp) values for the cotton relative yield model for the Southern Plains region of the United States.
Phenological phase (p) |
λp |
Value |
|
intercept |
0.637 |
Planting to emergence (p = 1) |
λ1 |
0.013 |
Emergence to pinhead square (p = 2) |
λ2 |
−0.111 |
Pinhead square to first bloom (p = 3) |
λ3 |
0.162 |
First bloom to peak bloom (p = 4) |
λ4 |
0.092 |
Peak bloom to first open boll (p = 5) |
λ5 |
0.057 |
First open boll to harvest (p = 6) |
λ6 |
0.082 |
In drought sensitivity, the pinhead square-first bloom phase was followed by the first bloom-peak bloom phase. Similar results were observed by previous researchers. Reference [13] showed that water stress during the first flower to peak bloom stages would reduce the number of fruiting sites that were initiated. They further stated that severe water stress during this phenological phase could also reduce the boll number by shedding young bolls, thus resulting in substantial yield loss. Reference [31], through a review study, demonstrated that the water deficit stress during flowering could reduce cotton yields significantly. In an experiment conducted in Apodi, Brazil, reference [34] observed lowest yields when water supply was suppressed during the first flower to peak bloom stages.
The sensitivity coefficient values indicated that the planting-emergence, peak bloom-first open boll, and first open boll-harvest ready phases were less sensitive to water stress. These results were in line with those of previous studies. Reference [13] exhibited that the water used by cotton during planting to emergence would be low, and that the water stress during the peak bloom-first open boll phase would be less critical than that during the squaring and early flowering stages, during which water stress could result in square and young boll shedding. For germination, however, water is critical; thus, reference [13] suggested cotton growers to irrigate before planting. In an experiment conducted in Apodi, Brazil by [34], the cotton yield reduction was not significant when the water stress was imposed after the first open boll stage. Reference [33], in a study carried out in Halfway, Texas, observed that the cutout, late bloom, and boll opening growth stages were less sensitive to water stress. Through a review study, reference [31] demonstrated that cotton bolls were less sensitive to water stress than the leaves because the bolls are significantly resistant to water loss.
As the negative value of the drought sensitivity coefficient indicated, cotton yields would be positively impacted by the water stress that occurred during the emergence-pinhead square phase. This implication was consistent with the findings of various previous studies. Unless the soil water deficit is extremely severe, irrigation at this time contributes little to cotton yield, as the water demand during this phase is low [34]. Actually, a mild water deficit early in the season can stimulate root production in cotton, especially encouraging deeper root systems, by slowing vegetative growth [13]. A mild water stress prior to square initiation also stimulates flower production later [35]. In a study conducted in Phoenix, Arizona, reference [36] observed that a moderate water stress early in the season could be beneficial to the cotton plants. Reference [12] also found that the water supply during planting to square initiation was negatively correlated with cotton yield. In a study conducted in Punjab, India, reference [37] observed that water stress during the pre-flowering period of some cotton cultivars increased the number of flowers and bolls per plant and boll size, thus increasing yields. To sum up, the reasonable values of the drought sensitivity coefficients (Table 3) indicated that the cotton yield model (Equation 8) expressed the ARID-cotton yield relationship for the Southern Plains region accurately.
3.2. The Yield Model Performance
Table 4 shows the values of the various goodness-of-fit measures that were used to evaluate the performance of the cotton yield model for the Southern Plains region of the US. The RMSE value was 0.09 (dryland yield per unit of irrigated yield). The overall percentage error of the yield model, computed as the ratio of RMSE to the mean observed relative yield, was about 22. The Nash-Sutcliffe Index value was 0.45, whereas the Willmott Index value was 0.83.
The values of these measures as well as Figure 1 indicated that the cotton yield model developed for the Southern Plains region performed reasonably well at predicting the relative yield of cotton and thus the fraction of yield loss caused by drought. The overall percentage error and the mean absolute error values of the ARID-based yield model were relatively small. The value of the Nash-Sutcliffe Index, which compared the residual variance of the model-predicted values to the variance of the measured data, suggested that the agreement between the observed data and the model-predicted values was relatively satisfactory for the cotton yield model, and thus the predictive power of the model was relatively good. Moreover, the positive values of the Nash-Sutcliffe Index showed that model predictions were more accurate than the averages of the observed data. The value of the Willmott Index, which measured the degree to which the observed data were approached by the model-estimated values, indicated that the relative yields of cotton predicted by the yield model agreed fairly closely with the relative yields calculated from the observed data. The average values of both predicted and observed relative yields of cotton were about the same (0.41). If the percentage error of the model were computed as the absolute difference between the predicted and the observed values relative to the observed value, the mean error value would be about 20%, an indication of a relatively small error. The range of the predicted yields was 0.13 to 0.60 and that of the observed yields was 0.14 to 0.61. That is, the width of the range of the predicted yields (0.469) relative to that of the observed yields (0.473) was about 0.99, which indicated that the modeling error based on this statistic was about 1%.
![]()
Figure 1. The model-predicted vs. observed values of the relative yield (dryland yield per unit of irrigated yield) of cotton in the Southern Plains region of the USA (1999-2024).
A total of 26 location-years were involved in this study. As depicted in Figure 1, the departure of the predicted relative yield from the observed one ranged from 0.00 to 0.15, implying that the prediction error was up to 15%. However, there was no association between the yield departure (prediction error) and location or year.
It is important to note that the impacts of drought on cotton yields associated with the data were influenced by several factors such as a specific cotton variety, management, environmental conditions, the duration and the intensity of the drought events, and the growth stages at which the stress was imposed. Despite large uncertainties associated with the data on crop management, cultivars, soil, and weather, the ARID-based cotton yield model was able to estimate the overall effect of drought on the relative yield of cotton in the Southern Plains region reasonably well by reflecting accurately the phenomenon of water stress decreasing the yields of cotton in this region.
Table 4. Values of the various metrics used to evaluate the performance of the cotton yield model developed for the Southern Plains region of the United States.
Metric |
Value |
Mean observed relative yield |
0.411 |
Mean predicted relative yield |
0.405 |
Willmott Index |
0.83 |
Nash-Sutcliffe Index |
0.45 |
Mean absolute error |
0.08 |
Root mean square error |
0.09 |
Percentage error |
22 |
4. Conclusions
This study developed a yield model for predicting the drought-induced yield loss of cotton in the Southern Plains region of the United States using an agricultural drought index, called the Agricultural Reference Index for Drought (ARID). This model accounts for the phenological phase-specific sensitivity of cotton yields to drought stress. The modeling results showed that, of all the phenological phases of cotton studied, the pinhead square-first bloom phase was the most sensitive to drought, whereas cotton yields were positively impacted by the water stress that occurred during the emergence-pinhead square phase. The reasonable values of the sensitivity parameters of the yield model indicated that the model could express the relationship between ARID and the yields of cotton accurately. The yield model reflected the phenomenon of water stress decreasing the cotton yields in this region and estimated the yield losses due to drought reasonably well. The yield model can be useful for minimizing the effects of drought on cotton yields through the adoption of necessary mitigation measures and scheduling irrigation allocation tailored to phenological phases that are more sensitive to drought stress.
In the southern United States, including the Southern Plains region, the inter-annual variability of climate has been linked to an oceano-atmospheric phenomenon, called the El Niño-Southern Oscillation (ENSO) [38] [39]. The ENSO phenomenon has significantly affected crop production in this region [40]-[43]. Because of the strong connection between ENSO and weather patterns in this region, an ENSO phase (El Niño, La Niña, or Neutral) may be successfully forecast up to a year in advance [44]. By using the phenological phase-specific ARID values computed from an ENSO phase-specific long-term historical weather data in the yield model (Equations 8), various stakeholders in this region, including cotton growers and extension agents, can estimate the yield loss of cotton from drought for the upcoming, anticipated ENSO phase year (growing season) in advance.