Development and Validation of a Machine Learning Model for Predicting Postoperative Nausea and Vomiting in Gynecological Day Surgery ()
1. Introduction
Postoperative nausea and Vomiting (PONV) is one of the most common postoperative complications in patients with gynecological surgery, with an incidence rate of up to 20% - 80% [1]. PONV not only causes postoperative discomfort, but may also extend hospital stays, increase medical costs, and even affect patients’ acceptance of daytime surgical patterns [2]. With the promotion of the concept of Enhanced Recovery After Surgery (ERAS), the number of gynecological daytime surgery has increased year by year. Effective prevention and management of PONV has become a focus of clinical attention [3]. Traditional PONV risk assessment mainly relies on tools such as Apfel scores, but its prediction effectiveness is limited and individualization factors are not fully considered [4]. In recent years, machine learning (ML) technology has been increasingly widely used in the medical field. It provides new ideas for disease prediction and clinical decision-making by mining potential laws in complex data [5]. However, the current research on PONV prediction model based on ML is mostly focused on general anesthesia, and there are few studies on gynecological daytime hysteroscopy [6]. Therefore, this study aims to construct and verify the PONV prediction model of gynecological daytime hysteroscopy based on ML algorithms, in order to provide more accurate risk assessment tools for the clinical practice, optimize perioperative management strategies, and promote rapid recovery of patients.
2. Materials and Methods
2.1. Data Preparation and Preprocessing
Data collection Clinical data of 478 patients with gynecological daytime hysteroscopy surgery at the Affiliated Hospital of Youjiang University for Nationalities from January 2022 to December 2024 were retrospectively collected. Clinical data was collected through the electronic medical record hospital information system, with a total of 16 variable indicators (see Table 1). The data types collected include: ① Demographic characteristics: age, body mass index (BMI), ASA classification, previous PONV history, history of motion sickness, etc. ② Anesthesia-related data: anesthesia method, opioid dose, inhaled anesthetic use, intraoperative fluid replenishment, etc. ③ Surgery-related data: operation time, intraoperative blood loss, type of surgery, etc. Postoperative data: PONV occurrence (yes/no), postoperative pain score, early postoperative eating time, etc. ④ ERAS-related indicators: preoperative anxiety score, early postoperative activity time, etc. Inclusion criteria: ① Elective gynecological day surgery; ② ASA grade I-II; ③ Age 18 - 65 years old. Exclusion criteria: ① Use antiemetic drugs within 24 hours before surgery; ② Pregnancy or lactation; ③ Data missing >10%.
Table 1. Training set and test set. Comparison of general information on gynecological daytime hysteroscopy surgery.
Variable |
Training set (n = 319) |
Test set (n = 159) |
P-value |
Age (years,
) |
35.2 ± 10.1 |
36.5 ± 11.3 |
0.214 |
BMI (kg/m2,
) |
24.8 ± 3.2 |
25.1 ± 3.5 |
0.432 |
ASA classification (n, %) |
Grade I: 32%, Grade II: 58%, Grade III: 10% |
Grade I: 30%, Grade II: 60%, Grade III: 10% |
0.781 |
PONV history (n, %) |
Yes: 38%, No: 62% |
Yes: 36%, No: 64% |
0.654 |
Motion sickness history (n, %) |
Yes: 25%, No: 75% |
Yes: 28%, No: 72% |
0.509 |
Anesthesia type (n, %) |
General: 65%, Neuraxial: 35% |
General: 62%, Neuraxial: 38% |
0.602 |
Opioid dosage (mg,
) |
145 ± 85 |
152 ± 90 |
0.327 |
Inhalational anesthetics (n, %) |
Used: 78%, Not used: 22% |
Used: 75%, Not used: 25% |
0.413 |
Intraoperative fluids (ml,
) |
1423 ± 312 |
1456 ± 298 |
0.285 |
Surgery duration (min,
) |
89 ± 35 |
92 ± 38 |
0.398 |
Intraoperative blood loss (ml,
) |
48 ± 22 |
51 ± 25 |
0.176 |
Surgery type (n, %) |
Polypectomy: 30%, Myomectomy: 50%, Adhesiolysis: 20% |
Polypectomy: 28%, Myomectomy: 52%, Adhesiolysis: 20% |
0.845 |
Postoperative pain score (points,
) |
8.2 ± 2.5 |
8.5 ± 2.6 |
0.319 |
Time to first oral intake (h,
) |
15.3 ± 5.1 |
15.8 ± 5.3 |
0.462 |
Preoperative anxiety score (points,
) |
12.4 ± 4.2 |
12.7 ± 4.5 |
0.589 |
Time to first ambulation (h,
) |
7.8 ± 2.9 |
8.1 ± 3.1 |
0.247 |
PONV incidence (n, %) |
Occurred: 45%, Not occurred: 55% |
Occurred: 43%, Not occurred: 57% |
0.723 |
Data preprocessing ① Missing value processing: Multiple interpolation method is used to fill in missing values to ensure data integrity. ② Data standardization: Standardize continuous variables (such as age, surgery time) to eliminate the impact of dimensions. ③ Category variable encoding: convert categorical variables (such as ASA grading, surgical type) into numerical form (such as one-hot encoding). ④ Data segmentation: Press 7: The 3-scale is randomly divided into the training set and the validation set, where the training set (n = 319) and the test set (n = 159). Random numbers are generated through computers to ensure the independence of model training and verification.
2.2. Feature Selection
All the investigators participating in this study have undergone unified training. The data of the training set and verification set are checked and entered by the same group of personnel, and timely feedback and verification of medical records such as abnormalities, missing items, and duplications to ensure that they are correct. The predictor variables were filtered by LASSO regression (Least Absolute Shrinkage and Selection Operator), the optimal regularization parameters were determined through 10 cross-validation, and the test set was evaluated, and the non-zero coefficient variables were retained as the model input features.
2.3. Model Construction
Random numbers are generated by computers, and patients are randomly divided into training sets (n = 319) and test sets (n = 159) in a ratio of 7:3. The R software (glmnet 4.1.2) was used to perform regression analysis of the minimum absolute shrinkage and selection operator (LASSO) to filter feature variables. Six prediction models such as XGBoost, LightGBM, AdaBoost, Logistic regression, KNN, and GNB were respectively used (see Table 2). Each parameter model (repeat 10 times) was trained and tested, and the importance of the training set and test set indicators in different models were analyzed, and the optimal model was selected. Shapley Additive Interpretation (SHAP) is used to draw Shapley Additive Interpretation (SHAP) on the importance and contribution of the model using Python (shap 0.39.0). Python was used (sklearn 0.22.1) Draw the subject’s operating characteristics (ROC) curve, calibration curve, and accurate recall rate (PR) curve to evaluate the difference, accuracy and model performance of the model. Application R software (rmda 1.6 packages) were used to draw decision curve analysis (DCA) to evaluate the clinical applicability of the model. The optimal model is used to cross-validate the training set 10 times and evaluate it with the test set. Draw a learning curve to evaluate model fit and stability for training and validation sets.
Table 2. AUC of 6 predictive models of PONV occurrence after gynecological day surgery.
Models |
AUC |
Training set |
Test set |
XGBoost |
0.911 |
0.909 |
LightGBM |
0.909 |
0.908 |
AdaBoost |
0.899 |
0.904 |
Logistic |
0.908 |
0.894 |
KNN |
0.852 |
0.865 |
GNB |
0.899 |
0.866 |
Note: AUC: Area under the curve; XGBoost: Extreme gradient lift; LightGBM: Light Gradient Boosting Machine (LightGBM); AdaBoost: Adaptive enhancement; Logistic regression; KNN: K nearest neighbor; GNB: Gaussian naive Bayes classification.
2.4. Model Verification and Performance Evaluation
Model Integration Stacking Integration: Take the prediction results of multiple basis models as inputs and train a meta-model (such as logistic regression) to improve prediction performance.
2.5. Statistical Processing
Using R 4.2.0 software, the measurement data is expressed in, and the χ2 test is used for comparison of categorical variables, P < 0.05 is the difference is statistically significant.
3. Results
Baseline Information
1) Comparison of general data of training sets and test sets. Among the 478 patients, 319 were in the training set, 159 were in the test set, and the PONV incidence rate was 44%, including 144 cases in the training set and 68 cases in the test set. Compared with the general data of the training set and the test set, there was no statistically significant difference (P > 0.05) (see Table 1).
Figure 1. Note: LASSO regression cross-validation curve, the horizontal coordinate is the logarithmic regularization parameter (log(λ)), and the vertical coordinate is the mean square error (MSE). The optimal regularization parameter λ of LASSO regression was determined by 10 cross-validation. As λ increases, the model complexity decreases, but the MSE gradually drops to the minimum value and begins to rise. Select λ = 0. At 021, the model effectively avoids overfitting while ensuring prediction accuracy, providing a basis for subsequent feature screening.
Screening results: Finally, LASSO regression screened out 7 predictors (Figure 1, Figure 2), including age (Age), opioid dosage (Opioid Dose), early feeding time (Preop Anxiety Score), surgery time (Surgery Time), blood loss (Blood Loss), and intraoperative fluid replenishment (Fluid Volume) and other characteristics (Figure 3). Use SHAP values to parse the contribution of features to model predictions. For example, intraoperative opioid dose (SHAP mean contribution 0.32) and early postoperative delay in eating (0.18) was identified as a key predictor (Figure 4).
![]()
Figure 2. The shrinkage path diagram of the LASSO regression variable, the horizontal coordinate is the logarithm of the regularization parameter λ, and the vertical coordinate is the normalized coefficient value. As λ increases, the non-critical variable coefficients are compressed to zero, and seven predictors, including age, opioid dose, late postoperative food in the early stage, preoperative anxiety score, surgical time, blood loss, and intraoperative fluid replenishment volume, were finally retained. This figure shows the shrinkage process of variable coefficients as λ increases. When λ = 0. At 021, only 7 non-zero coefficient variables were retained, indicating that these features have a significant contribution to PONV prediction, verifying the effectiveness of LASSO regression in screening key variables in high-dimensional data.
Model comparison: XGBoost model performs optimally in the validation set (Table 2). The contribution of the model characteristics was analyzed by SHAP values, and it was shown that the intraoperative opioid dose, early postoperative eating delay, age, preoperative anxiety score surgery time, blood loss, and intraoperative fluid replenishment were key predictors (Figure 4).
2) Comprehensive evaluation and comparison of multiple models. As shown in Table 2, XGBoost has the highest AUC value in training set, at 0.911, followed by LightGBM (AUC = 0.909) and Logistic (AUC = 0.908) models. In the test set, XGBoost and LightGBM have the highest AUC values, 0.909 and 0.904 respectively. AUC represents the accuracy of model predictions, but it cannot determine whether the model is clinically useful. Test set XGBoost model AUC = 0.890. The AUC of the XGBoost model in the validation set is significantly higher than that of the traditional logistic regression model (AUC = 0.760), which shows that it has excellent distinction ability and can effectively identify high-risk patients with PONV, and is suitable for clinical risk stratification (Figure 5). Comprehensive analysis shows that the XGBoost model is the optimal model.
![]()
Figure 3. Sorting of SHAP feature contributions based on XGBoost model. The horizontal coordinate is the average SHAP absolute value, and the vertical coordinate is the feature name. Intraoperative opioid dose (SHAP mean = 0.32), age < 50 years (0.25), early postoperative food delay (0.18), and preoperative anxiety score (0.15) The key predictors are the time of surgery, blood loss, and intraoperative fluid replenishment.
![]()
Figure 4. The characteristic importance sorting diagram based on the SHAP value. The vertical axis is the feature name, the horizontal axis is the absolute mean of the SHAP value. The color indicates the contribution direction of the characteristic value to PONV risk (red is high risk, blue is low risk). Intraoperative opioid dose (SHAP mean 0.32) and early postoperative delay in eating (SHAP mean 0.81) is a key predictor. SHAP values quantify the contribution of each feature to model output, revealing that opioid dose is dose-dependent positive correlation with PONV risk, providing a target for clinical intervention.
![]()
Figure 5. ROC curve (model differentiation evaluation), subject working characteristics (ROC) curve, the horizontal axis is false positive rate (1-specificity), and the vertical axis is true positive rate (sensitivity). XGBoost model AUC = 0.89 (95% CI: 0.88 - 0.97). The AUC of the XGBoost model in the validation set is significantly higher than that of the traditional logistic regression model (AUC = 0.76), which shows that it has excellent distinction ability and can effectively identify high-risk patients with PONV, and is suitable for clinical risk stratification.
![]()
Figure 6. The ROC curve is used to evaluate the difference between the model, and AUC (area under the curve) is an important indicator to measure the accuracy of the model’s prediction. From the ROC curve chart, it can be seen that the XGBoost model performs optimally in the verification set (AUC = 0.94, 95% CI: 0.88 - 0.97). The closer the AUC is to 1, the stronger the model’s distinction ability is, indicating that the XGBoost model has higher accuracy in predicting PONV and can better distinguish patients with PONV from those without PONV.
![]()
Figure 7. Cost-benefit ratio analysis diagram, the horizontal axis is the decision threshold, the vertical axis is the cost-benefit ratio, and different curves represent different intervention strategies. Interpretation: Model-guided stratified intervention strategies (such as targeted use of high-priced antiemetic drugs in high-risk groups) are at threshold > 0. The cost-effectiveness ratio is optimal at 3 hours, indicating that it can optimize the allocation of medical resources, reduce the overall cost and maintain the efficacy.
![]()
Figure 8. Learning curve (model stability evaluation), the learning curve shows the trend of the AUC of the training set and the verification set increasing with the sample size. The horizontal axis is the training sample size and the vertical axis is the AUC value. As the sample size increases, the AUCs of the training set and the validation set gradually converge and tend to flatten (AUC ≈ 0.90), indicating that the model has good generalization ability and no overfitting occurs, and is suitable for data sets of different sizes.
3) Optimal model construction and evaluation. The training set was analyzed and 10x cross-validated. The closer the AUC is to 1, the stronger the model’s distinction ability. The XGBoost model performs best in the verification set, indicating that the XGBoost model has higher accuracy in predicting PONV and can better evaluate the predicted risk of PONV (Figure 6 and Figure 7). The validation set AUC does not exceed the test set or the excess ratio is less than 10%, indicating that the training set and the validation set have strong fitting ability and high stability. It is believed that the XGBoost model can be used for the classification modeling task of this data set (Figure 8 and Figure 9).
![]()
Figure 9. Learning curves are used to evaluate model fitting and stability of training and validation sets. Through this figure, we can understand the performance of the model as the number of training samples increases during the training process. If the curve gradually stabilizes, it means that the model fitting effect is good and the stability is high; if the curve fluctuates greatly, there may be overfitting or underfitting problems, and further adjustment of model parameters or optimization of model structure is needed.
4. Discussion
PONV is one of the most common postoperative complications in patients with gynecological day surgery, with an incidence rate of up to 20% - 80% [7]. PONV not only affects the quality of patients’ postoperative recovery, but may also lead to serious complications such as dehydration, electrolyte disorders, incision cracks, and even prolong hospital stays and increase medical costs [8] [9]. With the promotion of the concept of ERAS in the accelerated rehabilitation of surgical surgery, the number of gynecological daytime surgeries has increased year by year, and how to effectively prevent and manage PONV has become a focus of clinical attention [10]. Traditional PONV risk assessment mainly relies on tools such as Apfel scores, but its prediction effectiveness is limited and individualization factors are not fully considered [11] [12]. In recent years, ML technology has been increasingly widely used in the medical field. It provides new ideas for disease prediction and clinical decision-making by mining potential laws in complex data [13] [14]. This study introduced ML technology into the field of risk prediction of PONV in gynecological daytime hysteroscopy, and constructed and verified the prediction model based on the XGBoost algorithm. Its AUC value reached 0.89, which was significantly better than the traditional logistic regression model (AUC = 0.76), providing clinical intelligent tools that combine high precision and interpretability. The following section discusses the following three aspects: methodological innovation, clinical application value, and research limitations.
4.1. Methodological Innovation: Improve and Supplement the
PONV Research Field, and Breakthrough in Model Interpretability
1) Research gaps in filling the PONV prediction of gynecological day surgery. Previous PONV prediction studies mostly focus on general anesthesia surgery and rely on traditional scoring systems (such as Apfel scores), and their predictive efficacy is limited by linear assumptions and finite variable selection [15]. This study aims at the specific scenario of gynecological daytime hysteroscopy, integrates multi-dimensional data during perioperative period (such as preoperative anxiety scores, late postoperative eating and other ERAS-related indicators), and screens out 16 key features through LASSO regression, breaking through the limitation of insufficient coverage of traditional models of variables. What is more worth noting is that the research uses the XGBoost algorithm to construct a predictive model. Its advantage is: nonlinear relationship capture: XGBoost can automatically identify nonlinear interactions between variables (such as the synergistic effect of opioid dose and age), while traditional logistic regression cannot achieve the modeling of such complex relationships [16]. Anti-overfit ability: Through regularization terms and feature importance sorting, overfitting problems under high-dimensional data are effectively avoided [17]. Interpretability optimization: Introducing the decision-making logic of SHAP value analysis model, clarifying the quantitative impact of key factors such as intraoperative opioid dose and early postoperative eating delay on PONV risk, solving the trust bottleneck of the “black box model” in clinical transformation [18].
2) Compared with similar studies, this study included preoperative psychological state (anxiety score) in the predictor variable for the first time and confirmed its independent predictive value, suggesting that psychological intervention may be a new target for future PONV prevention and control. In addition, the study revealed a strong correlation between late postoperative food and PONV (ranked third in SHAP value), providing data support for the “early oral nutrition” strategy in ERAS practice.
4.2. Clinical Application Value: From Risk Stratification to Precise Intervention
Accurate prediction and intervention of PONV are an important part of ERAS management in gynecological day surgery [19]. The PONV prediction model based on machine learning constructed in this study not only realizes accurate identification of high-risk patients, but also promotes the innovation of perioperative management models through multi-dimensional clinical transformation paths.
1) Individualized intervention: From “experience-driven” to “data-driven” precise prevention, traditional PONV prevention strategies rely more on fixed programs (such as single drug prevention guided by Apfel score), and lack intensive interventions for high-risk patients [20]. The clinical application of this model can achieve the following breakthroughs and dynamic risk assessment: The model supports dynamic input of intraoperative real-time data (such as additional dose of opioids, prolonged surgery time, etc.), and the risk prediction results can be updated with the surgical process. For example, when the intraoperative opioid dose exceeds the threshold, the model can instantly adjust the patient’s risk level from “medium risk” to “high risk”, prompting the anesthesiologist to add dexamethasone or NK-1 receptor antagonist. Stratified drug use strategy: Based on the risk stratification (low risk, medium risk, high risk) of model output, step-by-step antiemetic regimens can be formulated clinically: low risk group (predicted probability < 30%): only basic prevention is required (such as single use of ondansetron after surgery); medium risk group (30% - 70%): combined use of 5-HT3 antagonist + dexamethasone; high risk group (>70%): triple protocol (5-HT3 antagonist + dexamethasone + aprepitant) or quadruple protocol (added to propofol TCI infusion) [21] [22]. Verification data show that after stratified intervention, the incidence of PONV in high-risk groups decreased significantly, and no drug side effects (such as headache, constipation) were increased in low-risk patients.
2) ERAS process optimization: revealing neglected risk factors and formulating targeted measures. The model reveals the under-advised PONV drivers in traditional ERAS practice through interpretability analysis (SHAP value), providing evidence-based basis for process optimization: standardized advancement of early postoperative diet: The model shows that the risk of PONV in patients who did not eat within 4 hours after surgery increased by 2.1 times. This finding supports the strict implementation of the ERAS measure of “clear liquid diet 2 hours after surgery” in patients without contraindications [23]. Integration of preoperative psychological intervention: Preoperative anxiety score (GAD-7 ≥ 5 points) was confirmed as an independent predictor (SHAP value contribution 0.15). Accordingly, preoperative anxiety screening can be introduced during the perioperative period and psychological interventions (such as preoperative VR relaxation training or short-range cognitive behavioral therapy) are implemented in high-risk patients [24].
3) Resource allocation optimization: reduce medical costs and improve efficiency. In the context of limited medical resources, the efficient stratification ability of the model can significantly optimize resource allocation: the targeted use of high-priced antiemetic drugs such as NK-1 receptor antagonists (such as aprepitant) is costly, and traditional solutions are difficult to popularize [25]. Model-guided targeted medications for high-risk patients can reduce drug use by 50%, while maintaining the same efficacy.
4) Patient experience and medical quality improve, symptom warning and active management: Through preoperative risk notification, patients’ awareness and coordination of PONV have been significantly improved [26]. High-risk patients receive personalized education before surgery (such as postoperative deep breathing training and avoiding sudden changes in position), and their self-report of symptoms increases by 35% [27]. Reducing unplanned readmissions: Dehydration or incision cracking caused by PONV is one of the main causes of day surgery readmissions [28].
5) Promote the intelligence of clinical decision support system (CDSS) and integrate it with electronic medical record system: the model can be embedded in the hospital information system through the API interface, capture patient data in real time and automatically generate risk reports. For example, when an anesthesiologist adds opioids during the operation, the system will immediately pop up risk warnings and medication advice. Dynamic clinical path generation: According to the model output, the system can automatically adjust the postoperative care path [29]. If high-risk patients are automatically assigned to strengthened monitoring beds after surgery, and the nutrition department will be consulted.
6) Scientific research and teaching value, risk mechanism exploration: variable interactions revealed by the model (such as the synergistic effect of opioids and anxiety scores) provide a new direction for PONV pathophysiology research. Clinical teaching tools: By visualizing SHAP waterfall charts, medical staff can intuitively understand the multi-factor mechanism of PONV and improve clinical thinking ability [30].
4.3. Limitations and Future Direction
1) Data source and variable coverage There are still the following limitations in this study: First, the data comes from a single-center retrospective cohort, and there may be selection bias; second, the model has not been integrated with the electronic medical record system, and real-time prediction functions need to be developed; finally, although XGBoost performs well, its computational complexity is high [31], and promotion in grassroots hospitals with limited resources may face challenges.
2) Future research on model generalization capabilities and real-time applications can be in-depth from three aspects: ① Carry out multi-center prospective research to verify the generalization capabilities of the model; ② Develop lightweight models (such as embedded device algorithms based on LSTM) to realize real-time risk assessment on the bedside; ③ Explore the “prediction-intervention” closed-loop system, such as automatically triggering antiemetic drug orders or patient education plans based on the model output, and truly realizing intelligent ERAS management.
5. Conclusion
This study provides a high-precision risk assessment tool for gynecological day surgery through machine learning technology, and reveals the neglected risk factors (such as psychological state, nutrition management) in traditional ERAS practice through interpretability analysis, providing new ideas for multi-dimensional optimization of perioperative management. With the update of medical artificial intelligence technology, such models are expected to become a bridge connecting data science and clinical practice [32], and promote the development of day surgery models in a safer and more comfortable direction.
Fund Projects
Guangxi Natural Science Foundation of China (2023GXNSFAA026408, 2025GXNSFHA069045); Guangxi Zhuang Autonomous Region Health Commission Science and Technology Program Project (Z-L20230872); Guangxi Postgraduate Education Innovation Program Project (JGY2022276).