Developing Preoperative Nomograms to Predict Any-Stage and Stage III-IV Endometriosis in Infertile Women

Study objective: To generate and validate nomograms to predict any-stage and stage III-IV endometriosis before surgery in infertile women. Design: A single center retrospective cohort study. Setting: University affiliated hospital. Patients: Infertile patients (n = 1,016) who underwent reproductive surgery between July 1, 2016 and June 30, 2019. Interventions: None. Main outcome measurements: We randomly selected 2/3 of the included patients (667 patients, training sample) to analyze and generate predictive models and validated the models on the remaining patients (339 patients, validation sample). A multivariate logistic regression model was used with the training sample to select variables using a back stepwise procedure. Nomograms to predict any-stage and stage III–IV endometriosis were constructed separately. The discriminations and calibrations of both nomograms were tested on the overall population and a subgroup without endometrioma diagnosed on transvaginal sonography (TVS) of training and validation samples. The impact of different variables in these models was evaluated. Results: There were 377 (55.7%) women in the training sample and 196 (57.8%) in the validation sample who were diagnosed with endometriosis. The nomogram predicting any-stage endometriosis had an area under the curve (AUC) of 0.760 for the training sample and 0.744 for the validation sample, with favorable calibrations in the overall population. However, the performance was significantly decreased in patients without endometrioma on TVS, with an AUC of 0.726 in the training sample and 0.694 in the validation sample. Similarly, the nomogram predicting stage III–IV endometriosis had an AUC of 0.833 and 0.793 for the training and validation samples, respectively, as well as a favorable calibration. However, the performance of the nomogram on patients without endometrioma on TVS was poor. Endometrioma on TVS strongly predicted both any stage and stage III–IV endometriosis on both samples. Conclusion: We developed nomograms to predict any-stage and stage III–IV endometriosis but their performance were significantly decreased in patients without endometrioma on TVS. Endometrioma on TVS strongly predicted any and III–IV stage endometriosis in both sample groups. Therefore, we recommend that this study be used as encouragement to advance the utilization of advanced imaging for endometriosis for better clinical prognosis.


INTRODUCTION
Endometriosis is a common gynecological disorder characterized by the presence of endometrial tissue outside of the uterine cavity (1). Women with endometriosis typically present with infertility and pelvic pain, but can be asymptomatic (2). Among infertile women, the prevalence of endometriosis is 25-50% (3). To make a definitive diagnosis of endometriosis, laparoscopic inspection of the pelvis is necessary (4). However, as in vitro fertilization (IVF) is an alternative choice for infertile women, it will be helpful if endometriosis is identified with preoperative clinical data (5).
Plenty of studies have associated preoperative clinical data with endometriosis (6). However, no consensus has been reached on the best predictive models for endometriosis due to the diverse study population, different diagnostic criteria, and various predictive factors (6,7). The most common models using pain to predict endometriosis are not as accurate in diagnosing asymptomatic endometriosis, which is often found in infertile women (8). Several reports have predicted endometriosis in infertile women yet with some limitations. Some studies use history and symptoms as predictive factors, generating areas under the curve (AUC) of 0.71 and 0.752 (5,9). Some studies are not practical, as they used patients with normal pelvis as controls (10,11).
We aimed to build simplified nomograms to preoperatively predict any-stage and stage III-IV endometriosis in infertile women. Hysterosalpingography (HSG) findings specific to infertility were incorporated into the models. A subgroup of patients without endometrioma was used to illustrate the performance of two nomograms. In addition, we altered the variables in the full models for a better understanding of their effect for any-stage and stage III-IV endometriosis.

MATERIALS AND METHODS
This is a single center retrospective cohort study, which was approved by the Ethics Committee of Peking Union Medical College Hospital.

Patients
The inclusion criteria were infertile patients who underwent reproductive surgery by laparoscopy and hysteroscopy for infertility (defined as attempting pregnancy for ≥1 year without success) in the Gynecological Endocrine and Reproduction Center of Peking Union Medical College Hospital (Beijing, China) between July 1, 2016 and June 30, 2019. The exclusion criteria were patients with a previous surgical diagnosis of endometriosis, or previous laparoscopic or hysteroscopic investigations. Figure 1 shows the patient selection flowchart.
Endometriosis was diagnosed using laparoscopy on visual evidence alone according to the European Society of Human Reproduction and Embryology guidelines (4). Endometriosis was scored using the revised American Fertility Society (r-AFS) classification system and classified into four stages: I (minimal), II (mild), III (moderate), or IV (severe) (12). Patients who were not diagnosed with endometriosis (with or without other diagnoses) at laparoscopy were used as the controls for this study.

Data Collection and Variable Definitions
The data were gathered by trained doctors who reviewed the patients' medical records. The infertility investigations performed in our center include clear medical records and standard reproductive surgeries. The standard infertility investigation includes a female medical history, symptoms, bimanual pelvic examination, ultrasound findings, and blood analysis of reproductive hormones and thyroid function, as well as a male medical history and sperm analysis. Most patients who do not have a clear indication for surgery undergo HSG. Once all exams and procedures have been performed, patients are counseled regarding IVF or a hysteroscopic and laparoscopic investigation. Surgery is recommended to investigate for possible endometriosis in women with normal ovulation and tubal patency whose partners have a normal semen. In this study, all surgeries were performed and recorded by gynecological surgeons. Procedures were performed under general anesthesia, and endometriosis was scored and staged by a visual inspection of the abdomen and pelvis. All recognizable endometriotic lesions were radically excised to reconstruct the pelvic anatomy whenever achievable without affecting fertility.
Variables with both positive and negative results in all patients were included in the study. Duration of infertility was defined as the period between the time the couple had started trying to conceive and the time of surgery. Pain was defined as dysmenorrhea with need of analgesic medication most of the time, and/or intermenstrual pelvic pain, and/or dyspareunia. Palpable nodularities in the pouch of Douglas were found on bimanual pelvic examination (13). Endometrioma was defined as the presence of a cyst or multiple cysts containing diffuse lowlevel echoes in the ovaries on transvaginal sonography (TVS) (14). Tubal pathologies diagnosed by HSG were classified as no tubal occlusion, distal tubal occlusion, or proximal tubal occlusion (defined as contrast not shown beyond the isthmic portion of the tube) (15). For patients without HSG results, tubal testing findings at laparoscopy were used as alternatives.
The endpoints of the study were any-stage and stage III-IV endometriosis. Multivariable logistic regression analysis was used to select the best combination of variables that was independently associated with the diagnoses. Variables were selected by a backward, stepwise procedure. The P-values in the multivariable analysis were based on Wald tests. A P-value of 0.05 was considered significant. The final model equations were organized as nomograms designed to calculate patient-specific probabilities of any-stage or stage III-IV endometriosis. The models were applied to the validation sample and a subgroup of patients without endometrioma diagnosed on TVS. The discrimination of the models was assessed by AUC. A 95% confidence interval (95% CI) was calculated for the AUC. The differences in the AUCs between our models were compared using DeLong's test (18). The calibration of the models was assessed by calibration curves (19). We evaluated the P-value of unreliability statistic, average, and maximal errors between predictions and observations obtained from a calibration curve (19).
To better understand the difference between the two models, we built a model that included all of the variables (patient history and symptom, palpable nodularity, endometrioma diagnosed on TVS, and tubal pathology), and investigated how the predictive performance changed when each variable was removed. Additionally, a model with endometrioma diagnosed only on TVS was also built. The predictive performance of each model was determined by AUCs, which were compared using DeLong's test. Comparisons of discrimination and reclassification performance between the models were evaluated by calculating the integrated discrimination improvement (IDI) (20).

RESULTS
Overall, 1,111 patients met the inclusion criteria, however, 49 patients were excluded due to a previous surgical diagnosis of endometriosis, and 45 patients were excluded due to previous laparoscopic and hysteroscopic investigations. Of the remaining 1,016 patients, 443 patients (43.6%) did not have endometriosis, and 573 patients (56.4%) had visual endometriosis.
The training sample included 667 randomly selected patients: 300 patients without endometriosis, 245 patients with stage I-II endometriosis, and 132 patients with stage III-IV endometriosis. The validation sample included 339 patients: 142 patients without endometriosis, 132 patients with stage I-II endometriosis, and 64 patients with stage III-IV endometriosis. Patient characteristics are summarized in Table 1. There were no significant differences in the patient characteristics between the groups.
Six variables were included in the model for any-stage endometriosis after a backward, stepwise selection procedure: BMI, parity, cycle length, palpable nodularity, endometrioma diagnosed on TVS, and tubal pathology. In the model for endometriosis stage III-IV, three variables were selected: pain, palpable nodularity, endometrioma diagnosed on TVS. The nomograms for both models are reported in Figure 2, and the results of each model tested are shown in Table 3.
In the training sample, the nomogram for any-stage endometriosis had an AUC of 0.780 [95% confidence interval [CI], 0.746-0.814] in the overall population and 0.726 (95% CI, 0.686-0.766) in the subgroup that was negative for endometrioma diagnosed on TVS. The calibration was good with no significant maximal and average differences between the predicted probabilities and the observed frequencies. In the validation sample, the nomogram for any-stage endometriosis has an AUC of 0.750 (95% CI, 0.699-0.801) in the overall population and 0.694 (95% CI, 0.635-0.753) in the subgroup. The calibration was acceptable with average and maximal errors of 8.3 and 8.6%, respectively in the overall populations, and 6.0 and 6.4%, respectively, in the subgroups. The ROC curve and calibration plot are given in Figure 2.
In the overall population, the AUCs of the nomogram for endometriosis stage III-IV are 0.833 (95% CI, 0.789-0.877) in the training sample and 0.793 (95% CI, 0.725-0.860) in the validation sample. The nomograms were well-calibrated. However, the AUCs and calibration of the nomogram for endometriosis stage   III-IV in the subset were unsatisfactory. The ROC curve and calibration plots are given in Figure 2.
The differences in the predictive performances of each model when the variables were changed are shown in Table 4

DISCUSSION
We developed nomograms to predict any-stage and stage III-IV endometriosis in infertile women and validated the performances of nomograms in all participants and in a subgroup without endometrioma on TVS. Additionally, the effects of variables on the predictive ability of these models were evaluated.
The prevalence of endometriosis in our study population and the characteristics of patients with endometriosis were mostly consistent with previous reports. Endometriosis was present in 56.4% of our patients, which is slightly higher than previously reported (3). This may be due to the fact that the patients undergoing laparoscopy were not randomly selected from an infertile population (5), and that the diagnosis of endometriosis in this study was based on visual evidence alone. As previously reported, endometriosis is inversely associated with BMI (3,16,21) and the length of cycles (22). The incidence is higher in primary infertile and nulliparous women (5). However, patients with stage III-IV endometriosis had a longer average menstruation in our study, which is not consistent with previous reports (17). Moreover, we did not detect an increased risk of endometriosis in patients with a later menarche, or a short duration of infertility (17). Pain, despite being defined differently from the previous definition, was found to be an important predictor in our study (23). A study showed that the incidence rate of endometrioma found by TVS was reported to be 3/73 in infertile women without endometriosis and 1/44 in women with stage I-II endometriosis, which is consistent with our results (9). Tubal pathology was originally included as a predictor in our study. Tubal abnormality is one of important reason for infertility, and can be caused by infections, previous surgery, or endometriosis. Approximately 50% of patients in the non-endometriosis group had tubal abnormalities which is much higher than that in the endometriosis group, although endometriosis was also reported as a risk for tuboperitoneal pathology (24). This is probably due to HSG is useful in ruling out tubal blockage but has limited diagnostic value for the peritubal adhesions that are often found in endometriosis (25,26). Any-stage and stage III-IV models showed differences with regard to predictive factors and predictive performance in different populations. Six variables were included in the model for any-stage endometriosis, and three variables were included in the model for stage III-IV endometriosis ( Table 3). Our model for predicting any-stage endometriosis has good discrimination, with an AUC of 0.780 in the training sample and 0.750 in the validation sample. The model developed to predict stage III-IV endometriosis also has good discrimination, with an AUC of 0.833 in the training sample and 0.793 in the validation sample. These models were well-calibrated, with no significant differences between the predicted and the observed probabilities. To further eliminate the predictive performance of the nomograms, a subgroup of patients without endometrioma diagnosed on TVS, which comprised of approximately 90% of patients, was specifically analyzed. This is because these patients were less likely to be considered for endometriosis before surgery. However, the predictive performance of both models decreased profoundly in the subgroup. The nomogram predicting any-stage endometriosis in the subgroup had an AUC of 0.726 in the training sample and 0.694 in the validation sample, with good calibration. The performance was just fair. The nomogram predicting stage III-IV performance was poor in the subgroup.
To better understand the impact of different variables and different performances of predictive models on different populations, variables were removed from each model one at a time and the resulting models were analyzed. The models built by the variable of endometrioma diagnosed on TVS were analyzed separately ( Table 4). When removing patients' history and symptoms, the discriminations of full models for anystage and stage III-IV endometriosis were not significantly affected in the validation sample. However, we noticed that the calibrations were profoundly affected which indicated its potential contribution to the accuracy of the models. There are numerous predictive models aiming to predict endometriosis based on the patients' history and symptoms. However, their performances were either fair (5,9), or the model included a history of benign ovarian cysts and surgery/consultation for ovarian cyst as important variables that need to be diagnosed by imaging (27). Comparatively, removing palpable nodularity as a variable did not affect the model significantly. It is likely due to the fact that palpable nodularity on PV is not sensitive  for predicting endometriosis, and detecting pouch of Douglas (POD) obliteration and deep infiltrating endometriosis (DIE) of the rectum is not standard on our ultrasound reports. It has been reported that the sensitivity of POD obliteration on PV is 70%, but can be improved to 87% when combined with TVS (13). The POD obliteration and DIE of the rectum were also present in 1/4 and 1/10 of the cases with endometriosis without endometrioma, respectively (28). Thus, detection of POD obliteration and DIE of the rectum with TVS can potentially significantly improve the performance of predictive models on the subgroup without endometrioma. Eliminating endometrioma profoundly worsened the discrimination and calibration in the models which were developed to predict any stage and stage III-IV endometriosis. Endometrioma diagnosed on TVS is reported to have good sensitivity and excellent specificity (13,29), and stage III-IV endometriosis is reliably predicted using endometrioma on TVS only. Tubal pathology diagnosed on HSG is a variable specific to infertility. The variable strongly affected the discrimination and calibration of the predictive model for any stage endometriosis. However, HSG had little effects on predicting stage III-IV endometriosis. This is likely due to the fact that this divergence was neutralized by including stage I-II endometriosis as controls and because stage III-IV endometriosis typically has more tubal abnormalities. This study proved the importance of endometrioma on TVS as a variable in predicting endometriosis, and speculated that the detection of POD obliteration and DIE of the rectum with TVS is a promising way to improve the predictive ability of models for its accuracy. Clinical diagnosis of endometriosis is vital as it may reduce the delay in time to diagnosis. However, it is inconsistent, and, currently, there is no common standard diagnostic protocol (6). Patients' history and symptoms, physical examinations, and images are identified in clinics to preoperatively diagnose endometriosis (6), but the role of these variables in predictive models are seldom analyzed. We demonstrated these variables' impacts on models predicting any-stage and stage III-IV models in our study and stressed the importance of imaging in predicting endometriosis. The importance of imaging does not diminish the value of other variables in predicting endometriosis, which could modify models and improve predictive performance. As the POD and DIE of the rectum identified by TVS were not included as a variable in our study, we recommend that this study be used as encouragement to advance the utilization of advanced imaging for better clinical prognosis. It has been proven to be easily proficient in the diagnosis of POD and DIE of the rectum by sonographers who are familiar with the general use of TVS (30).
This study is not without limitations. First, the retrospective nature of this study cannot exclude all potential biases. The variables included in this study were recorded in a consistent manner in all of the patients to reduce bias caused by unrecorded information, which could result in the bias that some important variables might have been missed in our study. Second, infertile patients undergoing laparoscopy is a unique patient population which limits the generalizability of these results. The patients included may have had a propensity to undergo surgery, increasing their pre-test probability of endometriosis. Patients that underwent IVF instead of surgery might distort this result. Third, the POD and DIE of the rectum identified by TVS were not included as variables, which, if included, could have improved the performance of the models. Fourth, in patients without tubal pathology diagnosed by HSG, tubal testing findings at laparoscopy were used instead. Fifth, an external validation of our predictive models is required.
In conclusion, we developed two nomograms that can predict any-stage and stage III-IV endometriosis in infertile women. These nomograms performed well in all participants, but the performance was significantly decreased in a subgroup without endometrioma. Imaging has been proven to be important for predicting endometriosis. Therefore, we recommend that this study be used as encouragement to advance the utilization of advanced imaging to better diagnose and predict endometriosis.

DATA AVAILABILITY STATEMENT
The datasets presented in this article are not readily available because all the clinic information is generated from the HIS system of our hospital. It is patients' privacy and we should treat them confidentially. Requests to access the datasets should be directed to yuqi2008001@sina.com.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Ethics Committee of Peking Union Medical College Hospital. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
ZG and QY designed the study. ZG, XC, and RT acquired the data. ZG and PF performed the statistical analyses. ZG wrote the manuscript and submitted the manuscript. QY revised the paper. All authors approved the final version.