Machine-Learning Prediction of Postoperative Pituitary Hormonal Outcomes in Nonfunctioning Pituitary Adenomas: A Multicenter Study

Objective No accurate predictive models were identified for hormonal prognosis in non-functioning pituitary adenoma (NFPA). This study aimed to develop machine learning (ML) models to facilitate the prognostic assessment of pituitary hormonal outcomes after surgery. Methods A total of 215 male patients with NFPA, who underwent surgery in four medical centers from 2015 to 2021, were retrospectively reviewed. The data were pooled after heterogeneity assessment, and they were randomly divided into training and testing sets (172:43). Six ML models and logistic regression models were developed using six anterior pituitary hormones. Results Only thyroid-stimulating hormone (p < 0.001), follicle-stimulating hormone (p < 0.001), and prolactin (PRL; p < 0.001) decreased significantly following surgery, whereas growth hormone (GH) (p < 0.001) increased significantly. The postoperative GH (p = 0.07) levels were slightly higher in patients with gross total resection, but the PRL (p = 0.03) level was significantly lower than that in patients with subtotal resection. The optimal model achieved area-under-the-receiver-operating-characteristic-curve values of 0.82, 0.74, and 0.85 in predicting hormonal hypofunction, new deficiency, and hormonal recovery following surgery, respectively. According to feature importance analyses, the preoperative levels of the same type and other hormones were all important in predicting postoperative individual hormonal hypofunction. Conclusion Fluctuation in anterior pituitary hormones varies with increases and decreases because of transsphenoidal surgery. The ML models could accurately predict postoperative pituitary outcomes based on preoperative anterior pituitary hormones in NFPA.

Adenohypophysis hypofunction could cause severe hormonal disorders, which are responsible for decreasing the quality of life in patients. Two recent meta-analyses have revealed excess mortality in patients suffering from hypopituitarism (8,9). Perioperative management of hormones is an essential part of treatment for patients undergoing pituitary surgery. Therefore, anticipating and monitoring hypopituitarism and adjusting therapeutic strategies are necessary to minimize the effect of hypopituitarism on prognosis.
Multicenter studies of perioperative anterior pituitary hormones in NFPAs are limited. In the present study, the clinical characteristics of NFPAs in four medical centers were retrospectively reviewed. The interrelations between preoperative anterior pituitary hormones and postoperative secretion of adenohypophysis were analyzed. Machine learning (ML) models were developed to predict the secretory capacity of the anterior pituitary. Feature importance analysis was also performed for model explanation. This study aimed to investigate the perioperative hormonal characteristics of the anterior pituitary gland and provide a framework for ML models to predict postoperative hormone deficiency in the adenohypophysis.

MATERIALS AND METHODS
A collected database (CPDRN) was used as a screening tool in the cohort. Clinical data were obtained from Fuzhou General Hospital (FGH), Peking Union Medical College Hospital (PUMCH), the First Affiliated Hospital of Fujian Medical University (FAHFMU), and Beijing Tiantan Hospital (BTH). This study was approved by the Institutional Review Boards of all the medical centers. As it was a retrospective study, the patients did not sign an informed consent.
The criteria for admission were as follows:

Evaluation of the Secretory Capacity of Each Anterior Pituitary Hormone
All six anterior pituitary hormones (GH, TSH, FSH, LH, PRL, and ACTH) were monitored based on the serum hormone levels to reflect the secretory capacity of the adenohypophysis. The patients received a neuroendocrine evaluation and re-evaluation within 7 days before and after surgery. However, the random GH level lower limit for adults was 0 mg/L, and confirmatory stimulation tests were not routinely conducted to diagnose GH deficiency in all centers. Thus, the deficiencies of the remaining five hormones were evaluated. Pituitary hormonal deficiency (PHD) was defined as one or more inappropriately low anterior pituitary hormones. New deficiency was defined as at least one normal hormone secreted before the present postoperative deficiency. Recovered deficiency was defined as at least one of the preoperative deficits that returned to normal function following surgery (6). Although details of the hormonal tests were not standardized, the hormonal reference ranges of the four medical centers were similar, following the conversion of the unit. Furthermore, the heterogeneity across hormonal findings was limited (Supplementary Material Figures 1 and 2). Consequently, the multicenter hormonal outcomes were pooled to facilitate the training of the predictive models. The principal investigator from each center assessed the hormonal outcomes separately. The adjudicated results were forwarded to the coordinating site for further analysis.

Model Training and Explanation
Six common supervised ML models were developed to predict the postoperative secretory capacity of the adenohypophysis: decision tree (DT), random forest (RF), K-nearest neighbor (KNN), AdaBoost algorithm, support vector machine (SVM), and neural network (NN). In addition, the logistic regression (LR) model was trained as the reference frame for comparison with ML models. In the cohort, 215 male patients with NFPA underwent a fivefold division (172:43) of the randomized dataset. These dichotomous models for postoperative endocrine evaluation were trained using only the six anterior pituitary hormones representing the secretory capacity of the adenohypophysis. Hyperparameters are parameters that are not directly learned within estimators. They are passed as arguments to the constructor of the estimator classes. GridSearchCV provided by the Scikit-Learn library enabled efficient parameter search strategies. It could search for and evaluate all possible combinations from a grid of parameter values (10). Parameter search used the area under the receiver operating characteristic curve (AUC-ROC) of estimators to evaluate the parameter setting, following the convention that higher return values are better than lower return values. Finally, the parameter combination that outputs the maximum AUC-ROC value was obtained. The parameters were further optimized by manually performing fine-tuning parameter correction.
The abovementioned models were trained and tested using fivefold cross-validation. An evaluation was performed on one group by using the model built on 80% of the cases. Sensitivity, specificity, and AUC-ROC were provided. The algorithm with the highest AUC-ROC was considered as the optimal model.
The evaluation of feature importance on ML classification models facilitated the improvement of model interpretability.

Statistical Analyses and Software Application
Stata (version 19.0) was used to analyze the heterogeneity from different medical centers with the fixed-effect model. Q test (p < 0.10, indicating heterogeneity) and I 2 test (I 2 > 50%, indicating moderateto-high heterogeneity) were performed. Jupyter Notebook, an opensource project, served as a code execution tool (compatible with python and R environments) (11,12). Python (version 3.9) was run for model building and training. Various open-source ML libraries, including Scikit-Learn (version 0.24.2), were used (13). All data analyses and data visualization were processed in R (version 4.0.4) and python (version 3.9). Categorical variables were analyzed using the chi-square test. An independent sample t-test was used to compare the differences in hormonal levels. A two-sided p-value of less than 0.05 was considered significant.
Further details of the perioperative results are shown in Tables 2 and 3. The postoperative GH levels were significantly higher than the preoperative levels (p < 0.001), whereas the TSH (p < 0.001), FSH (p < 0.001), and PRL (p < 0.001) levels significantly decreased after surgery. Perioperative hormonal level fluctuations were not significant for LH (p = 0.96) and ACTH (p = 0.77). All patients underwent transsphenoidal surgical resection. A total of 151/200 (75.5) patients underwent gross total resection (GTR). Subtotal resection (STR) was observed in 24.5% (49/200) of patients. The GH (p = 0.07) levels were slightly higher in the GTR group than in the STR group after surgery. However, the PRL secretion levels were significantly lower in patients with GTR (p = 0.03). Postoperative LH (p = 0.25) and FSH (p = 0.41) presented an increase in tendency following GTR, with no significant differences.

Predictive Models for Postoperative Hypofunction
The six preoperative anterior pituitary hormones were considered as input features for the dichotomous models of postoperative hypopituitarism. The predictive capabilities were compared after training and fivefold cross-validation of the eight models. The performance details of each model are shown in Figure 1. The

Predictive Models for New Postoperative Hypofunction and Hormonal Recovery
The ML models were selected to predict new postoperative hypofunction and hormonal remission. GridSearchCV was reset for predictive models. The optimal model for the prediction of surgery-induced PHD revealed an AUC-ROC of 0.74 and an AUC-PR of 0.69. Hormonal recovery was also assessed using these models. The optimal model for postoperative recovery prediction had an AUC-ROC of 0.85 and an AUC-PR of 0.54. The LR models achieved AUC-ROC values of 0.59 and 0.68 in predicting new postoperative hypofunction and hormonal recovery, respectively. The ML models presented a higher reliable performance in predicting hormonal outcomes than the LR model.

Predictive Models for Individual Postoperative Hypopituitarism and Explanation
Predictive models for secondary PHD following surgery were reported in previous studies. However, these models could not support more informative details for individual hormonal outcomes. Thus, predictive models for postoperative uniaxial PHD were established and adjusted on the basis of the parameters recommended by GridSearchCV.
The ML models were constructed to analyze and compare the performance of preoperative hormones in predicting postoperative individual hypopituitarism (Figure 2 and Supplementary Figure 3). The results of model construction suggested that ML models could be adjusted to accurately predict different PHD compared with the LR model.
The feature importance of anterior pituitary hormones on the optimal model was evaluated to explain the predictive models. Feature importance indicated that the individual hormonal deficiency was related to the same hormonal axis and the other anterior pituitary hormones prior to surgery (Figure 2). Postoperative anterior pituitary hormonal deficiency was associated with multiple hormone secretion.

DISCUSSION
In previous studies, patients with large-volume, suprasellarextension, and invasive PAs likely have postoperative pituitary hypofunction (1,6,14). Surgery is the first-line treatment to decompress NFPA, but it is also regarded as a trigger to hormonal hypofunction because of intraoperative traction and normal pituitary tissue trauma (3,7,15,16). These extrinsic factors affect the anterior pituitary hormone secretion and induce hormonal fluctuation. However, few studies directly assessed the outcomes of the adenohypophysis in accordance with the change characteristics of the perioperative hormonal level in NFPAs. In these studies, male and female hormones were generally pooled for assessment (3,17,18). Significant differences in hormonal levels were found between men and women. Furthermore, the female hormonal levels presented significant differences in different physiological periods. The evaluation of the menstrual period in retrospective studies was dependent on patient-reported personal history, thereby lacking reliable and objective evidence. Errors might be found in the assessment of female hormones. High-quality data are necessary for an accurate and reliable clinical model. Consequently, only male patients with NFPA from four medical centers were included in this study to reflect the perioperative fluctuation of anterior pituitary hormone accurately and objectively.

Effects of Surgery on Hormonal Levels
Although transsphenoidal surgery could be invariably recommended for the decompression of NFPA on the pituitary gland, surgery could not treat tumor-induced hypopituitarism immediately. On the contrary, transsphenoidal surgery had a significant risk of sacrificing the remaining adenohypophysis function (14).  limitation of this study is that GHRP2 was performed at 2 weeks and 1-2 years after surgery, and the short-term GH dynamics were not analyzed. Increased GH hormone might be a reaction against surgical stress in major surgery, which could return to normal within seven days (19,20). The effects of resection goals on perioperative hormones were also evaluated in this study. The postoperative GH levels in GTR were slightly higher, but the PRL levels were significantly lower than those in STR. GTR presents a higher degree of intraoperative destruction than STR. Therefore, the GH levels in GTR cases are higher than those in STR against surgical stress. Increased PRL in NFPAs frequently occurs because of inhibitory dopamine transportation from the hypophyseal portal system (21,22). GTR could effectively relieve the compression of the pituitary stalk than STR. Thus, the postoperative PRL levels in GTR were slightly lower than those in STR.

ML Predictive Models for Postoperative Hypopituitarism
The causes offluctuation differences in anterior pituitary hormones remain uncertain. Hypertension, tumor diameter, invasion, surgical trauma, and residual tumor were considered predictors of pituitary hormonal prognosis. In clinical practice, partial hormonal deficiencies frequently occur in patients with NFPA compared with panhypopituitarism. In the current study, 80.9% and 89.9% of patients with hormonal deficits had hypofunction in one or two hormonal axes before and after surgery, respectively. Except for PRL, other anterior pituitary hormones could also be higher than the normal reference ranges in NFPAs. These extrinsic factors could not well explain the complicated hormonal fluctuation and support more detailed information. Hormonal levels could show the dynamic response of adenohypophysis function during the perioperative period in detail. Therefore, the postoperative hormonal results from preoperative anterior pituitary hormones were predicted and explained using ML models. ML is a modern predictive statistical method widely used to conduct prediction models in neurological studies, such as surgical remission of acromegaly, GTR of PAs, and cerebral spinal fluid leakage. The researchers were the first to train and explain the ML models of postoperative hypofunction of adenohypophysis and predict the postoperative outcomes of anterior pituitary secretion. In addition, the ML models were proven reliable to predict hormonal outcomes in NFPAs than the LR model.
In this multicenter study, measures to improve data quality were used for stable and reliable ML models, including only male patients to avoid hormonal heterogeneity from different sexes, heterogeneity analysis and combining data from the four centers, and crossvalidation. Furthermore, prediction models were constructed by only using the six hormones that adenohypophysis directly secretes, and they did not include extrinsic factors (tumor diameter, invasion, and suprasellar extension). After these noise variables were minimized, the preoperative secretory capacity of adenohypophysis could effectively predict the prognosis following transsphenoidal surgery. The optimal predictive model for postoperative hormonal deficits, new hormonal hypofunction, and hormonal recovery had AUC-ROC values of 0.82, 0.74, and 0.86, respectively, which were better than those in conventional statistics, such as the LR model.
Individual hormonal hypofunction was also analyzed and explained using ML models. In the prediction of individual hormonal deficits, the optimal ML models achieved AUC-ROC ranging from 0.81 to 0.91. The ML models could aid in predicting the secretory capacity of adenohypophysis and the prognosis of individual hormonal axes. The results of feature importance analysis implied that multiple relations were found among various anterior pituitary hormones. Individual   (17). Several limitations warrant further discussion. In the multicenter study, details of the hormonal tests were not standardized, including the testing reagents and testing methods. Although we pooled multicenter hormonal outcomes after heterogeneity assessment, bias in measurements at different centers remained. GH deficiency was not available in the cohort. If available, then it could increase the prediction accuracy. Peripheral hormones were not analyzed because this study focused on the secretory capacity of the adenohypophysis. In addition, this study could only reflect the characteristics of hormonal changes during the perioperative period. The longterm prognosis of adenohypophysis requires further study design for analysis. The current number of cases also remains limited. Building an accurate and reliable model require high-throughput data collection and inclusion of female patients with an accurate physiological period record. These limitations were attributed to the deficiencies of retrospective studies.

CONCLUSIONS
The characteristics of individual anterior pituitary hormone fluctuation were not consistent in the short term after NFPA resection. In addition to mass effects and surgical trauma, preoperative hormonal levels must be the focus of concern. Perioperative evaluation through hormonal levels could reflect and predict the secretory function of the adenohypophysis in NFPA. The ML model is a subtype of artificial intelligence, which could be integrated into clinical application as an NFPA prognosis prediction. The reliability and practicability of the prediction were also confirmed in this study. The combination of multiple models facilitated the selection of accurate predictive models and increased interpretability. The ML models could be an essential technical tool for future clinical prediction.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Institutional Review Board of Fuzhou General Hospital of Fujian Medical University and Peking Union Medical College Hospital. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
YF, HW, and MF contributed equally to the present study. All authors contributed to the article and approved the submitted version.