Prediction of Long-Term Recovery From Disability Using Hemoglobin-Based Models: Results From a Cohort of 1,392 Patients Undergoing Spine Surgery

Hemoglobin and its associated blood values are important laboratory biomarkers that mirror the strength of constitution of patients undergoing spine surgery. Along with the clinical determinants available during the preadmission visit, it is important to explore their potential for predicting clinical success from the patient's perspective in order to make the pre-admission visit more patient-centered. We analyzed data from 1,392 patients with spine deformity, disc disease, or spondylolisthesis enrolled between 2016 and 2019 in our institutional Spine Registry. Patient-reported outcome measure at 17 months after surgery was referred to the Oswestry disability index. High preoperative hemoglobin was found to be the strongest biochemical determinant of clinical success along with high red blood cells count, while low baseline disability, prolonged hospitalization, and long surgical times were associated with poor recovery. The neural network model of these predictors showed a fair diagnostic performance, having an area under the curve of 0.726 and a sensitivity of 86.79%. However, the specificity of the model was 15.15%, thus providing to be unreliable in forecasting poor patient-reported outcomes. In conclusion, preoperative hemoglobin may be one of the key biomarkers on which to build appropriate predictive models of long-term recovery after spine surgery, but it is necessary to include multidimensional variables in the models to increase the reliability at the patient's level.


INTRODUCTION
Spinal disorders are common and the prevalence increases in the aging population (1). In view of the recent advancements in spine treatments, surgery is considered clinically effective from a medical perspective. However, healthcare systems are progressively moving toward value-based business models (2), with the definition of clinical success increasingly based on patient-reported outcome measures (PROMs) (3). The Oswestry disability index (ODI) is a widely-used, validated, and selfadministered questionnaire to assess a patient's functional impairment related to the spinal condition, encompassing questions about personal care, movements, sleeping, and social life (4), and can be used to evaluate the outcomes from the patient's perspective (5). The advent of PROMs calls for more patient-centric healthcare, which outlines the need for early preoperative pathways. Identifying the determinants that affect the individual's daily activities in the long term can help maintain high-quality standards (6). One of the parameters known to mirror the functional reservoirs required for proper recovery is hemoglobin (7)(8)(9). An abnormal circulating level before surgery is considered a risk factor for poor medical and surgical outcomes in spinal patients (10), also being a predictor of mortality in the most severe conditions (11). Hemoglobin is an assembly of four globular polypeptide chains that fill the warp of red blood cells, carrying up to four oxygen molecules attached to iron atoms. Together with the erythrocytes and their volume over total blood (i.e., hematocrit), hemoglobin reflects oxygen carrying capacity, functional iron levels, and correct erythropoiesis (12). Hemoglobin concentration is one of the benchmarks for planning transfusion therapy strategy (13) along with other laboratory parameters such as hematocrit (14). Therefore, preoperative iron optimization is considered a key aspect of patient blood management (15) and enhanced recovery after surgery (16). To the authors' knowledge, there are no studies in spine surgery that have investigated the potential of preoperative hemoglobin in predicting clinical success from a patient's point of view. We studied a large cohort of patients undergoing spine surgery for deformities, disc disease, and other back conditions to identify predictors of long-term functional status using hemoglobin-based models.

Study Population
The research included patients enrolled in the institutional Spine Registry (SpineReg; ClinicalTrials.gov number: NCT03644407), which is a prospective observational registry recruiting patients undergoing spine surgery incepted in 2015 by our hospital IRCCS Orthopedic Institute Galeazzi of Milan (Italy). To answer the research question, we extracted from the registry the patients who met the following eligibility criteria: ≥18 years of age, enrolment between 2016 and 2019, the presence of at least one ODI assessment at 6-, 12-, or 24-month follow-up. The extraction excluded data of patients with a diagnosis of tumors, admission for complications, and surgical procedures involving the cervical spine. A 30% reduction from baseline was considered as the minimum clinically important difference (MCID) in the ODI score in order to categorize the outcomes (17). A secondary analysis was planned using the raw reduction of 12.7 points as classification threshold, which is more restrictive in categorizing successful surgeries (18). Patients who had a preoperative ODI < 12.7 were excluded from the extraction query. After data extraction and integration with routine parameters, the study sample comprised 1,392 patients with seventeen variables: gender, age, red blood cells (RBCs), hematocrit (Ht), serum concentration of hemoglobin (Hb), mean corpuscular volume (MCV), mean corpuscular hemoglobin (MCH), mean corpuscular hemoglobin concentration (MCH), C-reactive protein (CRP), type of diagnosis, American society of anesthesiologists physical status classification system (ASA), duration of surgery (DS), length of hospital stay (LOS), preoperative ODI (PREOP ODI), 6-month ODI, 12-month ODI, and 24-month ODI. The number of missing values counted 85 CRP, 14 DS, 555 6-month ODI, 422 12-month ODI, and 731 24-month ODI.

Data Handling
The simple imputation technique of the last observation carried forward (LOCF) was used to include patients with incomplete 24month ODI, with the last reported scores being used in place of the missing values. The months of follow-up were then weighed on the imputations, thus obtaining the ODI scores at 17 months (661 24-month ODI + 576 12-month ODI + 155 6-month ODI). In Table 1 is reported the structure of the dataset. Subsequently, three new variables were imputed. The first variable was the difference between the baseline ODI and the last ODI at 17 months ( ODI PRE vs. last 17). The second variable was the presence or absence (1,0) of clinical improvement at 17 months (17-month progress 30%), calculated as a 30% decrease from baseline ODI to the last ODI at 17 months. The third variable was the presence or absence (1,0) of clinical improvement at 17 months (17-month progress 12.7), calculated as a raw reduction in the ODI score≥12.7 from baseline ODI to the last ODI at 17 months. Two new sets classified the sample by gender code (males = 0, females = 1) and diagnosis (spine deformities: 4, disc diseases: 5; back surgeries: 6). Specifically, the group of spine deformities included kyphosis and scoliosis, and the group of back surgeries included spondylosis, spondylolisthesis, stenosis, and elective treatment of fractures. The dataset was explored for what concerned the presence of outliers among the primary variables, and a new dataset for regression analysis was created after the elimination of outliers. The following number of outliers were excluded based on the interquartile range rule of three: 4 RBCs, 27 MCV, 36 MCH, 2 MCHC, 96 CRP, 2 DS, and 21 LOS.

Statistics
The IBM SPSS 22 statistics package was used for all statistical analysis. The descriptive variables were reported as means, standard deviation, minimum and maximum values to be reported in Tables 2, 3 (baseline examination), regardless of the type of distribution. In Table 4 (outcome exploration) it was planned to report the most significant biochemical descriptors against the outcomes. Based on the results from the Shapiro-Wilk test on the dataset without outliers, Ht (p = 0.989), Hb (p = 0.281), and MCHC (p = 0.078) were assumed to have normal distribution. Age (p < 0.05), RBCs (p < 0.05), MCV (p < 0.05), MCH (p < 0.05), CRP (p < 0.05), DS (p < 0.05), LOS (p < 0.05), and baseline ODI (p < 0.05) were assumed to be skewed. The newly created dataset without outliers was used for running descriptive statistics, whereas the raw dataset was planned to be used for inferential statistics. The existence of a difference in the biochemical parameters between males and females was planned to be investigated through the independent t-test or the Mann-Whitney U-test for normally or not normally distributed values, respectively, and controlling for the homogeneity of variance by Levene's test for equality of variances (adjusted degree of freedom). The existence, strength, and direction of the association between the biochemical parameters and the demographic variable of age were examined using the Pearson product-moment correlation for continuous normally distributed variables or the Spearman rank-order correlation coefficient for skewed data. Regardless of data distribution, blood values and years of age were planned to be reported in scatter plots against the baseline ODI together with the Pearson's correlation and linear regression coefficients (unstandardized B). The difference between males and females in baseline ODI was investigated likewise through the Mann-Whitney U-test. Concerning outcome exploration, the Chi-Square test was used to investigate the differences in gender between outcome groups at 17 months using both 30% reduction and the raw 12.7 points reduction. Similarly, it was planned the Mann-Whitney Utest for the years of age and the biochemical parameters, taking into account the result of Levene's test for equality of variances based on median (adjusted degree of freedom). The Chi-Square test was also used to investigate the different outcomes according to ASA, diagnosis, DS, and LOS. Subsequently, the probe of two prediction regression models was planned; the first model (PRE BIO ) would be based on the seven preoperative biochemical markers, while the second model (PERI BIODEMCLI ) would have included the biochemical, the demographic (gender and age), and the clinical (diagnosis, ASA, baseline ODI, DS, and LOS) variables. The prediction potential of each of the two models on the last ODI was explored through multiple regression analysis on the dataset without outliers. The models were planned to be based on the stepwise method to serially add the next strongest predictor feasibly removing the previously entered predictor not significant. The most significant predictors would be chosen to report the predictive equations. The Wilcoxon signed-rank test was run to match the regression predicted values from the estimated regression equation with the real values of the dataset. The main predictors of each model were tested for the clinically significant outcomes (1,0) by using binary logistic regression (enter method). Neural Networks (NN) analysis was planned to observe the forecasting outcome as a function of each variablespecific model: PRE BIO (NN) and PERI BIODEMCLI (NN). Given the non-linear nature of this tool, the authentications between inputs and outputs have been run on the raw dataset through the supervised learning technique of Multilayer Perceptron (MLP) procedure to produce a predictive model for clinical success based on the values of the demographic, biochemical, and clinical predictors. After data whitening (the distributions were rescaled so that the mean was zero and the standard deviation was one), the training sample was set at 70%, the testing sample to track prediction at 20%, and the holdout sample to assess the final  NN at 10%. The NN architecture was planned to be based on two hidden layers with Sigmoid activation function (realvalued arguments are transformed to the range 0, 1) and on Softmax activation for the output layer (the vector of real-valued arguments is transformed to a vector whose elements fall in the range 0, 1 and sum to 1). The Receiver Operating Characteristic (ROC) curve evaluated the Area Under the Curve (AUC), the sensitivity (true positive outcome), specificity (true negative outcome), and 1-specificity (false positive rate) of model-specific predictors in three successive run of NN, with the normalized importance of the independent predictors independent being reported for the most significant multivariate model.

Preoperative Examination
The demographic and preoperative biochemical variables are reported in   Table 3, indicating data for each of the three clusters of spine diagnosis.
The association of sex, the years of age, and the biochemical markers with the ODI score at baseline are reported in Figure 1.

Exploration of the Outcome
In the whole cohort, 1,022 patients recovered at least 30% from baseline ODI at 17 months, while the reduction of 12.7 points counted 1,019 successful outcomes. The recovery from disability considering 30% reduction showed an association with gender [χ (1) Table 5 are reported the percentages of clinical success based on ranges of RBCs and Hb. There were expected 96.5% of true positives and 92.7% of false positives after setting a value of RBCs to 4 10 6 /µl. Similarly, setting a value of Hb to 12 g/dl exhibited about 93.8% of the positive outcomes correctly classified as positive, but 88.9% of the negative outcomes incorrectly specified as positive.
Considering RBCs, Hb, the two demographic variables, and the five clinical variables, the PERI BIODEMCLI (NN) with 30% reduction showed 29.46, 22.63, 25.71% of incorrect predictions in the holdout phase. The model resulted to be a fair diagnostic instrument (first run AUC = 0.628; second run AUC = 0.632; third run AUC = 0.626), with normalized importance of 100.00% given by age in the first and third run and LOS in the second run. Using the 17-month progress of −12.7 from baseline ODI score, the odds were confirmed for age (p = 0.002), ASA (p = 0.154), and LOS (p = 0.001), but Hb was no more a significant predictor (p = 0.088) whereas baseline ODI significantly predicted the outcome (p < 0.00001). The logistic regression model correctly classified 74.91% of cases and was able to explain 10.81% of the variance in the clinical outcome. The PERI BIODEMCLI (NN) with 12.7 point reduction showed 34.69, 30.22, 29.17% of incorrect predictions in the holdout phase. The model resulted to be a fair diagnostic instrument (first run AUC = 0.720; second run AUC = 0.726; third run AUC = 0.723), with normalized importance of 100.00% given by baseline ODI.

DISCUSSION
It is foreseeable that the workload of spine surgery centers will intensify in the coming years as the population is getting older and debilitating polymorbid conditions are becoming not uncommon (19,20). Technological advances in spine treatments help maintain short-term patient satisfaction high (21). However, it is necessary to plan patient-centered care pathways to achieve long-term results, thus revising the determinants of clinical success that simultaneously capture the perspective of the surgeon, the anesthesiologist, and the patient. In the present study, we analyzed the predictive potential of the preoperative biochemical markers on the ODI score at 17 months after surgery in patients enrolled in the institutional SpineReg of IRCCS Orthopedic Institute Galeazzi. The study cohort involved 1,392 patients undergoing surgery for deformity, disc disease, or other back spine disorders, and consisted of a majority of female older adults ( Table 2). In absolute terms, an improvement in disability at the last follow-up was observed in over 88% of patients. Considering the more restrictive MCID of the ODI, about 73% of patients reported a successful recovery. Similar rates have already been observed in spine patients (5). There were no differences in recovery between males and females, but individuals who did not experience a clinical improvement were older at the time of surgery, had higher ASA, and lower ODI. Equally, these trends based on clinical determinants are in line with previous studies (22). Patients with spinal deformities experienced lower recovery rates than the other clusters of diagnosis, conceivably due to the greater surgical complexity that requires longer operative times and prolonged hospitalization ( Table 3). Analyses of laboratory values confirmed that males generally have higher levels of RBCs and Hb than females and that there is a significant depletion with increasing age, feasibly mirroring iron supply discrepancies common in older adults. Similarly, the positive association of MCV and MCH with age would suggest an etiology from cobalamin or folate deficiency, which are known to cause macrocytic anemia in older individuals with poor strength of constitution (23,24). This consideration was corroborated by increased disability and inflammation found in older patients (Figure 1). Based on available laboratory parameters, predictive modeling demonstrated that RBCs and Hb levels prior to surgery were the strongest determinants of clinical success at 17 months in all types of spine surgery. In particular, the univariate linear model explains 0.5 of the postoperative change in disability, with each unit increase in RBCs being associated up to 1.539 times the probability of clinical success. However, the corresponding neural network models showed poor diagnostic performance, having an erroneous prediction rate of up to 34.75% and an AUC of 0.565, making it unreliable in terms of sensitivity and specificity. Furthermore, only slight reductions in disability scores (−17 to −31 for RBCs) could be predicted. The prediction accuracy for poor outcomes did not improve after setting lower blood values, failing to identify both highly successful outcomes and worsening observed in 140 patients at 17 months. Therefore, it can be reasonably argued that stratification of patients based on univariate cutoffs may not be recommended and that studying laboratory biomarkers as continuous variables might be preferable (25,26). In fact, the results in Table 5 showing a comparable trend between blood parameters and success rates give both RBCs and Hb a strong connotation that is also relevant for the patient. With the inclusion of clinical parameters, the variables in the multivariate linear models were able to explain ∼25.1% at 17 months. The crude contribution thus accounts for both worsening and notable improvements in respect to the previous univariate model. Although the equation was not still adequate in the prediction of postoperative recovery, the neural network observed values was set with a training sample at 70%, a testing sample to track prediction at 20%, and a holdout sample to assess the final model at 10%. The final percent correct shows a high performance in predicting the clinically successful outcomes, but poor reliability to forecast negative outcomes. ODI, Oswestry Disability Index, ranging from 0 (no disability) to 100 (maximum disability); Hb, hemoglobin; RBCs, red blood cell count; ASA, American Society of Anesthesiologists physical status classification system (1, healthy; 2, mild; 3, severe; 4, life threatening; 5, moribund; 6, brain-dead); DS, duration of surgery; LOS, length of hospital stay.
model at the last follow-ups showed the highest diagnostic performance even for the more restrictive MCID (AUC between 0.720 and 0.726 with up to 15.15% of correct prediction of negative outcomes), providing a decreasing order of importance of the preoperative determinants: ODI, LOS, DS, Hb, RBCs, age, ASA, diagnosis, gender (Figure 2). Thus, Hb seems to have a high predictive potential even greater than variables of demographic or clinical nature. The importance of preoperative Hb has already been studied in relation to complications in children, adults, and older adults undergoing spinal surgery (27-29), but it is unclear how it affects patients' long-term daily activities. It is plausible to think that the blood concentration reflects not only the strength of the patient's constitution (e.g., nutritional status) (8,9,30), but also the disease-specific weaknesses whose complications might consequently affect the daily activities of the patients (31). For example, there was found an inverse association between Hb levels and the number of patients reporting fatigue and shortness of breath (32). Whatever the connection, it is undeniable the recognition of the predictive potential that Hb has in the many surgical fields (33,34). This study has limitations. Although patients admitted for complications were excluded from this research, information on intraoperative (e.g., transfusion-associated complications) or postoperative events that did not require access in our hospital was not accessible, thus possibly explaining the inability of the models to predict worsening. Furthermore, the study cohort might not represent the population of patients undergoing spinal surgery in our hospital, being indicative only of those who have agreed to participate in the registry over the years. While the completeness of the registry was satisfactory, missing data at predefined followups reached 40% and may have provided some bias to the results. However, the models at 17 months were built on the scores at the last controls, thus making the results consistent (35). Lastly, although they can be estimated on the basis of surgical plan, both operative times and days of hospitalization are information available only after the intervention, which could undermine the preoperative nature of models.
In conclusion, our study sheds light on the role of preoperative Hb and RBCs in predicting long-term recovery reported by patients. Based on this research, values of RBCs < 4 10 6 /µl and of Hb < 12 g/dl in both genders may be associated with excessive rates of long-term failure after spine surgery from a patient's perspective. However, the model is not reliable in its current form and should be integrated with multidimensional variables of demographic, laboratory, and clinical nature to investigate further recovery determinants, such as body weight (36), the psychological distress (37), or the propensity for postoperative movement (38) and social participation (39). The ideal predictive model should have both high sensitivity and low false-positive rates. This is especially relevant when the consequence of not identifying patients at risk for negative outcomes could affect long-term daily activities. The performance of predictive models also varies according to the extent of recovery considered clinically relevant (17,18), a concept that places the need to involve the patient in planning the treatment path, thus making the pre-admission visit more patient-centered. In the future, the correct stratification of individuals at risk will ensure opportunities to optimize patient's health in time for surgery, more affordable clinical care, and greater patient's satisfaction (16,40,41).

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author/s.

ETHICS STATEMENT
Ethical review and approval was not required for the study on human participants in accordance with the local legislation and institutional requirements. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.

AUTHOR CONTRIBUTIONS
MB, PP, FL, TC, and ED conceived and designed the research. MB, FL, TC, and ED collected the data and managed the database. MB analyzed the data and wrote the first draft of the manuscript. PP, FL, TC, ED, PR, MP, LS, RB, MB-B, GB, and PB revised the first draft and contributed to the manuscript sections. GB and PB supervised the study. All authors contributed to the manuscript revision and approved the submitted version.