Mapping study of papillary thyroid carcinoma in China: Predicting EQ-5D-5L utility values from FACT-H&N

Objective To develop a mapping algorithm that can be used to predict EQ-5D-5L health utility scores from FACT-H&N and obtain health utility parameters for Chinese patients with papillary thyroid carcinoma (PTC), which can be used for cost-utility analysis in health economic. Methods A total of 1,050 patients with PTC from a tertiary hospital in China were included, and they completed FACT-H&N and EQ-5D-5L. Four mapping algorithms of direct mapping functions were used to derive the models: Ordinary least squares (OLS), Tobit model (Tobit), Two-part model (TPM), and Beta mixture regression model (Beta). The goodness-of-fit of models was assessed by the mean absolute error (MAE), root mean square error (RMSE), Akaike information criteria (AIC), Bayesian information criteria (BIC), and absolute error (AE). A fivefold cross-validation method was used to test the stability of the models. Results The mean utility value of the EQ-5D-5L was 0.870 ± 0.094. The mean EQ-VAS score was 76.5 ± 13.0. The Beta mixture regression model mapping FACT-H&N to EQ-5D-5L achieved the best performance [fivefold cross-validation MAE = 0.04612, RMSE = 0.06829, AIC = −2480.538, BIC = −2381.137, AE > 0.05 (%) = 32.48, AE > 0.1 (%) = 8.95]. The independent variables in this model were Physical Well-Being (PWB), Emotional Well-Being (EWB), Head & Neck Cancer Subscale (HNCS) scores and its square term and interaction term scores. Conclusions This study calculated the health utility score of Chinese patients with PTC. The reported algorithms can be used to map the FACT-H&N into the EQ-5D-5L, which can be applied in the cost-utility related study of patients with PTC.


. Introduction
Thyroid cancer is a most common head and neck cancer, with papillary thyroid carcinoma (PTC) accounting for approximately 90% in thyroid cancer (1). According to the global cancer statistics (2), the thyroid cancer is the ninth most malignant tumor. In China, the incidence and mortality of thyroid cancer continue to rise (3), affecting the patient's health-related quality of life (HRQoL) and the utilization of health resources. Health economic evaluation plays a key role in medical resource allocation and clinical decisionmaking, and its preferred method is cost-utility analysis (4).
In the cost-utility analysis of cancer treatment, the most important and commonly used health outcomes are qualityadjusted life years (QALYs) (5,6). Calculating QALYs requires measuring health utility values. The EuroQol five-dimensional 5 level (EQ-5D-5L) is currently the most widely used preferencebased health utility instrument at home and abroad that can be used to calculate health utility values (7,8). Functional Assessment of Cancer Therapy (FACT) is one of the most widely used instruments to measure health-related quality of life in cancer patients (9) and related mapping studies have covered breast cancer (10-13), ovarian cancer (14), colorectal cancer (15,16), lung cancer (15,17), prostate cancer (18,19) and others. However, FACT-H&N (Functional assessment questionnaire for the treatment of head and neck cancer) is a non-preference-based instrument and cannot be directly used for calculating health utility values. Meanwhile, in clinical experiments or work, the EQ-5D-5L is not universally measured, and non-preference-based instruments are more widely used. In such cases, it is common to use "mapping" to convert the available health status data from a non-preferencebased measure to utility values for a generic preference-based measure (20). The mapping methods include direct mapping and indirect mapping methods (21). In this study, we adopted direct mapping method.
Ordinary least squares (OLS), Tobit model (Tobit), and Twopart model (TPM) are the most common mapping models of direct mapping functions, and Beta mixture regression models have been developed and gradually applied in recent years. By reviewing the literature, we did not find mapping studies of the FACT-H&N in thyroid cancer. The purpose of this study was to develop a mapping algorithm from the FACT-H&N to the EQ-5D-5L based on the Chinese PTC population. . Materials and methods

. . Study subjects
The study was conducted from May to December 2021 at Sichuan Cancer Hospital, a large tertiary-grade oncology hospital that provides medical services to most cancer patients in southwestern China. The inclusion criteria for this study were as follows: (1) patients with PTC diagnosed by pathology; (2) aged between 18 and 80 years old; (3) clear thinking, normal spirit, and a certain ability to understand and express; and (4) willing to participate in this study and sign the "informed consent form". A total of 1,100 patients with PTC participated in the survey, and some patients were excluded due to missing values in their survey. Thus, 1,050 patients who completed the entire questionnaires were included in the data analysis. Three investigators who had received strict training participated in the investigation. The questionnaires included the general demographic data of the patients and instruments (FACT-H&N, EQ-5D-5L). The clinical treatment information of the patients was provided by the electronic medical record system of the hospital. After all questionnaires were completed, three investigators jointly checked whether there were any missing items so that we could contact the respondents.

. . Instruments . . . FACT-H&N
The FACT-H&N was designed at the Rush University Medical Center in Chicago, USA. The Chinese version of the FACT-H&N has good reliability and construct validity and can be used to determine the quality of life of Chinese patients with head and neck cancer (22-24). We contacted the research institution of the FACT-H&N; obtained the Chinese version and the scoring rules; and obtained the authorization of the instrument. The FACT-H&N investigates the situation of patients seven days before the day of investigation, and the specific subscale include Physical Well-being (PWB), Social/Family Well-being (SWB), Emotional Well-being (EWB), Functional Well-being (FWB) and Head & Neck Cancer Subscale (HNCS), with a total of 39 items. The scores of the items contained in each domain were summed to obtain the crude score of this domain, and the scores of these five domains were summed to obtain the total score, which ranged from 0 to 148, where a higher overall score indicates a better corresponding quality of life (25, 26).

. . . EQ-D-L
The EQ-5D-5L was developed by the European Quality of Life Group, which is a commonly used preference-based health utility instrument (27)(28)(29)(30). The instrument consists of a five-dimensional self-assessment and visual analogue instrument (EQ-VAS). Five dimensions include: mobility, self-care, daily activities, pain or discomfort, anxiety and depression, and each dimension is divided into no, slight, moderate, severe, extreme problems. The EQ-VAS is a instrument marked with numbers from zero to one hundred, with one hundred representing the best imaginary health condition and zero representing the worst imaginable health condition. The patients mark the instrument according to their perceived health condition on the day of the survey. We contacted the research and development institution of the EQ-5D-5L and obtained the instrument authorization and scoring rules. We used the Chinese population tariff to calculate the health utility score (31, 32), which can provide a valid reference for this value set. The Chinese tariff of the EQ-5D-5L was developed by the time trade-off (TTO) technique, which has a theoretical range of scores from −0.391 to 1.0 (33).
. . Statistical analysis . . . Descriptive statistical analysis Statistical measures such as percentage, mean, and standard deviation were used to describe the patient characteristics. In our study, EQ-5D-5L health utility data showed a skewed distribution; therefore, univariate analysis was performed using the rank sum test for patient characteristics to obtain factors affecting health utility, of which the Wilcoxon rank sum test was used for two-category data and the Kruskal-Wallis H-test was used for multicategory data. Multivariate analysis of factors influencing health utility values using Tobit regression.

. . . Correlations between instruments
Spearman's correlation coefficient was used to explore the correlation between the EQ-5D-5L and the FACT-H&N, if the correlation is poor, then the mapping function will perform poorly (34). The reference value for the strength of the relationship are as follows: 0-0.19 is regarded as very weak, 0.2-0.39 as weak, 0.40-0.59 as moderate, 0.6-0.79 as strong and 0.8-1 as very strong correlation (35).

. . . Establishment of the mapping model
The following four modeling approaches were used to predict EQ-5D-5L health utility values.OLS is the most widely used method in the mapping literature (36,37) and is a commonly used linear regression model. However, OLS can be affected by the ceiling effect, where a bias of underestimation with high values and overestimation with low values will occur; thus, the model is theoretically not fully applicable to the mapping of health utility values (38).
In some cases, Tobit model can mitigate ceiling effects in health utility measures. Tobit is a censored model that aims to estimate the linear relationship between variables when the dependent variable is left-censored or right-censored (39). Tobit is sensitive to violations of heteroskedasticity or non-normality (40).
The TPM model is considered an alternative estimation ways when analyzing skewed data, which reduces the bias in highly skewed distributions caused by the ceiling effect of the utility score (41). The first part of the TPM model uses logistic regression to estimate the probability that a patient is perfectly healthy, and the second part uses the previous OLS model to estimate a patient's  health utility score when not perfectly healthy and then combines the two models to obtain the final health utility value. Because mapping model's predictions can be outside the bounds, the typical linear regression models do not fit a bounded dependent variable and cannot manage extreme values (zero and one) on the bounds of that interval. Beta mixture regression models provide flexible means to regress outcome distributions with truncation support. The disadvantage of this model is that the irregularity of the distribution leads to inconsistent parameter estimates. The model generalizes to values that allow either or both boundaries by adding a degenerate distribution with a probability mass at the boundaries, which is called the truncated-inflated Beta model. The Beta model is flexible, where the first part analyses an incompletely healthy sample, and the second part uses the full sample and, based on the first part, estimates an inflated truncated mixed beta regression. The Beta model can fit models with and without truncation, as well as truncate models at the bottom or top of an interval range. It can predict health tools at various points, including negative values, observed peaks at full health or death, gap values between boundaries, and a mixture of the numeric components of the beta distribution. The command "truncation" in Stata software is used to determine whether there is a truncation in the model (42). Here we only consider the inflation part of the model at perfect health.
In our study, the mapping models of OLS, Tobit, TPM, and Beta were used to evaluate the following six different sets of independent variables: Mode l: FACT-H&N total score as the primary predictor of health utility score, such as OLS1, Tobit1, etc.
Model 2: Such as OLS2, Tobit2, etc. The independent variables of the models were the score of each subscale of the FACT-H&N, including PWB, SWB, EWB, FWB, HNCS.
Model 3: Such as OLS3, Tobit3, etc. The independent variables were the meaningful subscale scores in each dimension score of Model 2: PWB, EWB, HNCS, where P < 0.01 were considered significant.  Model 6: Such as OLS6, etc. The independent variables were those of Model 5 and age, gender.
The Beta model was explained here (see Table 3 for details): Beta 1a: the first model of Beta was the total score of FACT-H&N, and there was one component.
Beta 1b: the first model of Beta was the total score of FACT-H&N, and there were two components.
Beta 1c: the first model of Beta was the total score of FACT-H&N, and there were three components.

. . . Model validation and evaluation
A fivefold cross-validation method was used to test the predictive performance of the model, and all samples were randomly divided into two groups: 80% of the samples were used to estimate the dataset, 20% of the samples were used to validate the dataset, and the above four mapping algorithms were used to predict health utility values. The prediction procedure was repeated five times, and the mean absolute error (MAE), root mean square error (RMSE), Akaike information criteria (AIC), Bayesian information criteria (BIC), and absolute error (AE) of the nine better models were obtained. The smaller the values were, the better the model performance. Values from these indicators were ranked and summed to generate an average ranking (ARV). The model with the lowest ARV would be chosen as the optimal model (43,44). In addition, the performance of the model can also be visualized by plotting the scatter plot, error histogram, etc. All statistical analyses were performed in Stata version 16.0.

. . Patient characteristics and descriptive analysis
A total of 1,050 patients with PTC were included in this study. Women account for 76% of the sample size. The average age of the overall sample was 40.76 years old. Table 1 showed the patient characteristics. The mean utility score on the EQ-5D-5L was 0.870 (standard deviation 0.094), and the mean on the EQ-VAS was 76.5 (standard deviation 13.0). The mean of the FACT-H&N was 108.152 (standard deviation 15.478). The highest score in each subscale of the FACT-H&N was EWB, and the lowest score was FWB. Among the dimensions of the EQ-5D-5L, the dimension that accounted for the largest proportion was no difficulty in mobility (90%).   Table 2 Spearman rank correlation analysis showed that the total score of EQ-5D-5L was significantly positively correlated with the total score of FACT-H&N (Spearman correlation coefficient was 0.621). The total score of the EQ-5D-5L and the scores of each dimension of the FACT-H&N were correlated, and the correlation coefficient ranged from 0.140 to 0.626. The correlation coefficient between the total score of the FACT-H&N and the scores of each item of the EQ-5D-5L was between −0.497 and −0.243, and the . /fpubh. .
/fpubh. . (Beta1a, Beta1b, Beta1c, Beta2a, Beta3a, Beta4a, Beta5a, Beta6a without/with truncation point). The detailed performance of 32 models was shown in Table 3, and we performed an average ranking (ARV) of the performance indicators of each model. The regression coefficients of each model and covariance matrix of coefficients of preferred Beta model were shown in Supplementary Tables 1-5.
In each model, we selected the two best models (the two lowest ARV models) for final model screening. OLS5 and OLS6, Tobit5 and Tobit6, TPM2, TPM3, and TPM5, Beta 5a without truncation and Beta 2a with truncation models performed best in their respective mapping algorithms. Fivefold cross-validation was performed on the selected nine models, as shown in Table 4. After comprehensive consideration, the EQ-5D-5L utility scores was best predicted by the Beta 5a (without truncation point) consisting of PWB scores, EWB scores, HNCS scores and its square and interaction terms scores (there was no cutoff value, component was 1).
These nine models were used to estimate the predicted value of the EQ-5D-5L, and results were shown in Table 5, and their scatter plots and error histograms were plotted with these nine best models (Figures 2, 3). Based on the above results, OLS5, Tobit5, Tobit6, TPM5, and Beta5a (without truncation) were the best performing models in their respective mapping algorithms. The cumulative distribution function graph of these five models were shown below ( Figure 4). Overall, the predicted value of Beta 5a (without truncation) was closest to the observed value. The excel calculator for the best-fitting model was provided in the Supplementary Table 5, which make it easy for the user to calculate the EQ-5D-5L from FACT-H&N scores.

. Discussion
To knowledge, our study was the first to develop a mapping algorithm in PTC patients using the FACT-H&N and EQ-5D-5L and predict health utility values. We explored four different mapping methods with 32 models to find the most accurate predictive model to develop the mapping function. We mapped the FACT-H&N to the EQ-5D-5L using data from 1050 PTC patients in China.
EQ-5D itself has the nature of ceiling effect, resulting in utility scores that are not normally distributed (45). The EQ-5D-5L had lower rate in the ceiling effect than EQ-5D-3L (46, 47). In our study, a total of 101 (9.62%) respondents reported complete health, the ceiling effect of the EQ-5D-5L was not obvious, and the mean utility value of the EQ-5D-5L was 0.870 ± 0.094. Despite the mean EQ-5D-5L value was high, our ceiling effect was not evident, which may be due to the differences in health utility scores among thyroid patients with different status (surgery, Iodine 131, medicine treatment) in our study. It is worth noting that we also differed from other studies on EQ-5D-5L scores and ceiling effects. Compared with the other two breast cancer studies (48,49), our EQ-5D-5L was higher (the other two: 0.5627, 0.52), and the ceiling effect was higher (the other two: 5.84%, 3.85%). This may be because different countries use different versions of the EQ-5D-5L, leading to differences in health preferences between countries. It could also be due to different sample sizes, diseases, and male to female ratios.
We found that the correlation coefficient between FACT-H&N and EQ-5D-5L was 0.621, indicating that there was a strong correlation between the two instruments. The variables included in the best model Beta 5a (without truncation point) were PWB, EWB, HNCS subscales scores and their squared and interactive terms scores, and the model was relatively robust. According to the five-fold cross-validation of the model, the Beta model had the lowest ARV, indicating that it fit the data best. According to the predicted value obtained by the model, the predicted value of Beta 5a (without truncation point) was nearest the observed value; thus, the Beta model was selected as the optimal model. Compared with several other mapping studies that used the Beta mixture regression model (40,50,51), our MAE, RMSE, AIC, BIC, and AE values were lower, and sample size was larger, which demonstrated that the model in our study was effective and achieved a better prediction performance.
Through extensive literature searches, we found that linear regression was typically used to map a quality of life instrument to the EQ-5D-5L, and the most commonly used model was OLS (37,39,40,43,52). Actually, the Beta model was more flexible because it was suitable for a bounded dependent variable in the interval and can take values at the boundary (40). In the mapping-related literature, the Beta mixture regression model has been used more in recent years (40,50,51). A systematic review noted that the introduction of square terms and interaction terms of independent variables into mapping models can improve model performance (53, 54). In our study, the OLS5, Tobit5, Tobit6, TPM5, and Beta 5a (without truncation) performed best, indicating that the square and interaction terms do improve model performance. Many studies illustrated that sociodemographic variables were independent varibles that affect the quality of life and can improve model performance (43,52,55). In the results of the study, we found gender to be statistically significant (P < 0.05) and did not find age to be statistically significant in predicting EQ-5D-5L scores. The significance of age found in other studies could be related to the type of . /fpubh. . cancer or the age range of the populations (56). According to the ISPOR guidelines (57), for most of mapping functions, the inclusion of age as a covariate was required even if not statistically significant. A recent systemic review of mapping studies showed that age was included as a potential predictor in 51% of the reported algorithms and gender was included in 55% (9). So we addded age and gender to Model 6. Only Tobit6 proved this, indicating that the improvement was small in our study. The advantages of our study were as follows. First, the sample size was relatively large (n = 1,050). For other singledisease and single-center mapping-related studies, the sample size was primarily 200-300. Second, we used four mapping methods of direct mapping functions to develop a robust mapping model to predict the EQ-5D-5L health utility score in PTC patients, which can be used to assess the quality of life of PTC patients for use in health economics. Furthermore, to date, there were no published reports of the measurement of thyroid cancer health utility mapping models and related data on the Chinese population at home and abroad; thus, this study was important. Moreover, to improve the robustness of the model, this study performed five-fold cross-validation on the sample. However, our study had the following limitations. First, due to the high survival rate and good prognosis of PTC patients, this cancer is commonly known as "happy cancer", and the overall quality of life is better than that of other cancers. Respondents had generally high EQ-5D-5L scores, with only 24 (2.3%) having a utility value below 0.5, no negative utility value, and a small percentage of patients with poor quality of life, which may have affected the low-value prediction accuracy. Second, the sample size of the study was relatively large, but the survey was only conducted in one hospital and cannot be widely representative of all of China. Third, when examining the performance of mapping models, the lack of external validation in this study may limited the generalizability and extrapolation of its findings. Fourth, we used Chinese specific tariff, and whether the findings can be applied to other countries requires further research.
Despite these limitations, we demonstrated that the Beta mixture regression model performed much better than other models, and the mapping algorithm in the study could be used to predict the health utility value of the EQ-5D-5L in PTC patients.

Data availability statement
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics statement
The studies involving human participants were reviewed and approved by the Ethics Committee of Sichuan Cancer Hospital. The patients/participants provided their written informed consent to participate in this study.

Author contributions
Data collection and analysis were performed by JP, NC, and LJ. The first draft of the manuscript was written by QY and DH. All authors contributed to the article and approved the submitted version. . /fpubh. .