Derivation and Validation of a New Disease Activity Assessment Tool With Higher Accuracy for Takayasu Arteritis

Objective To develop a new disease activity assessment tool with high accuracy for Takayasu arteritis. Methods Individual items from National Institute of Health (NIH) criteria and the Indian Takayasu Clinical Activity Score (ITAS2010) were tested as candidate variables to develop a new disease activity assessment tool in a derivation cohort (N = 100). Physician global assessment on disease activity was used as the gold standard. Multivariable logistic regression models were constructed and the model with the highest accuracy was identified. A formula assessing disease activity was generated using simplified β coefficients (rounded to decimal place). Diagnostic performance was evaluated through estimating the area under the curve (AUC). The new assessment tool was subsequently validated in a validation cohort (N = 46). Results The multivariable model yielding the highest accuracy consisted of a high erythrocyte sedimentation rate (ESR), NIH criteria 1 and 4, and carotidynia. Using simplified β coefficients, the following disease activity assessment tool was developed: high ESR (3 points), NIH criterion 1 (2 points), NIH criterion 4 (4 points), and carotidynia (3 points) (total score ≥5, active; total score <5, inactive). The new disease activity assessment tool had a higher AUC (89.37) for discriminating active and inactive diseases than NIH criteria (AUC 77.96), ITAS2010 (AUC 66.12), ITAS-ESR (AUC 75.58), and ITAS-C-reactive protein (AUC 71.34). The AUC (85.23) of the new assessment tool was similar in the validation cohort. Conclusion A new disease activity assessment tool that consists of high ESR, NIH criteria 1 and 4, and carotidynia had higher accuracy in discriminating active and inactive disease than currently used clinical assessment tools.


INTRODUCTION
Takayasu arteritis (TAK) is a chronic inflammatory disease that causes granulomatous inflammation of the aorta and its major branches (1). Vessel inflammation results in irreversible structural damage, such as stenosis or aneurysm formation (2). For the treatment of TAK, interrupting active vessel inflammation before structural vessel damage occurs is crucial (3,4). For the timely interruption of active vessel inflammation, accurate assessment of disease activity is important. However, assessing disease activity in TAK is challenging as neither clinical symptoms nor laboratory data accurately reflect actual inflammation of the arterial wall (3,5,6).
Several clinical assessment tools for assessing disease activity in TAK have been developed. National Institute of Health (NIH) criteria, which consist of systemic symptoms, the erythrocyte sedimentation rate (ESR), vascular symptoms, and angiographic features, were the first tools used to assess disease activity in TAK (3). However, NIH criteria are suboptimal in detecting pathologically proven active disease (7). The Birmingham Vasculitis Activity Score (BVAS), which is a validated tool for assessing disease activity in small and medium vessel vasculitis (8), has also been used to assess disease activity in TAK (9). However, as the BVAS is a tool originally developed for small and medium vessel vasculitis, the value of the BVAS for assessing disease activity of TAK is limited as most of the 11 organ systems included in the BVAS are not affected in TAK (10). The disease extent index for TAK (DEI.Tak) is another assessment tool used for patients with TAK, which was created using the BVAS as a template (11). The DEI.Tak included rarely used items while not taking into account acute phase reactants or imaging findings (11), and it has not been widely accepted (12). The most recently developed disease activity assessment tool is the Indian Takayasu Clinical Activity Score (ITAS2010), which is derived from the DEI.Tak (13). It scores clinical features newly developed in the previous three months, with an additional version that includes acute-phase reactants (ITAS-A) (13). Although the ITAS scoring system has been found to be discriminatory for activity, imaging findings are not included in this scoring system, and studies have shown unsatisfactory agreement between the ITAS and the physician global assessment (PGA) (12,14).
Given the lack of an accurate disease activity assessment tool that can be widely adopted for use in research or clinical practice, we aimed to develop a new disease activity assessment tool that is highly accurate, using the PGA as the gold standard.

Patients
Patients with TAK who underwent laboratory tests and imaging studies [computed tomography (CT) scans] for the purpose of disease activity assessment between 2012 and 2021 at two referral hospitals were retrospectively included for analysis. All patients fulfilled the 1990 American College of Rheumatology criteria for the classification of TAK (15). Patients were randomly assigned to either a derivation or a validation cohort. Data concerning the following at the time of disease activity assessment were reviewed: age, sex, disease duration, type of vascular involvement according to the Hata classification (16), involvement of the pulmonary artery and renal artery, ESR (measured by Test 1 [Alifax, Padova, Italy]; cut-off value for high ESR was >20 mm/h for female, and >15 mm/h for male), Creactive protein (CRP) level (cut-off value for high CRP was >6 mg/L), total scores of disease activity assessment tools including NIH criteria, ITAS2010, ITAS-ESR, and ITAS-CRP, fulfilment of individual items from NIH criteria (NIH criterion 1, new onset or worsening of systemic features, such as fever or musculoskeletal symptoms; NIH criterion 2, new onset or worsening of elevated ESR; NIH criterion 3, new onset or worsening of features of vascular ischemia or inflammation, such as claudication, diminished or absent pulse, bruit, carotidynia, asymmetric blood pressure in either upper or lower limbs; and NIH criterion 4, new onset or worsening of typical angiographic features) (3) and the ITAS2010, PGA (active disease or inactive disease), and the use of a glucocorticoid (none-to-low dose, ≤7.5 mg of prednisolone or equivalent/day; or medium-to-high dose, >7.5 mg of prednisolone or equivalent/day) (17), methotrexate (yes or no), and azathioprine (yes or no). Fulfilment of NIH criterion 4 was assessed based on a CT scan. The NIH criterion 4 was considered fulfilled if one or more of the following findings were observed in the CT scans: (i) new luminal vascular lesions in previously unaffected arterial territory; (ii) progression of a previous luminal vascular lesion; and (iii) presence of concentric arterial wall thickening with delayed enhance.
This study was approved by the Institutional Review Board (IRB) of Gangnam Severance Hospital (IRB No: 3-2021-0445). Owing to this study's retrospective design, the requirement for informed consent was waived.

The PGA
The PGA on disease activity (active or inactive disease) was used as the gold standard. The PGA was determined by the treating physician after laboratory and imaging tests had been performed. Therefore, the PGA was comprehensively based on patient's symptoms, acute phase reactants, and imaging findings. Active disease was defined as presence of two or more of the following: (i) carotidynia; (ii) ischemic episodes; (iii) new bruit or asymmetry in pulse or blood pressure; (iv) constitutional systemic symptoms such as fever, malaise, weight loss, or musculoskeletal symptoms; and (v) elevated ESR and/or CRP. If new or progression of vascular lesions (luminal or arterial wall thickening) were detected on CT scan, presence of one or more of the above was considered as active disease. Constitutional systemic symptoms or elevated acute phase reactants in the absence of any clinical feature directly attributable to vasculitis were not considered as active disease.

Statistical Analysis
The sample size was calculated based on the area under the curve (AUC) of the new disease activity assessment tool. A difference of 0.15 between an AUC of 0.7, which is generally considered as an acceptable accuracy, and the new disease activity assessment tool with an AUC of 0.85 was selected as the minimum clinically significant value. We estimated that a sample size of 82 patients would be sufficient to evaluate the outcome at a significance level of 0.05 (two-sided) with 80% power. Patients were assigned to the derivation cohort and validation cohort in a 7:3 ratio. Continuous variables following normal or non-normal distribution are expressed as mean [± standard deviation (SD)] or median [interquartile range (IQR)], respectively, and categorical variables are expressed as numbers (%).
To develop a new disease activity assessment tool, we first conducted univariable logistic regression analyses in the derivation cohort using the PGA (active or inactive disease) as the dependent variable. We considered all individual items in NIH criteria and the ITAS2010 as potential components of the new disease activity assessment tool. Therefore, each item was used as an independent variable in the univariable logistic regression analysis. Variables that were statistically significant in the univariable logistic regression analyses were selected for multivariable stepwise logistic regression analysis. Taking multicollinearity among the variables into account, eight different multivariable models were constructed. Among the eight multivariable models, the model that yielded the highest AUC for distinguishing active disease from inactive disease was selected and used to develop the new disease activity assessment tool. The new disease activity assessment tool formula was obtained by multiplying each variable by its simplified b coefficient (b coefficient rounded to decimal place) and then summing the results.
Receiver operating characteristic (ROC) curve analysis was performed to determine the cut-off value of the new disease activity assessment tool that best discriminated active disease and inactive disease. The cut-off value was determined at the value where the Youden index was maximum. Diagnostic performance of the new disease activity assessment tool was evaluated by estimating AUC, sensitivity, specificity, accuracy, positive predictive value (PPV), and negative predictive value (NPV) with their respective 95% confidence intervals (95% CIs). The diagnostic performance of the new disease activity assessment tool was compared with that of NIH criteria, the ITAS2010, the ITAS-ESR and the ITAS-CRP.
For validation of the new disease activity assessment tool developed in the derivation cohort, AUC, sensitivity, specificity, accuracy, PPV, and NPV with their respective 95% CIs were estimated in the validation cohort.
A P-value of <0.05 was considered statistically significant. All analyses were conducted using SAS (version 9.4, SAS Inc., Cary, NC, USA) software.

Patients' Characteristics
A total of 146 patients with TAK were included (100 patients in the derivation cohort and 46 patients in the validation cohort). In the derivation cohort, the mean age of the patients was 43.  Table 1.
Using these variables, eight multivariable models were constructed ( Table 3). Among the eight models, model 6 yielded the highest AUC (89.49). Model 6 included high ESR, high CRP, NIH criterion 1, NIH criterion 4, carotidynia, and aortic incompetence as independent variables; high ESR, NIH criterion 1, NIH criterion 4, and carotidynia remained as statistically significant in the final model. This model was selected for the development of the new disease activity assessment tool. The b coefficients of high ESR, NIH criterion 1, NIH criterion 4, and carotidynia were 2.77 (simplified b: 3), 1.53 (simplified b: 2), 3.63 (simplified b: 4), and 3.37 (simplified b: 3), respectively ( Table 4). Using simplified b coefficients of each variable, we generated a new disease activity assessment tool as follows: high ESR (3 points), NIH criterion 1 (2 points), NIH criterion 4 (4 points), and carotidynia (3 points) (total score ≥5, active; total score <5, inactive). The glossary of the terms included in the assessment tool is as follows: high ESR, >20 mm/h for female, and >15 mm/h for male; NIH criterion 1, new onset or worsening of systemic features, such as fever or musculoskeletal symptoms; NIH criterion 4, one or more of the following findings observed in the CT scans, (i) new luminal vascular lesions in previously unaffected arterial territory, (ii) progression of a previous luminal vascular lesion, and (iii) presence of concentric arterial wall thickening with delayed enhance; and carotidynia, tenderness or pain during palpation of the carotid arteries.

Diagnostic Performance of the New Disease Activity Assessment Tool
In the ROC curve analysis (Figure 1), the cut-off value in the new disease activity assessment tool that best discriminated active disease and inactive disease was 5 (≥5, active; <5, inactive). The AUC, sensitivity, specificity, accuracy, PPV, and NPV of the new disease activity assessment tool were 89.37 (95% CI 83. 18 Table 5). The new disease activity assessment tool had significantly higher AUC and accuracy for differentiating active disease and inactive disease than NIH criteria and the ITAS2010, ITAS-ESR, and ITAS-CRP.

Validation of the New Disease Activity Assessment Tool
In the validation cohort, the mean patient age was 41.7 ( ± 15.5) years, and 82.6% of patients were female. The median ESR, CRP,  Table 1.   The diagnostic performance of the new disease activity assessment tool in the validation cohort was similar to that in the derivation cohort, with AUC, sensitivity, specificity, accuracy, PPV, and NPV of 85. 23

DISCUSSION
In this study, we developed a new disease activity assessment tool derived from items obtained from NIH criteria and the ITAS2010. This new assessment tool had a higher accuracy in differentiating active disease and inactive disease than NIH criteria and the ITAS2010, ITAS-ESR, and ITAS-CRP. Given that accurate assessment of disease activity is important in therapeutic decision-making (18), this newly generated disease activity assessment tool with high accuracy has important clinical implications.
The ITAS2010 is a clinical assessment tool that comprehensively captures clinical manifestations for assessment of disease activity of TAK (13). The ITAS-ESR and ITAS-CRP additionally incorporate acute phase reactants (13). However, imaging findings are not included as a scoring item in the ITAS (13). As patients with active TAK commonly have non-specific disease manifestations and unreliable laboratory parameters, imaging findings are of importance in monitoring disease activity in patients with TAK (19,20). On this basis, it is reasonable to combine clinical manifestations and acute phase reactants with imaging findings to yield a highly accurate disease activity assessment tool. Indeed, the new disease activity assessment tool consists of all the above-mentioned components (NIH criterion 1 and carotidynia for the clinical manifestation domain, high ESR for the acute phase reactant domain, and NIH criterion 4 for the imaging domain). Its diagnostic performance in discriminating active disease and inactive disease was significantly better than that of the ITAS2010, ITAS-ESR, and ITAS-CRP, which do not include imaging findings. These results reflect the importance of imaging findings in assessing disease activity of TAK.
NIH criteria include clinical manifestations, acute phase reactant (ESR), and imaging findings (3). Since an imaging finding (NIH criterion 4), is included as a criterion, NIH criteria (AUC, 77.96) had a higher AUC in discriminating active disease and inactive disease than the ITAS2010 (AUC, 66.12), ITAS-ESR (AUC, 75.58), and ITAS-CRP (AUC, 71.34), as could be expected. However, the AUC of NIH criteria for distinguishing active and inactive disease was lower compared with that of the new disease activity assessment tool (AUC, 89.37). This significantly higher AUC of the new assessment tool was striking, given that both disease activity assessment tools include items from all domains for assessing disease activity (clinical manifestations, acute phase reactants, and imaging findings). The difference in accuracy appears to stem from using a high ESR and carotidynia instead of NIH criterion 2 and NIH criterion 3, respectively, in the new assessment tool. The fulfilment of NIH criterion 2 is defined as new or worsening of elevated ESR (3). An important limitation of this definition is that it does not fully reflect the current absolute state of acute phase reactant. For instance, if the ESR was 90 mm/h previously and is 72 mm/h currently, NIH criterion 2 will be considered as not being met, even though the current ESR of 72 mm/h is still high (i.e. false negative). On the other hand, the item "high ESR" included in the new disease activity assessment tool captures elevated ESR regardless of the previous ESR, and therefore reflects the current absolute state better than NIH criterion 2. Hence, using a high ESR instead of NIH criterion 2 could have attributed to a higher sensitivity (a lower chance of a false negative) in the new disease activity assessment tool (sensitivity, 76.09) than that of NIH criteria (sensitivity, 54.35).
Another difference between the new disease activity assessment tool and NIH criteria is the use of carotidynia instead of NIH criterion 3. The fulfilment of NIH criterion 3 is defined as new or worsening of ischaemia [claudication, diminished or absent pulse, bruit, vascular pain (carotidynia), and asymmetric blood pressure] (3). Of symptoms included in   (3). Therefore, inclusion of vascular symptoms other than carotidynia could result in a false positive detection of active disease. In the new disease activity assessment tool, carotidynia is used as the only item reflecting vascular symptoms, which may have attributed to a higher specificity (a lower chance of a false positive) in the new disease activity assessment tool (specificity, 92.59) than in NIH criteria (specificity, 83.33).
The new disease activity assessment tool weights each item based on simplified b coefficients. The weights varied among items, with NIH criterion 4 (simplified b coefficient: 4) having the highest weight and NIH criterion 1 (simplified b coefficient: 2) having the lowest weight. It should be noted that fulfilment of one item only is not sufficient to classify a patient as having active disease (the cut-off value for active disease was ≥5). On the other hand, patients with any combination of two items results in a score of ≥5, and such patients will be classified as having active disease. Therefore, the new disease activity assessment tool can be simplified similar to NIH criteria as follows: fulfilment of two or more of the following features: (i) high ESR, (ii) NIH criterion 1, (iii) NIH criterion 4, and (iv) carotidynia.
This study had several limitations. First, histopathology of the artery, which is the true gold standard for assessing disease activity (18), was not available. We instead used the PGA as the gold standard. As the treating physician was aware of all clinical information including clinical manifestations, laboratory test parameters, and imaging findings, we assumed that the PGA was most suitable for use as a gold standard for validating the disease activity assessment tool. Indeed, the PGA is widely used as a disease activity comparator (11,13,14,21). Second, we lacked data on 18 Ffluorodeoxyglucose positron emission tomography/CT ( 18 F-FDG PET/CT) scan results, which are useful in assessing disease activity of TAK (22)(23)(24). If these data had been present and used in combination with the items from current disease activity assessment tools, they might have yielded a new disease activity assessment tool with even higher accuracy. However, 18 F-FDG PET/CT scans are expensive and have limited availability; therefore, their value in routine clinical practice is limited. CT scans, on the other hand, are less expensive and easily accessible compared with 18 F-FDG PET/CT scans. Therefore, our results could be more applicable to routine clinical practice setting. Moreover, the accuracy of the new disease activity assessment tool we developed here in the absence of these data was still good (AUC >0.8) (25), and is therefore clinically meaningful. Third, we lack longitudinal data and were unable to assess response to change of the disease activity assessment tool as a reflection of therapy.
In conclusion, we developed a new disease activity assessment tool that consists of high ESR (3 points), NIH criterion 1 (2 points), NIH criterion 4 (4 points), and carotidynia (3 points) (total score of ≥5 indicates active disease, or simply, fulfilment of ≥2 components indicates active disease), which has a higher accuracy in discriminating active disease and inactive disease than the currently used clinical assessment tools. This new disease activity assessment tool is easy to perform and could be useful for more accurately classifying patients with TAK into active TAK and inactive TAK.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/supplementary material. Further inquiries can be directed to the corresponding author.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Institutional Review Board (IRB) of Gangnam Severance Hospital (IRB No: 3-2021-0445). The ethics committee waived the requirement of written informed consent for participation.

AUTHOR CONTRIBUTIONS
OCK and M-CP contributed to the conception and design of the study. OCK and M-CP participated in acquisition of data, data analyses and data interpretation. OCK and M-CP wrote the manuscript. All authors contributed to the article and approved the submitted version.