Evaluation of a Fully Automated Antinuclear Antibody Indirect Immunofluorescence Assay in Routine Use

Indirect immunofluorescence assay (IFA) using HEp-2 cells as a substrate is the gold standard for detecting antinuclear antibodies (ANA) in patient serum. However, the ANA IFA has labor-intensive nature of the procedure and lacks adequate standardization. To overcome these drawbacks, the automation has been developed and implemented to the clinical laboratory. The purposes of this study were to evaluate the analytical performance of a fully automated Helios ANA IFA analyzer in a real-life laboratory setting, and to compare the time and the cost of ANA IFA testing before and after adopting the Helios system. A total of 3,276 consecutive serum samples were analyzed for ANA using the Helios system from May to August 2019. The positive/negative results, staining patterns, and endpoint titers were compared between Helios and visual readings. Furthermore, the turnaround time and the number of wells used were compared before and after the introduction of Helios system. Of the 3,276 samples tested, 748 were positive and 2,528 were negative based on visual readings. Using visual reading as the reference standard, the overall relative sensitivity, relative specificity, and concordance of Helios reading were 73.3, 99.4, and 93.4% (κ = 0.80), respectively. For pattern recognition, the overall agreement was 70.1% (298/425) for single patterns, and 72.4% (89/123) for mixed patterns. For titration, there was an agreement of 75.9% (211/278) between automated and classical endpoint titers by regarding within ± one titer difference as acceptable. Helios significantly shortened the median turnaround time from 100.6 to 55.7 h (P < 0.0001). Furthermore, routine use of the system reduced the average number of wells used per test from 4 to 1.5. Helios shows good agreement in distinguishing between positive and negative results. However, it still has limitations in positive/negative discrimination, pattern recognition, and endpoint titer prediction, requiring additional validation of results by human observers. Helios provides significant advantages in routine laboratory ANA IFA work in terms of labor, time, and cost savings. We hope that upgrading and developing softwares with more reliable capabilities will allow automated ANA IFA analyzers to be fully integrated into the routine operations of the clinical laboratory.

Indirect immunofluorescence assay (IFA) using HEp-2 cells as a substrate is the gold standard for detecting antinuclear antibodies (ANA) in patient serum. However, the ANA IFA has labor-intensive nature of the procedure and lacks adequate standardization. To overcome these drawbacks, the automation has been developed and implemented to the clinical laboratory. The purposes of this study were to evaluate the analytical performance of a fully automated Helios ANA IFA analyzer in a real-life laboratory setting, and to compare the time and the cost of ANA IFA testing before and after adopting the Helios system. A total of 3,276 consecutive serum samples were analyzed for ANA using the Helios system from May to August 2019. The positive/negative results, staining patterns, and endpoint titers were compared between Helios and visual readings. Furthermore, the turnaround time and the number of wells used were compared before and after the introduction of Helios system. Of the 3,276 samples tested, 748 were positive and 2,528 were negative based on visual readings. Using visual reading as the reference standard, the overall relative sensitivity, relative specificity, and concordance of Helios reading were 73.3, 99.4, and 93.4% (k = 0.80), respectively. For pattern recognition, the overall agreement was 70.1% (298/425) for single patterns, and 72.4% (89/123) for mixed patterns. For titration, there was an agreement of 75.9% (211/278) between automated and classical endpoint titers by regarding within ± one titer difference as acceptable. Helios significantly shortened the median turnaround time from 100.6 to 55.7 h (P < 0.0001). Furthermore, routine use of the system reduced the average number of wells used per test from 4 to 1.5. Helios shows good agreement in distinguishing between positive and negative results. However, it still has limitations in positive/negative discrimination, pattern recognition, and endpoint titer prediction, requiring additional validation of results by human observers. Helios provides significant advantages in

INTRODUCTION
Antinuclear antibodies (ANA) are one of the most important serological markers used for the diagnosis of systemic autoimmune rheumatic diseases (SARD) such as systemic lupus erythematosus (SLE), systemic sclerosis (SSc), Sjögren's syndrome (SjS), mixed connective tissue disease (MCTD), and idiopathic inflammatory myopathy (IIM). Steady increases in the prevalence of SARD have been reported in recent years, which has been attributed to a variety of causes, including exposure to environmental chemicals and toxins, an aging population and its associated chronic diseases, and use of particular drug regimens (1). With this increase in disease prevalence, the ANA test requests are increased by non-rheumatological clinicians to exclude SARD in patients due to the high negative predictive value of ANA measurement (2,3).
Indirect immunofluorescence assay (IFA) using human epithelial cell tumor (HEp-2) cells is the most established method for ANA screening (4). The main benefits of the ANA IFA are the detection of wide-ranging autoantibodies, high sensitivity, and the possibility of concurrently determining staining patterns and titers (5). Nevertheless, the ANA IFA has several drawbacks, including the labor-intensive nature of the procedure and a lack of adequate standardization (5)(6)(7). Notably, pattern recognition, which depends on the individual abilities of investigators, can result in significant inter and intra-laboratory variabilities (8,9). To overcome those challenges, several alternative techniques have been developed as potential replacements for IFA (i.e., single and multiplex immunometric assays, such as enzyme-linked immunosorbent assays, line immunoassays, and multiplex bead assays), promising improvements in standardization, throughput, and objectivity in results (10,11). However, contrary to expectations, these alternative methods can vary significantly in sensitivity and diagnostic accuracy due to the difference in source, purity, concentration, binding capacity, and the limited number of antigens (10)(11)(12)(13). Based on concerns regarding the newer assays and their associated limitations, the American College of Rheumatology (ACR) recommended IFA as the gold standard for ANA testing (14). In the context of standardization in ANA IFA testing and reporting, the International Consensus on ANA Patterns (ICAP) has been established, aiming to reach the consensus on nomenclature and definition of Hep-2 cell IFA patterns. The ICAP provides standardized categorization and nomenclature distinguishing different fluorescence patterns from AC (anti-cellular)-1 to AC-29, including AC-0 (negative), as well as interpretation guidelines of the 29 distinct patterns (15)(16)(17)(18). In addition to such increased demand for ANA testing and standardization efforts, the automation of slide preparation, image acquisition, titration, and interpretation were developed and evaluated for implementation to the clinical laboratory (8,(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29).
Among the commercial automated systems, Helios (Aesku Diagnostics, Wendelsheim, Germany) is the only fully automated IFA processor in which the automated digital image acquisition and ANA reading systems are integrated with slide processing in one instrument (2,8). During the full process, no intervention is needed, offering users a true hands-off time. The system employs barcode readers for complete traceability, a unique three needle system for fast pipetting operations enabling non-stop performance, a motorized and autofocus fluorescence microscope, and specially designed software using mathematical algorithms for discrimination of positive and negative results, identification of ANA patterns and titers.
By integrating the fully automatic ANA IFA analyzer in our laboratory, we aimed to establish a fast and efficient workflow for ANA testing. Here, we evaluated the performance of the Helios system in our real-life laboratory setting, where patient groups are less clearly defined, and test orders are not based on predefined criteria. Additionally, we compared the time and the cost of ANA IFA testing before and after adopting the Helios system.

Sample Collection
Between May and August 2019, a total of 3,276 consecutive serum samples obtained from 3,164 patients were referred for routine ANA testing to the Diagnostic Immunology Laboratory at Chonnam National University Hospital, Gwangju, South Korea. The study design and sample flowchart are described in Figure 1. This study was approved by the Institutional Review Board of Chonnam National University Hospital (IRB CNUH-2019-304). Due to the nature of this study, the Institutional Review Board of Chonnam National University Hospital waived the requirement for informed consent.

Automated ANA IFA
ANA tests were performed on a Helios automated analyzer using the ANA HEp-2 standard kit and Helios software version 3.1 (Aesku Diagnostics) according to the manufacturer's instructions. Briefly, serum samples were loaded in the Helios system, and the tests were automatically conducted at 1:80 dilution. Digital images are taken by a camera and stored on the computer system. The positive/negative classification module leverages the image features such as the structure of the objects, the fluorescence signal intensity (FI), and the background/cell ratio (8). The cut-off value of FI was 70. Three images were taken for each sample, and samples with two or more images classified as positive were defined as 'positive'. For positive pre-classified samples, the software tool of the Helios system recognizes the pattern of the captured image by using SVM (Support Vector Machine) algorithm. The system also provides automatically predicted endpoint titers based on the measured FI. Since the Helios software has not accommodated the ICAP classification yet, it reports staining patterns as following: homogeneous, speckled, centromere, nuclear dots, nucleolar, nuclear envelope, and cytoplasmic (22,23).
After all automated procedures, two experienced observers initially interpreted the stored digital images independently without knowledge of the suggested interpretation of Helios, and if the two experts disagreed, a consensus was reached by discussion. As recommended by ICAP, we endeavor to report all 29 HEp-2 cell IFA patterns in standardized nomenclature. To compare the patterns by visual reading with Helios reading, we assigned AC-1 as homogeneous; AC-2, AC-4, AC-5, AC-29 as speckled; AC-3 as centromere; AC-6, AC-7 as nuclear dots; AC-8, AC-9, AC-10 as nucleolar; AC-11, AC-12 as nuclear envelope; AC-15 to AC-23 as cytoplasmic; and AC-13, AC-14, AC-24 to AC-28 as others.
In case of the samples referred for screening tests only, positive samples were not proceeded with any further dilution and reported as positive with patterns. In case of the samples referred for titration tests, positive samples identified using the standard 1:80 dilution in the screening mode were further diluted. The classical endpoint titers based on the visual reading of the images from serial dilution were reported with patterns. For quality control, two standards (one positive and one negative) provided in the test kit, and two patient serum samples [one positive having a homogeneous (AC-1) pattern with a titer of 1:320 and one negative (AC-0)] were tested in parallel. Cohen's k values were interpreted as follows: ≤ 0.20 as poor, 0.21-0.40 as fair, 0.41-0.60 as moderate, 0.61-0.80 as good, and 0.81-1.00 as very good agreement (30). Fisher's exact test was used for comparison of proportions. The turnaround times (TATs) were defined as follows: TAT [1] , the time from blood sampling to sample receipt; TAT [2] , the time from sample receipt to results reporting; and TAT [Total] , the time from blood sampling to results reporting. Normality test for distribution of age and TATs was performed by D'Agostino-Pearson test. Mann-Whitney U test was used to compare TATs before and after the use of the Helios system. All statistical analyses were performed using R software version 3.6.1, and graphics were prepared using GraphPad Prism software version 6.0. P values < 0.05 were considered statistically significant.

Positive/Negative Discrimination
The analytical performance of Helios automated reading for discriminating between positive and negative ANA results is summarized in Table 2. Among a total of 3,276 samples, visual reading yielded 748 (22.8%) positive and 2,528 (77.2%) negative results. Of the 748 positive samples by visual reading, 548 (73.3%) were positive and 200 (26.7%) negative by Helios reading. Of the 2,528 negative samples by visual reading, 16 (0.6%) were positive and 2,512 (99.4%) negative by Helios reading. Using visual reading as the reference standard, the overall relative sensitivity, relative specificity, and concordance of Helios reading were 73.3, 99.4, and 93.4% (k = 0.80), respectively.
Of the total samples requested for ANA testing, 1,575 were assigned for screening and 1,701 for titration ( Figure 1). The relative sensitivity of Helios reading was found to be significantly higher in samples requested for titration compared with screening (77.3 vs. 66.6%, P < 0.005; Table 2). To investigate the impact of the inclusion of weakly positive samples on analytical performance of Helios reading, we compared the analytical performance between inclusion and exclusion of weakly positive samples in samples requested for titration. The relative sensitivity, relative specificity, and concordance of Helios reading were found to be significantly higher in samples with titers ≥ 1:160 compared with titers ≥ 1:80 (95.8 vs. 77.3%, P < 0.0001; 99.9 vs. 99.4%, P < 0.05; 99.2% (k = 0.97) vs. 93.3% (k = 0.82), P < 0.0001, respectively; Table 2).

Discrepancy Analysis
Discrepancies between Helios and visual readings for positive/ negative discrimination are summarized in Table 3. Among a total of 200 false negative samples, only 106 were referred for titration tests. The titration data revealed that 105 (99.1%) had a titer of ≤ 1:160, and the remaining one (0.9%) had a titer of 1:320 P values for comparison of proportions of request departments between screening and titration were calculated using Fisher's exact test. Values with the same superscript lowercase letters were compared with each other: c P < 0.0001; d P < 0.0001; e P < 0.0001; f P < 0.0001; g P < 0.0001; h P < 0.0001; and i P = 0.0021. ANA, antinuclear antibody; IFA, indirect immunofluorescence assay; n, number; and IQR, interquartile range.

Pattern Recognition
For samples showing single patterns by both Helios and visual readings, the overall agreement between Helios and visual readings was 70.1% (298/425)( For samples showing mixed patterns, the overall agreement between Helios and visual readings was 72.4% (89/123)( Table 5). As the Helios software can suggest only one pattern, if the suggested pattern was one of the mixed patterns by visual reading, it was considered concordant.

Endpoint Titer Estimation
For samples showing single patterns by both Helios and visual readings, by regarding within ± one titer difference as acceptable,    Concordance and error rates of automated endpoint titer were analyzed according to ANA pattern and the degree of titer difference in a total of 200 samples showing the same pattern by both Helios and visual readings ( Table 7 and Figures 2B-F). Of these samples, 60 (30.0%) had the same titer, 148 (74.0%) were within ± one titer difference, and 52 (26.0%) had more than ± one titer difference. In error results with more than ± one titer difference, automated endpoint titers of homogeneous patterns were significantly higher than classical endpoint titers (P < 0.0001), whereas those of speckled patterns were significantly lower (P < 0.01). The titer agreement for individual patterns, presented in descending order, were as follows: cytoplasmic (91.7%) > homogeneous (86.3%) > nucleolar (75.0%) > speckled (71.7%) > centromere (47.1%). Cross-tabulated data about automated and classical endpoint titers for individual patterns are presented in Supplementary Tables 1-5.

Time and Cost Analysis
TAT and reagent consumption before and after the adoption of the Helios system in routine clinical practice were compared ( Table 8). Our data showed that the median total TAT was significantly shortened from 100.6 h to 55.7 h after the introduction of Helios (P < 0.0001). Moreover, routine use of the Helios system also reduced the consumption of slide wells per test from 4 to 1.5.

DISCUSSION
To the best of our knowledge, this is the most extensive single-center investigation assessing the performance, titration capability, TAT, and cost-effectiveness of Helios, a fully automated analyzer used for daily ANA IFA testing in a large set of consecutive patients with suspected SARD in a real-life setting. In this study, the overall relative sensitivity, relative specificity, and concordance of Helios reading was 73.3, 99.4, and 93.4% (k = 0.80), respectively, which varied considerably from values obtained in several previous studies using various automated analyzers (23)(24)(25)(26). The analytical performance of automated systems is significantly affected by factors such as sample selection bias, prevalence, inclusion rate of weakly positive samples, and the individual device being tested (8,22,31). Our subgroup analysis showed that the relative sensitivity and concordance with visual assessments were superior in titration samples compared with screening samples ( Table 2). This observation is consistent with a previous study comparing samples processed at university and private laboratories (27). In screening samples, a low prevalence of SARD is usually expected (32). Our observation supported this expectation that the proportion of samples requested by the department of rheumatology showed a significantly higher percentage of titration samples compared with screening samples (64.7 versus 23.5%, P < 0.0001; Table 1). Additional analysis regarding weakly positive samples demonstrated that the analytical performance was better in cohorts with a low proportion of weakly positive samples than with a high proportion of weakly positive samples, consistent with previous results (28). This is supported by our observation that excluding weakly positive samples improves the concordance (Cohen's k) from 0.82 to 0.97.
In the present study, among a total of 3,276 samples, Helios mistakenly identified 200 (6.1%) as false negatives and 16 (0.5%) as false positives, suggesting that Helios missed a considerable number of visually positive cases. The main reason for this higher proportion of false negatives may be due to the inclusion of more samples with borderline FI from consecutive patients with suspected SARD than from well-defined patient groups. Previous studies also reported that automated ANA IFA systems have difficulties in differentiating negative and weakly positive samples (8,20,22,33). This notion is consistent with our data showing that almost all false negatives had low titers (1:80 or 1:160). Recently, the ACR and the European League Against Rheumatism (EULAR) released new SLE criteria based on a scoring system including a positive ANA at a titer ≥ 1:80 by IFA occurring at least once as an entry criterion to ensure high sensitivity (34,35). Our data showed      Table 6). This implies that the performances observed during routine laboratory use of the automated systems are not yet satisfactory. We further investigated the system's positive/negative discrimination parameters regarding such false negative samples. Interestingly, 34% (68/200) had at least one image over the FI cut-off of 70, including 5 of 8 SLE samples. To avoid missing such cases, it would be helpful to check each image's FI on the user interpretation module. Besides, adjusting the cut-off values could further increase the sensitivity of automated systems (26).
Our study showed that the overall concordance rates of pattern recognition between Helios and visual readings were 70.1% for single patterns and 72.4% for mixed patterns, which were similar to Daves et al.'s data (23). However, these values were lower than those in other previous studies (83.7-92.3%) (24,26,29). Such variation among studies may be due to the difference in the automated systems and reagents being used. Furthermore, our data showed that concordance rates varied from 0 to 100% according to individual patterns. The Helios system correctly recognized over 70% of homogeneous, speckled, centromere, nucleolar, and nuclear dots patterns, but less than 50% of nuclear envelope and cytoplasmic patterns, which, except for cytoplasmic patterns, is in line with previous studies (23)(24)(25)29). Our data revealed that Helios incorrectly identified 22.9% (47/205) of speckled pattern as homogeneous. By further investigation, of these 47 cases, 35 (74.5%) were nuclear dense fine speckled (AC-2) pattern. Data from an international internet-based survey reported that AC-2 pattern was recognized with significantly lower accuracy and most often confused with homogeneous or other speckled patterns (36). Therefore, it is necessary to make a careful review of homogeneous patterns suggested by the Helios system. Collectively, these findings indicate that the analytical performance of Helios for pattern recognition is not fully satisfactory from a perspective of routine laboratory practice, still requiring expert intervention for a considerable number of assigned patterns.
Considering results within ± one titer difference as acceptable, the overall agreement between automated and classical endpoint titer was 75.9% (Table 6), consistent with previous studies (23,24,29). For samples with the same single patterns by both Helios and visual readings (Table 7), the overall error rate was 26.0%, similar to previous reports (25,29,37,38). The high error rates may be due to the single-well titration method that most automated systems currently use. Won suggested the multi-well (2 or 3 wells) based line slope titration method to improve the accuracy (37). Moreover, our analysis on pattern dependency of automated endpoint titer prediction demonstrated that Helios predicted higher titers for homogeneous patterns but lower titers for speckled patterns, consistent with a previous study (37). The centromere pattern (AC-3), which possesses a lower overall fluorescence than the other common patterns, had a lower concordance rate of 47.1%, and most of the error results had lower titers (41.2%), in line with Zeng et al (25). We speculate that these phenomena are due to the different total amounts of fluorescent signals measured according to patterns. Taken together, these findings suggested the use of pattern-specific cut-off values or multi-well titration method to increase the accuracy of automated endpoint titer results.
In addition to the analytical performance of the automated system, hands-on time and material cost are essential concerns in a routine clinical laboratory. Helios shortened the TAT to nearly half of that seen using manual methods and decreased the number of slide wells used by two-thirds by adopting automated endpoint titer predictions as a guide before performing titer evaluation. Before implementing the Helios system in our laboratory, the workflow from sample preparation to results required approximately two working days, limiting our ability to perform ANA testing to two or three times per week. After the introduction of the Helios system, the ANA IFA test could be performed every working day. For well count saving, before the introduction of the fully automated system, all titration samples were serially diluted from 1:40 to 1:320, screened, and reported with intensity. After introducing the Helios system, the titration samples were screened at 1:80 dilution, and if positive, further dilution was done based on the automatically predicted endpoint titer, enabling us to reduce the number of wells used.
There were some limitations to our research. First, our study included a small number of specific patterns, such as nuclear dots and nuclear envelope patterns, limiting our ability to accurately assess the accuracy of the Helios system for these patterns. Second, we did not include the ENA or the patients' disease status when confirming patterns by visual reading, as the goal of this study was to assess the level of concordance between automated results and human assessments under real-life working conditions. Evaluation of the Helios system in the context of these additional factors will be investigated in a future study. Finally, the possibility of interobserver reading bias cannot be ruled out in a single-center study. This is supported by our analysis of the two expert reading results showing overall inter-observer agreements of 86.7% (k = 0.69) for positive/negative discrimination and 85.4% for pattern classification. Therefore, a multicenter study will be required to overcome the readers' subjectivity in a single-center study (9).
In conclusion, Helios, the fully automated ANA IFA analyzer showed good agreement in distinguishing between positive and negative results. However, it still has limitations in positive/negative discrimination, pattern recognition, and endpoint titer prediction, requiring additional validation of results by human observers. a Only samples with the same single pattern by both Helios and visual readings were included. To compare the patterns between Helios and visual readings, we assigned AC-1 as homogeneous; AC-2, AC-4, AC-5, AC-29 as speckled; AC-3 as centromere; AC-6, AC-7 as nuclear dots; AC-8, AC-9, AC-10 as nucleolar; AC-11, AC-12 as nuclear envelope; AC-15 to AC-23 as cytoplasmic; and AC-13, AC-14, AC-24 to AC-28 as others.
b P values for comparison of proportions between higher and lower predicted automated endpoint titers were calculated using Fisher's exact test. Values with the same superscript lowercase letters were compared with each other: b P < 0.0001; c P < 0.01; d P > 0.05; e P > 0.05; f P > 0.05; and g P > 0.05. ANA, antinuclear antibody; n, number. The TAT [1] was defined as the time from blood sampling to sample receipt. c The TAT [2] was defined as the time from sample receipt to results reporting. d The TAT [Total] was defined as the time from blood sampling to results reporting. ANA, antinuclear antibody; h, hour; IQR, interquartile range; n, number; and TAT, turnaround time.
Helios provides significant advantages in routine laboratory ANA IFA work in terms of labor, time, and cost savings. We hope that upgrading and developing softwares with more reliable capabilities will allow automated ANA IFA analyzers to be fully integrated into the routine operations of the clinical laboratory.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding author.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Institutional Review Board of Chonnam National University Hospital (IRB CNUH-2019-304). Written informed consent for participation was not provided by the participants' legal guardians/next of kin because: the IRB-CNUH waived the requirement for informed consent, due to the nature of this study.

FUNDING
This study was supported by the National Research Foundation of Korea (2020R1C1C1007297) and the Chonnam National University Hospital Biomedical Research Institute (BCRI19024). The funding organizations played no role in the design of study, choice of enrolled patients/specimens, review and interpretation of data, preparation of manuscript, or final approval of manuscript.