Artificial intelligence in colposcopic examination: A promising tool to assist junior colposcopists

Introduction Well-trained colposcopists are in huge shortage worldwide, especially in low-resource areas. Here, we aimed to evaluate the Colposcopic Artificial Intelligence Auxiliary Diagnostic System (CAIADS) to detect abnormalities based on digital colposcopy images, especially focusing on its role in assisting junior colposcopist to correctly identify the lesion areas where biopsy should be performed. Materials and methods This is a hospital-based retrospective study, which recruited the women who visited colposcopy clinics between September 2021 to January 2022. A total of 366 of 1,146 women with complete medical information recorded by a senior colposcopist and valid histology results were included. Anonymized colposcopy images were reviewed by CAIADS and a junior colposcopist separately, and the junior colposcopist reviewed the colposcopy images with CAIADS results (named CAIADS-Junior). The diagnostic accuracy and biopsy efficiency of CAIADS and CAIADS-Junior were assessed in detecting cervical intraepithelial neoplasia grade 2 or worse (CIN2+), CIN3+, and cancer in comparison with the senior and junior colposcipists. The factors influencing the accuracy of CAIADS were explored. Results For CIN2 + and CIN3 + detection, CAIADS showed a sensitivity at ~80%, which was not significantly lower than the sensitivity achieved by the senior colposcopist (for CIN2 +: 80.6 vs. 91.3%, p = 0.061 and for CIN3 +: 80.0 vs. 90.0%, p = 0.189). The sensitivity of the junior colposcopist was increased significantly with the assistance of CAIADS (for CIN2 +: 95.1 vs. 79.6%, p = 0.002 and for CIN3 +: 97.1 vs. 85.7%, p = 0.039) and was comparable to those of the senior colposcopists (for CIN2 +: 95.1 vs. 91.3%, p = 0.388 and for CIN3 +: 97.1 vs. 90.0%, p = 0.125). In detecting cervical cancer, CAIADS achieved the highest sensitivity at 100%. For all endpoints, CAIADS showed the highest specificity (55–64%) and positive predictive values compared to both senior and junior colposcopists. When CIN grades became higher, the average biopsy numbers decreased for the subspecialists and CAIADS required a minimum number of biopsies to detect per case (2.2–2.6 cut-points). Meanwhile, the biopsy sensitivity of the junior colposcopist was the lowest, but the CAIADS-assisted junior colposcopist achieved a higher biopsy sensitivity. Conclusion Colposcopic Artificial Intelligence Auxiliary Diagnostic System could assist junior colposcopists to improve diagnostic accuracy and biopsy efficiency, which might be a promising solution to improve the quality of cervical cancer screening in low-resource settings.

Introduction: Well-trained colposcopists are in huge shortage worldwide, especially in low-resource areas. Here, we aimed to evaluate the Colposcopic Artificial Intelligence Auxiliary Diagnostic System (CAIADS) to detect abnormalities based on digital colposcopy images, especially focusing on its role in assisting junior colposcopist to correctly identify the lesion areas where biopsy should be performed.
Materials and methods: This is a hospital-based retrospective study, which recruited the women who visited colposcopy clinics between September 2021 to January 2022. A total of 366 of 1,146 women with complete medical information recorded by a senior colposcopist and valid histology results were included. Anonymized colposcopy images were reviewed by CAIADS and a junior colposcopist separately, and the junior colposcopist reviewed the colposcopy images with CAIADS results (named CAIADS-Junior). The diagnostic accuracy and biopsy efficiency of CAIADS and CAIADS-Junior were assessed in detecting cervical intraepithelial neoplasia grade 2 or worse (CIN2+), CIN3+, and cancer in comparison with the senior and junior colposcipists. The factors influencing the accuracy of CAIADS were explored.
Results: For CIN2 + and CIN3 + detection, CAIADS showed a sensitivity at ~80%, which was not significantly lower than the sensitivity achieved by the senior colposcopist (for CIN2 +: 80.6 vs. 91.3%, p = 0.061 and for CIN3 +: 80.0 vs. 90.0%, p = 0.189). The sensitivity of the junior colposcopist was increased significantly with the assistance of CAIADS (for CIN2 +: 95.1 vs. 79.6%, p = 0.002 and for CIN3 +: 97.1 vs. 85.7%, p = 0.039) and was comparable to those of the senior colposcopists (for CIN2 +: 95.1 vs. 91.3%, p = 0.388 and for CIN3 +: 97.1 vs. 90.0%, p = 0.125). In detecting cervical cancer, CAIADS achieved the highest sensitivity at 100%. For all endpoints, CAIADS showed the highest specificity (55-64%) and positive predictive values compared to both senior and junior colposcopists. When CIN grades became higher, the average biopsy numbers decreased for the subspecialists and CAIADS required a minimum number of biopsies to detect per case (2.2-2.6 cut-points). Meanwhile, the biopsy sensitivity of the junior colposcopist was the lowest, but the CAIADS-assisted junior colposcopist achieved a higher biopsy sensitivity.
Conclusion: Colposcopic Artificial Intelligence Auxiliary Diagnostic System could assist junior colposcopists to improve diagnostic accuracy and biopsy efficiency, which might be a promising solution to improve the quality of cervical cancer screening in low-resource settings.

Introduction
Cervical cancer remains the fourth most common malignant cancer among women, with an estimated 600,000 new cases and 340,000 deaths in 2020 (1). China has a large population and contributes to nearly 18% (106,000) of new cervical cancer cases and 14% (48,000) of deaths (2), and the morbidity and mortality of cervical cancer tended to increase from 2000 to 2016 in China (3). In 2018, the World Health Organization (WHO) called for global action to eliminate cervical cancer (4), while there is a considerable gap between the WHO goals and the real situation regarding cervical cancer prevention and control in China. Although different human papillomavirus (HPV) vaccines have been approved since 2016 in China, screening is still an indispensable prevention strategy in this post-vaccination era. HPV test has high sensitivity, reproducibility, long-term (at least 5years) reassurance after a negative HPV result, and has been proved as feasible on self-collected samples (5)(6)(7). Thus, HPV testing has been widely used in primary cervical cancer screening in many countries, and recommended as the main screening method in the latest WHO guidelines (8).
The application of such a highly sensitive screening method, if not appropriately triaged by another test, will inevitably lead to a much higher colposcopy referral rate. The colposcopic examination is the crucial step linking the primary screening and the histological diagnosis that determines the clinical decision about the optimal management of abnormal lesions (9). Colposcopy plays an irreplaceable role in the precise localization of the biopsy sites and in the early diagnosis of precancerous lesions to reduce the incidence of cervical cancer (10,11). The accuracy of colposcopy is highly operatordependent, resulting in low reproducibility and varied diagnostic performance between different resource settings (12). Many low-and middle-income countries are facing the challenges of a shortage of experienced colposcopists, regular colposcopy training courses, a uniform diagnostic standard and strict quality control process, making colposcopy a bottleneck problem that restricts the benefits of cervical cancer screening program (13).
In recent years, artificial intelligence (AI) has been rapidly developed and applied in different fields (14-16). In healthcare, AI has shown promising application value in enhancing diagnosis and personalizing treatment (17)(18)(19)(20). There is an increasing interest in the use of deep learning-based AI technologies for the automatic assessment of medical images, which contributes to improving diagnostic accuracy and objectivity and reduces the workload of healthcare workers (21). Such advances also offer the opportunity to tackle the aforementioned challenges in colposcopic diagnosis in cervical cancer screening (22). Xue et al. developed a Colposcopic Artificial Intelligence Auxiliary Diagnostic System (CAIADS) that was trained, tuned, and validated using a large number of colposcopic images and clinical information from 19,435 patients, revealing its potential in improving the diagnostic quality of colposcopy and biopsy in the detection of cervical precancer/cancer (23). In 2022, Zhao et al. (24) concluded that the CAIADS had a higher sensitivity and similar specificity compared with colposcopists. However, the usefulness of the CAIADS in assisting less-experienced colposcopists in clinical practice is unclear.
In this study, we used hospital-based data to further evaluate the diagnostic performance of the CAIADS and its role in assisting junior colposcopists to identify the lesion areas and guide targeted biopsies.

Study population
This was a hospital-based retrospective study. A total of 1,146 women visited the colposcopy clinics at the Affiliated Cancer Hospital of Xinjiang Medical University in Xinjiang, China, due to abnormal HPV or cytological results or other gynecological symptoms between September 2021 and January 2022. The study cohort comprised women who had standard colposcopic images consecutively taken at 0, 30, 60, 90, and 120 s during the colposcopic examination and had a valid histologic diagnosis. The exclusion criteria were radiotherapy or chemotherapy, lack of definitive pathology results, invalid colposcopic images, unknown HPV status, or unknown cytological information ( Figure 1). The digital records of patients, including HPV and cytological information, colposcopic images, type of transformation zone, colposcopic diagnosis by a senior colposcopist, biopsy information (number and site), and histopathological diagnosis were collected from the hospital registry system. General information (age, smoking status, reproductive history, and HPV vaccination status) was collected from the electronic outpatient records. The study was approved by the Ethics Committee of the Affiliated Cancer Hospital of Xinjiang Medical University (approval number: K-2021055). The need for informed consent was waived because the study used anonymized data that were collected retrospectively.
Experienced cytologists from the Affiliated Cancer Hospital of Xinjiang Medical University performed liquid-based cytology (SurePath, BD Oncolarity, United States) and interpreted the results using the Bethesda 2001 classification system (25). Cytological results were classified as negative intraepithelial lesions or malignancies (NILM), atypical squamous cells of undetermined significance

Colposcopic procedure and histological confirmation
A senior colposcopist with over 20 years of specialized experience in the colposcopy clinic used a high-resolution electronic colposcope (EDAN, China) to perform the colposcopic examination in accordance with standard clinical guidelines (26). In brief, 5% acetic acid was applied to the cervix, and the visibility of the squamocolumnar junction, presence of aceto-whitening, and colposcopic lesions were documented for each woman. The final colposcopic diagnosis was recorded as benign/normal, low-grade lesion, or high-grade lesion. The colposcopic images were saved in JPEG format (640 × 480 pixels). For each woman, the colposcopic images consisted of five sequential images, namely a pre-acid image (at 0 s) and four post-acid images with an approximate time interval of 30 s (i.e., at 60, 90, 120, and 150 s) (23). Direct biopsy was performed when the colposcopic impression was satisfactory and suspicious lesions were seen; if the colposcopy was unsatisfactory or the result was HPV 16/18-positive or cytology showed high-grade abnormalities, four random biopsies were taken at the 3, 6, 9, and 12 o' clock positions.
Senior pathologists in the Affiliated Cancer Hospital of Xinjiang Medical University performed the histologic diagnosis using hematoxylin and eosin-stained slides. When the lesions were equivocal, p16 and Ki67 immunohistochemical staining of the tissue specimens was performed and the final diagnosis was made after a conjunctive analysis of the slides stained with hematoxylin and eosin and p16/Ki67. All histopathological findings were categorized by the cervical intraepithelial neoplasia (CIN) classification system as benign, CIN grades 1, 2, and 3, and cancer, with the worst finding used as the final diagnosis.
In addition to the examinations described earlier, a junior colposcopist with 1 year of experience working in the gynecological department reviewed all colposcopic images. The junior colposcopist was aware of the HPV status and cytological findings but was blinded to the colposcopic diagnosis by the senior colposcopist and the histological diagnosis. The junior colposcopist categorized the colposcopic findings using the 2014 WHO Classification of Female Reproductive System Neoplasms (27) as normal/benign, LSIL, HSIL, and cancer.

Diagnosis by the CAIADS
The CAIADS that was developed and initially validated by Xue et al. (23) was used to diagnose the cervical lesions. In brief, both the colposcopic images and the non-imaging information (cytology and HPV status) were inputted into the CAIADS to enable it to make a diagnostic judgment. The CAIADS algorithm mapped the input features (colposcopic images and non-imaging information) to the corresponding two target tasks (grading of the colposcopic impressions and guidance of biopsies) based on four deep learning networks, namely cervix detection, feature encoding, graph convolutional network-based feature fusion, and lesion area segmentation networks (23,28).
The findings of the CAIADS were categorized into three groups: benign, LSIL, and HSIL or worse (HSIL+), and the biopsy number and specific sites were indicated by the system with blue circles (Supplementary Figure 1). The CAIADS and the junior colposcopist received the same anonymized colposcopic images and non-imaging data (cytology and HPV status) to make an independent diagnosis while blinded to the senior colposcopist's findings and the histological results. To evaluate the role of the CAIADS in assisting the junior colposcopist, the order of the colposcopic records was changed and the junior colposcopist performed a second review with the knowledge of the CAIADS results; these findings were defined as the CAIADSassisted junior colposcopist (abbreviated as CAIADS-Junior in the subsequent text). The junior colposcopist also indicated the biopsy sites and number of biopsies on the original colposcopic images with and without the knowledge of the CAIADS.

Statistical analysis
The demographic and clinical characteristics were summarized using descriptive statistics. Taking the histological diagnosis as the gold standard, the diagnostic performances of the different subspecialists (CAIADS, senior colposcopist, junior colposcopist, and CAIADS-Junior) were evaluated separately for the different histology endpoints (CIN2 +, CIN3 +, and cancer). The Wilson score approach was used to calculate the sensitivity, specificity, positive predictive value, and negative predictive value with 95% confidence intervals (95% CIs). The sensitivity and specificity of the subspecialists were compared using McNemar's test. The areas under the curves (AUCs) were compared using the DeLong test (29). To evaluate the biopsy efficacy, the number of captured biopsies required per case (BNR) was calculated for each histology endpoint and the biopsy sensitivity was calculated (number of biopsies indicated by the subspecialists/the total number of diagnosed biopsies for specific endpoints). Binary logistic regression was used to estimate the odds ratios and 95% CIs to assess their impact on the CAIADS regarding accurate diagnosis and underdiagnosis. Age, ethnicity, BMI, educational level, parity, stage of menopause, cytological result, HPV status, and biopsy type were analyzed as the demographic and clinical characteristics potentially influencing the diagnostic accuracy and underdiagnosis of CAIADS. The accurate diagnosis was defined as the conditions in which the CAIADS indicated an abnormality (LSIL +) and histology confirmed CIN2 + or when the CAIADS judged a lesion as normal without the need for biopsy and histology confirmed the lesion as < CIN2; all other conditions were regarded as an inaccurate diagnosis.
Among women diagnosed as normal by the CAIADS, histological confirmation of CIN2 + was defined as underdiagnosis, while a histological confirmation of normal was defined as no underdiagnosis.
Statistical significance was defined as p < 0.05 (two-sided). All analyses were performed using IBM SPSS version 28 (IBM, New York, NY, United States) and MedCalc Statistical Version 20 (MedCalc Software Ltd., Ostend, Belgium). Figure 1 shows the flowchart of the selection of the study population. The medical records of 1,146 women with 7,646 colposcopic images were reviewed. Among them, 660 women with complete colposcopic images (five images per woman with an approximately 30-s interval between images) were identified, resulting in a total of 3,300 colposcopic images. Two-hundred-and-ninety-four women were excluded due to incomplete clinical information. A total of 366 women with a median age of 44 years (range 22-85 years, interquartile range 36-52 years) with 1,830 colposcopic images were included in the final analysis ( Figure 1). The detailed demographic information of the cohort is presented in Table 1.

Colposcopic findings of the CAIADS and the junior colposcopist
The CAIADS indicated 131 LSIL cases (35.8%) and 48 HISL + cases (13.1%), whereas the junior colposcopist indicated 140 LSIL cases (38.3%) and 77 HISL + cases (21.0%). When assisted by the CAIADS, the detection rate of the junior colposcopist increased to 40.2% (n = 147) for LSIL and 23.8% (n = 87) for HSIL +. Table 3 and Supplementary Figure 2 show the diagnostic performance of the CAIADS in comparison with the junior and senior colposcopists, and its value in assisting the junior colposcopist. Concerning CIN2 + and CIN3 + detections, the CAIADS showed a sensitivity of approximately 80%, which was not significantly lower For all endpoints, the CAIADS showed the highest specificity (55-64%) and the highest positive predictive values compared with the senior and junior colposcopists. Furthermore, the CAIADS had the highest overall accuracy for all endpoints. As shown in Figure 2 and Supplementary Figure 2E, there were significant differences between the AUC of the CAIADS and the junior colposcopist in detecting CIN2 + and cancer, although this difference was not significant for detecting CIN3 +. The AUC of the CAIADS was significantly higher than that of the senior colposcopist in detecting cancer (0.773 vs. 0.648; p < 0.001).

Biopsy efficacy and sensitivity of the CAIADS and CAIADS-junior
A total of 1,415 biopsies were taken from the 366 women by the senior colposcopist. To reflect the biopsy efficacy of the subspecialists, the BNRs ( Figure 3A) and biopsy sensitivity ( Figure 3B)

Discussion
Colposcopy is the cornerstone of the cervical cancer screening program and is used in combination with pathology to determine the best management strategy. However, the accuracy of colposcopy is a  worldwide concern due to its subjective nature as it is highly operatordependent; this issue is compounded in low-to middle-income countries with a limited number of well-trained colposcopists. The inaccuracy of colposcopy is reflected by the large variation in the consistency rate between colposcopic findings and pathology, ranging from 37 to 66% (30)(31)(32)(33)(34). With the worldwide trend of using HPV testing as the primary screening method, which inevitably leads to a significant increase in colposcopy referrals, there is an increasing demand for high-quality colposcopic examination to precisely identify the cervical lesions and locate the biopsy sites to obtain the final pathological diagnosis. If the accuracy of colposcopy-directed biopsy cannot be guaranteed, the efficacy of the screening program will be limited. The great advances in AI technology have brought the opportunity to improve medical practice in recent years. AI-based or deep learning-based colposcopic methods have shown promise in several studies (35)(36)(37)(38). In these studies, AI-based colposcopy or deep learning-based colposcopy systems were trained and validated using more than 10,000 colposcopic images, and the performances of these systems were compared with colposcopists with different levels of experience. In diagnosing histologically confirmed HSIL + cases, the reported sensitivities of AI-colposcopy, colposcopists, and AI-assisted colposcopists are 74.1-82.8%, 19.5-100%, and 66.7-84.5%, respectively. Overall, the diagnostic performance of colposcopists varies greatly, whereas the sensitivity of AI colposcopy tends to be stable between studies. In our study, the CAIADS and CAIADS-Junior findings had a sensitivity of more than 80% for high-grade lesions. These findings further reflect the fact that as an objective tool that is trained, set up, and validated using thousands of images, AI colposcopy has great potential to ensure the quality of colposcopic examination, which is of particular importance in areas that lack welltrained colposcopists.
The major aim of the colposcopic examination is to precisely obtain biopsies to confirm a histological diagnosis of HSIL or cervical cancer. Most studies have only explored the diagnostic performance of AI-colposcopy (35,36,39,40), while there is a lack of evidence regarding the role of AI in the last critical step (guiding biopsy), which makes AI colposcopy less practical in the areas lacking well-trained colposcopists. The CAIADS used in our study showed its advantages in colposcopic diagnosis, demonstrated its superiority in a colposcopytargeted directed biopsy, and revealed its potential in assisting junior colposcopists to improve their targeted biopsy performance, achieving a higher efficacy and biopsy sensitivity than that of the junior colposcopist alone.
The accuracy of colposcopic diagnosis and targeted biopsy might be influenced by various factors, such as age, menopause status, cytological abnormalities, HPV infection status, and type of transformation zone (41). We performed univariate and multivariate logistic regression analyses to identify the factors associated with the accuracy of the CAIADS and CAIADS-related underdiagnosis. Our    (42), making it easier for the CAIADS to identify the lesion areas. Overall, the role of the CAIADS is to assist colposcopists rather than supersede colposcopists in clinical practice and decision-making. External validation of the CAIADS has provided powerful evidence for its accuracy in the colposcopic examination (43). The present study used an independent real-world dataset (neither training nor an adjustment dataset) to evaluate the feasibility and effectiveness of the CAIADS, providing evidence for its clinical application in colposcopy clinics. The CAIADS was first applied in Xinjiang and was applied to ethnically diverse populations, affirming its geographical and ethnic generalization abilities. External validation of the CAIADS identified man-machine cooperation rather than man-machine confrontation. Previous studies have shown that humans and AI achieve similar outcomes and have suggested that humans will be replaced by AI (23,44,45). However, the present study revealed that the AI-assisted colposcopist achieved the best results, which is more in line with ethical, moral, and legal requirements than the use of AI alone.
The implementation of the CAIADS still has the following problems in less-developed areas (13,46). First, the quality of available cervical information (screening data, colposcopy images, etc.) may affect the colposcopic interpretation, and descriptive terms are not standardized in colposcopy practice due to the use of different types of colposcopic equipment, including cervical labeling, annotation, classification, and quality supervision (26,31,47). Thus, we aim to apply the CAIADS in various scenarios. Second, a wide area network may be difficult to achieve in less-developed areas due to the requirement for high-definition images and large running memory. Therefore, we aim to develop a software version of the CAIADS that is feasible using a local service network. Finally, colposcopists in low-resource areas may have incorrect notions about AI. For colposcopists to effectively use the CAIAD, it is important to understand that AI is a tool that assists the physician and does not take the place of a physician in making decisions.
The main strengths of this study are that we externally validated AI-based colposcopy (using the CAIADS) in diagnosing cervical lesions and targeting biopsy sites based on a hospital-based retrospective study in Xinjiang, China, proving important evidence on the performance and feasibility of CAIADS in resource-limited areas. While, the major limitation is that, as a retrospective study, the CAIADS and the junior colposcopist made decisions by reviewing high-resolution colposcopic images. Therefore, some potentially malignant cases that were detected by either the CAIADS or the junior colposcopist might have been missed and thus not biopsied by the senior colposcopist. However, the senior colposcopist who performed the colposcopic examinations had more than 20 years of working experience in a colposcopy clinic, which may have reduced the risk of missed cases. Furthermore, only one junior colposcopist with 1 year of experience reviewed the colposcopic images, inevitably leading to observer bias. Given that the current study is one of the very few studies evaluating the role of AI-colposcopy in assisting a junior colposcopist in diagnosing and guiding the biopsy during cervical cancer screening, which might be the most practical way to use AI in the screening setting, the promising findings provide the necessary evidence for future population-based, multicenter studies to further evaluate the use of AI in real-world settings.

Conclusion
The CAIADS may enhance the diagnostic and biopsy accuracy of junior colposcopists. Therefore, the CAIADS might be a promising solution to improve the colposcopy practice in low-resource areas with limited numbers of well-trained colposcopists.

Data availability statement
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Ethics statement
The studies involving human participants were reviewed and approved by the Research Ethics Committee of Affiliated Tumor Hospital of Xinjiang Medical University. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.

Author contributions
RR and YQ designed the study. AW, PX, GA, and DT were involved in the administration of fieldwork, data collection, and assembly. AW, PX, and RR participated in manuscript writing, data analysis, and interpretation. YQ and GA provided constructive comments and revisions to the manuscript. All authors contributed to the article and approved the submitted version.