Colorectal Cancer Screening Modalities in Chinese Population: Practice and Lessons in Pudong New Area of Shanghai, China

Background: Parallel test of risk stratification and two-sample qualitative fecal immunochemical tests (FITs) are used to screen colorectal cancer (CRC) in Shanghai, China. This study was designed to identify an optimal initial screening modality based on available data. Methods: A total of 538,278 eligible residents participated in the program during the period of January 2013 to June 2017. Incident CRC was collected through program reporting system and by record linkage with the Shanghai Cancer Registry up to December 2017. Logistic regression model was applied to identify significant factors to calculate risk score for CRC. Cutoff points of risk score were determined based on Youden index and defined specificity. Sensitivity, specificity, and positive predictive values (PPVs) were computed to evaluate validity of assumed screening modalities. Results: A total of 446 CRC were screen-detected, and 777 interval or missed cases were identified through record linkage. The risk score system had an optimal cutoff point of 19 and performed better in detecting CRC and predicting long-term CRC risk than did the risk stratification. When using a cutoff point of 24, parallel test of risk score, and FIT were expected to avoid 56 interval CRCs with minimal decrease in PPV and increase in colonoscopy. However, the observed detection rates were much lower than those expected due to low compliance to colonoscopy. Conclusions: Risk score is superior to risk stratification used in the program, particularly when combined with FIT. Compliance to colonoscopy should be improved to guarantee the effectiveness of CRC screening in the population.


INTRODUCTION
Colorectal cancer (CRC) is one of the most common cancers globally, leading to over 1.8 million new cases and 881,000 deaths in 2018 (1). In China, CRC ranks second in incidence and fourth in death of all cancers (http://gco.iarc.fr/, access date: April 4, 2019). The rapid increasing incidence and mortality of the disease (2) and the proven effectiveness of screening in CRC prevention and control (3) motivate the Chinese government to perform and scale up population-based CRC screening around the country.
Population-based CRC screening has been implemented in many countries as a National Cancer Screening Program (4). Multiple methods were used in these programs, mainly stoolbased tests like guaiac-based fecal occult blood test, fecal immunochemical test (FIT), and stool DNA testing, and direct visualization tests such as flexible sigmoidoscopy, colonoscopy, double-contrast barium enema, CT colonography, and video capsule colonoscopy (5). In resource-limited settings, serial use of risk assessment and FIT were conducted to improve costeffectiveness of screening (6). In Jiashan County of Zhejiang Province of China, however, parallel use of a questionnaire-based risk assessment and two-sample qualitative FITs were conducted as an initial screening method to increase sensitivity of screening. It was reported that in Chinese population, the sensitivity and specificity of one positive qualitative FIT were 90.4 and 53.8%, respectively, for CRC, and those of two positive qualitative FITs were 80.8 and 75.1%, respectively (7). The pilot study in Jiashan County showed that the parallel test modality performed well in detecting early colorectal neoplasms, and the positive predictive value (PPV) reached 2.7% (8,9).
Based on the evidence, the Shanghai government launched a pilot community-based CRC screening project in 2008. Threeyear practice using a similar screening protocol of Jiashan County showed a great improvement in detection of early-stage CRC (10). In 2013, a large-scale screening program was launched as a major public health service project, making Shanghai one of the earliest cities in China to undertake mass screening of CRC. So far, three rounds of screening have been performed, and the results of the first round validated the effectiveness of parallel use of risk stratification and FIT (11). The screening modality with a high sensitivity, however, has led to a high false positive rate and thus low compliance to further colonoscopy examination (12,13), limiting the effectiveness of screening.
In this study, we took advantage of the database developed in screening practice in Pudong New Area of Shanghai, China, to optimize the risk assessment tool and seek an optimal initial screening protocol for CRC in this population.

Study Participants
Almost all guidelines recommend CRC screening for asymptomatic individuals between ages of 50 and 75 years Abbreviations: AUC, area under receiver operating characteristic curve; CI, confidence interval; CRC, colorectal cancer; FIT, fecal immunochemical test; PPV, positive predictive value; ROC, receiver operating characteristic curve. (5,14,15) as mortality benefit is greatest for patients aged 50-70 years. However, in Shanghai, one of the most aging cities in China, the service was also provided to residents aged 76-79 years old to achieve equity in health care (10). Therefore, the inclusion criteria were defined as follows: (1) permanent residents of Shanghai, (2) living in Pudong New Area of Shanghai, (3) aged 50-79 years, and (4) beneficiaries of the basic medical insurance of Shanghai.
The first round of screening was conducted in 2013, the second round covered 3 years from January 2014 to December 2016, and the third round was planned from January 2017 to December 2019. Through community mobilization, a total of 538,278 eligible volunteers attended initial screening of CRC during the period of January 1, 2013 to June 30, 2017 and were included in this analysis.
This study was approved by the Medical Ethics Committee of the Center for Disease Control and Prevention in Pudong New Area of Shanghai, China, and oral consent was obtained from each participant of the screening program.

Screening Procedure
A two-stage sequential screening was designed and conducted in all 15 districts of Shanghai in 2013. A questionnaire-based risk assessment and two-sample qualitative FIT were used as initial screening.

Risk Stratification
The participants were regarded as positive in risk assessment if they had one of the following events: (1) a history of any cancer; (2) a history of polyps; (3) a family history of CRC in a first-degree relative and/or at least two of the following events: (a) chronic coprostasis, (b) chronic diarrhea, (c) phlegmatically blood feces, (d) serious unhappy life events such as death among first-degree relatives, (e) chronic appendicitis or appendectomy, and (f) chronic cholecystitis or cholecystectomy.

Fecal Immunochemical Test
Two stool samples were collected with an interval of 1 week by community healthcare staff and tested in a local hospital by contracted experienced technicians. Three different parts were taken from each stool sample and then mixed and washed by special buffer solution. Each sample was collected in a tube, including about 5 ml moist stool content. A qualitative FIT test was conducted in 5 min after collection using colloidal gold assay (monoclonal antibody), with a positivity threshold of 100 ng/ml of sample solution. FIT test kits were purchased from Shanghai Lijun Medical Co. Ltd., China.

Colonoscopy
Individuals with a positive FIT test or a positive risk assessment were regarded as positive in the first stage and were invited to undergo a colonoscopy as the second stage of screening. Colonoscopies were required to be performed in one of the 13 designated hospitals, where polyps and adenomas were removed once diagnosed. The risk assessment and FITs were administered free to participants, but colonoscopy was paid by basic medical insurance of Shanghai.

Data Collection
To evaluate the effectiveness of the CRC screening program, we took all subjects as members of a prospective cohort. A 12digit barcode was assigned to each participant at recruitment to follow screening results. Baseline demographic information and risk factors were collected through in-person interview using a structured questionnaire. The barcode appeared on the fecal collect tube, and when participants returned the tube, the FIT results were entered into the reporting system by scanning the barcode. The results of colonoscopic and histopathologic examinations were entered using the same barcode in designated hospitals and submitted monthly by the local community healthcare staff to the Center for Disease Control and Prevention in Pudong New Area of Shanghai through an internet-based reporting system.
Newly diagnosed CRCs were obtained from the program reporting system as screen-detected cancers and supplemented by record linkage with the Shanghai Cancer Registry up to December 31, 2017 using unique ID numbers (Figure 1). Interval CRC was defined as those detected within 2 years after a negative initial screening test, while missed cases referred to those detected within 2 years after a positive initial screening test.

Quality Control
The process of the screening program was supervised by the staff in the Center for Disease Control and Prevention in Pudong New Area of Shanghai who organized annual training for physicians, planned progress of the screening program, monitored screening tests, and supervised data collection and data entry. The final database was double-checked and verified to improve quality. Field quality control was conducted by community health care staff who were motivated by subsidies according to workload and quality assessment.

Statistical Analysis
Positive rate was calculated as the number of subjects positive in the respective screening test divided by the number of all participants of the test. Observed detection rates were calculated as the number of screen-detected CRC divided by the number of all participants, while expected detection rates were calculated as the number of prevalent CRC (screen-detected, interval, and missed CRC) divided by the number of all participants.
Fisher exact test was used to test the differences in positive rates and detection rates. Kappa coefficients were used to evaluate consistency of stratified risk with FIT results. Logistic regression model for prevalent CRC cases was fitted by backward selection with age, sex, education, and risk factors listed in the questionnaire to identify significant factors to construct CRC risk score. Risk score was calculated by multiplying the β-coefficients of the significant variables by 10 and rounding to the nearest integer (16). Receiver operating characteristic (ROC) curve was obtained by plotting sensitivity against 1-specificity to evaluate performance of risk score and risk stratification used in the program. The optimal cutoff point of risk score was identified based on Youden index, which was at the maximum sum of the sensitivity and specificity-1 (16). The cutoff point at the same specificity of risk stratification was also used to compare PPVs of the two risk assessment methods.
In order to testify the stability of the present model, we developed a model in randomly selected 90% of the overall sample according to the above-mentioned analysis method and validated in the remaining 10% of the sample. The above progress was repeated 10 times. Significant risk factors in 10 subgroups were identical to those in the whole samples, and the areas under ROC curve (AUC) ranged from 0.644 to 0.664 for risk score. Sensitivity, specificity, and PPV were computed to evaluate validity of assumed screening modalities.
Person-years of observation was used to calculate overall incidence [95% confidence intervals (CIs)] of CRC by subgroups. The period of observation was further split into two intervals (within 2 years and ≥2 years of screening) to calculate incidence (95% CI) of CRC during each period. Sensitivity analysis was performed by defining interval and missed CRCs as those detected within 3 years after an initial screening test.
All statistical analysis was performed in the Statistics Analysis System version 9.4 (SAS 9.4).

Demographic Characteristics of the Participants
In the program, a total of 538,278 residents participated in the screening program, accounting for 39.7% of all eligible residents ( Table 1). More women and individuals aged 60-69 years participated in the program. Among all subjects, 55,264 (10.0%) were stratified as high-risk individuals, and 70,273 (13.1%) were positive in at least one FIT. As a result, a total of 115,247 (21.0%) participants positive in risk assessment or in FIT were considered as positive in the initial screening test and were advised to have a further colonoscopy examination. The positive rate increased with age and was higher in men and in the residents with college education or higher (p < 0.0001). Of all positive subjects in initial screening tests, only 27,097 (23.5%) had a colonoscopy examination, whereas 588 negative subjects had colonoscopy for unknown reasons.
The score ranged from 0 to 49, with an optimal cutoff point of 19. The cutoff point increased to 24 at the similar specificity of risk stratification used in the program (89.7%). The risk score performed better in detecting CRC than risk stratification, with AUC being 0.655 vs. 0.526 for risk stratification (Figure 2).
The factors for risk assessment were not well-consistent with FIT results, with an agreement ranging from 80.9 to 86.0% and a Kappa coefficient from 0.01 to 0.03 (p < 0.0001). The low agreement with FIT was also observed for overall risk assessment,   with an agreement of 80.5% and a Kappa coefficient of 0.06 with risk stratification (p < 0.001), and an agreement of 54.7% and a Kappa coefficient of 0.04 with risk score (p < 0.001) ( Table 2).

Detection Rates of Colorectal Lesions by Subgroups
A total of 446 CRC cases were screened and reported, and as many as 777 missed or interval cases were identified through record linkage with the Shanghai Cancer Registry possibly due to low compliance to colonoscopy. Detection rates, both observed and expected, were significantly higher in high-risk individuals defined by risk stratification, risk score, and FIT and were the highest (20.8/1,000 and 38.7/1,000, respectively) among subjects with high-risk score and positive double FIT. Detection rates of precancerous lesions (advanced adenoma, small tubular adenoma, serrated adenoma, villous adenoma, hamartoma, highand low-grade dysplasia, tubular villous adenoma, etc.) were also higher in high-risk subjects defined by risk stratification, risk score, and FIT ( Table 3). As shown in Figure 3, CRC incidence was 81.5/100,000 among subjects with high-risk score only, significantly higher than 34.2/100,000 among those with low-risk score and negative double FIT. Detection rates and incidence of CRC doubled among subjects with high-risk score and any FIT positive than in those with any FIT positive only.

Incidence of Colorectal Cancer Along Follow-Up Time
As shown in Table 4, risk stratification, risk score, and FIT performed well in predicting CRC risk, with significant higher incidence of CRC after 2 or 3 years of initial screening in positive subjects. With the least number of interval CRC cases, parallel use of FIT and risk score performed better than modality used in the program in identifying individuals at high risk of CRC. Figure 4 presents incidence of CRC along with years of followup until December 2017 by results of risk score and FIT. A peak in incidence was observed within 6 months of screening, and then the incidence decreased within 2-3 years of screening. Thereafter, the incidence increased with the follow-up time in each group. Table 5, if all positive subjects received further colonoscopy and diagnostic examinations, the initial screening We further evaluated validity of assumed risk score-based screening modality. Parallel test of FIT with risk score using the optimal cutoff point of 19 detected more CRC cases than parallel tests of FIT with risk stratification, but at the cost of decreased PPV (0.39%) and doubled colonoscopy examinations for each detected CRC. When using 24 as the cutoff point of risk score, parallel test of FIT with risk score was expected to avoid 56 interval CRCs with a minimal decrease in PPV and an increase in colonoscopy per detected CRC.

DISCUSSION
In this CRC mass screening program provided by the Chinese government as a major public health service (17), the main findings include the following: (1) risk assessment was  complementary to FIT in identifying CRC cases, supporting parallel test of the two methods in the population; (2) the compliance rate was as low as 23.5% in positive subjects, indicating the urgency to optimize initial screening modality in the population; (3) risk score system developed in this study performed better in detecting CRC than risk stratification used in the program, indicating potential benefits by using risk score; and (4) parallel use of FIT and risk assessment performed well in predicting long-term risk of CRC, suggesting that subjects positive in initial screening should be followed up extensively even if they are negative in colonoscopy examinations. Selection of CRC screening modality depends not only on validity of the modality in target population but also on feasibility, affordability, compliance, and clinical capacity of screening, particularly in resource-limited settings (5). In Shanghai CRC screening program, FIT, the most widely used qualitative CRC screening method, was used to identify highrisk individuals using a cutoff value of fecal hemoglobin (Hb) ≥ 100 ng/ml (20 µg Hb/g feces) based on evidence from Chinese (18) and other populations (4,19,20). In a meta-analysis including 17 studies, the median fecal Hb positivity cutoff was found to be 20 µg Hb/g feces, with a range of 10-200 µg Hb/g feces (21). The detection threshold resulted in high specificity but low sensitivity in our population and thus a large number Frontiers in Oncology | www.frontiersin.org of interval CRCs, which are usually considered as a failure of detection due to the lack of diagnostic tools with perfect sensitivity and specificity (22). Combined use of risk stratification and FIT has been performed to achieve higher accuracy than FIT only (23). The importance of risk assessment in initial screening was also supported by Steele et al. (24), who found that interval CRCs were less likely to bleed. Considering that FIT can detect bleeding lesions while questionnaire-based risk assessment helps to identify individuals with lesions not bleeding (25), parallel test of the two methods was developed in 2006 in China as an initial screening modality to improve sensitivity of CRC screening (9) and recommended to the whole country (8). The observed low consistency of risk factors with FIT, as well as the greatly improved sensitivity, strongly supports parallel test of risk assessment and FIT in the population.
In this study, we developed a risk score system based on long-standing risk factors like age, sex, history of any cancer, and family history of CRC that perform well in long-term risk prediction (26), and specific intestinal symptoms such as diarrhea, constipation, mucus bloody stool, and intestinal polyps that had better short-term predictive values for CRC (27,28). The risk score system was superior to currently used risk stratification in detecting malignant and precancerous lesions and in predicting long-term risk of CRC, but at the cost of almost doubled colonoscopy per detected CRC. It is of note that sensitivity of qualitative FIT was much lower in this study than in a previous report (7). Therefore, the parallel test screening modality should be optimized to trade off validity, compliance to colonoscopy, and clinical capacity of screening by adjusting cutoff point for risk score and by improving stoolbased test.
In this study, only 23.5% positive subjects had colonoscopy, lower than 39.8% in the whole population of Shanghai (11). In addition to subpopulation disparity, compliance to colonoscopy in this study may have been underestimated due to the lack of information beyond the 13 designated hospitals. Nevertheless, low compliance to colonoscopy is common around the world, regardless of age, sex, and ethnicity (29), making a large number of missed cases a bigger challenge than interval cases. Validity of screening modality, particularly specificity, has been associated with compliance to colonoscopy (30). Lower specificity of the risk score-based screening modality may further decrease the compliance. Given the low compliance to colonoscopy, the numbers of detected neoplasms in each category of the new risk score strategy may be greatly underestimated. In this study, compliance to colonoscopy was 16.9% among high-risk individuals defined by risk stratification, triple of 5.6% in subjects with high-risk score, indicating potential benefits of using risk score even at the current level of compliance. When we improved specificity of risk score at same level of risk stratification by increasing its cutoff point to 24, we found that the risk scorebased screening modality may detect additional 56 CRCs at the cost of additional 9,400 colonoscopy examinations, supporting utility of the risk score system. Moreover, medical insurance, lower educational attainment, discomfort during colonoscopy, fear of complications, and lack of information on colonoscopy procedures were also barriers to colonoscopy screening (31)(32)(33), and should be overcome to increase compliance to colonoscopy.
There are several strengths of this study. First, the large sample size makes it possible to evaluate performance of multiple assumed screening modalities. Second, the risk score system was developed with a comprehensive range of risk variables such as age, sex, history of cancers, and intestinal symptoms. All the information are easy to collect (26), ensuring feasibility of the system in the "real world." Moreover, the record linkage with the Cancer Registry and the Vital Statistics enabled us to collect all CRC cases and to calculate person-years of observations accurately, through which we found that the incidence of CRC decreased sharply after an incidence peak and began to increase between 2 and 3 years after screening, supporting the use of the period to define interval CRC and missed CRC (20,24,34). Finally, sensitivity analysis was conducted by defining interval or missed cases as linked CRC diagnosed within 3 years after initial screening. Similar results provide further evidence for our conclusions.
Several limitations should be considered. First, we did not collect information on lifestyle factors such as smoking, alcohol use, red meat intake, and physical activities, which have been included in multiple risk score systems (26,35). It is possible that these unmeasured confounders may have biased the associations of collected risk factors with the risk of CRC and thus the weighing of each factor in the system. We could not compare the risk score system developed in this study with others due to the lack of lifestyle information to calculate risk score within other systems. Second, we may have underestimated the incidence of CRC in this population because of the lagging in cancer registry. Furthermore, the screening value of risk score system developed in this study was just validated internally. External validation study is needed to verify the extrapolation and generalization of the system. Finally, the follow-up time was not long enough to observe long-term predictive value of the risk score system, in which a longer follow-up is warranted.

CONCLUSIONS
In conclusion, quantitative risk score-based modality may help to improve effectiveness of CRC screening and has potential of scaling up in the population. Cutoff points of risk score should be optimized and stool-based test should be improved for large-scale usage in Chinese population. The effect of the parallel screening modality on improving compliance to colonoscopy and early detection of CRC, as well as its cost-effectiveness in view of society, warrant further evaluations.

DATA AVAILABILITY
The datasets generated for this study are available on request to the corresponding author.

ETHICS STATEMENT
This study was approved by the Medical Ethics Committee of the Center for Disease Control and Prevention in Pudong New Area of Shanghai, and oral consent was obtained from each participant of the screening program.

AUTHOR CONTRIBUTIONS
WW and YW drafted the manuscript. TL and WX conceived and designed the study. CY and BY made substantial contributions to the study design. CY and YZ were responsible for study coordination. YW and BY contributed to data quality control. HJ and XL contributed to data analysis. All authors contributed to the revision of the manuscript and approved the final manuscript.