Cohort Profile: Chinese Cervical Cancer Clinical Study

Cervical cancer is the fourth most common cancer worldwide, but its incidence varies greatly in different countries. Regardless of incidence or mortality, the burden of cervical cancer in China accounts for approximately 18% of the global burden. The Chinese Cervical Cancer Clinical Study is a hospital-based multicenter open cohort. The major aims of this study include (i) to explore the associations of therapeutic strategies with complications as well as mid- and long-term clinical outcomes; (ii) to widely assess the factors which may have an influence on the prognosis of cervical cancer and then guide the treatment options, and to estimate prognosis using a prediction model for precise post-treatment care and follow-up; (iii) to develop a knowledge base of cervical clinical auxiliary diagnosis and prognosis prediction using artificial intelligence and machine learning approaches; and (iv) to roughly map the burden of cervical cancer in different districts and monitoring the trend in incidence of cervical cancer to potentially inform prevention and control strategies. Patients eligible for inclusion were those diagnosed with cervical cancer, whether during an outpatient visit or hospital admission, at 47 different types of medical institutions in 19 cities of 11 provinces across mainland China between 2004 and 2018. In a total, 63 926 patients with cervical cancer were enrolled in the cohort. Since the project inception, a large number of standardized variables have been collected, including epidemiological characteristics, cervical cancer-related symptoms, physical examination results, laboratory testing results, imaging reports, tumor biomarkers, tumor staging, tumor characteristics, comorbidities, co-infections, treatment and short-term complications. Follow-up was performed at least once every 6 months within the first 5 years after receiving treatment and then annually thereafter. At present, we are developing a cervical cancer imaging database containing Dicom files with data of computed tomography/magnetic resonance imaging examination. Additionally, we are also collecting original pathological specimens of patients with cervical cancer. Potential collaborators are welcomed to contact the corresponding authors, and anyone can submit at least one specific study proposal describing the background, objectives and methods of the study.

Potential collaborators are welcomed to contact the corresponding authors, and anyone can submit at least one specific study proposal describing the background, objectives and methods of the study. Keywords

WHY WAS THE COHORT SET UP?
Cervical cancer is a major public health problem. It is the fourth most common cancer type in terms of incidence and mortality in women worldwide. In 2018, an estimated 570 000 cases of cervical cancer and 311 000 deaths caused by cervical cancer have been recorded (1)(2)(3)(4)(5). The incidence and mortality rates of cervical cancer varied widely among countries (1). With the highest number of cases (106 430) and the second-highest estimated number of deaths (47 739), China accounts for approximately 18% of the global cervical cancer burden (6).
The elimination of cervical cancer as a public health issue is considered a priority under the WHO 13th General Programmer of Work of the World Health Organization (7)(8)(9). Despite the wide availability of screening and improved therapeutic practices, the 5-year overall survival (OS) of cervical cancer remains at only 60-70% in high-income countries and it is much lower in middle-and low-income countries (10). Therefore, more reasonable prevention and control strategies must be developed. And more effective diagnosis and treatment strategies must be applied in clinical practice. These strategies are associated with a shorter potential years of life lost and a fewer disability-adjusted life year lost due to cervical cancer (11).
The effect of different management on patients with specific clinical stages of cervical cancer remains controversial (12)(13)(14)(15). Prognostic factors, including tumor characteristics, medical condition, sociodemographic characteristics, lifestyle behaviors, biomarkers, diagnosis, treatment, and care, are suggested for further detailed exploration (16)(17)(18)(19)(20)(21)(22). This may provide a potential way to reduce residual disease after treatment, as well as minimize recurrence, decrease the complications, and improve survival. Furthermore, the recent advances in medical information technology and computer technology play an important role in the realization of auxiliary diagnosis and digital healthcare of cervical cancer (23,24). These developments provide an optimism outlook for remarkable progress in the diagnosis, treatment, and prediction of the prognosis of cervical cancer.
The Chinese Cervical Cancer Clinical (Four-C) Study was created in 2014 with the aim of collecting clinical and prognostic information on patients diagnosed with cervical cancer in mainland China since 2004. Its research objectives currently focus on four main themes: (i) to explore the associations of therapeutic strategies with complications as well as mid-and long-term clinical outcomes, including comparative effectiveness research based on marginal structural models or propensity scores (25)(26)(27); (ii) to widely evaluate the prognostic factors of cervical cancer (such as late access to care and the influence of nutritional status) and then guide treatment as well as care options, and to precisely predict the prognosis of patients so as to develop much more effective program of personalized followup and intervention (21,22); (iii) to utilize artificial intelligence (AI) and machine learning (ML) approaches for multimodal data aggregation and multifactorial examination in order to develop a knowledge base of cervical clinical auxiliary diagnosis and prognostic prediction (28)(29)(30). What' more, as the Four-C Study relatively represents the occurrence of cervical cancer across mainland China in terms of age, geographical origin, year of diagnosis, clinical stage, gross type, and histological type, it can also serve to map the burden of cervical cancer in different districts and monitor trends in incidence of cervical cancer, which could potentially inform prevention and control strategies (31).

WHO IS IN THE COHORT?
The Four-C Study was set up by two phases. The inclusion criteria for the participants were as follows: subjects who were outpatients or inpatients of participating centers of the Four-C Study; subjects who have a pathology report of cervical biopsy, which is the gold standard for cervical cancer diagnosis, issued by at least two experienced doctors in a Grade III Level A hospital. In the first phase initiated in 2014, the hospital-based open  cohort recruited 46 205 patients diagnosed with cervical cancer  between 2004 and 2016 at 37 Figure 2.
The documentation of patient information included epidemical characteristics, clinical testing results, examinations, diagnoses, treatments, care and prognosis. Further details on the project are available at the International Clinical Trails Registry Platform (http://apps.who.int/trialsearch/, ChiCTR1800017778). Approval was obtained from the institutional ethics committee of Nanfang Hospital (NFEC-2017-135) (32,33). Only the information on the medical practice of each patient was collected so that individual patients could not be identified. The ethics committee exempted informed consent. Of note, every patient is assigned a unique number to match the clinical/epidemiological information obtained from medical records with original pathology specimens or image data, which can be used to validate a diagnosis or provide additional information for subsequent specific research projects.

WHAT HAS BEEN MEASURED AND COLLECTED?
At baseline, the Four-C Study collected 315 epidemiological and clinical variables by returning to the patients' medical records, which covered almost all information on clinical diagnosis, treatment, care and short-term outcome of cervical cancer, including sociodemographic characteristics, menstrual history and reproductive history, type of medical institution, cervical cancer-related symptoms, physical examination, laboratory testing, imaging report, tumor biomarkers, tumor staging, tumor characteristics, comorbidities, co-infections, management of patients and short-term complications after treatments. Table 1 shows a broad overview of the information collected on each inclusion participant. Further detailed information on management of patients, such as surgical approaches, surgical procedures, other surgical-related information, and the complete protocols radiotherapy or chemotherapy, were also documented, which was summarized in

HOW OFTEN HAVE THEY BEEN FOLLOWED UP?
Follow-up was conducted by well-trained nursers, research assistants and postgraduates in Gynecology and Obstetrics through telephone conversations with the patient or one of her family members, or via the assessment of patient electronic medical notes (both at each outpatient visit or hospital admission) at certain time intervals defined by the follow-up strategy. The follow-up interval is every 3 to 4 months within the first 2 years, every 6 months for the next 3 years, and then annually thereafter or until year 10 at the discretion of the treating physician. Bedsides, detailed follow-up information of participants is require to be collected immediately if a special and meaningful clinical manifestation is identified or a significant change in tumor-related biological markers is noted during an outpatient visit for review after treatment. Standardized variables were collected to evaluate the residual disease, residual tumor, surveillance, survival, and other information ( Table 1). These included the time of follow-up, middle-and long-term complications, treatment strategies for complications, recurrence of cervical cancer and the time of recurrence, site of recurrence and treatment after recurrence, drug use, outpatient, hospitalization and death information.
We take some measures to minimize the loss of follow-up. Firstly, all hospitals have promised to strictly abide by the followup strategy. Upon discharge or departure from the clinic, each participant was notified in detail the time interval for reexamination and/or follow-up. The doctors in charge should maintain long-term communication with each patient to improve their compliance. Additionally, patients who move or seek care elsewhere were also traced, and information of those who died was collected through telephone conversations with one of her close family members.

QUALITY ASSURANCE AND CONTROL
First of all, after full discussion, preextraction and revision, a concise and standardized case report form (CRF) and a brief guideline for performing data extraction from the medical records were developed. Baseline information was abstracted from two systems: an electronic medical record (EMR) system and a conventional paper medical record kept in the hospital medical documents room, which ensured the integrity of data. Data were read and entered into electronic CRF by two trained gynecologists, nurses or postgraduates in obstetrics and gynecology, respectively. Subsequently, we checked the  consistency of data for the same patient entered by different researchers and tried our efforts to correct any questionable data. Meanwhile, the information in the database was randomly examined by researchers designated by the study group. In this way, the accuracy and completeness of all the input information were guaranteed, even though it required significant labor and time costs. Finally, editing of the database was locked to prevent the modification or destruction of the determined data.

Components Measurements
Baseline data Sociodemographic characteristics date of birth, age at first diagnosis, ethnicity, region, Province, City, city scale, education level, occupation, residence, marital status, age of marriage, sexual life and family history of cervical cancer Menstrual history and reproductive history age of menarche, pregnancy history, age of childbearing, parity, delivery way and hormone replacement therapy history Type of medical institution general hospital, cancer center, women and children's center Related symptoms anemia, leukorrhagia, irregular vaginal bleeding, contact bleeding, a foul-smelling watery or sometimes bloody vaginal discharge, lower extremity edema, fever, oliguria or osphyalgia Physical examination bimanual pelvic examination, colposcopy, biopsy, height, weight, resting blood pressure, temperature, pulse, heart rate, the administration of 12-lead electrocardiography (ECG), auscultation heart and lung, hearing acuity, regular examinations of otorhinolaryngology, the heart and blood vessels examination, respiratory system examination, nervous system examination and abdominal viscera examination, limbs and joints movements, liver function Laboratory testing thinprep cytologic test (TCT), human papillomavirus testing (HPV testing), pathology report

Components Measurements
Surgical treatment Surgical approaches abdominal, vaginal, laparoscopic or robot-assisted Surgical procedures cone/loop resection, resection margins, presence or absence of positive resection margins, trachelectomy, type of hysterectomy, nerve-sparing radical surgery, presence or absence of ovaries and fallopian tubes, lymphadenectomy, LN dissection, presence or absence of vaginal cuff, and presence or absence of parametria Other information preoperative care, preoperative workup, pretreatment surgical, anesthesia, surgical margins, surgical interventions, blood transfusion, intraoperative blood loss, operation period, postoperative care, postoperative complications, postoperative recurrence, the time of postoperative first exhaust and defecation, the time of postoperative catheter removal, postoperative residual urine, residual disease, residual tumor or reoperation, Radiotherapy type of radiation source, radiation approach, area exposed, site exposed, unit dose, total dose, total number of segments and total treatment time as well as the effectiveness of radiotherapy Chemotherapy drug, dose, course, interval and route LN, lymph node.  Notably, all data were backed up at every stage of database development to prevent accidental loss of recorded data.

WHAT HAS IT FOUND? KEY FINDINGS AND PUBLICATIONS
Although laparoscopic surgery and robot-assisted hysterectomy for cervical cancer have significantly increased since 2004, little is known about their real effect when applied in women in China with FIGO stage IA-IIB cervical cancer. In addition, the incidence of specific complications associated with radical hysterectomy, and the influence of prognostic factors (such as uterine corpus invasion, and urologic complication) on tumor outcomes remain unclear. This large hospital-based cohort duly contributed several studies to answer above questions based on evidence in the database.  Another study (37) included 13413 participants with FIGO stage IA1 with LVSI-IIA2 cervical cancer. It showed that the rates of 5-year DFS of patients in the LRH group were lower than those in the ARH group (HR = 1.25, 95% CI 1.11 to 1.40). Compared with ARH, LRH was not an appropriate approach for patients whose FIGO stage was IB1 or IIA1 and tumor size was ≥2 cm. In other words, LRH or RRH could lead to worse oncological outcomes than ARH for patients with cervical cancer depending on the specific FIGO stage. These results demonstrate the need to be more cautious and considerate in the choice of surgical approaches.

Uterine Corpus Invasion Should be Taken Seriously
Diagnosis of uterine corpus invasion is frequently missed. According to a retrospective review of original pathology specimens (38) there were 38 patients with a missed diagnosis of uterine corpus invasion and 20 patients were misdiagnosed with uterine corpus invasion among 1414 patients with FIGO stage IA2-IIB cervical cancer from 11 medical institutions in mainland China. We found that myometrial invasion ≥50% seemed to be an independent prognostic factor for decreased rates of 5-year OS (HR = 2.74, 95% CI 1.81 to 4.13) and 5-year DFS (HR = 2.31, 95% CI 1.59 to 3.35), whereas myometrial invasion <50% or endometrial invasion was not associated with outcomes of cervical cancer.
Between 2015 and 2020, the Four-C Study gave rise to 47 articles published in peer-reviewed journals. A list of publications will be available on the website currently under construction.

FUTURE PLANS?
We will further assess outcomes of different management strategies on cervical cancers with specific clinical stages. We will also continue to evaluate the influence of various prognostic factors on the oncological outcomes to guide treatment options, care, and follow-up.
Apart from that, we will also assess the natural history of cervical cancer, including biologic onset, subclinical stage, clinical stage, and outcome as much as possible, through a comprehensive analysis of the detailed information of patients who have not received any treatment or intervention. This will enable us to clarify the yield of different treatments compared with no treatment. The exploration of the natural history of cervical cancer will also contribute to promotion of the early diagnosis and prevention.
We are building a cervical cancer imaging database (33,39) to collect the Dicom file data of computed tomography (CT)/ magnetic resonance imaging (MRI) examinations before treatment of cervical cancer. As of September 10, 2020, CT data of 3042 patients with cervical cancer and MRI data of 2843 patients with cervical cancer have been collected, among which 670 patients have both CT and MRI data. We have manually labeled the tumor boundaries and drawn the contours of the abdominal and pelvic lymph nodes on the collected original imaging. Moreover, we are collecting original pathological samples of patients with cervical cancer. Based on those, we may conduct extensive research on imaging omics and pathology omics of cervical cancer using AI and ML and advanced biological technology.
With digital medicine of obstetrics and gynecology, our team reconstructed a three-dimensional model of female abdominal and pelvic structure based on the computed tomography angiography (CTA) and MRI data sets of participants. We also constructed a digital diagnosis and treatment platform for gynecological and obstetrical diseases. More importantly, we creatively applied digital medicine technology to the preoperative diagnosis, intraoperative guidance, surgical navigation, digital delivery and postoperative evaluation of obstetrical and gynecological diseases.
Finally, our team has committed resources to diagnose difficult miscellaneous diseases using the three-dimensional modeling from the viewpoint of the source of arterial blood supply. In this clinical practice, a pelvic mass of an unknown source can also be successfully identified, which has improved the diagnosis rate of the pelvic mass of an unknown source.

WHAT ARE THE MAIN STRENGTHS AND WEAKNESSES?
This hospital-based cohort has several notable strengths. Foremost, it is one of the largest cervical cancer clinical databases to have been established and is likely to be the largest cervical cancer imaging and pathology database worldwide. A sufficiently large sample size allows us to carry out the study restricted to special-interest patient groups (e.g., patients with FIGO stage IB1 to IIA2 cervical cancer) with sufficient statistical power. And this resource may open the door to digital medical research on cervical cancer. Second, the cohort included not only cervical cancer patients who were treated with different measures, but also more than 1,400 cervical cancer patients who did not receive any treatment. It means that this cohort is able to provide us a unique opportunity to clarity the progression of cervical cancer and explore the real benefit of different therapeutic methods. Third, we have recruited participants in 26 cities of 14 provinces across the mainland China and created a network of 47 sites that include the different types and levels of hospitals. Of all patients (regardless of outpatients and inpatients) in attendance at the 47 sites were enrolled into the cohort. This makes it possible to roughly describe the burden of cervical cancer at different times and in different regions across mainland China, which could potentially inform prevention strategies. Forth, detailed information is available on socioeconomic characteristics, imaging, pathology, comorbidities, patient's management, complications, and other variables, enabling us to conduct comprehensive statistical analyses. Despite difficult working conditions, we have maintained inclusion and follow-up in this large cohort for over 6 years.
One weakness of this project is that some information on clinical outcomes prognoses relies on inpatient medical records, readmission records or outpatient records. As a result, some complications and other related events may be underreported and underestimated. In addition, although all laboratory staff strictly followed the procedure to perform each test, the results of laboratory tests may be affected by the equipment used in different hospitals, while it is really hard to get all 47 hospital to use the same equipment. Finally, several variables, such as those indicating economic status, lifestyle behaviors (smoking status, alcohol assumption and physical activity, etc.), living quality and utilization of medical resource, are not currently collected in the study because they may not be included in routine clinical practice and medical records, from which we obtained the information. We will endeavor to supplement those information during future follow-up, and we plan to include these variables when encountering a new patient with cervical cancer from July 2021.

WHERE CAN I FIND OUT MORE? CAN I GET HOLD OF THE DATA?
The databases remain the property of all participating centers and will still be managed by the Four-C Study group. Although the databases are not yet freely available in the public domain, potential collaborators are welcomed to contact the corresponding authors, Chun-Lin Chen (e-mail: ccl1@ smu.edu.cn) or Chen Mao (e-mail: maochen9@smu.edu.cn). Anyone can submit at least one specific study proposal describing the background, objectives and methods of the study. The databases can be partially transmitted to successful applicants with adequate statistical expertise. Otherwise, the members of the Four-C Study group will cooperatively analyze the data with the applicants. What's exciting is that we are trying our best to build a public website show the names of all variables and their detailed descriptions. By that moment, anyone may browse the website anytime and anywhere to learn about our project, and users interested in obtaining and using those data could also fill out a Data Use Agreement form downloading from the website to apply for access to the data.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/supplementary material. Further inquiries can be directed to the corresponding authors.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by The institutional ethics committee of Nanfang Hospital. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.