Nomogram Model for Prediction of SARS-CoV-2 Breakthrough Infection in Fujian: A Case–Control Real-World Study

SARS-CoV-2 breakthrough infections have been reported because of the reduced efficacy of vaccines against the emerging variants globally. However, an accurate model to predict SARS-CoV-2 breakthrough infection is still lacking. In this retrospective study, 6,189 vaccinated individuals, consisting of SARS-CoV-2 test-positive cases (n = 219) and test-negative controls (n = 5970) during the outbreak of the Delta variant in September 2021 in Xiamen and Putian cities, Fujian province of China, were included. The vaccinated individuals were randomly split into a training (70%) cohort and a validation (30%) cohort. In the training cohort, a visualized nomogram was built based on the stepwise multivariate logistic regression. The area under the curve (AUC) of the nomogram in the training and validation cohorts was 0.819 (95% CI, 0.780–0.858) and 0.838 (95% CI, 0.778–0.897). The calibration curves for the probability of SARS-CoV-2 breakthrough infection showed optimal agreement between prediction by nomogram and actual observation. Decision curves indicated that nomogram conferred high clinical net benefit. In conclusion, a nomogram model for predicting SARS-CoV-2 breakthrough infection based on the real-world setting was successfully constructed, which will be helpful in the management of SARS-CoV-2 breakthrough infection.


INTRODUCTION
Two inactivated vaccines (Sinovac and Sinopharm), widely used in mainland China, are highly effective in preventing COVID-19, hospitalization, intensive care unit admission, and COVID-19related deaths (Jara et al., 2021). However, as SARS-CoV-2 variants are emerging globally, the efficacy of the vaccines might reduce (Brosh-Nissimov et al., 2021). In a real-world study in Guangzhou City, China, in May 2021, the efficacy of two-dose vaccination was 59.0% against the Delta variant (Li et al., 2021). More and more SARS-CoV-2 breakthrough infection has been reported around the world (Rana et al., 2021;Brown et al., 2021;He et al., 2021;Karim and Karim, 2021), highlighting the need to identify prominent risk factors correlated with SARS-CoV-2 breakthrough infection in vaccinated individuals.
Evidence from randomized controlled trials of mRNA-1273 (Gilbert et al., 2021) and ChAdOx1 (Feng et al., 2021) vaccination indicated a higher risk for SARS-CoV-2 breakthrough infection among persons with lower neutralizing, spike, or receptor-binding domain (RBD) titers in the early breakthrough infection period. Real-world data suggested that among fully vaccinated healthcare workers, the occurrence of SARS-CoV-2 breakthrough infection was highly correlated with neutralizing antibody titers during the peri-infection period (Bergwerk et al., 2021), and in vaccinated patients receiving dialysis low levels of circulating RBD antibody, it was associated with higher risk for breakthrough infection (Anand et al., 2021). Meanwhile, partially vaccinated individuals without obesity (body mass index (BMI) < 30 kg/m 2 ) had lower odds of SARS-CoV-2 breakthrough infection (Antonelli et al., 2021), while a history of contact with a confirmed positive case and presence of symptoms were major risk factors for SARS-CoV-2 breakthrough infection in fully vaccinated individuals (Alishaq et al., 2021). It can be concluded from previous studies that risk factors of SARS-CoV-2 breakthrough infection may vary due to the different vaccine brands, vaccinated types, demographics, or underlying conditions. Therefore, it is necessary to conduct more studies to identify SARS-CoV-2 breakthrough infection risk factors, and it is meaningful to construct models based on those risk factors to predict SARS-CoV-2 breakthrough infection.
During the recent and first SARS-CoV-2 Delta variant outbreak in Fujian province, mainland China, 10,961 close contacts including 471 (4.30%) COVID-19 cases have been identified as linked to the first patient possibly transmitted by an imported case. In total, 8,345 (76.13%) of them, 891 (8.12%) close contacts who received the firstdose vaccination (partially vaccinated) and 7,454 (67.08%) close contacts who received the second-dose vaccination (fully vaccinated), have been vaccinated, and most of them are vaccinated by Sinovac and Sinopharm. Further, there are 6,189 close contacts (partially or fully vaccinated) whose index cases' vaccinated information and SARS-CoV-2 RT-PCR testing results are complete. This real-world setting might provide a good opportunity to find risk factors and establish a reliable model to predict SARS-CoV-2 breakthrough infection against the Delta variant, which will help us distinguish vaccinated individuals who are at high risk of breakthrough infection and develop guidance to augment their protection, either by continued social distancing or by additional active or passive vaccinations.

Study Design and Population
The demographic (age, sex, occupation, etc.), SARS-CoV-2 vaccination (date of vaccination and manufacturer), epidemiology (exposure date, relationship with other cases and contacts), and clinical characteristics (symptoms, severity classification, date of onset, and SARS-CoV-2 RT-PCR assay results) of 10,963 close contacts and index cases were collected by experts in Fujian Provincial Center for Disease Control and Prevention (FJCDC). A confirmed case with SARS-CoV-2 breakthrough infection was defined as a positive result of repeat RT-PCR assays for nasal and pharyngeal swab specimens.
In this study, the vaccination time of vaccinated individuals with SARS-CoV-2 breakthrough infection, defined as the first dose time or the second dose time, was calculated from the vaccination date to the date of RT-PCR testing positive result, while the vaccination time of non-SARS-CoV-2 breakthrough infection vaccinated individuals were calculated from vaccination date to September 15, 2021, when the epidemic outbreak of SARS-CoV-2 Delta variant reached to peak in Fujian. If cases were not vaccinated, we defined the vaccinated brand as unvaccinated and the vaccinated time as zero.
Because 2 weeks was required to form protective effects against SARS-CoV-2 infections, we defined the first-dose vaccination (partially vaccinated) and second-dose vaccination (fully vaccinated) as vaccinated time having elapsed for more than 14 days. Otherwise, study participants would be deemed unvaccinated despite that they had received the first dose of vaccination or deemed to have received the first dose of vaccination only although they had received the second dose.
In total, 6,189 vaccinated individuals whose index cases' vaccinated information and SARS-CoV-2 RT-PCR testing results were complete were recruited including 219 (3.54%) with SARS-CoV-2 breakthrough infection and 5,970 (96.46%) without breakthrough infection. The recruited vaccinated individuals were randomly grouped into a training (70%) cohort and a validation (30%) cohort to construct and evaluate the logistic regression model (Figure 1).
The study was approved by the Joint Prevention and Control Mechanism of the State Council.

A Nomogram Construction
In the training cohort, univariate and multivariate logistic regression models via stepwise regression analysis were constructed. We further simplified the complex logistic regression model into a visualized nomogram by using the rms package of R. Subsequently, the efficiency of the visualized nomogram was evaluated by receiver operating characteristic (ROC) curve, calibration curve, and decision curve in both training and validation cohorts. The ROC was used to assess the discriminative ability of the nomogram and then the area under the curve (AUC). ROC analysis was used to calculate the optimal cutoff values that were determined by maximizing Youden's index. The accuracy of the optimal cutoff value was assessed by sensitivity and specificity. A calibration curve was used to compare the association between actual outcomes and predicted probabilities. Decision curve analysis (DCA) was performed by calculating the net benefits for a range of threshold probabilities to evaluate the clinical utility of the nomogram (Vickers and Elkin, 2006;Pan et al., 2020).

Statistical Analysis
Baseline continuous variables were expressed as means (SDs), and categorical variables were presented as frequency (%). Differences between the SARS-CoV-2 breakthrough infection group and the non-SARS-CoV-2 breakthrough infection group were used for the Wilcoxon test for continuous variables and the chi-squared test for categorical variables. Variable correlations were measured using Spearman's correlation coefficient.
All p-values were two-sided, and p < 0.05 was regarded as significant. All statistical analyses, modeling, and plotting were performed with R (version 4.1.0 http://www.r-project.org).

RESULT Patient Characteristics
The demographic characteristics of 6,189 partially or fully vaccinated individuals with or without SARS-CoV-2 breakthrough infection are shown in Table 1. The 219 vaccinated individuals with SARS-CoV-2 breakthrough infection were much older than non-SARS-CoV-2 breakthrough infection vaccinated individuals (41.52 ± 13.03 vs 36.43 ± 14.14, p < 0.001). There were more women (59.4% vs 50.9%, p = 0.017) in the SARS-CoV-2 breakthrough infection group. Meanwhile, there was a significant difference in the first dose brand, the first dose time, the second dose brand, and the second dose time between vaccinated individuals with or without SARS-CoV-2 breakthrough infection as well as index cases implying the efficacy of different vaccination methods might vary.
To construct and evaluate the logistic regression model, those vaccinated individuals recruited were randomly split into a training (70%) cohort and a validation (30%) cohort. As shown in Supplementary Table 1, the difference of each variable between the training and validation cohorts was not significant, implying that the segmentation process was random and balanced. As shown in Supplementary Figure 1, the correlation between ORF1ab gene Ct value and N gene Ct values was significantly high (R = 0.958, p < 0.001) in index cases; to avoid collinearity and overfitting in a logistic regression model, only ORF1ab gene Ct values were included in the further analysis process in this study.    Figure 3).
To evaluate the clinical applicability of our prediction nomogram, calibration curve and DCA were performed. In Figures 4A, B, the calibration plot for SARS-CoV-2 breakthrough infection probability showed an excellent agreement between the prediction by nomogram and actual observation in the training cohort and validation cohort, respectively, which indicated good calibration of the model. DCA is a novel method for evaluating diagnostic and prognostic prediction models, which has some advantages over AUC (Vickers and Elkin, 2006). The DCA curves of nomogram showed a superior overall net benefit within the wide and practical ranges of threshold probabilities, which indicated that it had better clinical utility in predicting SARS-CoV-2 breakthrough infection in vaccinated individuals ( Figures 4C, D).

DISCUSSION
The ongoing global COVID-19 pandemic has infected hundreds of millions of people over the world, and SARS-CoV-2 vaccines are currently the best defense against COVID-19 while being relatively safe in trial studies (Baden et al., 2021;Liu et al., 2021).
In this real-world cohort study, we conducted a retrospective analysis to identify risk factors associated with SARS-CoV-2 breakthrough infection and construct a nomogram model based on multivariate logistic regression analysis to predict SARS-CoV-2 breakthrough infection in vaccinated individuals to fight against the Delta variant.
In the nomogram model, if vaccinated individuals were female, partially vaccinated, or fully vaccinated with the second vaccination dose time between 60 and 120 days, the vaccinated individuals were more likely to experience breakthrough infection. In addition, when vaccinated individuals come into close contact with index cases who were unvaccinated (first dose brand: unvaccinated), partially vaccinated (first dose time was 60-120 days), or fully vaccinated while the second dose brand was Sinovac and the second dose time was <60 days, the probability of breakthrough infection was high. The ORF1ab Ct values of the index case contributed greatly to the nomogram; the lesser the ORF1ab Ct values, the higher the probability of breakthrough infection. Interestingly, when the age of the index case was 30-40 years, the odds ratio (3.170, p = 0.001) was much higher, which might be because index cases mainly came from local shoe and clothes factories where most workers were young. Consequently, it can be deduced that it is necessary to boost vaccination in vaccinated individuals against COVID-19, and unvaccinated index cases with lower ORF1ab Ct values would lead to more secondary attacks.
The nomogram had excellent performance metrics in both the training cohort and validation cohort: AUC (0.819 vs 0.838), sensitivity (0.801 vs 0.794), specificity (0.712 vs 0.792), negative predictive value (0.989 vs 0.990), accuracy (0.715 vs 0.792), and recall score (0.801 vs 0.794) were reasonably high. However, its positive predictive value (0.094 vs 0.118) was poor, which may be due to the low rate (3.54%, 219/6,189) of breakthrough infection in this cohort. Even though the positive predictive value was poor, considerably high AUC, negative predictive value, and recall score were achieved, implying that the nomogram would perform accurately in identifying vaccinated individuals with a high probability of breakthrough infection.
Compared with other risk factors like neutralizing antibody titers in the peri-infection period (Bergwerk et al., 2021) to predict SARS-CoV-2 breakthrough infection, risk factors in our nomogram model were conveniently retrieved at any time, making our model more applicable in real-world settings. Moreover, this nomogram model had a good performance and was easy to understand and interpret. Generally, our nomogram performed well in terms of discrimination, calibration, and clinical utility in predicting who was at high risk of SARS-CoV-2 breakthrough infection. To our knowledge, this is the first report of a quantitative nomogram model for predicting SARS-CoV-2 FIGURE 3 | ROC curves of the nomogram for predicting SARS-CoV-2 breakthrough infection among vaccinated individuals in training cohort (AUC, 0.819; 95% CI, 0.780-0.858; sensitivity, 0.712; and specificity, 0.801 at the optimal cutoff of 0.027) and validation cohort (AUC, 0.838; 95% CI, 0.778-0.897; sensitivity, 0.792; and specificity, 0.794 at the optimal cutoff of 0.041). AUC, the area under the receiver operating characteristic (ROC) curve.  (Wang and Powell, 2021;Chen et al., 2022), our nomogram is based on inactivated vaccines against Delta variants in a real-world setting and may provide an alternative strategy for predicting Omicron breakthrough infection.
In conclusion, we successfully constructed a nomogram model for predicting SARS-CoV-2 breakthrough infection based on the real-world setting, which will be helpful in the management of SARS-CoV-2 breakthrough infection.

LIMITATIONS
Our study has several limitations. First, even though a cohort of 6,189 vaccinated individuals was recruited, the number of breakthrough infection cases (219) was relatively small. Second, the majority of the cohort was young and composed of vaccinated individuals with breakthrough infections who were mainly asymptomatic or mild. Thus, symptoms in some vaccinated individuals and index cases like fever, cough, sputum, nose stuffiness, or diarrhea, which may indicate COVID-19 or accelerate the spread of SARS-CoV-2, were not regarded as risk factors in this study. Third, some reported risk factors like neutralizing antibody titers in the peri-infection period or BMI were excluded as well because data were missing in most participants. Therefore, more risk factors and more participants should be included to improve the efficiency of the nomogram model, and multicenter data are needed to validate the accuracy of models in the future.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be available on reasonable request, without undue reservation.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Joint Prevention and Control Mechanism of the State Council. Written informed consent to participate in this study was provided by the participants' legal guardian/next of kin.