How to Synchronize Longitudinal Patient Data With the Underlying Disease Progression: A Pilot Study Using the Biomarker CRP for Timing COVID-19

The continued digitalization of medicine has led to an increased availability of longitudinal patient data that allows the investigation of novel and known diseases in unprecedented detail. However, to accurately describe any underlying pathophysiology and allow inter-patient comparisons, individual patient trajectories have to be synchronized based on temporal markers. In this pilot study, we use longitudinal data from critically ill ICU COVID-19 patients to compare the commonly used alignment markers “onset of symptoms,” “hospital admission,” and “ICU admission” with a novel objective method based on the peak value of the inflammatory marker C-reactive protein (CRP). By applying our CRP-based method to align the progression of neutrophils and lymphocytes, we were able to define a pathophysiological window that improved mortality risk stratification in our COVID-19 patient cohort. Our data highlights that proper synchronization of longitudinal patient data is crucial for accurate interpatient comparisons and the definition of relevant subgroups. The use of objective temporal disease markers will facilitate both translational research efforts and multicenter trials.

The continued digitalization of medicine has led to an increased availability of longitudinal patient data that allows the investigation of novel and known diseases in unprecedented detail. However, to accurately describe any underlying pathophysiology and allow inter-patient comparisons, individual patient trajectories have to be synchronized based on temporal markers. In this pilot study, we use longitudinal data from critically ill ICU COVID-19 patients to compare the commonly used alignment markers "onset of symptoms," "hospital admission," and "ICU admission" with a novel objective method based on the peak value of the inflammatory marker C-reactive protein (CRP). By applying our CRP-based method to align the progression of neutrophils and lymphocytes, we were able to define a pathophysiological window that improved mortality risk stratification in our COVID-19 patient cohort. Our data highlights that proper synchronization of longitudinal patient data is crucial for accurate interpatient comparisons and the definition of relevant subgroups. The use of objective temporal disease markers will facilitate both translational research efforts and multicenter trials.

INTRODUCTION
The rapid spread of the corona virus disease , caused by the SARS-Cov-2 virus, imposes a heavy burden on public health systems around the world. A substantial number of patients show a severe disease progression possibly caused by endotheliitis, gas diffusion impairment and organ ischemia (1,2). Current research efforts focus on the identification of predictive indicators that allow closer supervision and targeted intervention in high-risk patients. As a hyper-activated immune response might act as a driving factor for severe , ratios between neutrophils and lymphocytes (NLR) (3), lymphocyte counts alone (4) and elevation of specific cytokines among other laboratory values (3,(5)(6)(7) have been proposed as markers for initial patient risk assessment and stratification. Most studies solely compare measurements taken at hospital or intensive care unit (ICU) admission, neglecting the enormous potential of continuous longitudinal data obtained throughout hospitalization (3)(4)(5)(6)(7)(8). This is especially detrimental for the most severe patients, as this high mortality risk group could benefit the most from a more detailed separation into different disease progression subgroups. However, the pooling of longitudinal data requires a temporal marker to align individual patient trajectories.
During the first wave of the corona virus pandemic, patient comparisons were often made based on clinical time points such as "onset of symptoms, " "hospital admission, " or "ICU admission" (3)(4)(5)(6)(7)(8). Group comparisons based on these clinical markers might however result in the description of false differences, for example by comparing patients in late disease stages with patients in early disease stages (8), or blurring of actual differences due to temporal misalignment (Figures 1A,B). An ideal disease timer should be an objective biomarker that can provide an early indication of disease progression and should be measured routinely in most hospital settings. In this pilot study, we compare different alignment methods and show that C-reactive protein (CRP) can be used to synchronize individual patient trajectories to the underlying pathophysiology of COVID-19.

Inclusion Criteria, Ethics Approval, and Consent to Participate
We included all COVID-19 patients admitted to the University Hospital Zurich between March and July 2020 with age older than 18 years that required ICU treatment. We excluded patients with objection to the further use of medical data in research, and patients that where transferred from other hospitals. The data used in this manuscript was routinely collected during hospitalization. Whenever possible, we obtained a written informed consent of the patients (or relatives) for the further use of their medical data for research. This study has been approved by the cantonal ethics committee of Zurich.

Time Series Analysis
We used MATLAB (The MathWorks, Inc. USA) to analyze and visualize longitudinal patient data. The MATLAB function findpeaks.m was used to identify the first local CRP maximum (CRP max ). We developed custom scripts for synchronization of the time series with the temporal markers.

Predictive Modeling
Two feature vectors were generated for each patient containing the mean values of CRP, relative neutrophils and lymphocytes of a time window anchored on either ICU admission or on CRP max . We followed a stratified 5-fold cross-validation scheme, where each fold was defined as a distinct 80-20% train-test split. Within each fold, hyper-parameter selection was performed in the training set with a stratified 4-fold cross validation.
For each fold multiple logistic regression models were trained using varying hyper-parameters such as regularization type (l 1 , l 2 ) (9), regularization value in the interval of (10 −4 10 4 ), optimizer [LBFGS (10), SAGA (11)] and with or without class weighting. The best models as determined by F1-macro score on the 4-fold cross validation were then tested on the test split. Since our retrospective patient classification was done at ICU discharge, only values before outcome classification were considered for the construction of the feature vectors. This lead to the exclusion of 2 patients from the spontaneously breathing subgroup.

Statistical Analysis
Statistical testing was performed with R version 3.6.3. Multiple comparison testing with a Tukey post-hoc test was performed on single time points comparisons. Patient characterization data was tested by ANOVA for normally distributed data, Kruskal-Wallis test for non-normally distributed data and χ 2 -test for binary data. A mixed linear regression model analysis was performed for CRP max and ICU shifted data, with likelihood ratio test for overall and Satterthwaite approximation for subgroup analysis.

RESULTS
A comparison of individual patient trajectories in our cohort of 28 critically ill COVID-19 patients admitted to the ICU of the University Hospital Zurich ( Table 1) revealed considerable interpatient variability ( Figure 1A): Out of the 28 patients, 8 were directly transferred to the ICU upon hospital admission and 5 additional patients were transferred to the ICU only one day after hospital admission. Based on this data alone, it is evident that interpatient comparison at hospital or ICU admission was biased in our ICU COVID-19 patient cohort. Likewise, onset of symptoms showed a high variation (7.65 ± 8.49 days) and was occasionally missing.
To find an alternative disease timer, we compared the testing frequency of routine laboratory parameters such as the acute phase inflammatory marker CRP, the inflammatory cytokine interleukin 6 (IL-6), myoglobin and cardiac troponin that have previously been correlated to COVID-19 severity (6). We found that IL-6, myoglobin and cardiac troponin were not measured on a daily basis around ICU admission both in our cohort (38.2, 59.7, 62.7% respectively) and in the international RISC-19-ICU registry cohort of critically ill COVID-19 patients (14.6, 9.6, 30.0% in Switzerland and 15.3,6.7,28.2% internationally), thereby making them poor candidates for longitudinal data alignment ( Figure 1C). In contrast, CRP was measured routinely around ICU admission both in our cohort (98.2%) and in the RISC-19-ICU registry cohort (86.9% in Switzerland,74.8% internationally). Different to other frequently measured laboratory values such as hematological cell counts or creatinine, most patients had a distinct CRP maximum around ICU admission in our cohort, indicating a correlation with COVID-19 severity and progression ( Figure 1D). Some patients showed further CRP maxima during their ICU stay, probably resulting from coinfections or secondary damage (12). We found that longitudinal data alignment based on the first local CRP maximum (CRP max ) decreased both interpatient variability in the CRP curve ( Figure 1D) and in the variability of other laboratory values such as total leukocyte and relative neutrophil and lymphocyte counts to a similar extent than the clinically based ICU admission alignment (Supplementary Figure 1).  To test whether CRP max -based synchronization improves patient stratification in our ICU patient cohort, we retrospectively defined three severity subgroups: (1) deceased ICU patients (n = 6), (2) discharged ICU patients that had been mechanically ventilated (n = 13) and (3) discharged ICU patients that had been spontaneously breathing while in the ICU (n = 9). CRP peak values were more than three-fold higher in the mechanically ventilated patient subgroups (mean ± SD, 346 ± 147 mg/L) as compared to the spontaneously breathing subgroup (99 ± 74 mg/L), but did not differ from the deceased subgroup (338 ± 106 mg/L) (Figure 1E). This lack of distinction is reflected in all alignment methods. In accordance to current literature, we further assessed the longitudinal progression of relative neutrophils and lymphocyte counts (Supplementary Figure 2) and the ratio thereof (NLR, Figure 1E) in the three severity subgroups (3,4). While both ICU admission-based and CRP max -based alignment improved subgroup separation, only CRP max -based synchronization revealed a distinct NLR turning point, occurring simultaneously with CRP max , thereby providing a window for maximal subgroup distinction. A linear mixed effect model (13) employing subgroup and time as fixed effects and per-patient random slopes as random effects confirmed a difference between the subgroups and the measured time points in a window of ±4 days around CRP max (p < 0.01, Supplementary Table 1), whereas the time wise difference was not detected in the data shifted by ICU admission (Supplementary Table 2). Similarly, when comparing the subgroups in single time points of each alignment method, only the CRP max -based synchronization resulted in a significant difference between the two most severe patient subgroups (Figure 1F).
In a last step, we explored whether different patient synchronization methods might have an impact on future outcome prediction using machine learning techniques (Figure 2). We generated two feature vectors for each patient containing the mean values of CRP, relative neutrophils and lymphocytes of a time window anchored on either ICU admission or on CRP max (Figure 2A, upper panel). Using a stratified 5-fold cross validation logistic regression model, we found that the CRP max anchoring increased the overall prediction accuracy by 9.6% and F1-macro score by 51.8% (accuracy 0.68 ± 0.22, F-score 0.668 ± 0.23) as compared to the ICU admission anchoring (accuracy 0.62 ± 0.10, F-score 0.44 ± 0.13) (Figures 2B,C, window size 1). Similarly, the corresponding confusion matrices indicated a higher accuracy in distinguishing between the most severe subgroups of mechanically ventilated and deceased ICU patients (Figures 2D,E).

DISCUSSION
We demonstrated that longitudinal data synchronization based on the inflammatory marker CRP reduces interpatient variability at least to an equal extend as the ICU admission based alignment. This pilot study is limited due to the monocentric design and the low numbers of COVID-19 patients (n = 28) that we were able to include during the first wave, which had a comparably mild impact on the north-eastern part of Switzerland.
Nevertheless, our study revealed that both "onset of symptoms" and "hospital admission" appear as poor temporal markers, leading to increased variability, blurring of subgroup differences and, in case of "onset of symptoms, " to patient exclusions due to unclear data. The interpretation and translation of noteworthy symptoms from patients to clinicians make "onset of symptoms" a highly subjective value for patient synchronization, which is reflected in our data and early reports of exaggerated incubation periods until onset of disease (5,14). While ICU admission is a consistent clinical marker in our monocentric study, this might not be the case when comparing patients from different hospitals with less stringent or deviating ICU admission criteria, resources or ICU capacity. Furthermore, COVID-19 associated symptoms might not be the primary reason for ICU admission in some patients and, obviously, this temporal marker cannot be applied to non-ICU patients. These problems are encountered by most medical centers and researchers alike and highlight the necessity for an objective temporal marker that synchronizes individual patient trajectories with the underlying pathophysiology (8). Our findings suggest that in case of COVID-19, CRP can serve as such a marker that allows alignment of disease trajectories independent of hospital specific policies. We therefore encourage multicentric studies that aim at reproducing the results of this pilot study with a special focus on non-ICU ward patients. Further studies that incorporate data from subsequent COVID waves should address whether changes in patient dispositions and or treatments, such as the early administration of anti-inflammatory therapy, may limit the informative value of CRP.
In line with previous literature, our subgroup analysis of both CRP max and ICU aligned data reproduced the COVID-19 severity markers: neutrophilia, lymphocytopenia and the ratio thereof (3)(4)(5)(6)(7). However, only CRP max -based longitudinal alignment improved distinction between the most severe subgroups of mechanically ventilated patients and deceased patients. Although this pilot study relies on a small cohort, our data suggests a central role for CRP in the timing of COVID-19 immunopathology by marking the turning point of longitudinal NLR dynamic and thereby providing a window for maximal subgroup distinction. CRP is under direct transcriptional control of IL-6, but shows slower dynamics, making it more likely that its maximum can be captured by daily measurements and when the patient is hospitalized (12). Interestingly, CRP itself has immune-modulating functions such as complement activation, regulation of apoptosis and cellular processes of both neutrophils and monocyte-derived cells (12). Although an elevation of CRP is generally associated with bacterial rather than viral infections (12,15,16), elevated CRP levels have been observed in COVID-19 patients as well as in severe progression of other respiratory viral diseases such as influenza (3,4,6,17,18). It is tempting to speculate that elevation of CRP in severe respiratory viral infections marks a shift from a more localized inflammation of the lungs to a multi-organ systemic immune response.
Digitalization of modern medicine has led to increased availability of continuous patient data that should be used to describe and define longitudinal disease progression and pathophysiology of novel and known diseases alike. Our data highlights that proper synchronization of longitudinal patient data has the potential to improve mortality-risk stratification and subgroup distinction both in a clinical setting and for research purposes.

DATA AVAILABILITY STATEMENT
The data analyzed in this study is subject to the following licenses/restrictions: data protection laws. Requests to access these datasets should be directed to jan.bartussek@usz.ch.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Cantonal Ethics Committee Zurich. The Ethics Committee waived the requirement of written informed consent for participation.

AUTHOR CONTRIBUTIONS
MM and JB conceptualized and performed the research and wrote the manuscript. AA performed the 5-fold cross-validation modeling. MH performed the mixed linear regression modeling and RISC-19-ICU registry data analysis. The CoViD-19 ICU-Research Group Zurich and the RISC-19-ICU Investigators contributed to the data collection. AA, SB, PB, CG, NP, MH, MK, RS, and PW provided valuable input and critically assessed the manuscript. All authors contributed to the article and approved the submitted version.

FUNDING
This research was supported by the Swiss National Science Foundation (Grant 320030_201184 to MK) and from nonrestricted grants to RS.

ACKNOWLEDGMENTS
We thank Catharina Wolfensberger, Patrick Hirschi, the RDSC and the PDMS group of the University Hospital Zurich for their continued support. We further thank all the public health and essential workers as well as researchers for their efforts in the battle against SARS-CoV-2. This manuscript has been released as a pre-print at MedRxiv (19).