T cells, more than antibodies, may prevent symptoms developing from respiratory syncytial virus infections in older adults

Introduction The immune mechanisms supporting partial protection from reinfection and disease by the respiratory syncytial virus (RSV) have not been fully characterized. In older adults, symptoms are typically mild but can be serious in patients with comorbidities when the infection extends to the lower respiratory tract. Methods This study formed part of the RESCEU older-adults prospective-cohort study in Northern Europe (2017–2019; NCT03621930) in which a thousand participants were followed over an RSV season. Peripheral-blood samples (taken pre-season, post-season, during illness and convalescence) were analyzed from participants who (i) had a symptomatic acute respiratory tract infection by RSV (RSV-ARTI; N=35) or (ii) asymptomatic RSV infection (RSV-Asymptomatic; N=16). These analyses included evaluations of antibody (Fc-mediated–) functional features and cell-mediated immunity, in which univariate and machine-learning (ML) models were used to explore differences between groups. Results Pre–RSV-season peripheral-blood biomarkers were predictive of symptomatic RSV infection. T-cell data were more predictive than functional antibody data (area under receiver operating characteristic curve [AUROC] for the models were 99% and 76%, respectively). The pre-RSV season T-cell phenotypes which were selected by the ML modelling and which were more frequent in RSV-Asymptomatic group than in the RSV-ARTI group, coincided with prominent phenotypes identified during convalescence from RSV-ARTI (e.g., IFN-γ+, TNF-α+ and CD40L+ for CD4+, and IFN-γ+ and 4-1BB+ for CD8+). Conclusion The evaluation and statistical modelling of numerous immunological parameters over the RSV season suggests a primary role of cellular immunity in preventing symptomatic RSV infections in older adults.


1
Supplementary Methods

Serology data preprocessing
Flow-cytometer-based assays for (i) antibody dependent phagocytosis (ADP) by dendritic cells (DCs) leading to IL-10 signaling and (ii) ADP by DCs leading to TNF-α signaling, returned values marked as invalid response or quantity not sufficient from 32/32 and 31/31 subjects, respectively, and therefore no background values were subtracted.All other measurements with values below the limits of background (LoBs) were replaced by zero.After inspection of the distributions of each assay's measurements, a log10 (x+1) transformation was applied on all measurements expressed in mean fluorescence intensity (MFI) or in concentrations (pg/ml).Subsequently, hierarchical clustering (with the distance being computed using Pearson correlation) was applied in order to remove features with strong linear relationships (which may negatively affect feature selection and machine-learning [ML] performance).Upon manual inspection, the hierarchical clustering tree was cut at height = 0.35 to group correlated features.Eigenvalues were computed for each feature group.Within each group, a proxy feature was defined as the feature with the highest Pearson correlation with the respective group's eigenvalue.Thus, the selected proxy features were the most representative for all features in their respective feature groups.Only proxy features were considered in downstream analyses.

Cell-mediated immunity data preprocessing
For each subject and within each group (RSV-ARTI-4X or RSV-Asymptomatic), background measurements of cell numbers (i.e., T cells not exposed to any antigen) were subtracted from all other cell counts.All resulting negative cell counts were replaced by zero.The data from 3/29 subjects were entirely excluded because both CD4+ and CD8+ T-cell counts were negative.For the data from the remaining subjects, missing values were imputed using the median of the log-transformed abundance values (i.e., cells-per-million + 1) of data from samples stimulated with the same antigen from subjects in the same group).Additional variables were computed for both CD4+ T cells and CD8+ T cells based on phenotype (i.e., combinations of positive and negative detection of immune markers, CD40L, IFNγ, IL-2, and TNF-α.CD4/CD8 ratios were also computed.Overall, 21871 features were defined for Tcell data.Next, cell populations defined by a given feature were removed from analysis on condition that for all samples (i) values were zero; (ii) sparsity >70%; or (3) variance <1.Multicollinearity between the remaining features was assessed through hierarchical clustering (complete linkage) wherein the distance metric was computed using Pearson correlation.Upon manual inspection, the resulting dendrogram was cut at height = 0.3, such that each cluster contained various features defining the same cell population.Within each cluster, the most precise feature was retained as the defining (proxy) feature for subsequent analysis, resulting in a final set of 458 features for downstream analysis.

Machine-learning (ML) optimization
Data (stratified by group; RSV-ARTI-4X or RSV-Asymptomatic) were randomly split into one set (90% of the data) for optimization and one hold-out set (10% of the data) for assessment of the feature selection and hyperparameter tuning steps.To avoid overfitting, a nested crossvalidation (CV) approach was used, wherein the outer CV (20 times repeated 5-fold CV, allowed performance evaluation of a set of selected features and hyperparameters on unseen data, whereas the inner CV (5fold CV) aided avoiding overfitting of the feature selection and hyperparameter tuning procedure.All training data sets were scaled prior to model training.
Feature selection was performed using either lasso logistic regression, random forests, or Boruta (1).Features and hyperparameters with the highest area under the receiver operating characteristic (AUROC) on inner CV test sets were selected and used for prediction on the outer CV test sets.Features and hyperparameters that were selected most frequently across all outer CV folds were considered optimal and retained for the assessment part.The three feature selection methods were used independently and each of the three resulting feature sets were used for modelling using each of the five ML models.Hence there were 15 independent ML strategies.Only when looking at the final results in each analysis, we evaluated which combination of feature selection methods and ML models had the best performance.

ML assessment
First, the initial data hold-out set (which had not been used for the optimization step) was used to assess the performance of ML models with the selected features and hyperparameters.Next, the selected features and hyperparameters were used to fit ML models using the entire data set.Specifically, 100times repeated 10-fold CV (stratified by group; RSV-ARTI-4X or RSV-Asymptomatic) was applied to the entire data set.Training data were used to fit the ML models with the selected features and hyperparameters.
The CMI and serology data sets were analyzed with a stacked ML strategy wherein the output predictions of both CMI and serology models were used as input data.Data were split randomly 90%:10%: in which the 10% portion represented the hold-out test data set.The features were limited to those that were selected in the ML optimization steps on CMI data only or serology data only.A nested CV approach was used (20 times repeated 5-fold CV (stratified by group; RSV-ARTI-4X or RSV-Asymptomatic) such that only subjects represented in both CMI and serology data were divided into the 5 CV folds.Next, models, with hyperparameters optimized for either CMI data only or serology data only, were used to train on both the respective CV training data sets and on data of subjects that were not represented in both data sets.Subsequently, predictions made based on the CMI data and predictions made based on the serology data were used as inputs to stacked ML models.Again, LR, RF, SVM, KNN, and GBC classifier models were optimized.No further feature selection was considered for the stacked ML models.ML assessment for the stacked ML strategy consisted of 200times repeated 5-fold CV.