Vaccines Meet Big Data: State-of-the-Art and Future Prospects. From the Classical 3Is (“Isolate–Inactivate–Inject”) Vaccinology 1.0 to Vaccinology 3.0, Vaccinomics, and Beyond: A Historical Overview

Vaccines are public health interventions aimed at preventing infections-related mortality, morbidity, and disability. While vaccines have been successfully designed for those infectious diseases preventable by preexisting neutralizing specific antibodies, for other communicable diseases, additional immunological mechanisms should be elicited to achieve a full protection. “New vaccines” are particularly urgent in the nowadays society, in which economic growth, globalization, and immigration are leading to the emergence/reemergence of old and new infectious agents at the animal–human interface. Conventional vaccinology (the so-called “vaccinology 1.0”) was officially born in 1796 thanks to the contribution of Edward Jenner. Entering the twenty-first century, vaccinology has shifted from a classical discipline in which serendipity and the Pasteurian principle of the three Is (isolate, inactivate, and inject) played a major role to a science, characterized by a rational design and plan (“vaccinology 3.0”). This shift has been possible thanks to Big Data, characterized by different dimensions, such as high volume, velocity, and variety of data. Big Data sources include new cutting-edge, high-throughput technologies, electronic registries, social media, and social networks, among others. The current mini-review aims at exploring the potential roles as well as pitfalls and challenges of Big Data in shaping the future vaccinology, moving toward a tailored and personalized vaccine design and administration.

Systematic, cytome-wide investigation of biochemical/biophysical events at a single cell level Immunogenomics Systematic, immunogenome-wide investigation of immunologically relevant genes Immunoproteomics Systematic, immunoproteome-wide investigation of immunologically relevant proteins Immunometabolomics Systematic, immunometabolome-wide investigation of immunologically relevant metabolites Interactomics Systematic, interactome-wide investigation of interactions among proteins and/or other cellular molecules/components Secretomics Systematic, secretome-wide investigation of all secreted proteins of a given cell/tissue/organism Exoproteomics Systematic, exoproteome-wide investigation of proteins in the extra-cellular proximity of a biological system Surfomics Systematic, surfome-wide investigation of surface proteins and other components, such as surface-exposed moieties Immunomics Systematic, immunome-wide investigation of immune system dynamics, regulation and response to a given pathogen Protectomics Systematic, protectome-wide investigation of the structural/functional protein motifs that confer immunological protection Adversomics Systematic, adversome-wide investigation of potential vaccine-related adverse events Vaccinomics Systematic, comprehensive integration of previously described omics disciplines for advancing vaccine discovery and development, as well as personalized vaccinology developed and developing countries, significantly reducing the burden generated by infectious diseases (2). They have contributed to the eradication of smallpox and to the control of others infectious agents, such as polio. According to the estimates of the Global Alliance for Vaccines and Immunization (Gavi), they have contributed to avert up to 23.3 million projected deaths from 2011 to 2020, especially in Africa, Southeast Asia, and in the Eastern Mediterranean (3). Furthermore, they positively impact on perceived quality of life (3) and reduce inequity worldwide (1,4). While vaccines have been successfully designed for those infectious diseases preventable by preexisting neutralizing specific antibodies, for other communicable diseases, additional immunological mechanisms should be elicited to achieve a full protection. These additional mechanisms include the stimulation of effector and memory T lymphocytes, besides the release of antibodies by helper T cells-induced B cells (5). A better understanding of immune networks, their sophisticated tuning, and interactions is, as such, fundamental, in those vaccines against HIV/AIDS, malaria or tuberculosis, eluding classical vaccine development, which require new strategies and approaches (6).
"New vaccines" are particularly urgent in the nowadays society, in which economic growth, globalization, and immigration are leading to the emergence/reemergence of old and new infectious agents at the animal-human interface (7,8).
Conventional vaccinology (the so-called "vaccinology 1.0") was officially born in 1796 thanks to the contribution of Edward Jenner (1749-1823) and the pioneering discoveries of the New England Puritan minister Cotton Mather (1663-1728), and Lady Mary Wortley Montague (1689-1762), partially anticipated by Chinese and Indians different centuries before. The vaccine typical of vaccinology 1.0 is given by the rabies vaccine, the first human vaccine manufactured in 1885 in the laboratory (9). Other "first generation" vaccines are bacillus Calmette-Guérin (BCG), plague, pertussis, polio, and smallpox vaccines (9).
Entering the twenty-first century, vaccinology has shifted from a discipline in which serendipity and the Pasteurian principle of the three Is (isolate, inactivate, and inject) played a major role to a science, characterized by a rational design and plan (10).
If vaccinology 1.0 mainly consisted in isolating infectious agents, cultivating and inactivating them (as a whole or partially), and injecting the obtained product, vaccinology 2.0 utilizes purified microbial cell components. Example of "second generation" vaccines includes vaccines against tetanus, diphtheria, anthrax, pneumonia, influenza, hepatitis B, and Lyme disease (9). The transition from vaccinology 1.0 to vaccinology 2.0 has been made possible by several technological advancements, including genetic and protein engineering, recombinant DNA (11), polysaccharide and carbohydrate chemistry, combinatorial chemistry (12), among others.
Vaccinology 3.0 starts from the microbial genomic sequences (reverse vaccinology 1.0) or from the repertoire of protective human antibodies (reverse vaccinology 2.0) (13,14). This shift has been possible thanks to omics data, which represent one type of Big Data, characterized by different aspects, such as enormous volume, velocity, and high variety of data (15).
High-throughput technologies-enabled omics disciplines [such as genomics and post-genomics specialties (16,17), includ ing transcriptomics, proteomics, metabolomics, cytomics, immunomics, secretomics, surfomics, or interactomics], briefly overviewed in Table 1, are able to produce a wealth of data and information, at a large-scale. Recently, these approaches have converged in what is termed vaccinomics, that is, to say the performance of large-scale, hypothesis-free, data-driven and holistic investigations. Poland and collaborators have defined vaccino mics as the "integration of immunogenetics and immunogenomics with systems biology and immune profiling" (18).
New cutting-edge technologies include next-generation sequencing (NGS) techniques [RNASeq (19) and large-scale Band T-cell receptor sequencing (20,21)], mass cytometry (CyTOF) (22), and peptide/protein arrays (23). Data produced by molecular biology and NGS as well as by bioinformatics (24) can be used to perform mechanistic reductionist studies but can be also exploited to comprehensively capture immune dynamics and interactions (25), carrying out, for instance, network analysis or systems biology (the so-called "systems vaccinology"). Novel bioinformatics tools and new approaches are needed to better integrate the enormous wealth of data originated from omics experiments, making the shift from single-omics to multi-omics possible. Furthermore, the actual era is characterized by the widespread diffusion of the new information and communication technologies (26): electronic health or eHealth refers to their exploitation as "a means to expand, to assist, or to enhance human activities, rather than as a substitute for them" (27). As omics experiments, eHealth generates as well an enormous wealth of data. Researchers have found that, usually, digital activities correlate with offline behaviors and other variables, such as vaccination knowledge and perception of own risk: for example, Betsch and Wicker (28), investigating a sample of 310 medical students found that explicitly surfing the Internet for vaccination risks-related websites led to fewer public health websites than generically searching for immunization practices.
Vaccinology has now entered a new phase, characterized by new challenges: within this new framework, Big Data hold promises and opportunities, which will be overviewed in the following paragraphs (Table 2; Figure 1).
vACCine DiSCOveRY AnD DeSiGn: THe ROLe OF BiG DATA Computational vaccinology (29,30) and immunoinformatics (31), utilizing algorithms, enable experimental immunology to save time, focusing only on prescreened vaccine candidate antigens and, thus, avoiding cost, time-consuming, and labor intensive steps.
A successful example of rationally designed web-based vaccine is the vaccine against Neisseria meningitidis, commercially available with the trade name of Bexsero. For the selection of surface antigens, Masignani and collaborators (47) performed genome mining, using computational tools and algorithms, such as PSORT (48), PSI-BLAST (49), and FindPatterns to predict proteins with transmembrane domains, leader peptides, lipo-boxes and outer membrane anchoring motifs. At the end, 570 proteins were selected and GNA1870, a new surface-exposed lipoprotein inducing high levels of bactericidal antibodies, was discovered.
These computational approaches, using massive data mining techniques, rely on brute force (the so-called "test-all-to-losenothing" approach). Altindis and collaborators (75) have recently attempted to refine this framework, based on the idea that protective antigens share specific structural/functional features, termed as "protective signatures" or "immunosignatures, " differing from other pathogen components, in terms of immunological properties. Instead of focusing on protein localization, as in previous investigations, Altindis and coworkers concentrated their computational analyses on protein biological role and function. In this sense, their approach, termed as "protectome, " is protein localization unbiased, in that it leads to the identification of surface-exposed and secreted or cytoplasmic protective antigens. To solve these issues, Merck and Microsoft have, for example, established a collaboration, in which Merck exploits Microsoft R Server for Hadoop for analyzing, monitoring, and predicting variables that could affect the cold chain, including origin, destination, and delivery route as well as external weather and logistics providers, utilizing special thermal-protection containers equipped with temperature-recording sensors and temperaturesensitive vaccine vial monitors.

BiG DATA AnD vACCine PRODUCTiOn AnD DeLiveRY
Nexleaf has produced ColdTrace (currently, ColdTrace version 5), which has already been implemented in more than 7,000 health-care facilities worldwide, and has recently established a new partnership with www.Google.org and Gavi.
The benefits provided by these technologies are the fact they are low-cost and particularly useful in developing countries, which often rely on stem thermometers or 30-day temperate loggers.

BiG DATA AnD vACCine CAMPAiGnS
Other major sources of Big Data are immunization registries and surveillance systems such as SmiNet-2 (79), or SurvNet@RKI (80). These enormous databases are precious databanks, which can be mined to capture data concerning vaccination coverage rate and its determinants.
Non-conventional data sources or novel data streams, such as Internet search data and tools monitoring web queries, like Google Trends (GT) (81), social media (YouTube, Facebook, Google Plus, Twitter, Pinterest, Instagram, and so on), or news source scraping like HealthMap (82), provide researchers and public health workers with real-time information concerning public reaction to epidemic outbreaks. Novel data streams can track different vaccine-preventable infectious diseases, such as influenza (83)(84)(85), pertussis (86,87), or measles (88), among others. As such, they can be exploited to predict epidemiological figures as well as monitor the effect of vaccine campaigns.

BiG DATA AnD vACCine eFFiCACY/ eFFeCTiveneSS
Big Data enable also to individuate molecular signatures and predictors of the outcomes of vaccination, being correlates of vaccine efficacy/effectiveness in different populations (89). Haks and collaborators (89), for instance, utilized transcriptomics to quantitatively assess the immunogenetic signature of immunization response. Dunachie and coworkers (90) explored the differentially expressed genes induced by a malaria candidate vaccine and found that most genes conferring immunological protection belonged to the interferon-gamma and to the proteasome/antigen presentation pathways, differently from genes associated with hemopoietic stem cells, regulatory monocytes, and the myeloid lineage modules.
Novel data streams, such as mobile/smartphone applications, can be utilized in the monitoring and management of vaccinerelated data (91).

BiG DATA AnD vACCine SiDe eFFeCTS
Vaccine adverse events and reactions are very rare. As such, most studies are statistically underpowered to capture the rate of rare/very rare side effects. Meta-analytical approaches and data mining have emerged as useful strategies with this regard. As claimed by Chandler (92), the classical paradigm of the actual pharmacovigilance/vaccine vigilance system based on three stage-approach (namely, signal detection, development of a causality hypothesis, and testing of the causality hypothesis) is plagued by some limitations, in that "routine vaccine pharmacovigilance practice is not sufficient to understand suspected harms that are poorly defined and whose pathophysiology are not completely understood. Furthermore, estimations of risk at the population level fail to acknowledge that vaccines may cause harm in subgroups with individual-level risk factors" for adverse events following immunization. As such new approaches are needed to capture new side effects and, also in this case, Big Data could play a major role.
"Adversomics" is a term coined by Poland in 2009 and is an emerging discipline defined as "the study of vaccine adverse reactions using immunogenomics and systems biology approaches" (93,94).
Berendsen and coworkers (95) exploited Big Data, to explore BCG-related "non-specific effects, " that is to say effects induced by the vaccination on health beyond its target disease. In particular, they evaluated the effect of timing of BCG on stunting in Sub-Saharan African children under 5 years, analyzing crosssectional data for 368,450 subjects from 33 controls. Authors found that BCG vaccination did not affect stunting, with timing of BCG vaccination being statistically significant. Similar patterns could be detected for diphtheria-tetanus-pertussis and measles vaccinations.
Vaccine ontology (96,97), a class of biomedical ontologies, that is to say a consensus-based computer and human interpretable set of terms and relations indicating specific biomedical entities, is another valuable approach. It enables support integrative adverse events-related data collection and analysis, utilizing a normalization strategy more effective than other controlled terminologies. These include the Medical Dictionary for Regulatory Activities, the Common Terminology Criteria for Adverse Events, and the WHO Adverse Reactions Terminology, among others. Using Ontology-Based Vaccine Adverse Event representation, Xie and He (96) explored the adverse events related to Flublok, a recombinant hemagglutinin influenza vaccine.
Novel data streams can be used to see how often people Google for vaccination and for vaccination-related adverse events. Bragazzi and collaborators (98) utilized GT for monitoring the interest toward preventable infections and related vaccines. Authors found that, generally speaking, vaccine was not a popular topic, with the valuable exception of the vaccine against Human Papillomavirus, with vaccines-related queries being approximately one third of the volumes regarding preventable infections. Users tended to search information about possible vaccine-related side effects.

BiG DATA AnD vACCine LiTeRACY/ vACCine HeSiTAnCY
Big Data enable to track and monitor interest toward vaccination practices (99). The increasing phenomenon of vaccine hesitancy (an umbrella term that includes indecision, uncertainty, delay and reluctance) is multifactorial, and closely linked to social contexts, with different determinants, ranging from geographical area, to political situation, complacency, convenience and confidence in vaccines. Novel data streams, providing a snapshot of perceptions of vaccination in a given place and at a specific time, could be used to assess lay-people's perceptions of vaccination, enabling health-care workers to actively engage citizens and to plan ad hoc communication strategies and plans to contain vaccine hesitancy and to promote vaccine literacy (100).
Shah and colleagues (101) compared time series of rotavirusrelated Internet searches as captured by GT with rotavirus laboratory reports from the United States and United Kingdom and with hospitalizations for acute gastroenteritis in the United States and Mexico, before and after national vaccine introductions. Authors found a strong positive correlation between web queries and laboratory reports in the United States (R 2 = 0.79) and United Kingdom (R 2 = 0.60) and between the Internet searches and acute gastroenteritis hospitalizations in the United States (R 2 = 0.87) and Mexico (R 2 = 0.69). Correlations were stronger in the prevaccine period and after vaccine introduction, the mean Internet queries decreased by 40-70% in the United States and Mexico, with a loss of seasonal variation in the United Kingdom.
Bakker and coworkers (102) exploited GT to monitor the interest toward chicken pox, over an 11-year period, from 36 countries. Authors found seasonal peaks with striking latitudinal variation in information seeking behavior. Authors concluded that novel data streams are able to track the global burden of childhood disease as well as to investigate effects of immunization at population level.
Goldlust and collaborators (103) investigated the use of large-scale medical claims data for local surveillance of underimmunization for childhood infections in the United States, developing a statistical framework for integrating disparate data sources on surveillance of vaccination behavior. In this way, authors were able to identify the determinants of vaccine hesitancy behavior. Within the "Vaccine Confidence Project, " Larson and colleagues (104) extensively analyzed data from 10,380 reports (from 144 countries) and found that 7,171 (69%) contained positive or neutral content whereas 3,209 (31%) contained negative content (related to vaccine programs and disease outbreaks, vaccine-related beliefs, awareness, and perceptions; vaccine safety; and vaccine delivery programs).
Within the ambitious "Project Tycho" (freely accessible at www.tycho.pitt.edu) launched by the University of Pittsburgh, United States (105,106), authors have digitized all weekly surveillance reports of notifiable diseases for United States cities and states published in the period between 1888 and 2011. This data set consists of 87,950,807 reported individual cases and has been used to derive a quantitative history of disease dynamics and transmission in the United States. Pattern analysis has documented, in a statistically robust way, a significant reduction of infections-generated burden, underlining the positive effect of vaccination programs (105). This use of big data emphasizes the dimension of "veracity, " through which is possible to contrast vaccine-related "fake news" and "post-modern, post-factual truths, " disseminated by the anti-vaccination movements (107).

COnCLUSiOn: STATe-OF-THe-ART, CURRenT CHALLenGeS, AnD FUTURe PROSPeCTS
Big Data have contributed and are expected to continue contributing toward facilitating the discovery, development, production, and delivery of rationally designed vaccines. Further, enabling to identify predictive biomolecular signatures of response to vaccination, vaccination will shift from the classical "one-size-fits-all" paradigm to a personalized approach. Moreover, Big Data can be used to track the success of vaccination campaigns, in term of vaccination coverage rate, as well as the rare/very rate vaccinerelated adverse events, for which "classical epidemiological studies" would be statistically underpowered.
However, a number of pitfalls and challenges should be properly recognized to be addressed by future research: Big Data and Big Data sources, as previously overviewed, are highly heterogeneous and should be effectively integrated and harmonized together. Moreover, some algorithms underlying novel data streams need to be refined in that, sometimes, do not exactly predict epidemic outbreaks (108), even though some scholars have shown that, in principle, is possible to correct them to achieve higher predictive power (109). Further, efforts should be done to preserve and protect privacy, confidentiality, and identity. The emerging field of "Big Data Ethics" is trying to address all these issues (110,111). Currently, we are only witnessing the very beginning of the ongoing "Big Data revolution. "

AUTHOR COnTRiBUTiOnS
NB, MM, and MB conceived the study. NB, VG, MV, RR, AN, AH, MM, and MB drafted and revised the manuscript; read and approved the last version.