Abstract
Background: The blood transcriptome is expected to provide a detailed picture of an organism's physiological state with potential outcomes for applications in medical diagnostics and molecular and epidemiological research. We here present the analysis of blood specimens of 3,388 adult individuals, together with phenotype characteristics such as disease history, medication status, lifestyle factors, and body mass index (BMI). The size and heterogeneity of this data challenges analytics in terms of dimension reduction, knowledge mining, feature extraction, and data integration.
Methods: Self-organizing maps (SOM)-machine learning was applied to study transcriptional states on a population-wide scale. This method permits a detailed description and visualization of the molecular heterogeneity of transcriptomes and of their association with different phenotypic features.
Results: The diversity of transcriptomes is described by personalized SOM-portraits, which specify the samples in terms of modules of co-expressed genes of different functional context. We identified two major blood transcriptome types where type 1 was found more in men, the elderly, and overweight people and it upregulated genes associated with inflammation and increased heme metabolism, while type 2 was predominantly found in women, younger, and normal weight participants and it was associated with activated immune responses, transcriptional, ribosomal, mitochondrial, and telomere-maintenance cell-functions. We find a striking overlap of signatures shared by multiple diseases, aging, and obesity driven by an underlying common pattern, which was associated with the immune response and the increase of inflammatory processes.
Conclusions: Machine learning applications for large and heterogeneous omics data provide a holistic view on the diversity of the human blood transcriptome. It provides a tool for comparative analyses of transcriptional signatures and of associated phenotypes in population studies and medical applications.
Introduction
Blood is the pipeline of the human organism's physiology. The accessibility and minimal invasiveness during sampling has made it a feasible resource in scientific research and clinical diagnostics as they could replace more invasive and risky tests (Sohn, ). Because of utility and simplicity, blood transcriptome investigations on genome-wide scales have gained in popularity over the past few years. They were applied in a medical context for characterizing diseases such as ischemic stroke (Baird et al., ), Alzheimer's disease (Rembach et al., ), epilepsy (Karsten et al., ), sepsis (Davenport et al., ; Burnham et al., ; Scicluna et al., ; Hopp et al., ); in pharmacogenomics (Burczynski and Dorner, ) and marker search (Hanash et al., ); and also in epidemiological investigations on aging (Peters et al., ), obesity status (Johannsen et al., ; Homuth et al., ), lifestyle factors such as smoking and alcohol consumption (Dumeaux et al., ), special nutrition (Burton et al., ), and in immune system characterization (Chaussabel et al., ) (see Chaussabel, and references cited therein for a broad literature survey). Most of these studies comprise of relatively small sample sizes of dozens to a few hundred individuals and they focus on selected diseases thus enabling only limited views on the variability of transcriptomic states and the mutual associations with health phenotypes in a broader context.
We here present the systematic analysis of the transcriptomes obtained from whole peripheral blood specimens of more than 3,000 adult individuals collected as part of the LIFE (-adult) study at the Leipzig Research Center for Civilization Diseases. This project conducted one of the largest cross-sectional population studies in Germany focusing on extensive phenotyping of urban individuals from Leipzig city in order to discover the interplay between molecular, environmental, and lifestyle factors and their impact on the health status of the population (Loeffler et al., ). The large number of phenotype characteristics collected in LIFE in parallel to blood samples from the same participants such as disease history, medication status, lifestyle factors, and body mass index (BMI) offers the option to study their mutual associations for women and men over an age range from about 40 to 80 years (Loeffler et al., ) (Table 1).
Table 1
| Features | Men | Women | Comment | |||
|---|---|---|---|---|---|---|
| Number of participantsa | 1,618 | 1,510 | ||||
| Age (mean ± SD) | 58.1 ± 12.4 | 59 ± 13 | Years | |||
| Smoker/Ex-smoker | 1,000 e) | 701 | ||||
| <30 g alcohol per day | 633 | 218 | ||||
| Features | Symbol | # men | Mean age (±SD) | # women | Mean age (±SD) | Description (BMI in units of kg/m2) |
| BMI status | uwt | 14 | 39 ± 9 | 42 | 46 ± 10 | Underweight BMI <18.5 |
| nwt | 375 | 53 ± 15 | 492 | 54 ± 12 | Normal weight 18.5 < BMI <25 | |
| Pre obese | 590 | 60 ± 12 | 443 | 60 ± 12 | 25 < BMI <30 | |
| Obese | 411 | 63 ± 11 | 311 | 61 ± 11 | 30 < BMI | |
| Features | ||||||
| Blood Countb | Basophils; eosinophils, erythrocytes; hematocrit; hemoglobin; leucocytes; lymphocytes; mean corpuscular hemoglobin; mean platelet volume; monocytes; neutrophils; reticulocytes; platelets | |||||
| Blood Serum markers | Human serum C-reactive protein; ferritin; transferrin; cystatin C | |||||
| Medicationc | Alimentary tract and metabolism; blood and blood forming organs; cardiovascular system; dermatologicals; genitourinary system and sex hormones; systemic hormonal preparations, excl. sex hormones and insulins; anti-infective for systemic use; antineoplastic and immuno-modulating agents; muscular-skeletal system; nervous system; antiparasitic products, insecticides, and repellents; respiratory system; sensory organs; various | |||||
| Disease historyd | Angina pectoris; arthrosis; asthma; cancer; cataract; depression; diabetes; glaucoma; gout; heart attack; hepatitis; hyperlipidemia; hypertension; hzoster; rheuma; sepsis; thyroid | |||||
Participant's characteristics of the LIFE-adult study used in this publication for association with the blood transcriptome (see also Supplementary Table 1 in Supplementary File 1 for further details).
For the detailed description of the LIFE-adult study see (Loeffler et al., ).
Analyses using clinical laboratory (Loeffler et al., ).
Medications taken within the last 5 days before the LIFE-core program visit. Medication was classified according to Anatomical Therapeutic Chemicals (ATCs) indexing, https://www.whocc.no~/atc_ddd_index/).
Disease history of the participants was assessed in questionnaires (Loeffler et al., ).
Our study aims at characterizing the diversity of transcriptional states of the blood transcriptome and their impact in terms of cellular functions and at studying associations with age and health-related features, so-called phenotypes, such as obesity, smoking, disease history, and medication status. From a methodical point of view, integrative analysis of molecular “omics” features and of phenotypes challenges the computational analysis framework (de Meulder et al., ). We have previously developed an omics “portrayal” methodology based on self-organizing maps (SOM) machine learning which takes into account the multidimensional nature of gene regulation and pursues a modular view on co-expression, reduces dimensionality, and supports visual perception by delivering “personalized,” case-specific transcriptome portraits (Wirth et al., ; Binder and Wirth, ). This method has been applied to a series of data types and diseases (Hopp et al., ; Kunz et al., ; Bilz et al., ; Loeffler-Wirth et al., ; Nikoghosyan et al., ), among them a study on the blood transcriptomes of sepsis patients framed with pneumonia (Hopp et al., ). In this publication we extend this approach to a much larger data set comprising the blood transcriptomes of thousands of nominally healthy individuals and of associated phenotype data. Figure 1 provides a schematic overview: SOM-portrayal permits a detailed description and visualization of the molecular heterogeneity of transcriptional states and of their association with different phenotypes. Our approach is expected to provide a detailed view of the blood transcriptome of a healthy population as a function of age, sex, and obesity status. It provides a methodical framework applicable to large data sets in the context of personalized medicine with potential impact for applications in medical diagnostics and molecular and epidemiological research.
Figure 1
Materials and Methods
LIFE-Adult Study and Phenotype Characteristics
The LIFE (-adult) study performed extensive phenotyping of more than 10,000 urban individuals from Leipzig city (Loeffler et al., ). The study was approved by the ethics board of the Medical Faculty of the University of Leipzig. In this publication we analyzed transcriptomic data of whole peripheral blood (WPB) samples, which were obtained from 3,388 adult participants of the study. They roughly divide equally into women and men covering an age range between about 20 and 80 years with a strong bias toward elderly persons (Table 1). The LIFE-adult study overall collected a broad survey of more than 20,000 lifestyle and health items (see Loeffler et al., for details). We made use of selected lifestyle characteristics of the participants such as smoking behavior and alcohol consumption, medication according to ATCs (Anatomical Therapeutic Chemicals) indexing and disease history of the participants collected via questionnaires, blood count data from clinical laboratory including selected serum markers, and body mass index (BMI) (Table 1 and Supplementary Table 1 for details). A list of items and abbreviations used is provided as Glossary in Supplementary File 1.
Blood Transcriptome Sampling, Microarray Measurements, and Data Preprocessing
We made use of pre-processed gene expression data extracted from WPB samples of individuals as provided by the LIFE database. Participant's recruitment, blood collection, storage and mRNA preparation, microarray measurements, and primary data pre-processing was realized by different groups of the LIFE center (Loeffler et al., ). WPB was collected in tempus blood RNA tubes (ThermoFisher, Waltham, MA, USA) and stored at −80°C until further processing. RNA was isolated and then hybridized to Illumina HT-12 v4 Expression BeadChips (Illumina, San Diego, CA, USA) and measured on an Illumina HiScan device. Raw probe level data were extracted using Illumina GenomeStudio and then further pre-processing including batch correction, outlier and missing value removal, log-transformation, quantile normalization, and centralization of the expression value of each gene using an in-house pipeline as described in detail in Supplementary Methods (Supplementary File 1) was undertaken. The final transcriptome data consists of more than 48,000 probe IDs including the expression values of 19,049 genes for each of the individuals.
Self-Organizing Maps (SOM) Transcriptome Portrayal
Pre-processed expression values were analyzed using the oposSOM pipeline, available as the R-package “oposSOM” (Löffler-Wirth et al., ). It uses SOM neuronal network machine learning to translate the high-dimensional expression data of N = 19,049 gene transcripts into K = 10,000 metagene expression data per individual (Wirth et al., , ). Each metagene represents a “micro”-cluster of co-expressed genes showing mutually similar expression profiles across the samples. Metagenes were arranged in a 100 × 100 two-dimensional grid coordinate system and colored according to their expression level for each sample thus providing a “personalized” image of the blood transcriptome of each individual studied (Supplementary Figure 1A, for Supplemenatry Figures see Supplementary File 1). Size of the SOM was chosen to be virtually insensitive for a downstream analysis task regarding, e.g., the number of spots based on previous systematic adjustments of the method (Binder and Wirth, ). Mean portraits of transcriptome classes (see below) were calculated by averaging metagene expression values over all portraits of the respective group. Default color scale (red to blue for maximum to minimum expression, respectively) of the portraits uses log-expression values of the metagenes (Wirth et al., ). The diversity of the sample portraits was visualized using a graph representation called a “correlation network” as implemented in “oposSOM” (Löffler-Wirth et al., ). Downstream analysis of the SOM-portraits then provides quantitative features such as modules and lists of co-regulated genes and information about gene functions using enrichment techniques (see next subsections) together with statistical evaluation as described previously (Wirth et al., , ) and implemented in the oposSOM software (Löffler-Wirth et al., ).
“Spot” Clustering of Co-expressed Genes and Stratification of Samples
Metagenes of similar profiles clustered together forming “spot-like” red and blue areas of over- and under-expression in the portraits due to the self-organizing properties of the SOM. Each of the spots represents a cluster of mutually correlated genes (Supplementary Figure 1B). The spots were detected using a distance-metrics criterion making use of Euclidean distance between neighboring metagenes, where metagenes of maximum mutual distances form closed, halo-like lines around the “spots” (Vesanto, ) (see D-map in Supplementary Figure 1B for illustration). The spot expression patterns obtained represents a characteristic fingerprint of each particular sample. Lists of genes included in each of the spot modules and lists of enriched gene sets were provided as Supplementary Excel tables together with statistical information (Supplementary Tables 2–4 in Supplementary File 1 and Supplementary Files 3, 4, respectively). The overall collection of spot-modules detected are major nodes of the co-expression network derived from the sample series (see the spot correlation and implication networks in Supplementary Figures 1B,C, respectively). Spot selection criteria were developed and described previously (Wirth et al., , ; Binder and Wirth, ) and applied and proven in numerous publications to provide reasonable results (Binder et al., , ; Cakir et al., , ; Hopp et al., ,; Hopp et al., ,; Gerber et al., ; Kunz et al., ; Arakelyan et al., ; Loeffler-Wirth et al., ; Nikoghosyan et al., ). Based on the spots detected in the transcriptome portraits we stratified the samples into appropriate groups. First, the portraits were divided into 33 so-called combinatorial pattern-types (cPATs), each defined by a certain unique combination of over-expressed spots as described recently (Loeffler-Wirth et al., ) (Supplementary Figure 2A). Using the cPATs we estimated the tentative number of groups (Supplementary Figures 2A–C) and used them subsequently in a K-means clustering run, which stratifies the portraits into three major transcriptome types and nine subtypes (STs, Supplementary Figures 2A–C). The transcriptome strata were further characterized by detailed statistics about spot appearance (Supplementary Figures 2C,D) and verified by random splits of the cohort into training and verification subsets, resampling, and subsequent classification using support vector machine (Supplementary Figure 3).
Function Mining
We applied a gene set analysis to the lists of genes located in each of the spot modules to discover their functional context using a right-tailed Fisher's exact test as described previously (Wirth et al., ). In addition, the gene set enrichment z-score (GSZ) was used to evaluate the impact of the gene sets in the different transcriptomic strata (Wirth et al., ). The GSZ-metrics considers the mean expression of the gene set normalized by its variance, i.e., it provides high values for homogeneous gene sets reflecting the activation of biological functions with high relevance for the respective transcriptional states. Gene set maps complement this analysis by visualizing the position of the gene of a set within the SOM grid. According to their degree of accumulation in or near the spots, one can deduce their potential functional context (Wirth et al., ).
Phenotype Portrayal
Phenotype information of the participants comprises their blood cell and marker counts, BMI and information about their lifestyle (smoking and alcohol consumption), and medication and disease history (Table 1 and Supplementary Table 1). The enrichment of categorical phenotypic characteristics in each of the transcriptomic classes (types and subtypes) were estimated using a one-tailed Fisher's exact test and visualized as enrichment heatmaps. Phenotype-to-metagene correlation maps were generated by correlating each of the phenotype parameter-profiles over all participants with each of the metagene expression profiles. For categorical phenotypes, correlation maps were obtained by calculating the point biserial correlation between the expression profile of each metagene and the respective phenotype profile. Point serial correlation de facto provides the difference of portraits between blood transcriptomes showing the respective phenotype and all others. The matrix of correlation coefficients obtained was visualized in the SOM-grid as “phenotype” portraits using a red-to-blue (maximum-to-minimum correlation) color-code. The metagene of maximum correlation coefficient was marked in the SOM-grid of a phenotype overview map. Expression of each of the spots was fitted using multiple regression with the phenotype values of the participants of each of the categories as variables. Standardized regression coefficients and their p-values were then visualized as heatmaps (Supplementary File).
Availability of Data and Software
Processed transcriptomic data of this study are available as “SOM-data” via the Leipzig Health Atlas under the link https://www.health-atlas.de/data_files/76?version=1 and https://www.health-atlas.de/som_browser/201611_LIFE_Transcriptome/Summary.html (pdf and html reports). Data can be interactively discovered using the oposSOM browser functionality available under https://www.izbi.uni-leipzig.de/opossom-browser/ and https://apps.health-atlas.de/opossom-browser/?dataset=6. Raw expression data and participants information can be requested from the LIFE Consortium (www.life.uni-leipzig.de/en/). The oposSOM program (Löffler-Wirth et al., ) is available under https://rdrr.io/github/hloefflerwirth/oposSOM/.
Results
The Blood Transcriptome Splits Into Three Types
SOM analysis provided one portrait for each of the 3,388 LIFE-adult participant's WPB transcriptomes (Supplementary File 2 and Supplementary Figure 1A). For the stratification of samples we made use of the so-called combinatorial spot patterns approach (cPATs, see also next subsection), which largely reduces the dimension of the data, and subsequent clustering as described in detail previously (Loeffler-Wirth et al., ), in the methods part and in Supplementary Figure 2. The associated cluster tree is shown in Supplementary Figure 2A and Figure 2A. Overall, we identified three major strata of transcriptomes called type 1, type 2, and type M (q = 0.003, Anova; classification error: 10% of samples after resampling and SVM-based re-classification, see Supplementary Figure 3B). The pairwise correlation map illustrates the similarities between the types in terms of Pearson's correlation coefficients between the expression portraits (Figure 2A). Type 1 and type 2 show pronounced anti-correlated expression portraits while type M forms an intermediate group. The network presentation reveals that WPB transcriptomes of type 1 and type 2 split into separate clusters while type M samples overlap between them (Figure 2B). The functional context of activated genes were estimated using gene set analysis (Figure 2A, part below). Type 1 was associated with functional categories related to oxygen transport, heme metabolism, neutrophil accumulation, and repressed chromatin states of T cells while the type 2 group was related to immune response, transcriptional activity, T cell accumulation, and active chromatin states (see below). A higher percentage of men were found in type 1 (29% vs. 19% for women) while this reverses for type 2 (percentage of women: 37% vs. 51%; Figure 2C). Type 1 was more populated with elderly persons compared with type 2, while the distribution with age was different between women and men (Figure 2D). The composition of types for women changed virtually monotonously with a steadily increasing percentage of type 1 in contrast to men, who showed a maximum of type composition in the age range of 50–55 years. Note also that the age dependence of type M more resembled that of type 1 than that of type 2 which suggests a functional correspondence between types M and 1 (see below). The type-composition of men and women was virtually independent of BMI (body mass index) except for very obese persons (BMI > 35 kg/m2) which seemed to be more present in type 1 transcriptomes (Figure 2D).
Figure 2
Taken together, we identified two major blood transcriptome types and an intermediate type partly resembling type 1. Type 1 included more men, elderly participants, and upregulated genes associated with inflammation and increased heme metabolism, while type 2 included more women and younger participants. It was associated with activated immune responses and transcriptional activity. The composition of types changes in a gender- and age-specific fashion.
A Modular Map of Gene Activation
Clusters of genes with correlated expression profiles appear as red spot-like areas in the transcriptomic portraits, which indicate their overexpression in the respective samples (Supplementary Figure 1A). Overall we identified 13 such major overexpression spots and labeled them with capital letters A–M (Figure 3A, for spot lists of genes see Supplementary Table 3 and Supplementary File 3 and for enriched gene sets Supplementary File 4). It roughly divides into two major areas containing spots predominantly upregulated either in type 1 (and partly also type M) or type 2 samples, respectively, and a third area with mixed spot assignment as illustrated by mean portraits of the transcriptomic types (Figure 3B), the spot profiles (Figure 3C and Supplementary Figure 4), and their correlation network (Figure 3D). Gene maps indicate the positions of genes taken from selected functional gene sets within the SOM grid of metagenes (Figure 3A). For example, genes upregulated in erythrocytes and platelets accumulate in spots C and N (up in type 1), respectively, while genes associated with mitochondrial function and RNA processing are found in spot E and G. Signature genes of T cells and of ribosomal function accumulate in and near spots I and J (up in type 2). Spot H accumulates the signature of CD4 cytotoxic T lymphocytes (CTLs) including the marker genes GZMA and PRF1, which were recently found to be associated with extreme longevity (Hashimoto et al., ). Genes with functions in interferon (IFN) response accumulate in spot L without preferential upregulation in one of the three types. Differential gene expression analysis between the types revealed a considerably larger number of genes upregulated in type 1 compared with type 2 (Supplementary Figure 5).
Figure 3
Typically, each of the individual sample portraits show more than one spot, which reflects the parallel activation of different transcriptional programs and/or their mutual couplings. We subsume frequently observed combinations of expressed spots as so-called combinatorial pattern types (cPATs) using a method described previously (Loeffler-Wirth et al.,
Footprints of Functions: Cellular Programs, Infections, Telomeres, and Epigenetics
Next, we performed functional analysis of the transcriptome strata using gene sets taken from the functional categories “biological process” (Subramanian et al.,
Figure 4

Functional characteristics and previous signatures of the blood transcriptome: (A) Signatures of the GO-term biological process (BP) roughly group into processes upregulated in type 1 (green), type 2 (apricot color), and in samples showing high expression in all types (blue) (see also Supplementary Figure 7). (B) Gene signatures associated with bacterial (Néemeth et al.,
Profiling function-signatures splits them into two major clusters either upregulated in type 1 (marked with green color in the figures) or type 2 (apricot color), respectively. Gene signatures taken from the gene ontology category “biological process” reveal that type 2 associates with the activation of cell cycling, MYC-target genes, oxidative phosphorylation (oxphos), while inflammation, hypoxia, coagulation, reactive-oxygen species, and the pathway signaling of TNFalpha-, TGFbeta-, PI3K-Akt-MTOR-, and IL6-JAK-Stat3 activate in type 1. A third cluster (blue color) accumulates signatures related to interferon (IFN) response, which eventually suggests an association with viral infections (Figure 4A). We analyzed expression signatures derived recently to differentiate between bacterial and viral infections (Néemeth et al.,
Next we analyzed the expression sets of genes assigned to distinct chromatin states in blood cells under healthy conditions, among them T-, B-,and T-regulatory-cells (Figure 4E and Supplementary Figure 10). States involving genes with an active promoter (TssA) and a completed transcription (Tx) were expected to show high expression, while repressed promoter states were expected to show low expression levels. This relation was indeed observed in type 2 transcriptomes, however it reversed in type 1. This reversal suggests de-repression of nominally repressed states and repression of active states in type 1 transcriptomes by epigenetic chromatin re-modeling. We recently demonstrated that differentiation and adjustment of cellular programs are governed by subtle cooperation of transcription factor (TF-) networks and epigenetics, e.g., via regulation of the polycomb repressive complex 2 (PRC2) and its targets (Thalheim et al.,
Previous Gene Expression Signatures of the Blood Transcriptome
Next, we analyzed a series of expression signatures taken from previous, independent studies of blood transcriptomes (Chaussabel et al.,
Next, we made use of a repertoire of 382 functionally annotated expression modules extracted from a recent meta-analysis of the blood transcriptomes of 16 disease and physiological states (Altman et al.,
Correlation analysis of different previous blood signature sets (Chaussabel et al.,
In summary, the comparison of previous blood signatures with our data show that our spot-modules represent a sort of minimum set describing co-expression of the blood transcriptome. It expands into a rich collection of functional annotations including molecular mechanisms, cellular programs, and cell types but also lifestyle factors, diseases, and aging effects and, finally, it verifies our blood types using independent data.
Blood Cell Signatures and Seasonal Effects
Gene sets implemented in blood cell deconvolution algorithms such as Cibersort (Newman et al.,
Recent studies report seasonal changes of gene expression of the blood transcriptome and of blood cell counts (de Jong et al.,
Phenotype Portrayal: Blood Cell Counts, Lifestyle, Medication, and Disease History
Previous blood transcriptome studies also extracted gene signatures which were associated with health-related features such as BMI (body mass index) and smoking status and also with the development of different diseases such as heart failure (Tan et al.,
Figure 5

Association of selected features (phenotypes) with the transcriptome landscape of blood: (A) Phenotype (correlation) portraits visualize the correlation between metagene expression profiles and the profiles of selected phenotypes in a red-to-blue color scale. The correlation overview maps for each of the categories mark the metagene of maximum correlation coefficient for each of the phenotypes studied [see the legend in (C) and also Table S 1 for assignment of phenotypes]. (B) Distribution of samples of each of the phenotypes among the transcriptome types. Obese men and men consuming alcohol (>30 g/day) accumulate (red circles) in type 1 transcriptomes. Also participants with different disease histories enrich in type 1 in a gender-specific fashion while medication is most prevalent in women of type 1. Type 2 and type M refer to underweight and normal weight women and to smoking men, respectively. Enrichment of blood count data is provided in Supplementary Figure 20. Correlation maps and further details are presented for each of the phenotype categories in Supplementary Figures 21–25.
We found that most blood count data correlate either with type 1 (e.g., erythrocytes, reticulocytes, platelets, neutrophils) or type 2 (lymphocytes) transcriptomes in agreement with the blood cell transcriptomes analyzed above. Smokers, alcohol consumers (>30 g/day), obese and elderly people, men, and participants taking different categories of medication according to the ATC (Anatomical Therapeutic Chemicals) classification and also participants with different self-reported lifetime diseases show preferences for type 1 (and partly type M) transcriptomes while younger, under- and normal-weight participants, women, and non-consumers of medication associated preferentially with type 2. The degree of correlation with metagene expression was markedly higher for blood counts compared with the other phenotypes (Figure 5C).
Part of the blood count portraits indicated fingerprint-like correlation patterns specific for the different blood compounds (Figures 5A,B, Supplementary Figures 20, 21, and Figure 4H). The portraits of the phenotypes of the other categories partly resembled those of blood counts, this way reflecting close association between them. For example, the “aging” portrait (visualizing the correlation between age and transcriptome) can be understood as the superposition of the red blood cell (RBC)- and neutrophil (NE)-phenotype portraits indicating the increased levels of RBC and NE in elderly people (see next subsection). The “alcohol consumption” portrait also resembled the RBC-portrait while smoking revealed an eosinophil (EO)-like pattern. Increased eosinophil counts in smokers associated with lung function were reported for humans (Jensen et al.,
Part of the medication and disease history portraits can be interpreted similarly. Namely they reflect the fact that increased usage of medication and incidences for diseases are more prevalent in elderly people (see the mean age data of each of the phenotypes listed in Supplementary Table 1) and consequently were associated with increased RBC- and NE-levels and decreased lymphocyte (LY) counts (Supplementary Figures 24, 25).
Other phenotype portraits, e.g., those of different age ranges (see next subsection) and of different medications, cannot be simply interpreted as composites of the blood count portraits. For a more detailed view we performed correlation and multiple regression analysis to estimate the particular effect of phenotypes on spot expression (Supplementary Figures 21C–F, 25C–F). We found a close relationship between high correlation coefficients and significant contributions of phenotype-coefficients (p < 10−6) especially for spots located in the lower left and upper right corners of the map. These refer, first of all, to age, obesity, gender, RBC, and white blood cell (WBC) counts, and LY, medications of the groups C (cardiovascular system) and B (blood forming organs) and the previous diseases HL (hyperlipidemia), DIA (diabetes), HT (hypertension), and CAN (cancer).
In summary, phenotype portrayal visualizes fine structures of the effect of health and lifestyle factors on the blood transcriptome. They reflect alterations of blood cell composition and presumably also the specifics of the transcriptional programs activated in the different cells. The transcriptome types (and subtypes) resolve the heterogeneity of blood transcriptomes while the spot modules provide a metric for its quantification. Overall, the phenotype portraits enable an intuitive, perception-based interpretation in terms of function and mutual associations between the different features.
Portrayal of Aging
Aging and alterations of the BMI are accompanied by changes of the composition of transcriptome types in a gender-specific fashion (Figure 2D). Functional analysis shows that expression of type 1_up transcriptomes gains with age while the expression of type 2_up decays on average (see the plots of age-ranked samples in Supplementary Figures 7–13, all showing enrichment of type 2 transcriptomes at younger ages and of type 1 transcriptomes at higher ages). Plots of spot expression as a function of age and BMI reveal further details (Figure 6A). Spot expressions related to red blood (spot C) and platelet (spot N) characteristics increase as a function of age and BMI with differences between the mean LOESS-curves for men and women (compare the red and blue curves) in correspondence with the blood count data (Supplementary Figure 20). In turn, the expression curves of spots related to immunity (I and J) decay with age and BMI in a nearly sex-independent fashion. On the other hand, the curves show similar courses at different levels for the transcriptomic types, which suggests type-independent aging tendencies. The aging curves are partly non-linear where the slopes get steeper for ages above 55–60 years (e.g., for spot A and I, indicative for inflammation and immune response, respectively) or above 65–70 years (spot L, IFN response), which suggests altered mechanisms in elderly people above certain age thresholds. Importantly, individual expression values of the spots show high variance about the LOESS-curves largely exceeding the mean changes observed over the age range studied between 40 and 80 years. This result suggests that the inter-individual variability of the activity of underlying molecular programs exceeds the intra-individual changes upon aging. Recent longitudinal follow-up studies on different molecular markers indeed show that inter-individual age-dependencies strongly scatter about the mean aging curve and presumably better describe aging trends than the overall curve (Alpert et al.,
Figure 6

Aging and BMI characteristics of the blood transcriptome: (A) Expression of selected spots as a function of age and BMI. Separate LOESS (local weighted scatterplot smoothing) fits for women and men (red and blue curves) and for types 1, M, and 2 visualize mean spot expressions as a function of age and BMI. The course is mostly non-linear and change slope at different turning points (see arrows). (B) Genes from previous aging sets (Peters et al.,
Note also that the scattering of individual values about the LOESS-curve is larger for spots showing increasing expression with age (e.g., spots A, C, and N) than for spots of decaying mean expression (e.g., spots I, J) which is in parallel with the larger heterogeneity of associated processes (see below). Gene maps of previous aging signatures (Peters et al.,
The mean aging portrait (“all ages” in Figure 6C) corresponds to the distribution of aging_up and aging_dn genes of the aging signature (Peters et al.,
Obesity and Serum Markers
The mean BMI-portrait (“all BMI” in Figure 6D) shows characteristics of type 1 transcriptomes without the NE-like patterns and the elevated expression of spot L (IFN-response) observed in the respective aging portrait. Interestingly, the BMI-stratified portraits “switch” from type 2 into type 1 for obese women and men (BMI > 30 kg/m2), due to gained (positive) correlations between BMI and inflammatory (spot A), RBC- (spot C), and platelet (spot N) characteristics, on one hand, and decaying immune response (spots I, J) expression signatures on the other one. Interestingly, this behavior is possibly associated with the so-called obesity-paradox claiming that an intermediate BMI about 25 kg/m2 is associated with minimum health risk (Wild and Byrne,
For further comparison, we generated phenotype (correlation) portraits of four selected serum protein markers (Figure 6E). The portraits of hsCRP (human serum C-reactive protein) and of cytostatin C reflect footprints of inflammation (spot O) and IFN-response (spot L) in the blood transcriptome were associated with NE-like patterns of the blood counts. The portrait of ferritin closely resembled that of RBC reflecting correspondence between the level of stored iron and erythrocyte expression (spot C). The transferrin portrait revealed a different patterns associating with the diminished spots O (inflammation) and especially L (IFN-response) and the enhanced spot N (thrombocytes), possibly due to the role of platelets in iron transport (Brieland et al.,
Discussion
We “portrayed” the diversity of the blood transcriptome of a cohort of more than 3,000 nominally healthy adult individuals included in the Leipzig Health “LIFE-adult” Study in terms of intuitive SOM-images and classified them into three major transcriptome types. The expression patterns decomposed into a minimum set of modules of co-regulated genes. Their functional impact can be interpreted based on the results of previous blood transcriptome studies. Finally, we associated the blood transcriptomes with a series of phenotype-features collected in the study for the same participants such as age, obesity-status, blood cell count, disease history, and medication by means of phenotype portraits. Overall, machine learning provided a comprehensive characterization of the diversity of the blood transcriptome taking into account the whole spectrum of transcriptional states on a population-wide scale in the context of health and lifestyle factors.
Overall, the strength of the study consisted in the large and novel set of molecular and associated phenotype data and in the comprehensive description of the blood transcriptome in terms of a holistic approach, which extracts, describes, and visualizes the multidimensional relationships between intrinsic modes of variation and their associations with health and lifestyle factors. Its limitations, on the other hand, can be seen in the fact that the visualization capabilities partly mask the evaluation of rigor and stringency in comparing different conditions, which require separate ways of presentation. Another limitation is the solely cross-sectional design, which impedes full entanglement of relations between individual and population-averaged trends.
SOM-Portrayal Reduces Dimensions of the Blood Transcriptome
Dimension-reduction and feature extraction are important issues in high-throughput data analysis (Binder et al.,
The tree in Figure 7A illustrates the similarities between the subtype portraits, which are virtually linearly arranged along a common backbone. The portraits at the left and right margins (type 1-vs.-type 2) differ mainly in the antagonistic expression of genes located in opposite corners of their portraits. Our analysis thus uncovered a striking simplicity of the transcriptome at the coarsest level of approximation. It reflects characteristic alterations of transcriptional programs referring to different cell components, namely a decrease in signatures of myeloid-lineage cells and an increase of signatures of lymphocytes from the left to the right. The transcriptional (spot-) modules diversify these basic patterns in a subtype-specific fashion. Namely it indicates continuous expression change along the subtypes related to immune response (spots I, J) and cytotoxic cells (H) with potential impact for longevity, and, in addition, also subtype-specific expression related to erythrocytes and platelets (C, N) giving rise to gender-specific differences. A third category shows the activation patterns spread over all subtypes related to IFN-response reflecting partly viral infections. It increases, on average, in elderly people especially above 65 years.
Figure 7

Portrayal of the blood transcriptome: (A) The similarity tree reflects a virtually linear arrangement of subtypes due to a continuum of transcriptional states ranging from type 1 to type 2. Selected phenotype portraits illustrate correlation of the respective features with the transcriptomes of different types. (B) Main functions and phenotypes associating with the transcriptome types.
Footprints of Aging, Telomere Maintenance, and Epigenetics
On a cross sectional population scale our data provide information about aging between mid-life (30–50 years) and elderly (70–80 years) women and men. In addition to the systematic changes of inflammation characteristics and immune response, aging relates to epigenetic factors and to telomere length dynamics (Figure 7B). Telomeres serving as protective nucleoprotein structures that cap the ends of chromosomes shorten systematically with age in result of repeated cell divisions (Mather et al.,
Our analysis also emphasizes the importance of epigenetic mechanisms, particularly of chromatin (re-) organization for changes of the blood transcriptome. We found a pronounced mutual switching between type 1 and type 2 transcriptomes using gene expression of nominally repressed and activated chromatin states in blood cells as an indicator of gene activity. This result suggests that part of active states in type 2 become repressed in type 1 and vice versa, that part of repressed states in type 2 become activated in type 1. Hence, part of the expression changes observed were associated with changed chromatin organization leading to altered cell function as discussed in the context of aging (Ciccarone et al.,
Transcriptome typing and modularization describes the effect of age and BMI on the blood transcriptome, and in a wider context, on a human's physiology via association with lifestyle characteristics. The percentage of type 1 transcriptomes in the population relating to inflammation gains with age and, to a less degree, with BMI in a non-linear, gender-specific fashion. It is known that obesity is associated with leukocytosis representing a state of chronic low-grade inflammation (Herishanu et al.,
Conclusions
Machine learning offers a promising option to analyze omics data sets in the epidemiological context. We characterized the human blood in terms of transcriptome types and functional gene modules and their association with health-, lifestyle- and age-related phenotypes. It has impacts for future applications for diagnosis and prognosis via the refinement of existing and the development of novel predictors for age, lifestyle, and disease outcomes. The individual portrayal of transcriptomes and of their associations with phenotype features in terms of easy-to-interpret images offers perspectives for visual perception-based personalized diagnostics. Large scale longitudinal studies and paired transcriptome-epigenome investigations are needed to better understand lifetime courses, causal relationships, and mechanisms of (epi-)genomic regulation.
Statements
Data availability statement
The data that support the findings of this study are available from the LIFE center but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of LIFE. Secondary data are available as SOM-data via the Leipzig Health Atlas under the link https://www.health-atlas.de/data_files/76?version=1 and https://www.health-atlas.de/som_browser/201611_LIFE_Transcriptome/Summary.html (pdf and html reports). Data can be interactively discovered using the oposSOM browser functionality available under https://www.izbi.uni-leipzig.de/opossom-browser/ and https://apps.health-atlas.de/opossom-browser/?dataset=5.
Ethics statement
The studies involving human participants were reviewed and approved by ethics board of the Medical Faculty of the University of Leipzig. The patients/participants provided their written informed consent to participate in this study.
Author contributions
HB and HL-W: conceived the study. MS and HB: wrote this paper. MS, LH, HB, and HL-W: performed analysis. HL-W, MS, and AA: downstream analysis methods development. HK: preprocessing of transcriptomics data. KW, CE, RB, and KK: collection and curation of phenotype data. ML and JT: coordinators of LIFE research center. All authors read and approved the final manuscript.
Funding
This publication was supported by LIFE—Leipzig Research Center for Civilization Diseases, Leipzig University funded by means of the European Social Fund, the Free State of Saxony. This work was further supported by the Federal Ministry of Education and Research (BMBF) i:DSem project Leipzig Health Atlas (www.health-atlas.de, to HB, HL-W, and ML), the collaborative projects with Armenia PathwayMaps (WTZ ARM II-010 and 01ZX1304A to HB and AA) and oBIG (FFE-0034 to HL-W), and the Systems Biology programme project CapSys (to HB, LH, and ML). The author(s) acknowledge support from the German Research Foundation (DFG) and Universität Leipzig within the program of Open Access Publishing.
Acknowledgments
This manuscript has been released as a pre-print at ResearchSquare (Schmidt et al.,
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Supplementary material
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fdata.2020.548873/full#supplementary-material
- A
Alimentary tract and metabolism
- AP
Angina pectoris
- ALC > 30
Participants consuming more than 30g alcohol per day
- ALC ≤ 30
Participants consuming less than 30g alcohol per day
- ALC
Alcohol consumption
- ART
Arthrosis
- AST
Asthma
- ATC
Anatomical Therapeutic Chemical classification system of medication
- B
Blood and blood forming organs
- BA
Basophils absolute (109/l)
- BAP
Basophils (%)
- BD
Sepsis type “blood disturbant”
- BMI
Body mass index
- BP
GO-term “biological process”
- C
Cardiovascular system
- CAN
Cancer
- CAP
Community acquired pneumonia
- CAT
Cataract
- cPAT
Combinatorial pattern type
- CTL
Cytotoxic T lymphocytes
- D
Dermatologics
- DEP
Depression
- DIA
Diabetes
- DNMT1
DNA-methylation maintenance methyltranferase
- EO
Eosinophils absolute (109/l)
- EOP
Eosinophils (%)
- EXSMO
Ex-smoker
- G
Genitourinary system and sex hormones
- GLA
Glaucoma
- GO
Gene ontology
- GOU
Gout
- GSZ
Gene set enrichment z-score
- H
Systemic hormonal preparations, excl. sex hormones and insulins
- HA
Heart attack
- HCT
Hematocrit (l/l)
- HEP
Hepatitis
- HGB
Hemoglobin (SI units, mmol/l)
- HGBK
Hemoglobin (conv. units, g/dl)
- HL
Hyper-lipidemia
- HS
Sepsis type “high severity”
- hsCRP
Human serum C-reactive protein
- HT
Hypertension
- HZO
Hzoster
- IFN
Interferon
- J
Anti-infective for systemic use
- L
Antineoplastic and immunomodulating agents
- LIFE(-adult)
Leipzig Research Center for Civilization Diseases
- LOESS
Locally estimated scatterplot smoothing
- LS
Sepsis type “healthy and low severity”
- LY
Lymphocytes absolute (109/l)
- LYP
Lymphocytes (%)
- M
Muscular-skeletal system
- MCH
Mean corpuscular hemoglobin (SI units, fmol)
- MCHC
Mean corpuscular hemoglobin concentration (SI units, mmol/l)
- MCHCK
Mean corpuscular hemoglobin concentration (conv. units, g/dl)
- MCHK
Mean corpuscular hemoglobin (conv. units, pg)
- MCV
Mean corpuscular volume (fl)
- MO
Monocytes absolute (109/l)
- MOP
Monocytes (%)
- MPV
Mean platelet volume (fl)
- MS
Sepsis type “medium severity”
- N
Nervous system
- NE
Neutrophils absolute (109/l)
- NEP
Neutrophils (%)
- NONSMO
Non-smoker
- nwt
Normal weight
- ob
Obese
- P
Antiparasitic products, insecticides, and repellents
- PLT
Platelets (109/l)
- PRC2
Polycomb repressive complex 2
- pre-ob
Pre-obese
- R
Respiratory system
- RBC
Erythrocytes (1012/l)
- RETI
Reticulocytes (/1000)
- RHE
Rheuma
- S
Sensory organs
- SEP
Sepsis
- SMO
Smoker
- SMO
Smoking
- SOM
Self-organizing maps
- ST
Subtype
- TF
Transcription factor
- THY
Thyroid
- TM
Telomere length maintenance
- TssA
Genes with active promoter
- Tx
Genes with completed transcription
- uwt
Underweight
- V
Various
- WBC
Leucocytes (109/l)
- WPB
Whole peripheral blood.
Glossary
References
1
AhadiS.ZhouW.Schüssler-Fiorenza RoseS. M.SailaniM. R.ContrepoisK.AvinaM.et al. (2020). Personal aging markers and ageotypes revealed by deep longitudinal profiling. Nat. Med.26, 83–90. 10.1038/s41591-019-0719-5
2
AlpertA.PickmanY.LeipoldM.Rosenberg-HassonY.JiX.GaujouxR.et al. (2019). A clinically meaningful metric of immune age derived from high-dimensional longitudinal monitoring. Nat. Med.25, 487–495. 10.1038/s41591-019-0381-y
3
AltmanM. C.RinchaiD.BaldwinN.WhalenE.GarandM.KabeerB. A.et al. (2019). A novel repertoire of blood transcriptome modules based on co-expression patterns across sixteen disease and physiological states. BioRxiv [Preprint]525709. 10.1101/525709
4
Andres-TerreM.McGuire HelenM.PouliotY.BongenE.Sweeney TimothyE.Tato CristinaM.et al. (2015). Integrated, multi-cohort analysis identifies conserved transcriptional signatures across multiple respiratory viruses. Immunity43, 1199–1211. 10.1016/j.immuni.2015.11.003
5
ArakelyanA.NersisyanL.NikoghosyanM.HakobyanS.SimonyanA.HoppL.et al. (2019). Transcriptome-guided drug repositioning. Pharmaceutics11:677. 10.3390/pharmaceutics11120677
6
BairdA. E.SoperS. A.PullagurlaS. R.AdamskiM. G. (2015). Recent and near-future advances in nucleic acid-based diagnosis of stroke. Expert Rev. Mol. Diagn.15, 665–679. 10.1586/14737159.2015.1024660
7
BalliuB.DurrantM.GoedeO. D.AbellN.LiX.LiuB.et al. (2019). Genetic regulation of gene expression and splicing during a 10-year period of human aging. Genome Biol.20:230. 10.1186/s13059-019-1840-y
8
BarthelF. P.WeiW.TangM.Martinez-LedesmaE.HuX.AminS. B.et al. (2017). Systematic analysis of telomere length and somatic alterations in 31 cancer types. Nat. Genet.49, 349–357. 10.1038/ng.3781
9
BellC. G.LoweR.AdamsP. D.BaccarelliA. A.BeckS.BellJ. T.et al. (2019). DNA methylation aging clocks: challenges and recommendations. Genome Biol.20:249. 10.1186/s13059-019-1824-y
10
BilzN. C.WillscherE.BinderH.BöhnkeJ.StaniferM. L.HübnerD.et al. (2019). Teratogenic rubella virus alters the endodermal differentiation capacity of human induced pluripotent stem cells. Cells8:870. 10.3390/cells8080870
11
BinderH.HoppL.LembckeK.WirthH. (2015). Personalized disease phenotypes from massive OMICs data, in Big Data Analytics in Bioinformatics and Healthcare, eds BaoyingW.RuowangL.WilliamP. (Hershey, PA: IGI Global), 359–378.
12
BinderH.HoppL.SchweigerM. R.HoffmannS.JühlingF.KerickM.et al. (2017). Genomic and transcriptomic heterogeneity of colorectal tumours arising in Lynch syndrome. J. Pathol.243, 242–254. 10.1002/path.4948
13
BinderH.WillscherE.Loeffler-WirthH.HoppL.JonesD. T. W.PfisterS. M.et al. (2019). DNA methylation, transcriptome and genetic copy number signatures of diffuse cerebral WHO grade II/III gliomas resolve cancer heterogeneity and development. Acta Neuropathol. Commun.7:59. 10.1186/s40478-019-0704-8
14
BinderH.WirthH.ArakelyanA.LembckeK.TiysE. S.IvanishenkoV.et al. (2014). Time-course human urine proteomics in space-flight simulation experiments. BMC Genomics15:S2. 10.1186/1471-2164-15-S12-S2
15
BinderH.WirthH. (2014). Analysis of large-scale OMIC data using self organizing Maps, in Encyclopedia of Information Science and Technology, 3rd Edn, ed Khosrow-PourM. (Hershey, PA: IGI Global), 1642–1654.
16
BotelhoF. M.Llop-GuevaraA.TrimbleN. J.NikotaJ. K.BauerC. M. T.LambertK. N.et al. (2011). Cigarette smoke differentially affects Eosinophilia and remodeling in a model of house dust mite Asthma. Am. J. Respir. Cell Mol. Biol.45, 753–760. 10.1165/rcmb.2010-0404OC
17
BrielandJ. K.VissersM. C. M.PhanS. H.FantoneJ. C. (1989). Human platelets mediate iron release from transferrin by adenine nucleotide-dependent and -independent mechanisms. Biochim. Biophys. Acta Biomembr.978, 191–196. 10.1016/0005-2736(89)90114-4
18
BurczynskiM. E.DornerA. J. (2006). Transcriptional profiling of peripheral blood cells in clinical pharmacogenomic studies. Pharmacogenomics7, 187–202. 10.2217/14622416.7.2.187
19
BurnhamK. L.DavenportE. E.RadhakrishnanJ.HumburgP.GordonA. C.HuttonP.et al. (2017). Shared and distinct aspects of the sepsis transcriptomic response to fecal peritonitis and pneumonia. Am. J. Respir. Crit. Care Med.196, 328–339. 10.1164/rccm.201608-1685OC
20
BurtonK. J.PimentelG.ZanggerN.VionnetN.DraiJ.McTernanP. G.et al. (2018). Modulation of the peripheral blood transcriptome by the ingestion of probiotic yoghurt and acidified milk in healthy, young men. PLoS ONE13:e0192947. 10.1371/journal.pone.0192947
21
BusslingerM.TarakhovskyA. (2014). Epigenetic control of immunity. Cold Spring Harb. Perspect. Biol.6:a019307. 10.1101/cshperspect.a019307
22
CakirM. V.BinderH.WirthH. (2014). Profiling of genetic switches using Boolean implications in expression data. J. Integr. Bioinform11, 246. 10.1515/jib-2014-246
23
CakirM. V.Wirth-LoefflerH.ArakelyanA.BinderH. (2017). Dysregulated signal propagation in a MYC-associated gene network in B-cell lymphoma. Biol. Eng. Med. 2, 1–11. 10.15761/BEM.1000115
24
ChaussabelD. (2015). Assessment of immune status using blood transcriptomics and potential implications for global health. Semin. Immunol.27, 58–66. 10.1016/j.smim.2015.03.002
25
ChaussabelD.PascualV.BanchereauJ. (2010). Assessing the human immune system through blood transcriptomics. BMC Biol.8:84. 10.1186/1741-7007-8-84
26
ChaussabelD.QuinnC.ShenJ.PatelP.GlaserC.BaldwinN.et al. (2008). A modular analysis framework for blood genomics studies: application to systemic lupus erythematosus. Immunity29, 150–164. 10.1016/j.immuni.2008.05.012
27
CiccaroneF.TagliatestaS.CaiafaP.ZampieriM. (2018). DNA methylation dynamics in aging: how far are we from understanding the mechanisms?Mech. Ageing Dev.174, 3–17. 10.1016/j.mad.2017.12.002
28
CoatesP. J.RundleJ. K.LorimoreS. A.WrightE. G. (2008). Indirect macrophage responses to ionizing radiation: implications for genotype-dependent bystander signaling. Cancer Res.68, 450–456. 10.1158/0008-5472.CAN-07-3050
29
CoddV.NelsonC. P.AlbrechtE.ManginoM.DeelenJ.BuxtonJ. L.et al. (2013). Identification of seven loci affecting mean telomere length and their association with disease. Nat. Genet.45, 422–427. 10.1038/ng.2528
30
DanielS.NylanderV.IngerslevL. R.ZhongL.FabreO.CliffordB.et al. (2018). T cell epigenetic remodeling and accelerated epigenetic aging are linked to long-term immune alterations in childhood cancer survivors. Clin. Epigenet.10:138. 10.1186/s13148-018-0561-5
31
DavenportE. E.BurnhamK. L.RadhakrishnanJ.HumburgP.HuttonP.MillsT. C.et al. (2016). Genomic landscape of the individual host response and outcomes in sepsis: a prospective cohort study. Lancet Respir. Med.4, 259–271. 10.1016/S2213-2600(16)00046-1
32
de JongS.NeelemanM.LuykxJ. J.ten BergM. J.StrengmanE.Den BreeijenH. H.et al. (2014). Seasonal changes in gene expression represent cell-type composition in whole blood. Hum. Mol. Genet.23, 2721–2728. 10.1093/hmg/ddt665
33
de MeulderB.LefaudeuxD.BansalA. T.MazeinA.ChaiboonchoeA.AhmedH.et al. (2018). A computational framework for complex disease stratification from multiple large-scale datasets. BMC Syst. Biol.12:60. 10.1186/s12918-018-0556-z
34
DumeauxV.OlsenK. S.NuelG.PaulssenR. H.Børresen-DaleA.-L.LundE. (2010). Deciphering normal blood gene expression variation—the NOWAC postgenome study. PLoS Genet.6:e1000873. 10.1371/journal.pgen.1000873
35
EpelE. S.PratherA. A. (2018). Stress, telomeres, and psychopathology: toward a deeper understanding of a triad of early aging. Annu. Rev. Clin. Psychol.14, 371–397. 10.1146/annurev-clinpsy-032816-045054
36
FosterS. L.HargreavesD. C.MedzhitovR. (2007). Gene-specific control of inflammation by TLR-induced chromatin modifications. Nature447, 972–978. 10.1038/nature05836
37
GardnerM.BannD.WileyL.CooperR.HardyR.NitschD.et al. (2014). Gender and telomere length: systematic review and meta-analysis. Exp. Gerontol.51, 15–27. 10.1016/j.exger.2013.12.004
38
GerberT.WillscherE.Loeffler-WirthH.HoppL.SchadendorfD.SchartlM.et al. (2017). Mapping heterogeneity in patient-derived melanoma cultures by single-cell RNA-seq. Oncotarget8, 846–862. 10.18632/oncotarget.13666
39
GielenM.HagemanG. J.AntoniouE. E.NordfjallK.ManginoM.BalasubramanyamM.et al. (2018). Body mass index is negatively associated with telomere length: a collaborative cross-sectional meta-analysis of 87 observational studies. Am. J. Clin. Nutr.108, 453–475. 10.1093/ajcn/nqy107
40
GoldingerA.ShakhbazovK.HendersA. K.McRaeA. F.MontgomeryG. W.PowellJ. E. (2015). Seasonal effects on gene expression. PLoS ONE10:e0126995. 10.1371/journal.pone.0126995
41
HanashS. M.BaikC. S.KallioniemiO. (2011). Emerging molecular biomarkers—blood-based strategies to detect and monitor cancer. Nat. Rev.Clin. Oncol.8, 142–150. 10.1038/nrclinonc.2010.220
42
HashimotoK.KounoT.IkawaT.HayatsuN.MiyajimaY.YabukamiH.et al. (2019). Single-cell transcriptomics reveals expansion of cytotoxic CD4 T cells in supercentenarians. Proc. Natl. Acad. Sci. U.S.A.116, 24242–24251. 10.1073/pnas.1907883116
43
HaycockP. C.HeydonE. E.KaptogeS.ButterworthA. S.ThompsonA.WilleitP. (2014). Leucocyte telomere length and risk of cardiovascular disease: systematic review and meta-analysis. BMJ349:g4227. 10.1136/bmj.g4227
44
HerishanuY.RogowskiO.PolliackA.MarilusR. (2006). Leukocytosis in obese individuals: possible link in patients with unexplained persistent neutrophilia. Eur. J. Haematol.76, 516–520. 10.1111/j.1600-0609.2006.00658.x
45
HiguchiT.OmataF.TsuchihashiK.HigashiokaK.KoyamadaR.OkadaS. (2016). Current cigarette smoking is a reversible cause of elevated white blood cell count: cross-sectional and longitudinal studies. Prev. Med. Rep.4, 417–422. 10.1016/j.pmedr.2016.08.009
46
HomuthG.WahlS.MüllerC.SchurmannC.MäderU.BlankenbergS.et al. (2015). Extensive alterations of the whole-blood transcriptome are associated with body mass index: results of an mRNA profiling study involving two large population-based cohorts. BMC Med. Genomics8:65. 10.1186/s12920-015-0141-x
47
HoppL.Loeffler-WirthH.NersisyanL.ArakelyanA.BinderH. (2018b). Footprints of sepsis framed within community acquired pneumonia in the blood transcriptome. Front. Immunol.9:1620. 10.3389/fimmu.2018.01620
48
HoppL.Löffler-WirthH.GalleJ.BinderH. (2018a). Combined SOM-portrayal of gene expression and DNA methylation landscapes disentangles modes of epigenetic regulation in glioblastoma. Epigenomics10, 745–764. 10.2217/epi-2017-0140
49
HoppL.NersisyanL.Löffler-WirthH.ArakelyanA.BinderH. (2015a). Epigenetic heterogeneity of B-cell lymphoma: chromatin modifiers. Genes6:1076. 10.3390/genes6041076
50
HoppL.WillscherE.Wirth-LoefflerH.BinderH. (2015b). Function shapes content: DNA-methylation marker genes and their impact for molecular mechanisms of glioma. J. Can. Res. Updates4, 127–148. 10.6000/1929-2279.2015.04.04.1
51
HoppL.WirthH.FasoldM.BinderH. (2013). Portraying the expression landscapes of cancer subtypes: a glioblastoma multiforme and prostate cancer case study. Syst. Biomed.1, 99–121. 10.4161/sysb.25897
52
HoppL.Wirth-LoefflerH.BinderH. (2015c). Epigenetic heterogeneity of B-cell lymphoma: DNA-methylation, gene expression and chromatin states. Genes6, 812–840. 10.3390/genes6030812
53
HorvathS. (2013). DNA methylation age of human tissues and cell types. Genome Biol.14:R115. 10.1186/gb-2013-14-10-r115
54
JensenE. J.PedersenB.NarvestadtE.DahlR. (1998). Blood eosinophil and monocyte counts are related to smoking and lung function. Respir. Med.92, 63–69. 10.1016/S0954-6111(98)90034-8
55
JohannsenN. M.PriestE. L.DixitV. D.EarnestC. P.BlairS. N.ChurchT. S. (2010). Association of white blood cell subfraction concentration with fitness and fatness. Br. J. Sports Med.44, 588–593. 10.1136/bjsm.2008.050682
56
KarstenS. L.KudoL. C.BraginA. J. (2011). Use of peripheral blood transcriptome biomarkers for epilepsy prediction. Neurosci. Lett.497, 213–217. 10.1016/j.neulet.2011.03.019
57
KunzM.Löffler-WirthH.DannemannM.WillscherE.DooseG.KelsoJ.et al. (2018). RNA-seq analysis identifies different transcriptomic types and developmental trajectories of primary melanomas. Oncogene37, 6136–6151. 10.1038/s41388-018-0385-y
58
LaphamK.KvaleM. N.LinJ.ConnellS.CroenL. A.DispensaB. P.et al. (2015). Automated assay of telomere length measurement and informatics for 100,000 subjects in the genetic epidemiology research on adult health and aging (GERA) cohort. Genetics200, 1061–1072. 10.1534/genetics.115.178624
59
LeungC. W.FungT. T.McEvoyC. T.LinJ.EpelE. S. (2018). Diet quality indices and leukocyte telomere length among healthy US adults: data from the National health and nutrition examination survey, 1999–2002. Am. J. Epidemiol.187, 2192–2201. 10.1093/aje/kwy124
60
LiberzonA.BirgerC.ThorvaldsdóttirH.GhandiM.Mesirov JillP.TamayoP. (2015). The molecular signatures database hallmark gene set collection. Cell Syst.1, 417–425. 10.1016/j.cels.2015.12.004
61
LoefflerM.EngelC.AhnertP.AlfermannD.ArelinK.BaberR.et al. (2015). The LIFE-Adult-Study: objectives and design of a population-based cohort study with 10,000 deeply phenotyped adults in Germany. BMC Public Health15:691. 10.1186/s12889-015-1983-z
62
Loeffler-WirthH.KreuzM.HoppL.ArakelyanA.HaakeA.CogliattiS. B.et al. (2019). A modular transcriptome map of mature B cell lymphomas. Genome Med. 11:27. 10.1186/s13073-019-0637-7
63
Löffler-WirthH.KalcherM.BinderH. (2015). oposSOM: R-package for high-dimensional portraying of genome-wide expression landscapes on bioconductor. Bioinformatics31, 3225–3227. 10.1093/bioinformatics/btv342
64
Lorente-SorollaC.Garcia-GomezA.Català-MollF.ToledanoV.CiudadL.Avendaño-OrtizJ.et al. (2019). Inflammatory cytokines and organ dysfunction associate with the aberrant DNA methylome of monocytes in sepsis. Genome Med.11:66. 10.1186/s13073-019-0674-2
65
MatherK. A.JormA. F.ParslowR. A.ChristensenH. (2010). Is telomere length a biomarker of aging? A review. J. Gerontol. Ser. A66A, 202–213. 10.1093/gerona/glq180
66
McLachlanJ. L.SmithA. J.BujalskaI. J.CooperP. R. (2005). Gene expression profiling of pulpal tissue reveals the molecular complexity of dental caries. Biochim. Biophys. Acta Mol. Basis Dis.1741, 271–281. 10.1016/j.bbadis.2005.03.007
67
NéemethZ. H.LeibovichS. J.DeitchE. A.ViziE. S.SzabóCHaskóG. (2003). cDNA microarray analysis reveals a nuclear factor-κB-independent regulation of macrophage function by adenosine. J. Pharmacol. Exp. Ther.306, 1042–1049. 10.1124/jpet.103.052944
68
NersisyanL.HoppL.Loeffler-WirthH.GalleJ.LoefflerM.ArakelyanA.et al. (2019). Telomere length maintenance and its transcriptional regulation in lynch syndrome and sporadic colorectal carcinoma. Front. Oncol.9:1172. 10.3389/fonc.2019.01172
69
NewmanA. M.LiuC. L.GreenM. R.GentlesA. J.FengW.XuY.et al. (2015). Robust enumeration of cell subsets from tissue expression profiles. Nat. Methods12:453. 10.1038/nmeth.3337
70
NikoghosyanM.HakobyanS.HovhannisyanA.Loeffler-WirthH.BinderH.ArakelyanA. (2019). Population levels assessment of the distribution of disease-associated variants with emphasis on armenians – a machine learning approach. Front. Genet .10:394. 10.3389/fgene.2019.00394
71
OeseburgH.de BoerR. A.van GilstW. H.van der HarstP. (2010). Telomere biology in healthy aging and disease. Pflugers Arch.459, 259–268. 10.1007/s00424-009-0728-1
72
PenaO. M.HancockD. G.LyleN. H.LinderA.RussellJ. A.XiaJ.et al. (2014). An endotoxin tolerance signature predicts sepsis and organ dysfunction at initial clinical presentation. EBioMed.1, 64–71. 10.1016/j.ebiom.2014.10.003
73
PetersM. J.JoehanesR.PillingL. C.SchurmannC.ConneelyK. N.PowellJ.et al. (2015). The transcriptional landscape of age in human peripheral blood. Nat. Commun.6:8570. 10.1038/ncomms9570
74
PolonisK.SompalliS.BecariC.XieJ.CovassinN.SchulteJ. P.et al. (2019). Telomere length and risk of major adverse cardiac events and cancer in obstructive sleep apnea patients. Cells8:381. 10.3390/cells8050381
75
RayD.YungR. (2018). Immune senescence, epigenetics and autoimmunity. Clin Immunol.196, 59–63. 10.1016/j.clim.2018.04.002
76
RembachA.RyanT. M.RobertsB. R.DoeckeJ. D.WilsonW. J.WattA. D.et al. (2013). Progress towards a consensus on biomarkers for Alzheimer's disease: a review of peripheral analytes. Biomark. Med.7, 641–662. 10.2217/bmm.13.59
77
Roadmap Epigenomics ConsortiumKundajeA.MeulemanW.ErnstJ.BilenkyM.YenA.et al. (2015). Integrative analysis of 111 reference human epigenomes. Nature518, 317–330. 10.1038/nature14248
78
RodeL.NordestgaardB. G.BojesenS. E. (2015). Peripheral blood leukocyte telomere length and mortality among 64,637 individuals from the general population. J Natl Cancer Inst.107:djv074. 10.1093/jnci/djv074
79
SchmidtM.Loeffler-WirthH.HoppL.ArakelyanA.ScholzM.KirstenH.et al. (2019). Portrayal of the human blood transcriptome of 3,388 adults and its relation to ageing and health. Res. Square.1–50. 10.21203/rs.2.19387/v1
80
SciclunaB. P.van VughtL. A.ZwindermanA. H.WiewelM. A.DavenportE. E.BurnhamK. L.et al. (2017). Classification of patients with sepsis according to blood genomic endotype: a prospective cohort study. Lancet Respir. Med.5, 816–826. 10.1016/S2213-2600(17)30294-1
81
ShawiM.AutexierC. (2008). Telomerase, senescence and ageing. Mech. Ageing Dev.129, 3–10. 10.1016/j.mad.2007.11.007
82
SohnE. (2017). Diagnosis: frontiers in blood testing. Nature549:S16. 10.1038/549S16a
83
SubramanianA.TamayoP.MoothaV. K.MukherjeeS.EbertB. L.GilletteM. A.et al. (2005). Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. U.S.A.102, 15545–15550. 10.1073/pnas.0506580102
84
SweeneyT. E.WongH. R.KhatriP. (2016). Robust classification of bacterial and viral infections via integrated host gene expression diagnostics. Sci. Transl. Med.8:346ra91. 10.1126/scitranslmed.aaf7165
85
TanF.-L.MoravecC. S.LiJ.Apperson-HansenC.McCarthyP. M.YoungJ. B.et al. (2002). The gene expression fingerprint of human heart failure. Proc. Natl. Acad. Sci. U.S.A.99, 11387–11392. 10.1073/pnas.162370099
86
ThalheimT.HoppL.BinderH.AustG.GalleJ. (2018). On the cooperation between epigenetics and transcription factor networks in the specification of tissue stem cells. Epigenomes2:20. 10.3390/epigenomes2040020
87
TownsendM. K.AschardH.de VivoI.MichelsK. B.KraftP. (2016). Genomics, telomere length, epigenetics, and metabolomics in the nurses' health studies. Am. J. Public Health106, 1663–1668. 10.2105/AJPH.2016.303344
88
VesantoJ. (1999). SOM-based data visualization methods. Intellig. Data Anal.3, 111–126. 10.3233/IDA-1999-3203
89
WangL.OhW. K.ZhuJ. (2016). Disease-specific classification using deconvoluted whole blood gene expression. Sci. Rep.6:32976. 10.1038/srep32976
90
WildS. H.ByrneC. D. (2016). Body mass index and mortality: understanding the patterns and paradoxes. BMJ353:i2433. 10.1136/bmj.i2433
91
WirthH.LöfflerM.von BergenM.BinderH. (2011). Expression cartography of human tissues using self organizing maps. BMC Bioinformatics12:306. 10.1186/1471-2105-12-306
92
WirthH.von BergenM.BinderH. (2012). Mining SOM expression portraits: Feature selection and integrating concepts of molecular function. BioData Mining5:18. 10.1186/1756-0381-5-18
93
WuX.HakimiM.WortmannM.ZhangJ.BöcklerD.DihlmannS. (2015). Gene expression of inflammasome components in peripheral blood mononuclear cells (PBMC) of vascular patients increases with age. Immun. Ageing12:15. 10.1186/s12979-015-0043-y
Summary
Keywords
self-organizing maps, omics and phenotype integration, age, lifestyle and obesity, gene expression, immune response, subtypes
Citation
Schmidt M, Hopp L, Arakelyan A, Kirsten H, Engel C, Wirkner K, Krohn K, Burkhardt R, Thiery J, Loeffler M, Loeffler-Wirth H and Binder H (2020) The Human Blood Transcriptome in a Large Population Cohort and Its Relation to Aging and Health. Front. Big Data 3:548873. doi: 10.3389/fdata.2020.548873
Received
07 April 2020
Accepted
02 September 2020
Published
30 October 2020
Volume
3 - 2020
Edited by
Vasilis Vasiliou, Yale University, United States
Reviewed by
Paolo Montuschi, Catholic University of the Sacred Heart, Italy; Stefano Monti, Boston University, United States
Updates

Check for updates
Copyright
© 2020 Schmidt, Hopp, Arakelyan, Kirsten, Engel, Wirkner, Krohn, Burkhardt, Thiery, Loeffler, Loeffler-Wirth and Binder.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Hans Binder binder@izbi.uni-leipzig.de
This article was submitted to Medicine and Public Health, a section of the journal Frontiers in Big Data
†These authors share senior authorship
Disclaimer
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.