Over-Generalizing About GC (Hypoxia): Pitfalls of Limiting Breadth of Experimental Systems and Analyses in Framing Informatics Conclusions

Accumulating evidence suggests that many immune responses are influenced by local nutrient concentrations in addition to the programming of intermediary metabolism within immune cells. Humoral immunity and germinal centers (GC) are settings in which these factors are under active investigation. Hypoxia is an example of how a particular nutrient is distributed in lymphoid follicles during an antibody response, and how oxygen sensors may impact the qualities of antibody output after immunization. Using exclusively a bio-informatic analysis of mRNA levels in GC and other B cells, recent work challenged the concept that there is any hypoxia or that it has any influence. To explore this proposition, we performed new analyses of published genomics data, explored potential sources of disparity, and elucidated aspects of the apparently conflicting conclusions. Specifically, replicability and variance among data sets derived from different naïve as well as GC B cells were considered. The results highlight broader issues that merit consideration, especially at a time of heightened focus on scientific reports in the realm of immunity and antibody responses. Based on these analyses, a standard is proposed under which the relationship of new data sets should be compared to prior “fingerprints” of cell types and reported transparently to referees and readers. In light of independent evidence of diversity within and among GC elicited by protein immunization, avoidance of overly broad conclusions about germinal centers in general when experimental systems are subject to substantial constraints imposed by technical features also is warranted.


INTRODUCTION
In the March 2020 issue of Nature Immunology, Weisel, Shlomchik, and co-workers presented interesting data pioneering the use of flow-purified B cells from BCR knock-in mice to explore substrate utilization and metabolic features of B lymphocytes (1). These included naïve B cells -in some but not all comparisons -and a population of germinal center (GC)phenotype B cells recovered from recipients with a monomorphic B cell population designed to avoid inclusion of other B cells into GC (1)(2)(3). Comparisons also involved B cells after T-independent activation in vivo (1). In light of the limits to using bio-informatic data to reach conclusions about biological systems, the new evidence about fatty acid oxidation (1) advances insights beyond gene expression profiles comparing naïve and GC B cells (4). However, the paper evoked a need to evaluate the conclusive statement that the "GCBC transcriptome is not commensurate with [….] hypoxia" and similar broad conclusions of the text. This claim seems connected to a view of the authors that RNA-Seq data with GCBC do not contain evidence of enrichment for genes encoding glycolytic enzymes, or that such increases relative to naïve B cells would necessarily show up in a metabolomics analysis with 13 C-labeled glucose. These issues prompted examination of these and other data sets in GEO. The results of the analyses point to limits to the conclusions as stated in (1); they also raise a broader question about the system used for this work.
Several papers (4-6) have documented results from intravital labeling with imidazole compounds that covalently modify cellular constituents when the mitochondria of viable cells operate under reductive conditions due to intracellular hypoxia (7)(8)(9). Work under controlled conditions has shown that a meaningful signal above background is obtained only when the ambient pO 2 is below about 1-1.5%, levels sufficient to yield HIF stabilization (7)(8)(9). Indeed, direct evidence of increased HIF-1a has been presented for both GC B cells (4,5) and their Tfh counterparts (10), suggesting that many GC light zones are hypoxic. Of note, any issue of HIF function requires understanding that BCR engagement and TLR stimulation cause sustained HIF-1a and HIF-2a stabilization, which presents a drawback to comparing activated versus GC B cells. It might formally be possible that duration of the hypoxia, BCR signaling, and HIF stabilization failed to yield changes in mRNA concentrations large enough for enough gene products to yield a "statistically significant" result in a gene set enrichment analysis (GSEA) algorithm. Indeed, using a gene signature derived with the human breast cancer-like cell line MCF7 (11), the authors' analysis of their purified GC B cells suggested that neither hypoxia-related nor HIF-1 target genes were enriched under their conditions of experimentation (1). Our previous published work had used GSEA with a gene signature indicative of biologically significant hypoxia (12) and scored the result of 2fold normalized enrichment as statistically significant (4). In contrast, application of this gene signature to the data sets of (1) yielded a balanced mix of increased versus decreased mRNA and was not "statistically significant". This difference along with other disparities of the data prompted us to compare the informatic findings while also using additional benchmarks that could test each report for independent replication.

METHODS AND TECHNICAL LOG
Datasets comparing RNA expression data from "naïve" (IgD+ or Follicular) versus immunization-induced GC B cells were mined from the GEO depositions of raw sequencing data, all of which were generated with Illumina Hi-Seq 2000 or 2500 instruments. For a bespoke pipeline, sequences were trimmed using the fastp FASTQ preprocessor for overrepresented sequences and to remove any sequence with a quality score <10 (13). Trimmed sequences were aligned to the mm10 mouse genome using the STAR sequence aligner of Dobin et al. (14). Aligned sequences were then quality-tested using Qualimap (15) software, and counted using featurecounts in Rsubread (16). To cross-check results and to use a commercial platform readily accessible to anyone seeking to re-analyze the data herein, the RNA-Seq platform within the suites of Basepair Technologies were used. Processing of the gene expression and PCA were cross-checked by using the commercial pipeline of Basepair Technologies as applied to the primary data for naïve and GC B cells of the papers cited as (1,4,17,18).
For heatmap generation (Figures 2A, B) via the Basepair Technology platform, first the raw read counts generated using STAR aligner and featurecounts were normalized using the DESeq2 package. DESeq2 (19) performs an internal normalization where geometric mean is calculated for each gene across all samples. The counts for a gene in each sample are then divided by this mean. The median of these ratios in a sample is the size factor for that sample. After this, a Z-score normalization is performed on the normalized read counts across samples for each gene, so that Z-scores are computed on a gene-by-gene (row-by-row) basis by subtracting the mean and then dividing by the standard deviation. Computed Z scores were then used to plot heatmaps in which each row represents one gene. There were no substantive differences between different data sets in the quality [for naïve and GC B cell RNA respectively, 92% and 91-92%, 95% and 94-95%, 94% and 93%, and 87% and 91% for the papers referenced in the main text as (1, 4, 13, and 14, respectively) via the in-house pipeline uniformly applied to all samples. Comparable values, all above 90%, were generated by the Basepair Technologies pipeline and STAR alignment algorithms. Length scoring also was indistinguishable among the different datasets, two of which [the papers cited as (1) and (18) in the main text] were generated by paired-end sequencing. These steps were followed by multifactor differential expression analysis in which deposited datasets as well as "naïve" vs. "GCB" expression data were compared using DESeq2 (20). Effect size shrinkage of differential expression data was normalized using the apeglm method published by Zhu et al. (21). Euclidean distances and Spearman correlation coefficients were derived by applying a standard "dist" function in R to the data described above. Gene set enrichment analyses were carried out using the GSEA program of the Broad Institute (22,23) and gene sets derived from published literature, the Rat Genome database ("RGD"), and the Broad KEGG database. As a technical and data note, the informatics pipeline reported in (1) used voom instead of DESeq2, followed by rankSumTestWithCorrelation instead of GSEA. As expected from the technical literature (19,(24)(25)(26)(27), the results from each pipeline did not materially differ when applied to the GEOdeposited data of (1). For Figure 2D display of count data in a manner that reduces dependence on the variance of the mean for low count data, blinded dispersion estimation using the variance stabilizing transform, VST (28), was conducted on the DESeq2 count data followed by generation of a heat map in which the count data are displayed rather than relative expression.

RESULTS
GC hypoxia was observed with several types of immunization, including with NP-carrier (ovalbumin) (6), but only one paper (4), by Cho, Boothby, et al., had RNA-Seq data for comparisons. However, contemporaneous (2016, 2017) papers with data deposited in GEO had replicate data on naïve and bulk GC B cells (i.e., unfractionated mixtures of all of the diverse types of GC B cell, Figure 1A) in a similar time frame (7-10 d) after immunization with the same immunogen (SRBC) (17,18). We analyzed the sequencer output data for all four papers (1,4,17,18) using the same two parallel pipelines (one assembled in-house, detailed in a technical log appended to this Perspective; a second via Basepair Technologies, Inc for an independent framework).
Several salient observations emerged from these comparisons. Unsupervised clustering with Spearman correlation analyses A B C FIGURE 1 | Quantitative comparisons of the overall RNA-Seq data for naïve and GC B cells in the transgenic and non-transgenic systems. Raw RNA-Seq data for naïve and GC B cells were downloaded from the GEO deposits for the papers cited as (1), "W-S" (17);, "Bu" (4);, "Bo" (18); "Mel", and put through each of two separate analysis pipelines applied uniformly to all data (detailed in the Methods and technical log). (A) A tabulation of the surface markers used to purify naïve (or naïve follicular) and total germinal center B cells in the papers analyzed. (B) Self-organizing map from unsupervised clustering based on Spearman correlation coefficients across the data sets of the papers cited as (1,4,17,18). Darker blue represents strong positive correlation (1.0 = identical across the RNA-Seq data); lightest blues are anti-correlated. (C) Shown here, a PCA plot depicting results across the datasets for the indicated conditions (triangles, naïve; circles, GC B cells) and datasets (color-coded as to the paper linked to each GEO data set according to the Legend) generated using the bespoke analysis pipeline detailed in the Methods and Technical Log (below). X-and Y-axes are defined by PC1 and PC2, accounting for 59% and 27% of the variance across the datasets, respectively. The results match those derived by the fully independent analysis using the default settings in Basepair Technologies' pipeline (not shown).
( Figure 1B), Principal Components analyses ( Figure 1C), and the Euclidean distances among the different types of samples, revealed that the results from (1) differ substantially from the data of (3), (17), and (18). First, the mRNA expression pattern of GC B cells generated after transferring large numbers of B cells [apparently, 10 6 -(2) as cited in (1)] biased toward a single specificity BCR (B1-8 i Vk -/-) into recipient mice with a monomorphic B cell population specific for Ig (AM14-Tg × V k 8R-KI BALB/c mice, i.e., specific for allotype-disparate IgG2a b ) differed substantially from the other samples. A measure of overall differences between naïve and GC B cells was less in (1) (mean Euclidean distances of 126.3 ± 0.69) than the difference between the GC B cells of the transferred B1-8i, Vk-/-cells (1) and the polyclonal GCBC (4, 17, 18) (mean Euclidean distances of 170.3 ± 10.3; p<0.01) as well as those of non-transgenic naïve versus GC B cells (147.6 ± SEM of 3.3; p<0.05). Second, the GC B cells from mice with a normal preimmune repertoire (4,17,18) were substantially more similar to one another in comparing among three independent analyses (correlation coefficients in the range of +0.41 to +0.92) as opposed to anti-correlation of the B1-8i-derived GCBC, (correlation coefficients of -0.05 to -0.32). Although each of the independent data sets with the non-transgenic C57Bl/6 (B6) system differed somewhatinternally and from each other -the RNA-Seq "fingerprints" of naïve B cells in the polyclonal system were quite similar. Thus, the data were replicable when comparing independent SRBC immunizations of mice with no restrictions of the repertoire or BCR. In contrast, even the naïve B1-8 i Vk-/-B cells were quite distinct from the clusters of polyclonal naïve B cells (Figures 1B, C). Relative to the data from (18) as well as (4), the changes were more modest in (1). The similarities and differences among independent data sets can also be parsed by heat maps of differentially expressed genes among the data sets, analyzing either the total set thereof or highlighting specific genes that are known to encode major determinants of B cell positioning, signal initiation during activation, of differentiation into GC B or plasma cells (Figures  2A, B). In practice, the collective data ( Figures 1B, C; 2B) indicate that the GCBC generated after adoptive transfers of B cells with a restricted repertoire into mice whose endogenous B cells will contribute little to the GC are qualitatively distinct from a polyclonal response that evolved from a polyclonal repertoire.
Turning to the question, are there hypoxia-related gene signatures in GC B cells when taking into account other work with a polyclonal repertoire, GSEA algorithms were applied using several different gene sets for hypoxia ( Figure 2C). A further facet of what the analyses revealed is that with some hypoxia modules, even the RNA-Seq data of (1) show significant enrichment in GSEA ( Figure 2C). These increases can also be appreciated in data with the actual counts when displaying all naïve and GC B cell data ( Figure 2D). Moreover, RNA-Seq data with GC from NP-CGG-immunized mice with a normal preimmune landscape (29) also show enrichment for a functional hypoxia signature ( Figure 2C, data from the Pernis lab), and divergence from the B1-8i data. Metabolism rather than hypoxia was the central point of (1), a part of which was indirect evidence about glycolytic rates -perhaps on the view that if HIF were stabilized, expression of glycolytic genes should be increased. Prior work had provided evidence that the "glycolysis", "mitochondrial respiration/oxidation" and "FAO" transcriptomes were increased in GCBC when compared to naïve B cells (4). Although this analysis was not evident in the Figure presented in (1) using the B1-8i system, a GSEA comparing naïve and GCBC with the RNA-Seq data in (1) found significant increases in glycolysis-related gene expression ( Figure 2C), which also applied to each of the other data sets. Indeed, expression of genes encoding enzymes along the glycolysis pathway increased in GC B cell data sets of (1) as well as the polyclonal B6 GC B cells when compared to naïve B cells ( Figure 2D).

DISCUSSION
The data presented here underscore that what is measured for an overall population of GC B cells is influenced by the structure of the experiment and the inputs analyzed. Key findings are that the cells analyzed in (1) were distinct from and changed less from naïve to GC phenotype than what is found in reproducible data with B cells that derived from a normal polyclonal repertoire (1,17,18). B1-8i, Vk-/-B cells started out with substantial differences in activation markers such as Nr4a1 and differentiation-related genes, i.e., Prdm1 and Irf4. Of note, the differences do not relate to the particular informatic tools but instead to the selection of inputs. Thus, the pipelines used here yield similar results to those of (1) if, and only if, the analyses are restricted to those data sets and only use the selections in that work. Several factors may be involved in the differences between GC B cells in (1) from those in analyses published several years previously (4,17,18). Akin to findings with T cells (30), the experiments in (1) may involve the intra-clonal competition shown to result from using hundreds of thousands of transferred antigen receptor-transgenic cells (31, 32) -as already documented for B1-8i (31). In addition, the metabolic environments of highly diverse GC may differ from the secondary follicles that ensued with the transfer system used in (1). These factors warrant investigation, but the differences emphasize why it is vital that stated conclusions be restricted by limitations of the approachnot least when the approach is quite artificial.
GC are quite heterogeneous (33,34). This point suggests that caution is warranted from the outset, militating against framing conclusions as blanket generalizations to apply across all GC. In terms of hypoxia or HIF gene signatures, prior work reported (4) that as many as 20% of splenic GC had no signal of intravital hypoxia, a variegation observed in parallel by others (J. Jellusova, personal communication). An integrative possibility is that hypoxia and/or its influence are reduced in GC designed to minimize bystander B cell involvement and dominated by a single specificity, as in (1)(2)(3). One technical issue is that there is no validated universal 'hypoxia' or 'HIF' module for activated lymphocytes, let alone a unitary module that includes GC B cells. Hypoxia and HIF responses are protean, but three decades of A B C D FIGURE 2 | Comparisons of condition-dependent differential gene expression. As in Figure 1, GEO data deposits were downloaded for the papers cited as (1), "W-S" (17);, "Bu" (4);, "Bo" (18); "Mel" and processed by trimming, alignment, generation of normalized counts (FPKM), and quantification of differential expression using each of two independent pipelines (i.e., both bespoke and Basepair Technologies'). (A, B) Heat maps generated in the Basepair Technologies platform and derived from differential expression analyses comparing the groups "SRBC-immunized B6 mice" versus the BALB/c B1-8i, Vk -/-system are shown for (A) naïve and (B) GC B cells. In each panel, all differentially expressed genes (16% and 14% of totals, respectively, i.e., 3214/22,843 and 3461/21,327 genes differentially expressed >2fold with p-adjusted < 0.05) are displayed. To the right, a subset of B lineage-specific or other genes functionally relevant in GC biology are shown. In these heat maps, entries that are dark red are upregulated and those that are blue are downregulated. Since the rows (genes) are Z-Score scaled, the maps report differences in expression of single genes across the samples. (C) Gene Set Enrichment Analyses (GSEA) were performed using the Broad algorithm as detailed in the Methods log. Analyses were performed both with data derived by the bespoke pipeline (shown here) and with the expression data exported from the Basepair pipeline, which yielded congruent results to those shown here. Inset values show the normalized enrichment score (NES) and p value after compensation for multiple comparisons (FWER), as indicated. Each panel displays results for a separate analysis in which ranked differential expression data were processed using the Broad Institute's GSEA software. The three columns represent outputs obtained using the indicated gene signatures: Hypoxia Signature as established by Nanostring Technologies ® , the hypoxia gene signature of Eustace et al. [as in references (4,12)], and the Broad Institute's Hallmark Glycolysis gene set. The rows of panels identify the RNA-Seq data sets that were analyzed in comparisons of GC to naïve B cells: (i) data from like samples in (4,17,18); (ii) data from (1); and (iii) data from (29), all of which were processed through the same pipeline with same parameters. (D) A heatmap of the blinded variance-stabilizing transformed (VST) count data for selected genes encoding glycolytic enzymes or hypoxia-related genes in all samples [naïve and GC B cells of (1,4,17,18)], benchmarked against two genes (Aicda; S1pr2) known to be highly expressed in GC B cells. Color coding from lowest (darker purple) counts to highest (deepest peach) is as indicated. Columns position (placement and order) was the computational result of self-organizing mapping, with expected separationi of naïve from GC B cells. Citations below the heat map indicate the publication sourcing of each column's data. papers address cell type-specific functionality of a transcription factor. Thus, some restraint may be warranted before concluding that a GC transcriptome is not commensurate with hypoxia or conflating HIF with a particular magnitude of increased expression of genes encoding the enzymes of glycolysis or glycolytic flux. The issues connected to these questions take on added currency in considering antibody diversification during the persistent hypoxemia of many patients with severe CoVID-2019 infection. Thus, a great challenge will be to assess if human GC (in which questions about hypoxia, HIF, and in situ metabolism) exhibit effects such as those uncovered in mice (4-6) -either in normoxemic people or during concurrent hypoxemia.
Of course, such studies may rely heavily on scRNA-Seq and informatics analysesconflicting or notcannot settle secondary issues such as glucose uptake and utilization. Translation efficiency, post-translational modifications, and complex but unknown aspects of nutrient supplies, substrate concentrations, and substrate competition along webs of interconnected pathways all are downstream from the mRNA in question. That simply means that direct rigorous biochemical assays of glucose uptake per cell [e.g., 3 H-2-deoxyglucose (DG), since 2-NBDG is known not necessarily to correlate with glucose entry rates (35)] and of glucose oxidation are needed before stating or accepting broad conclusions about germinal center B cells based on a single or special case as in (1). In this respect, normalizing a measure such as uptake to cell sizewhich is distinct from suitable compensation for autofluorescence in flow cytometry -lacks rigor. Normalization to size is analogous to concluding that a 160 kg person eating 4400 kCal/d has the same energy intake as one who is 80 kg and eats 2200 kCal/d, or claiming that a truck does not consume more fuel than a compact sedan if the sizenormalized fuel consumption is the same. With respect to conclusions based solely on informatics, it is proposed that a reasonable standard is to (a) compare new RNA-Seq data sets to independent entries in GEO, and (b) quantify the correlation toor difference from -those that are in GEO and readily comparable (i.e., not micro-array to RNA-Seq). Of essential importance, some restraint in statement of conclusions, along with openness about limitations, is vital at a time when the societal landscape is roiled by consequences of how scientists frame their work or evidence.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary Material.

AUTHOR CONTRIBUTIONS
MB wrote the text and guided analyses along with getting presubmission input from three independent faculty/PIs. AR led the primary informatic analyses, prepared figures, drafted portions of text, and edited the overall text. SC, KS, and JL performed analyses and provided insights into the RNA-Seq data and gene sets in the papers meta-analyzed. VHH provided input and guidance to SHC and MB; SHC provided input and guidance to KS and MB. All authors contributed to the article and approved the submitted version.