Test of Arabidopsis Space Transcriptome: A Discovery Environment to Explore Multiple Plant Biology Spaceflight Experiments

Barker, Richard; Lombardino, Jonathan; Rasmussen, Kai; Gilroy, Simon

doi:10.3389/fpls.2020.00147

METHODS article

Front. Plant Sci., 04 March 2020

Sec. Plant Abiotic Stress

Volume 11 - 2020 | https://doi.org/10.3389/fpls.2020.00147

This article is part of the Research TopicHigher Plants, Algae and Cyanobacteria in Space EnvironmentsView all 22 articles

Test of Arabidopsis Space Transcriptome: A Discovery Environment to Explore Multiple Plant Biology Spaceflight Experiments

Richard Barker¹

Jonathan Lombardino^1,2

Kai Rasmussen¹

Simon Gilroy^1*

¹Department of Botany, University of Wisconsin, Madison, WI, United States
²Microbiology Doctoral Training Program, University of Wisconsin, Madison, WI, United States

Recent advances in the routine access to space along with increasing opportunities to perform plant growth experiments on board the International Space Station have led to an ever-increasing body of transcriptomic, proteomic, and epigenomic data from plants experiencing spaceflight. These datasets hold great promise to help understand how plant biology reacts to this unique environment. However, analyses that mine across such expanses of data are often complex to implement, being impeded by the sheer number of potential comparisons that are possible. Complexities in how the output of these multiple parallel analyses can be presented to the researcher in an accessible and intuitive form provides further barriers to such research. Recent developments in computational systems biology have led to rapid advances in interactive data visualization environments designed to perform just such tasks. However, to date none of these tools have been tailored to the analysis of the broad-ranging plant biology spaceflight data. We have therefore developed the Test Of Arabidopsis Space Transcriptome (TOAST) database (https://astrobiology.botany.wisc.edu/astrobotany-toast) to address this gap in our capabilities. TOAST is a relational database that uses the Qlik database management software to link plant biology, spaceflight-related omics datasets, and their associated metadata. This environment helps visualize relationships across multiple levels of experiments in an easy to use gene-centric platform. TOAST draws on data from The US National Aeronautics and Space Administration’s (NASA’s) GeneLab and other data repositories and also connects results to a suite of web-based analytical tools to facilitate further investigation of responses to spaceflight and related stresses. The TOAST graphical user interface allows for quick comparisons between plant spaceflight experiments using real-time, gene-specific queries, or by using functional gene ontology, Kyoto Encyclopedia of Genes and Genomes pathway, or other filtering systems to explore genetic networks of interest. Testing of the database shows that TOAST confirms patterns of gene expression already highlighted in the literature, such as revealing the modulation of oxidative stress-related responses across multiple plant spaceflight experiments. However, this data exploration environment can also drive new insights into patterns of spaceflight responsive gene expression. For example, TOAST analyses highlight changes to mitochondrial function as likely shared responses in many plant spaceflight experiments.

Introduction

As a possible integral feature of life support systems, plants offer the potential to provide food, replenish the air, filter water, and improve the mental health of the crew during long-duration missions in space. Therefore, at a practical level, plants are being intensively studied to assess their ability to adequately fulfill these roles in the spaceflight environment [reviewed in (Wheeler, 2017)]. In addition, a growing number of plant spaceflight studies are addressing the quest for fundamental knowledge about how plant biology operates. Thus, the spaceflight environment provides conditions that are inaccessible on Earth, such as growth in microgravity and exposure to cosmic radiation, providing a unique opportunity to dissect responses under conditions that plant biology has not encountered during its evolutionary history [reviewed in (Vandenbrink and Kiss, 2016; Paul et al., 2013a)].

These studies are now generating extensive characterizations of the responses of diverse plant species to spaceflight. As part of the output from this research, there is an ever-increasing set of genome-scale analyses that range from transcriptomics [e.g., (Kwon et al., 2015; Johnson et al., 2017; Paul et al., 2017; Choi et al., 2019; Herranz et al., 2019; Vandenbrink et al., 2019)] and proteomics [e.g., (Mazars et al., 2014; Ferl et al., 2015; Basu et al., 2017)] to epigenomics [e.g., (Zhou et al., 2019)]. These datasets help catalog the plant response to growing in space. For example, the omics database maintained by The National Aeronautics and Space Administration’s (NASA’s) GeneLab program (GeneLab, 2019) contains, at the time of writing, data from over 200 spaceflight-related experiments, with about 20 plant-focused studies, mainly from research conducted using the Space Shuttle and International Space Station. In addition, similar spaceflight and related data from e.g., the Japanese, Chinese, and European space agencies have been deposited in a range of other publicly accessible data repositories such as NCBI GEO (Barrett et al., 2013) and the European CATdb (Gagnot et al., 2008). Each experiment has multiple spaceflight samples and often compares the responses of wild-type and mutants in spaceflight to parallel ground-based controls performed on the Earth. Research groups have then mined, e.g., the patterns of transcriptional change seen in individual experiments to reveal potential underlying plant responses to spaceflight. Thus, changes in the expression of heat shock proteins [e.g., (Zupanska et al., 2013; Johnson et al., 2017; Li et al., 2017; Choi et al., 2019)], cell wall peroxidases [e.g., (Correll et al., 2013; Kwon et al., 2015; Zhang et al., 2015; Johnson et al., 2017; Choi et al., 2019)], and a general response to oxidative stress [e.g., (Sugimoto et al., 2014; Choi et al., 2019)] have all emerged as response signatures identified in some, but not all, plant spaceflight transcriptomes. However, the scale of the available data now poses challenges when making such comparisons between diverse experiments. Thus, (1) the datasets are distributed across multiple repositories, posing potential issues with accessibility and interoperability, (2) the bioinformatics-based analytical approaches used between published studies are often very different, making robust comparisons of differences drawn from the literature challenging, (3) the sheer scale of the data makes it hard to perform more than a few comparisons between experiments before its volume becomes limiting, and (4) it is often difficult to present these kinds of broad-scale comparative analyses in a visually accessible, intuitive manner for use by a broad scientific audience.

To address these challenges, we have developed the Test Of Arabidopsis Space Transcriptome (TOAST; a compilation of the abbreviations and terms used throughout, along with a brief definition of each is presented as a glossary in the Appendix) database. TOAST uses a database management software called Qlik (Qlik Technologies Inc., King of Prussia, PA, USA) to aggregate and visualize plant spaceflight omics-level data from multiple repositories. It applies a uniform set of analytical steps to the data and makes visualization of massive datasets accessible, allowing for interactive comparisons between experiments. The database also provides links to experiment metadata and a suite of online tools to enhance the scope of potential further analysis. In this publication we present an overview of the TOAST database and provide examples of how it can both validate previously published inferences as to likely spaceflight-imposed stress responses and mine across the plant spaceflight transcriptomics data to facilitate the generation of new hypotheses.

A Broad Set of Available Data Underlies the TOAST Data Exploration Environment

As a first step toward designing a comprehensive tool for the analysis of plant spaceflight omics-level data, we categorized the breadth of data available to support such an exploration environment. As most studies have generated transcriptomics data, we have focused on these datasets, although the TOAST database also includes the currently few available proteomic (Mazars et al., 2014; Ferl et al., 2015; Basu et al., 2017) and epigenomic (Zhou et al., 2019) plant spaceflight datasets. NASA's GeneLab program maintains a publicly accessible data repository that brings together a large amount of such genome-scale spaceflight data (GeneLab, 2019). Although the GeneLab site has the highest density of these kinds of spaceflight-related datasets, the global spaceflight research community has deposited a large amount of data generated by similar genome-scale experiments in other data repositories such as NCBI-GEO (Barrett et al., 2013) and CATdb (Gagnot et al., 2008). Figure 1 presents an analysis of the spectrum of plant species and experimentation available for incorporation into a plant-focused data exploration environment (see Supplementary Table 1 for the source list of the plant biology data repositories). The most highly researched plant is Arabidopsis thaliana, being the predominant plant model for molecular analysis, with the Col-0 ecotype most frequently chosen for spaceflight experimentation. Rice [Oriza sativa; (Jin et al., 2015)], mizuna [Brassica rapa; (Sugimoto et al., 2014)], and the fern Ceratopteris richardii (Salmi and Roux, 2008) have also been the focus of similar molecular analysis. Figure 1A breaks down the available data into species and genotype versus analytical approach (e.g., microarray or RNAseq technologies), showing that the majority of the available data has been generated using Affymetrix microarrays or Illumina-based RNAseq to monitor patterns of gene expression. Figure 1B further shows that although the predominant sample analyzed in multiple plant spaceflight experiments is the whole seedling, data is available from several experiments using cell cultures and from individual organs dissected after the plants had been grown in space.

FIGURE 1

Figure 1 Publicly available spaceflight transcriptomics datasets. (A) Relationships between species, ecotype, genotype (i.e., mutant or wild type) growth environment and assay technique for datasets from plants experiencing spaceflight. (B) Relationships between species, ecotypeand genotype versus the tissue or organ type that was sampled to generate the tspaceflight-related transcriptomics dataset. Col-0, Columbia ecotype of Arabidopsis thaliana; Ws, Wassilewskija ecotype; Ler, Landsberg erecta ecotype; Cvi, Cape Verdi Islands ecotype; mutants of Arabidopsis: phyD, Phytochrome D; arg, Altered Response to Gravity; act2, Actin 2; phyD, phytochrome D; hsfa2, heat shock factor a2, Wt, wild type; BRIC, Biological Research in Canister; ABRS, Advanced Biological Research System; EMCS, European Modular Cultivation System.

This survey of the available plant biology spaceflight-related data suggested to us that there is a strong base of publicly accessible, genome-level datasets with which to populate a database designed to help visualize and compare between plant spaceflight experiments. For example, there are multiple experiments using similar species and analyzing similar tissues; transcriptomics data for Arabidopsis is particularly extensive. We set a minimum criterion for inclusion in the initial iteration of TOAST to be studies where statistically rigorous analyses can be applied. This approach means datasets are required to contain three or more biological replicates and at present only spaceflight experiments on Arabidopsis and rice fulfill this requirement. We have therefore imported all of the available, replicated Arabidopsis and rice plant spaceflight datasets into the TOAST database. In addition, we have added a series of ground-based datasets addressing spaceflight-related factors, such as effects of increased radiation or exposure to oxidative stress on Earth as the foundation with which to build the TOAST exploration environment (these datasets are summarized in Supplementary Table 1).

TOAST Design Philosophy and Data Structure

As noted above, the underlying software engine behind TOAST is the Qlik Associative Engine (Qlik Technologies Inc., King of Prussia, PA, USA). We chose to use Qlik as it not only provides the tools to develop and administer the underlying relational database but also allows the user to readily see what other information in the database is associated with their current query via a software feature built into QLIK named the Qlik Associative Data Engine. In addition, Qlik integrates graphic visualization packages that allow intuitive, interactive exploration and analysis of the data. Such tools help ensure the data will be more readily accessible not only to plant space biology researchers and bioinformaticians but also to a much broader community, including non-specialists and students.

Data was therefore imported into a Qlik-managed database (Supplementary Table 1) to generate the associative database outlined in Figure 2 that forms the foundation of TOAST functionality. However, the various data sources use a variety of indices for gene identification that range from Affymetrix microarray probe name (i.e., the Affymetrix microarray technology's specific technical name for the DNA probe used to identify a particular gene) to Arabidopsis Genome Initiative (AGI) locus codes [i.e., unique gene identifiers assigned by the consortium of researchers forming the Arabidopsis Genome Initiative; (Kaul et al., 2000)]. We therefore first re-indexed all the datasets to use Entrez gene identifiers (Maglott et al., 2011). Entrez is the National Center for Biotechnology Information (NCBI)'s database for gene-specific information and it assigns gene identifiers, or codes, that uniquely identify a particular gene. The advantage of these identifiers in tracing a gene from one dataset to another is that they form a uniform, well-curated indexing system specifically developed to be applied across all organisms. Entrez gene names uniquely identify individual genes and importantly, the system has been developed to expand as new genes are identified. Thus, re-indexing the gene identifiers in TOAST from the varied standards used in the imported datasets to their Entrez identifiers served several purposes: (1) it allows for comparisons within the TOAST database via a uniform labeling system, (2) it facilitates data exchange with other databases and analytical tools, anchoring the data to the global Entrez standard, and (3) it builds scalability into the database architecture as Entrez identifiers are designed to provide a standard for indexing all current and future documented genes.

FIGURE 2

Figure 2 Database structure underlying TOAST 4.5. Each dataset within TOAST includes a series of pre-computed factors for each gene: minimally including fold-change, P-value, Q-value, and a yes/no value for whether the fold-change for each gene is significant at P < 0.05. These pre-computed values greatly speed the real-time processing of interactive visualizations within the TOAST user interface. The identifiers in the raw data, such as Transcript ID from RNAseq, Probe ID for Microarray, or TAIR ID are translated to their unique Entrez and Ensembl IDs to allow for uniform indexing within TOAST itself and to facilitate passing of analyzed data produced by TOAST analyses to exterior sites and tools. Within TOAST, the strings of molecular ID's from a dataset are both directly transferred to a series of data visualization and exploration tools and are imported into a series of analytical packages accessing a range of databases that have been imported into the TOAST environment. These databases include: the Genome Ontology (GO) consortium databases that allow analysis of the relationships between gene lists of interest and known biological processes, the SUBA4 database which catalogs predicted subcellular locales for each gene, the Kyoto Encyclopedia of Genes and Genomes (KEGG) database that analyzes relationships to known cellular pathways, and Ensembl's Orthologous Matrix database, allowing TOAST to make comparisons between species. The outputs of these analytical modules are then passed to TOAST's interactive data visualization tools to help explore each dataset. Results from the visualizations are in turn returned as lists of Gene IDs to allow for reiterative analyses.

If the original authors' data structure matched that of our database model (minimally, fold-change, P-value, Bonferroni corrected Q-value), we imported their analysis that incorporated their statistical models for calculating P- and Q-values. However, some of the publicly available microarray dataset analyses lacked some of these minimum requirements (most typically missing a statistical analysis of significance of reported changes) and so had to be reprocessed for incorporation into TOAST. For this reanalysis (exclusively Affymetrix ATH1 microarray data), we used R-studio codes provided courtesy of NASA's GeneLab. R is a programming language widely used in the statistical analysis of scientific data (https://www.r-project.org/about.html) and R-studio is commercially produced software that aids with the development of programs using R (R-Studio Inc. Boston, MA, USA). These R-studio codes were customized for each experimental design to provide the required fields of fold-change, P-, and Bonferroni corrected Q-values using Robust Multichip Average (RMA) quantile normalization (Irizarry et al., 2003), a technique that accounts for variation across multiple microarray chips used in these analyses. These codes can be found at https://github.com/dr-richard-barker/NASA-GeneLab-MicroArray-Codes. Further analysis was then performed on the imported data from Arabidopsis Affymetrix and CATMA microarrays and rice Affymetrix microarrays, converting probeIDs to Entrez gene identifiers. RNAseq data was reprocessed by importing the raw FASTQ files (i.e., the files containing the nucleotide sequences identified by the sequencing machine) into CyVerse [the cloud computing infrastructure supported through the National Science Foundation's Directorate of Biological Sciences; (Merchant et al., 2016)] and then analyzed using a series of software steps (analysis pipeline) of: HiSAT to first generate BAM files from the FASTQ files. BAM, or Binary compressed sequence Alignment Map files contain information on the alignment of each read from the sequencing machine to the genome. BAM files were then processed by the BAMtoCounts software package to create a counts matrix that holds the number of reads that have mapped to a particular transcript. Finally differential expression analysis for each transcript was calculated using the DESeq/EdgeR approach (Love et al., 2014) as part of the iDEP R-Shiny application. iDEP is a software package for the R programming language designed to process genetic data. iDEP uses R-Shiny, a further R software package that allows for easy development of interactive web-based applications (Ge et al., 2018). Fragments per kilobase of transcript per million mapped reads (FPKM) and counts per million reads mapped (CPM) were calculated as described in Choi et al. (2019).

We used the TAIR10 annotation that describes genetic loci within the Arabidopsis thaliana genome sequence (Lamesch et al., 2012) and the associated genes were linked to Gene Ontology (GO) molecular function and biological processes databases that catalog the annotated functions and processes linked with each genetic locus. These GO descriptions allow testing of whether genes annotated as being associated with specific molecular functions or biological processes are over represented in a particular dataset (Ashburner et al., 2000; Carbon et al., 2019). Consensus sub-cellular location predictions were imported from the SUBA4 subcellular locale database (Hooper et al., 2017). SUBA4 uses multiple weighted lines of empirical evidence for protein localization in addition to aggregating subcellular targeting predictions from >20 programs, providing a broad-scale survey of likely subcellular association for the protein product of each transcript. As these databases use a variety of gene identifiers for their indexing, a table was developed within the Qlik database to translate between these various identifiers and the Entrez indexing within TOAST. This matrix linked the TAIR AGI with the associated Affymetrix microarray Probe IDs, RNAseq transcript IDs, along with associated Ensembl ID [imported from the Ensemble BioMart plant database; (Zerbino et al., 2018)], Entrez ID and if available, the Gene Symbols (i.e., the commonly used gene name). Rice cell culture microarray results from the Shenzhou 8 mission (Kindly provided by Dr. Peipei Xu, Shanghai Institute of Plant Physiology and Ecology and Dr. Weiming Cai, Chinese Academy of Sciences) were also integrated into TOAST. To allow comparison to the Arabidopsis data, we adopted an orthologous matrix (OM) database-driven approach. Thus, the Ensembl genome database project (Kersey et al., 2018) has developed software to analyze the structure of the genomes of different organisms to identify genes between species that originated from a common ancestral gene prior to speciation [i.e., orthologous genes; (Altenhoff et al., 2018)], allowing researchers to ask if, e.g., transcriptome responses between species reflect similar patterns of classes of gene expression. To allow such comparisons within the TOAST database, we needed to be able to translate rice microarray probe IDs to orthologous Arabidopsis gene identifiers. We therefore imported the OM table from Ensembl, i.e., the table that links the rice and Arabidopsis orthologs through their Ensembl gene IDs. We then linked the rice microarray probe IDs provided in the imported rice datasets within this table to their corresponding Ensemble IDs, allowing mapping between the Arabidopsis and rice orthologs.

The TOAST User Interface

Figure 3 shows the web interface for TOAST, which launches as an overview menu of dashboard icons. Clicking on the first few dashboards links to introductory materials about space or Arabidopsis research providing an entry point for the non-specialist. Most of the remaining icons are links to datasets from individual experiments. The design of each icon gives quick visual information on the nature of the experiment (spacecraft, plant type, hardware, assay type) and clicking on the icon opens the particular dataset. The final set of icons represent links to online tools that can be used to further analyze the results emerging from using the TOAST database. The linked tools are summarized in Supplementary Table 2.

FIGURE 3

Figure 3 The TOAST 4.5 user interface. (A) The web interface for TOAST launches an overview menu of dashboard icons allowing the user to directly access the introductory materials, omics data, and related analysis tools. (B) Each icon provides a visual summary of the data or tools that it links to including elements such as spaceflight vehicle (e.g., Shuttle, ISS, Shenzou vs ground-based experimentation), the growth hardware used, plant/seedling vs cell culture experiment, RNAseq vs microarray vs proteomics, species and ecotype and dataset identifier (e.g., GLDS number).

Within TOAST, each dataset is presented to the user as an interactive dashboard with Log10-fold change and measure of statistical significance (P-value) provided for each locus as shown in Figure 4. We have used P- rather than Q-value as our initial metric of significance to provide as broad an overview of significantly differentially expressed genes as possible. Q-values are corrected P-values that take into account the cumulative errors that occur when making multiple tests of significance within a large dataset. The Q-value is available in the downloadable data tables (see below), allowing users to apply this more stringent statistical metric as needed. Volcano plots (plots of fold-change versus statistical significance of that change for each gene ID) were chosen as the main way to visualize both the statistical analysis and the degree of gene induction or repression. A side table displays the gene identifier, the gene symbol, the fold-change and P-value and Q-value for each locus. The user can toggle on and off the P-value statistical significance filters on the volcano plot to rapidly assess the strength of the inferences to be drawn from the results that they are visualizing. All of these data can also be downloaded and used with a range of other databases that are linked in the TOAST overview menu (Figure 3). TOAST 4.5 includes a GO database (Ashburner et al., 2000; Carbon et al., 2019) that provides real time feedback on the ontology of the subsets of genes selected. Tabs above the interactive bar charts allow access to four main types of annotation: GO Molecular function (16,504 categories), GO Cellular component (15,383 categories), GO Biological process (15,644 categories), and Kyoto Encyclopedia of Genes and Genomes pathways [KEGG; (Kanehisa et al., 2017)]. KEGG is a widely used database that categorizes genes into the cellular pathways in which they are involved. In addition, gene selections can be interactively compared against the AGRIS transcription factor database [1,851 loci; (Palaniswamy et al., 2006)], the TAIR10 microRNA database (Lamesch et al., 2012), or be filtered using a selection of over 60 manually curated gene families. A further selection allows comparison to known sites of spaceflight-induced epigenetic modification (Zhou et al., 2019). These filters are applied using a drop-down menu. NCBI PubMed links to any associated publication are also embedded alongside these data analysis tools to provide the critical context of the original published experimental descriptions and analyses (a summary of literature linked within TOAST is shown in Supplementary Table 1).

FIGURE 4

Figure 4 Graphical user interface for typical dataset. Clicking on a volcano plot also activates an interactive graphical tool for manual selection of groups of genes of interest. *Defaults to showing 33.43K, i.e., all Entrez identifiers, until a filter or gene selection is applied. Inset, a lasso tool allows user selection of data points from volcano plot in addition to activation of filters such as on significance of change, KEGG Pathway, or GO annotation.

These data exploration features were implemented using D3 JavaScript software libraries executed within the Qlik environment, connecting the spaceflight data, its pre-computed statistics (Figure 2) and information on functional ontology. This system architecture facilitates user interaction with massive amounts of data in real time. Thus, as shown in Figure 5, the user selects a dashboard containing their initial dataset of interest. The software then allows them to interactively select genes or groups of genes either manually from the volcano plots, by filtering using gene ontology terms, or via a text-based interface as described above. As the user explores the data, they can apply further rounds of filtering and/or manual selection of groups of genes. These stacked filters spawn to all other datasets such that opening another dashboard of information on another experiment will show the equivalently filtered results. Further filtering of these newly opened data will, in turn, filter back on the original and all other datasets. This reiterative filtering approach allows the user to focus on an ever smaller number of genes selected by comparisons across multiple experiments. These results can be exported as a spreadsheet and/or passed to other web-based analytical sites linked within the TOAST interface.

FIGURE 5

Figure 5 Overview of use of the TOAST 4.5 database. (1a) The user selects an initial study of interest and then can review the summary of its metadata to ensure it is the correct focus for study (1b). The dataset is then opened and (2) when the study is selected an interactive dashboard launches and the user has a direct link to any associated manuscript. Gene filtering: statistical (3a), gene ontology (3b), and other related functional filters can be applied to focus the number of loci being visualized in the volcano plot (3c) to genes of interest. In addition, the volcano plot itself can be interactively manually filtered using a graphical selection tool. All filters can be toggled on and off using selectable tabs at the top of the interface (3d). If an interesting subset of loci are selected the user can activate the download option (4a) and save the related data in word or xml format (4b). (5) The user can also perform further bioinformatic and statistical analysis with other online tools linked from the main user interface.

TOAST Metadata App

A custom metadata app is also incorporated as a tool for use with TOAST (https://astrobiology.botany.wisc.edu/astrobotany-toast/tutorial-metadata). This additional relational database provides data visualization tools that use the metadata associated with each dataset to find associations in factors such as experimental design parameters, hardware, and features of the spaceflight mission between different experiments in TOAST 4.5. GeneLab provides a rich array of metadata associated with its datasets. Most of the other non-GeneLab datasets incorporated into TOAST do not provide these kinds of metadata summaries and so we manually curated both the GeneLab and non-GeneLab datasets within TOAST 4.5 to provide equivalent metadata for all. These experiment-related factors are presented in Supplementary Table 3 and drive the visualizations presented in the metadata app. Figure 6A shows the main dashboard for the metadata app. Clicking on an icon launches the associated dashboard where interactive visualizations can, in turn, filter on the range of factors that are presented (Figure 6B). This architecture allows the user to explore commonalities in the available plant biology data in TOAST 4.5 ranging from lighting conditions, hardware or plant age at time of assay to analytical approach and even PI of the group performing the experiment (See Supplementary Table 3 for comprehensive list of factors). Within the app there are several places where “ROS meta-analysis variable” appears on the visualization. This description is used to denote that the data comes from a published meta-analysis of many publicly available microarray experiments related to responses to reactive oxygen species (ROS) called “The ROS-wheel” (Willems et al., 2016). Thus, for this particular comparative dataset there is not a single value of, e.g., for light level or plant age (as it is an aggregation of many individual experiments).

FIGURE 6

Figure 6 Analysis of metadata within the TOAST 4.5. (A) Initial dashboards allow access to comparisons between a range of experiment-related factors such as lighting conditions, growth environment, and plant genotypes. (B) A typical dashboard for metadata exploration, in this case for light conditions and age of seedling. Preset filters for e.g., lab group performing the research and growth and radiation environments are available to the user and the identity of the filtered datasets is shown in the bottom left window.

Overview of the Plant RNAseq and Microarray Data Within TOAST 4.5

For the RNAseq data in TOAST, 42,220 transcript IDs are assigned to one of 37,019 distinct TAIR10 gene models. However, only 33,550 transcripts were detected within the data imported into TOAST 4.5 as being expressed either on Earth or during spaceflight. For microarrays, TOAST 4.5 links data gathered from 22,810 Arabidopsis Affymetrix probes IDs, 7,370 CATMA probe IDs, and 75,070 rice probe IDs. For Arabidopsis, the 42,220 Entrez loci ID's are associated with 13,750 detected proteins and, if it has been assigned, to one of the 25,270 Gene Symbols [drawn from the TAIR and ATTED II (Obayashi et al., 2018) gene databases combined]. For rice, 75,000 Affymetrix probe IDs are linked to the Arabidopsis Ensembl ID as described above. Note, in some microarrays a subset of the probes used have the potential for cross-hybridization and so to report on multiple gene responses. Similarly, many microarrays have redundant probes for each gene (e.g., in addition to gene unique probes, the ATH1 microarray also has 309 probes that redundantly monitor 148 genes). Where we have imported the original authors' analyses, we have used their approach to identifying and filtering these effects. When we had to reanalyze a dataset to conform to our requirements of presenting fold-change and P- and Q-values, then where a probe was identified as showing potential cross-hybridization effects, we have assigned a gene ID with both gene identifiers. Thus, e.g., a data point derived from a potentially cross-hybridizing probe represented on a volcano plot of fold-change versus significance would simultaneously show both gene IDs. For redundant probes, it is known that these often do not agree on expression levels (Cui and Loraine, 2009). This is likely due to the fact that each probe hybridizes to a different point on the gene and so effects such as differential splicing of that gene will cause probes to behave differently in the gene expression analysis. To be as inclusive as possible, the maximum value amongst each redundant probeset was therefore used.

Most of the microarray data within the TOAST database is associated with the Affymetrix ATH1 chip, with Illumina-based RNAseq being the second most regularly used approach. For these experiments, Arabidopsis seedlings were grown under a range of growth hardware and lighting conditions. Experiments in the Biological Research in Canisters (BRICs) produced dark-grown samples in cassettes (Petri dish fixation units, PDFUs) that are sealed prior to launch [e.g., (Kwon et al., 2015; Basu et al., 2017; Johnson et al., 2017; Zupanska et al., 2017; Choi et al., 2019)]. Light-grown material was produced in the European Modular Cultivation System [EMCS, with variable RGB lighting and atmospheric and temperature control, e.g., (Correll et al., 2013; Herranz et al., 2019; Vandenbrink et al., 2019)], in SIMBOX [Science in Microgravity Box, LED lighting, e.g., (Fengler et al., 2015)], and in Petri dishes under 24 h LED light in the Veggie hardware [e.g., (Beisel et al., 2019)] or in the Advanced Biological Research System [ABRS, LED lighting; (Paul et al., 2013b)]. Both the EMCS and SIMBOX have a centrifuge, providing the capability for an extremely informative on-orbit 1 x g control [and for investigating other fractional g environment, e.g., (Correll et al., 2013; Fengler et al., 2015)]. The WS ecotype has been grown in the BRIC, ABRS, Veggie, and in Petri dishes attached to the ISS cabin wall (in both dim diffuse light and in total darkness). Thus, as data from a wide range of experiments has been imported into TOAST, it is important to assess the likely impact of features such as hardware, tissue samples, and seedling age when making comparisons between datasets. For example, differences in atmospheric control and lighting may have important influences on plant responses. Thus, plants grown in the BRIC (darkness, sealed system) might show altered hypoxic response when compared to those in the EMCS (lighting and atmospheric control). Careful attention to the parallel ground controls and, if available, on-orbit centrifuge data are critical to helping understand the extent of such effects. In addition to hardware and growth environment, some specific data features may also impact user analyses. For example, the Ler-0 ecotype was grown in both the EMCS and BRIC but the fact that different microarray technologies [Agilent vs Affymetrix; (Correll et al., 2013; Johnson et al., 2017)] were used in each study needs to be taken into account when making comparisons. This is because results from these different measurement approaches, even when applied to replicate samples have been reported to differ in some cases [e.g., (Del Vescovo et al., 2013)]. Similarly, during the ABRS APEX01 study, Col-0 and WS samples were combined and then separated into roots, stems, and leaves for transcriptional analysis (Paul et al., 2013b). Therefore, when using these datasets allowance for the mixed ecotypes in the sample would need to be made.

In addition to seedlings, cell cultures have been subjected to spaceflight. Thus, Zupanska et al. (2013) compared Arabidopsis seedlings and wild type cell cultures grown in the dark within the BRIC. Subsequent spaceflight experiments saw comparisons between wild-type Arabidopsis cell cultures and those with mutations in the genes for ALTERED RESPONSE TO GRAVITY 1 (ARG1; a well-studied Arabidopsis gene related to gravity sensing) and HEAT SHOCK FACTOR 2a [HSF2a; a key heat shock response-related transcriptional regulator; (Zupanska et al., 2017; Zupanska et al., 2019)]. Fengler et al. (2015) also flew Arabidopsis and rice cell cultures in the SIMBOX hardware on the Shenzhou-8 spacecraft. Interestingly, despite a large number of differences in the methodologies used in the preparation of the Arabidopsis cell cultures between these various experiments (notably culture age and hardware), TOAST analysis identifies three genes that are significantly differentially expressed in all sets of experiments (AT5G48560, CRY2-INTERACTING BHLH 2; AT1G73260, KUNITZ TRYPSIN INHIBITOR 1, and AT2G15220, a basic secretory protein family member). The sharing of such responses across multiple cell culture spaceflight experiments implies these changes in transcription may be linked to a common element of the spaceflight environment that impacts a physical factor related to spaceflight at a cellular-level. Facilitating such rapid, comparative analyses is a major focus of the TOAST 4.5 architecture.

Non-Spaceflight Datasets Within TOAST

Many ground-based analyses are relevant to specific aspects of the spaceflight environment. Therefore, several non-spaceflight datasets have been added to the TOAST database to aid with these comparative analyses. Thus, as you move further from the protection of the Earth's magnetic field radiation levels experienced by biological systems increase. Studies using ATH1 microarrays that study radiation effects on plants are therefore also included within TOAST. In these ground-based experiments, wild-type WS seedlings and mutants compromised in DNA repair (atm-1, atr-1) were treated with both gamma photons and high-charge, high-energy (HZE) radiation and their transcriptional response monitored (Culligan et al., 2006; Missirian et al., 2014). These studies provide fingerprints of transcriptional response to both increased radiation and increasing levels of DNA damage for comparison to the changes seen in spaceflight datasets.

Likewise, data from Arabidopsis cell cultures grown while either experiencing magnetic levitation or growth on random positioning machines (Manzano et al., 2012) are also included in TOAST 4.5. These two techniques have been used to mimic elements of the spaceflight environment such as reduced contact with the substrate and disruption of directional cues normally derived from 1 x g on Earth, providing further useful comparisons to spaceflight responses. These gene expression datasets were obtained using the CATMA microarray technology and so some care should be taken when making comparisons to data from experiments using the ATH1 Affymetrix microarray as these two technologies are not identical and e.g., the data from the CATMA arrays was analyzed using the slightly older TAIR9 genome annotation to assign gene IDs (Lamesch et al., 2012).

The BRIC hardware is one of the most widely used plant growth systems for spaceflight and so TOAST also contains a dataset related to growth of plants in the BRIC hardware on Earth to help provide context for analyses in that particular piece of equipment (Basu et al., 2017). Many spaceflight samples are also preserved on orbit in the chemical fixative RNAlater and so TOAST includes a dataset on the effects of RNAlater on Arabidopsis seedlings (GLDS-38). In addition, as there are several spaceflight studies that present data on root responses to spaceflight, a root tip transcriptome (Krishnamurthy et al., 2018) and root tissue gene expression mapping (Birnbaum et al., 2003) are also included for comparative analyses.

It is important to note that the ground-based studies incorporated into TOAST are not an exhaustive survey of the publicly available datasets but are intended as an entry point for such comparative analysis. A summary of the non-spaceflight datasets incorporated into TOAST 4.5 is presented in Supplementary Table 1.

TOAST Confirms and Extends Previous Transcriptome Analyses

Oxidative stress has been highlighted as a likely spaceflight-related response in multiple experiments. Therefore, TOAST 4.5 also includes datasets/dashboards for comparative “ROS wheel” analyses. The ROS wheel (Willems et al., 2016) is a meta-analysis of 79 Affymetrix ATH1 microarray studies related to Arabidopsis redox homeostasis experiments. It provides a comprehensive overview of ROS and oxidative stress-related transcriptional signatures, allowing TOAST to filter for ROS-related events within spaceflight datasets. For example, Choi et al. (2019) noted the “high light early” oxidative stress signature from the ROS wheel as a common feature of the responses of Arabidopsis in the BRIC-19 spaceflight experiment. “High light early” is one of the groupings (clades) of response defined in the ROS wheel analysis and refers to the common ROS-related transcriptional signature seen in a set of experiments all exposing plants to a high light intensity stress for between 30 min and 2 h. Figure 7 shows that reanalysis with TOAST confirms this patterning with ~⅓ of the genes significantly altered in spaceflight in Col-0 in the BRIC19 experiment also being significantly modulated in the ROS wheel “high light early” response clade. The power of these comparative approaches is shown using TOAST to perform similar analyses on other spaceflight transcriptomes. Thus, in (Beisel et al., 2019); the APEX3-2 experiment (GLDS-218; using the Veggie hardware and Arabidopsis Col-0 ecotype) TOAST analysis reveals that at 4 days of growth on orbit, 533 of the significantly differentially expressed genes in the root in response to spaceflight were also seen in the “high light early” clade of the ROS wheel. This pattern is reiterated through the time-course of the experiment (day 8, 295 transcripts and at day 11, 29 in the root and 265 in the shoot tissues). Analyses across other spaceflight experiments (Supplementary Table 4) shows that such regulation of “high light early” genes is seen in many flight experiments using whole seedlings. Interrogation with the metadata app shows these experiments mostly include plants grown in the dark, suggesting that while the triggering of a “high light early” oxidative stress pathway may be a common response of plant biology to some feature of the spaceflight environment, this is unlikely to be due to high light levels.

FIGURE 7

Figure 7 TOAST confirms the “high light early” ROS response from spaceflight data. The “high light early” clade in the ROS wheel analysis represents 8.87K transcripts from a total of 21.33K transcripts detected, or 41.5% of all transcripts.

Cross-Species Analyses Using TOAST

TOAST 4.5 also allows for seamless cross-species comparisons that offer the possibility to reveal fundamental elements of plant biology response to spaceflight. For example, when we used TOAST to compare the significantly differentially expressed genes in rice cell cultures grown on the Shenzhou 8 spacecraft with the Arabidopsis cell cultures from the same flight, 483 orthologous loci were identified (filtering on P-value <0.05, Q-value < 0.05 and for genes mapping to unique Ensembl gene Identifiers; Supplementary Table 5). The expression of, for example, genes encoding receptor-like kinases thought to be involved in response to pathogens were altered in both species, indicating that spaceflight-induced changes in the response system to biotic stress might be a conserved plant spaceflight response. Importantly, these samples were grown under sterile conditions on orbit, suggesting these responses were triggered without pathogen stimulus. In addition, both cell cultures showed changes in the expression of genes related to cell wall structure, a theme already highlighted in several reports on Arabidopsis seedlings grown in spaceflight [e.g., (Choi et al., 2014; Kwon et al., 2015; Johnson et al., 2017)] and readily discernable as a transcriptional pattern from TOAST analyses of these same spaceflight samples. Comparison between the datasets from these cell culture samples grown under microgravity with those in the 1 x g on-orbit centrifuge control module within the SIMBOX hardware of this experiment showed 111 of the genes that were significantly differentially expressed in spaceflight vs ground controls in both Arabidopsis and rice cultures were also differentially expressed in the flight vs 1 x g centrifuge (Supplementary Table 5). That is, these genes were most likely not responding to the microgravity component of the spaceflight environment (which is nullified by centrifugation). Thus, some other feature(s) of spaceflight, such as increased background radiation or the development of microgravity-induced hypoxia [e.g., (Choi et al., 2019)] may be affecting this particular response.

TOAST: Survey of Spaceflight Responsive Genes Implies Alterations in Mitochondrial Function

Manual inspection of the subcellular locations presented in the TOAST interface suggested to us a potentially common element: mitochondria-related transcripts appeared to often be significantly altered in spaceflight samples. Therefore, to more closely identify possibly conserved spaceflight-related changes to mitochondrial function, 2,290 genes annotated as belonging to the “mitochondrion” were selected using the “GO subcellular location” tool embedded in TOAST's graphical user interface as shown in Figure 8A. Using this filter, significantly differentially expressed genes (P < 0.05) were acquired from the analyses of Arabidopsis Col-0 plants grown in space in either light or dark conditions. In total, 1,233 unique differentially expressed mitochondrial genes were identified from the following light-grown experiments: root tips in CARA (GLDS-120), roots from both four and 8 day old seedlings cultivated in APEX-03's Veggie growth system (GLDS-218), the elongation zones of seedlings (GLDS-208), and undifferentiated cell cultures flown in Shenzhou 8's SIMBOX plant growth hardware (Fengler et al., 2015). Figure 8B shows that of these 1,233 differentially expressed transcripts, 382 were identified as being shared between at least two datasets, with eight genes being shared across all four experiments (Supplementary Table 5). When further comparisons were made using different sample times or assay types as a further distinction within these data (Figure 8C), only one gene, alternative Oxidase 1A (AOX1A) was found to be common amongst the significantly differentially expressed genes in all conditions of the four selected experiments. These results suggest analysis of plant mitochondrial functioning during spaceflight may be a fruitful area of research. Indeed, Sugimoto et al. (2014) previously identified an alternative oxidase in mizuna grown on the ISS in the Lada growth chamber (i.e., in the light) as showing 9.2-fold induction during spaceflight. While the majority of the selected experiments report induction of AOX1A, instances of repression were also identified in roots extracted from four day and 8 day old seedlings grown in APEX-03's Veggie growth system.

FIGURE 8

Figure 8 Analysis of mitochondrion-related genes altered by spaceflight. (A) Screenshot depicting an example of a user's interaction with the TOAST graphical user interface to define mitochondrion-related transcripts. (B) Using TOAST for iterative filtering of differentially expressed genes across multiple spaceflight studies where plants were light grown. (C) More extensive analysis of the studies in (B) using differentiation within the individual datasets for different analytical approaches (microarray vs RNAseq) and for different analysis periods (4 days vs 8 days). (D) Similar analysis but for dark grown plant samples. (E) The effects of spaceflight on the alternative oxidase gene family in dark grown samples. Maximum likelihood tree of AOX gene family generated using ClustalW alignment with Mega-X software (www.megasoftware.net). Venn diagrams plotted using jvenn (Bardou et al., 2014).

Furthermore, AOX1A is significantly induced in both the SIMBOX “Flight Static” vs “Ground Static” analyses (i.e. samples grown in microgravity compared to ground controls), and in the ‘Flight Centrifuge' vs “Ground Static” comparisons (i.e., plants grown at 1 x g on orbit vs ground controls not in a centrifuge). However, no significant difference of expression is observed when comparing the “Flight Static” vs “Flight Centrifuge” environments. These results highlight the power of being able to make comparisons to an on-board 1 x g control. The data suggest that the induction of AOX1A in light-grown undifferentiated cells is likely not a microgravity-driven event but reflects some other aspect of the spaceflight environment, such as increased radiation exposure, possible development of hypoxia or altered fluid dynamics.

The datasets chosen for the TOAST analysis above that highlight AOX1A originate from experiments with samples grown under light. To explore whether the light environment might be playing a role in this suite of responses, several “dark-grown” spaceflight studies of the Col-0 ecotype were also selected using the TOAST metadata app: etiolated seedlings and undifferentiated cell cultures grown aboard BRIC19 (GLDS-37), BRIC20 (GLDS-38), BRIC16 (GLDS-17, GLDS-44), and etiolated root tips extracted from the CARA experiment (GLDS-120). Comparisons between these datasets revealed no commonly regulated genes (Figure 8D) and AOX1A was only significantly differentially expressed in the BRIC19 study in this analysis. Therefore, we examined the spaceflight-related transcriptional responses in the other members of the AOX gene family (Figure 8E). Indeed, other alternative oxidases are differentially expressed in these other “dark-grown” experiments, with each AOX gene being differentially expressed in at least one selected “dark-grown” study (Figure 8E). Given these altered expression patterns of members of the AOX family in multiple experiments, these results suggest that the regulation of alternative oxidases in response to spaceflight-associated stressors would be a strong candidate for future research studies.

Thus, this analysis in TOAST suggests a potentially widespread alteration in mitochondrial function in plants experiencing spaceflight, but many questions arise from these observations. Is an alternative oxidase pathway being triggered by spaceflight stress? Could mitochondrial dysfunction be a significant element in the oxidative stress responses seen in plant spaceflight data, as suggested e.g., for mammalian ocular tissues (Mao et al., 2013) or osteoblasts experiencing microgravity (Michaletti et al., 2017)? This kind of comparative data mining highlights the possibilities for hypothesis generation supported by the TOAST environment.

However, here it is important to note some of the limitations inherent in these kinds of analyses. For example, hypoxia is thought to be imposed during spaceflight by local oxygen consumption and associated depletion around metabolically active tissues. The reduced convective gas mixing inherent in microgravity (Porterfield, 2002) then lowers oxygen resupply leading to development of a depletion zone around these tissues. Such hypoxic stress would be an obvious potential modulator of mitochondrial function. Yet, hypoxic signatures do not readily emerge from GO analysis of the transcript profiles of the plant spaceflight datasets, yet hypoxia is a term that GO enrichment analyses can highlight. Thus, one possibility is that another, yet to be defined, physical element(s) of the spaceflight environment may act to drive these changes in mitochondrial function. However, the formation of hypoxic environments due to microgravity is likely to be very different from how hypoxia either develops naturally on Earth, or can be experimentally imposed in ground-based experimentation. For example, the steep local oxygen depletion zones that form in microgravity are more likely to be disrupted by convective gas mixing on Earth. This observation highlights one of the important caveats of relying strongly on GO analyses to understand spaceflight data. Gene ontology analyses match patterns of gene expression to those seen under particular conditions on Earth. Therefore, how well treatments on Earth mimic conditions developing during spaceflight may affect the sensitivity of such GO analyses for defining these spaceflight responses.

Similarly, it is important to ask how much batch effects might be superimposed on any particular analysis (Leek et al., 2010). Batch effects are where measurements are impacted by a non-biological treatment related factor that systematically changes the measurement. For example, for RNAseq, a batch effect might be differences in patterns of gene expression related to the day a particular set of samples was processed for sequencing rather than the biological treatment of the samples. For microarray analyses it could be differences imposed by different batches of microarray being used for different sets of samples. Batch effects can be complex to resolve but statistical approaches such as surrogate variable analysis (Leek and Storey, 2007; Leek et al., 2010) can be used on a case-by-case basis by the researcher to estimate the sensitivity of a particular dataset's analysis to these kinds of effects and so help build a case for the robustness of the analysis.

Conclusions

As the volume of spaceflight omics-level data increases, its power will lie in researchers' ability to mine both within and across multiple datasets. Such comparisons will provide an important source of hypotheses to then be experimentally tested. TOAST provides a data-rich environment with which to explore the commonalities and differences in the responses of plants to spaceflight and spaceflight-related environments in an accessible and intuitive format. The TOAST database has been released as a publicly available, web-based environment (https://astrobiology.botany.wisc.edu/astrobotany-toast) along with an online tutorial at https://astrobiology.botany.wisc.edu/astrobotany-toast/tutorial-metadata. At present, TOAST provides a tool to aid the plant biology community. However, the underlying TOAST architecture is biological kingdom agnostic; through use of orthologous matrix mapping, we are working to extend TOAST to facilitate similar data exploration across the wealth of biological systems that are being analyzed in spaceflight.

Data Availability Statement

The datasets analyzed for this study can be found in the GeneLab data repository (https://genelab-data.ndc.nasa.gov/genelab/projects/) and the Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo/).

Author Contributions

RB developed the database. RB and KR developed the user interface and tutorial. RB, JL, and SG analyzed data and wrote the manuscript.

Funding

This research was funded by NASA grants NNX13AM50G, NNX14AT25G, NNX17AD52G, 80NSSC18K0126, 80NSSC18K0132. The Qlik software used in this work is provided under a free-to-use educational license from QlikTech International.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

The authors are indebted to Dr. Sarah Swanson for critical reading of the manuscript and to our many colleagues who have generously shared their unpublished data. We also thank the many European Bioinformatics Institute researchers and numerous beta-testers from around the world whose feedback has been critical to improving the TOAST environment. GeneLab datasets were obtained from https://genelab-data.ndc.nasa.gov/genelab/projects/, maintained by NASA GeneLab, NASA Ames Research Center, Moffett Field, CA 94035.

Supplementary Material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2020.00147/full#supplementary-material

References

Altenhoff, A. M., Glover, N. M., Train, C. M., Kaleb, K., Warwick Vesztrocy, A., Dylus, D., et al. (2018). The OMA orthology database in 2018: retrieving evolutionary relationships among all domains of life through richer web and programmatic interfaces. Nucleic Acids Res. 46, D477–D485. doi: 10.1093/nar/gkx1019

PubMed Abstract | CrossRef Full Text | Google Scholar

Anders, S., Huber, W. (2010). Differential expression analysis for sequence count data. Genome Biol. 11, R106. doi: 10.1186/gb-2010-11-10-r106

PubMed Abstract | CrossRef Full Text | Google Scholar

Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., et al. (2000). Gene ontology: tool for the unification of biology. Nat. Genet. 25, 25–29. doi: 10.1038/75556

PubMed Abstract | CrossRef Full Text | Google Scholar

Austin, R. S., Hiu, S., Waese, J., Ierullo, M., Pasha, A., Wang, T. T., et al. (2016). New BAR tools for mining expression data and exploring Cis-elements in Arabidopsis thaliana. Plant J. 88, 490–504. doi: 10.1111/tpj.13261

PubMed Abstract | CrossRef Full Text | Google Scholar

Bardou, P., Mariette, J., Escudié, F., Djemiel, C., Klopp, C. (2014). Jvenn: an interactive Venn diagram viewer. BMC Bioinf. 15, 293. doi: 10.1186/1471-2105-15-293