Mild temperatures differentiate while extreme temperatures unify gene expression profiles among populations of Dicosmoecus gilvipes in California

Accurately predicting the effects of future warming on aquatic ectotherms requires an understanding how thermal history, including average temperature and variation, affects populations of the same species. However, many laboratory studies simplify the thermal environment to focus on specific organismal responses and sacrifice environmental realism. Here, we paired laboratory-based transcriptomic RNA-seq analysis to identify thermally responsive genes with NanoString analysis of a subset of those genes to characterize natural field-based variation in thermal physiology among populations. We tested gene expression responses of three populations of field-acclimatized larval caddisflies (Dicosmoecus gilvipes) from streams in different eco-regions (mountain, valley, and coast) following exposure to current and future summertime temperatures. We hypothesized that distinct thermal histories across eco-regions could differentiate populations at baseline “control” levels of gene expression, as well as gene expression changes in response to daily warming and heat shock. Population-specific patterns of gene expression were apparent under the control and daily warming conditions suggesting that local acclimatization or local adaptation may differentiate populations, while responses to extreme temperatures were similar across populations, indicating that response to thermal stress is canalized. Underlying gene co-expression patterns in the daily warming and heat shock treatments were different, demonstrating the distinct physiological mechanisms involved with thermal acclimatization and response to thermal stress. These results highlight the importance and limitations of studies of the thermal biology of wild-caught organisms in their natural environment, and provide an important resource for researchers of caddisflies and aquatic insects in general.


RNA-seq Methods: Library Construction, Sequencing and Bioinformatics
Animals were flash-frozen in liquid nitrogen and the thorax was dissected on dry ice and placed frozen into Tri Reagent (MRC). The thorax was used to maximize muscle tissue and minimize digestive tract symbiont contamination. Thorax tissue was homogenized under liquid N2 using a TissueLyser II (Qiagen 85300) for 5-10 sec and RNA was extracted using the manufacturer's polysaccharide/proteoglycan removal modified protocol, using BCP and High Salt Precipitation Solution (MRC). Purified RNA was assessed for quantity and quality using a Bioanalyzer (Agilent) and only samples with little to no degradation and adequate concentration were used in subsequent steps. For each exposure temperature, equal amounts of total RNA from each individual were used to make n=5 independent pooled RNA samples. Those n=5 pooled RNA samples were used to make RNAseq libraries following the Stillman laboratory's modifications of the Illumina Tru-Seq RNA v2 kit (Stillman et al 2020). The pooled RNA allowed best sampling of mean levels of gene expression from a greater number of individuals, and to remove sources of technical variation from the ability to detect changes in gene expression. In what follows we briefly summarize our bioinformatics pipeline. All scripts can be found on the GitHub page for Dr. Scott Fay (https://github.com/safay/RNA_seq/tree/master/blacklight_pipeline). RNAseq data were analyzed on the Pittsburg Supercomputer Center Blacklight. The mean number of processed reads in each library was 11.8±2.3 X 10 6 (mean ± SD). To prepare libraries for analysis, sequences were trimmed to remove Illumina adaptors (stringency = 1) and bases with a Phred quality score under 20 using Trim_Galore! (V 0.3.0). Trimmed reads were 32.0 ± 1.2 % of the total, 11.0 ± 1.8 % of which were quality trimmed, and 0.51 ± 0.02 % of the bases were trimmed.
FLASH (V 1.2.8) was used to join overlapping paired end reads in order to make longer reads for de novo assembly and to eliminate artificial double counting of mapped read overlap regions.
The number of reads processed by FLASH was 10.4 ± 2.0 X 10 6 , with 37.8 ± 5.5% of the reads combined, and 11.6 ±1.0% of sequence pairs removed. De novo transcriptome assembly was performed using Trinity (V r2013-11-10) (http://trinityrnaseq.sourceforge.net/) (Grabherr et al., 2011). Trinity was used to conduct the assembly using a minimum kmer coverage = 2. The assembled transcriptome contained n=135858 contigs. This Transcriptome Shotgun Assembly project has been deposited at DDBJ/EMBL/GenBank under the accession GJZL00000000. The version described in this paper is the first version, GJZL01000000. The transcriptome was annotated against the SwissProt database using Trinotate (V 1.1) , with BLASTx of nucleotide sequences (resulting in n=37464 annotated contigs), and BLASTp (n=31743 annotated) and PFAM (n=30225 annotated) analyses of TransDecoder (V 2.0) produced protein sequences.
Gene ontology annotation was found for n=35024 contigs.
Libraries were mapped to the de novo transcriptome for that species using Bowtie2 (V 2.0.6) and counted using eXpress (V 1.5.1) (http://bio.math.berkeley.edu/eXpress/), a read mapper that probabilistically assigns reads that ambiguously map to multiple loci, thus minimizing issues arising from redundant putative transcripts in de novo assembled transcriptomes (Roberts & Pachter, 2013). The unmodified eXpress output data are available at the NCBI Gene Expression Omnibus (NCBI GEO) under accession GSE206349 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE206349). Library size (# of mapped reads) was compared visually using boxplots to ensure that there was no library size variation.
On average 93.8 ± 1.0 % of reads in each library mapped to the transcriptome.
Statistical analysis of differential gene expression was performed in R using the Bioconductor package EdgeR (V 1.2.4) (http://www.bioconductor.org/packages/release/bioc/html/edgeR.html) (Robinson et al, 2010) to identify genes with expression that varied across temperature. We used likelihood ratio tests with false-discovery rate correction to identify genes that had differential expression between any two acclimation treatments within a given species. Transcripts with expression of less than 2 reads/million mapped in less than 4 samples were removed from the analysis, which resulted in the retention of n=30092 transcripts in the analysis. Transcripts with a false-discovery rate corrected P-value of less than 0.05 and a fold-change ≥ 4 (for up-regulation) or ≤ 0.25 (for downregulation) were identified as statistically differentially expressed genes (DEGs) in each LRT, which yielded a set of n=2586 DEGs. Differentially expressed transcripts were log2base transformed and median-centered based on the global within-species median. That set of transformed data is available at the NCBI GEO under accession GSE206349. Expression data were log2base transformed and median-centered based on the global within-species median.
Expression data were filtered again (MaxVal-MinVal ≥ 1.0), resulting in a final set of n=934 DEGs, and k-mean clustered (using default parameters) in Cluster (V 3.0; http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm) and visualized using TreeView (http://jtreeview.sourceforge.net). Those clusters were visualized using TreeView (http://jtreeview.sourceforge.net) to determine the minimum number of k-means clusters that fully described the data. All subsequent data analysis of differentially expressed genes was performed in R.

Gene Expression Clusters:
Expression patterns for D. gilvipes are presented in the 8 k-means clusters named alphabetically "A" through "H" (Fig. S1; Table S1). Three of these clusters (A, B, and C) represented genes with a general pattern of down-regulation with increasing temperature, with cluster B showing the greatest downregulation at 30°C (approximately -2 log2-fold change on average; Fig. S1). All three of these clusters had a high representation of genes related to protein synthesis/degradation (Table S1). In addition, Cluster A had many genes related to transcription and RNA/DNA binding, Cluster B contained several genes related to immune/stress responses, oxidative metabolism, and storage, and Cluster C included many genes related to extracellular/cell-cell interaction/cuticular function and lipid modification (Table S1).
Three of the D. gilvipes clusters (D, E, and F) were associated with general increases in gene expression with rising temperature, particularly at 30°C. Clusters E demonstrated the strongest induction at 30°C, with a mean four-fold log2 increase in expression (Fig. S1). Cluster D was dominated by immune/stress response genes, including heat-shock proteins, as well as cellular chaperones typically unassociated with the stress response (Table S1). Cluster D also contained a high proportion of genes related to cell cycle/development and transcription and DNA/RNA binding. Cluster E had a high representation of genes related to the cell cycle/development as well as the immune/stress response (Table S1). Cluster F had low and relatively even representation of genes across functional categories, but had the functions with the highest representation were related to amino acid metabolism, immune/stress response, and transcription and DNA/RNA binding (Table S1).
Two of the D. gilvipes clusters (G and H) had no strong pattern of change with temperature ( Fig. S1). Cluster G contained only features in the "other" functional category, and cluster H had low and relatively even representation across functional categories, with no more than three features occurring for any given function (Table S1).    . RNA-Seq data only for the genes used in the current study, organized by hierarchical clustering. The text colors represent log2 fold difference in expression, and indicate if we expected an increase (red) or decrease (blue) in transcript abundance with temperature in the current study. Two samples were removed from the 20°C group because of divergent expression patterns (see Fig. S1). Figure S3. Heatmap of all genes and treatments, without any induction calculations. Genes are arranged in rows and grouped by similarity of expression value (dendrogram). Each column is an individual caddisfly, labeled by its warming treatment. White bars represent individuals in the cool control treatment. Black and grey bars correspond to the daily warming treatment and heat shock, respectively. The colors of the heat map cells represent the magnitude and direction of the change in expression, scaled and centered by row. Asterisk represents the individual left out of the PCA analysis.    Table S3 . Tukey multiple comparison tests. Significance codes for adjusted p-values: 0=***, 0.001=**, 0.01=*, 0.05=., 0.1=NS A) Population differentiation PC1 B) Population differentiation PC2 C) Angelo dates PC1 A)

B)
P adj