A thorough RNA-seq characterization of the porcine sperm transcriptome and its seasonal changes

Understanding the molecular basis of cell function and ultimate phenotypes is crucial for the development of biological markers. With this aim, several RNA-seq studies have been devoted to characterize the transcriptome of ejaculated spermatozoa in relation to sperm quality and fertility. Semen quality follows a seasonal pattern and decays in the summer months in several animal species. The aim of this study was to deeply profile the transcriptome of the boar sperm and to evaluate its seasonal changes. We sequenced the total and the short fractions of the sperm RNA from 10 Pietrain boars, 5 collected in summer and 5 five sampled in winter, and identified a complex and rich transcriptome with 4,436 coding genes of moderate to high abundance. Transcript fragmentation was high but less obvious in genes related to spermatogenesis, chromatin compaction and fertility. Short non-coding RNAs mostly included piwi-interacting RNAs, transfer RNAs and micro-RNAs. We also compared the transcriptome of the summer and the winter ejaculates and identified 34 coding genes and 7 micro-RNAs with a significantly distinct distribution. These genes were mostly related to oxidative stress, DNA damage and autophagy. This is the deepest characterization of the boar sperm transcriptome and the first study linking the transcriptome and the seasonal variability of semen quality in animals. The annotation described here can be used as a reference for the identification of markers of sperm quality in pigs.


Introduction
clear drop on semen quality and male fertility has been observed in the warm summer 151 months, possibly due to heat stress (Trudeau and Sanford, 1986;Zasiadczyk et al., 152 2015). This seasonal effect has been linked to altered levels of some transcripts (Yang et 153 al., 2010). 154 155 The first step towards the efficient identification of RNA markers of sperm quality 156 requires obtaining a profound picture of the boar sperm transcriptome. Our group has 157 recently optimized a pipeline to extract RNA from swine mature spermatozoa and 158 obtain a high quality and complete transcriptome profile (Gòdia et al., 2018a). In this 159 study, we have profiled the sperm transcriptome from 10 boars, including both coding 160 and non-coding RNAs and we have evaluated the relationship between transcript 161 abundance and the season of collection (summer versus winter) in the northern 162 temperate climate zone. 163 164 165 Materials and Methods 166 167 Sample collection 168 Specialized professionals obtained fresh ejaculates from 10 Pietrain boars from a 169 commercial farm, with ages ranging from 9 to 28 months old, between July 2015 and 170 January 2017 as previously described (Gòdia et al., 2018a). Of the 10 ejaculates, 5 were 171 collected between December to February, and the other between May and July. No 172 animal experiment has been performed in the scope of this research. 173 174 RNA extraction, qPCR validation, library prep and sequencing 175 RNA extraction was performed and the abundances of the sperm specific PRM1 and the 176 somatic-cell specific PTPRC transcripts as well as the presence of genomic DNA 177 (gDNA) were measured by qPCR to determine the quality of the obtained RNAs as 178 previously described by our group (Gòdia et al., 2018a). Extracted RNA was quantified 179 with QubitTM RNA HS Assay kit (Invitrogen; Carlsbad, USA) and its integrity 180 validated with Bioanalyzer Agilent RNA 6000 Pico kit (Agilent Technologies; Santa 181 Clara, USA). Total RNA was subjected to ribosomal RNA depletion with the Ribo-Zero 182 Gold rRNA Removal Kit (Illumina) and RNA-seq libraries were constructed with the 183 SMARTer Low Input Library prep kit v2 (Clontech) and sequenced to generate 75 bp 184 paired-end reads in an Illumina's HiSeq2500 sequencing system. Short RNA-seq 185 libraries were prepared from the same RNA aliquots (prior to rRNA depletion) with the 186 NEBNext Small RNA (New England Biolabs) and sequenced in an Illumina Hiseq2000 187 to produce 50 bp single reads.

189
Total RNA-seq mapping and analysis of the Sperm RNA Elements 190 The quality of the paired end reads were evaluated with FastQC v.0.11.1 191 (http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc), and filtered to remove low 192 quality reads and adaptors with Trimmomatic v.0.36 (Bolger et al., 2014). Filtered reads 193 were then mapped to the S.scrofa genome (Sscrofa11.1) with HISAT2 v.2.1.0 (Kim et 194 al., 2015) with default parameters except "--max seeds 30" and "-k 2". Duplicate 195 mapped reads were removed using Picard Tools (http://picard.sourceforge.net) 196 MarkDuplicates. The uniquely mapped reads were used for the detection and 197 quantification of Sperm RNA Elements (SREs). SREs are short-size sequences 198 characterized by a number of RNA-seq reads clustering to a given genomic location 199 Gòdia et al., 2018b). This approach enables an accurate exon-200 quantification (or short-size sequence quantification) instead of a whole transcript mean, 201 which makes it useful for highly degraded tissues such as sperm. After mapping, SREs 202 are classified as exonic (mapping to annotated exons), intronic, upstream/downstream 203 10 kb (if located 10 kb upstream or downstream of annotated genes) and orphan 204 (mapping elsewhere in the genome) (Gòdia et al., 2018b Multi-adjusted read counts were then normalized by sequencing depth. We only 249 considered the miRNAs that were detected in all the samples processed. To determine if 250 piRNAs were located in RE, the overlap between REs and the piRNA clusters that were 251 shared in at least 3 samples was checked with Bedtools (Quinlan and Hall, 2010) 252 multicov using the RepeatMasker database . The short RNA-seq reads 253 that did not align to any of the datasets provided were used for the de novo piRNA 254 annotation using ProTRAC v.2.4.0 (Rosenkranz and Zischler, 2012) and forcing a 255 piRNA length between 26 and 33 bp and a default minimum cluster length of 5 kb. We 256 then kept only these putative novel clusters that were shared in at least 3 of the sperm 257 samples.

259
Analysis of the seasonal variation of the boar sperm transcriptome 260 We studied the potential seasonal effect of the sperm trancriptome by comparing the 261 summer (N=5) and the winter (N=5) ejaculates. Total RNA-seq analysis was performed 262 for the transcripts annotated in the pig genome. We quantified RNA abundance with the 263 software StringTie (Pertea et al., 2015). Transcript counts were then used for the 264 differential analysis after correcting for the sequencing run using the R package DESeq2 265 ( the GTEx project and concluded that the human genome annotation incorporates most 300 of the Homo sapiens genes but still lacks a large proportion of the splice isoforms. 301 While this study increased the list of coding genes by only 5%, the catalogue of splice 302 isoforms grew by 30%. Our data is in line with these recent results and does not only 303 indicate that the novel annotation of the pig genome annotation incorporates most of the 304 genes found in sperm but also reveals that there is still a large amount of splice isoforms 305 to be discovered in this species. Since it is well known that the spermatozoon harbors a 306 very specific transcriptome, a large proportion of these unannotated isoforms are likely 307 to be sperm-specific (Sendler et al., 2013;Ma et al., 2014). 308 309 In order to dig further into the porcine sperm transcriptome, we investigated whether the to fertilization (e.g. HSPA1L and PRSS37) or spermatogenesis (ODF2 and SPATA18). 324 The top 30 most abundant annotated protein coding SREs mapped to 27 genes ( Table  325 1), 12 from mitochondrial origin (e.g. Total RNA-seq analysis: variance on the SRE abundance 336 We evaluated the transcripts that contained the 10% most abundant SREs across all 337 samples and classified them as uniform (coefficient of variation or cv < 25%) or 338 variable (cv > 75%). This identified 481 genes for which all their SREs were uniformly 339 represented (cv < 25%) and 276 genes where each SRE was highly variable (cv > 75%). 340 The These contigs were then contrasted by sequence homology against several databases 400 and after filtering (see Methods), resulted in a list of 1,060 proteins from human, cattle, 401 mouse, pig and other animal species with moderate to high RNA abundance 402 (Supplementary File 8). Some of the proteins were detected in more than one species 403 and accounted for a total of 768 unique genes (Supplementary File 9). The majority of 404 these genes (739) were already present in the porcine annotation whilst 29 were 405 classified as novel genes. From the annotated genes, 699 were also detected with our 406 initial pipeline mapping the SREs to the porcine genome but 40 were only detected by 407 this de novo assembly (Supplementary File 9). The reads that did not map to the genome 408 but found a gene counterpart in the de novo analysis could have remained unmapped 409 due to three main reasons. They could have either harbor more mismatches than the 410 maximum allowed for the mapping algorithm, mapped within a repetitive element or, 411 simply correspond to segments not assembled in the current version of the porcine 412 genome. These three scenarios could involve full genes or just gene segments. The 40 413 known genes detected only by the de novo assembly together with the 29 potential 414 novel genes did not cluster into any GO biological process. However, some of these 415 genes have been associated to spermatogenesis or implicated in the sperm structure such Although the number of novel protein-coding genes represents a modest increase (29 420 genes), our de novo analysis yielded a much higher number (699)  and their RNA levels were clearly below their coding SREs counterparts 454 (Supplementary File 10). The predicted cis-regulated target genes included ZNF217, 455 which is a transcriptional repressor, DYNLRB2 which encodes for a protein belonging to 456 the dynein family of axoneme components related to sperm motility and YIPF5, which 457 caused infertility in a knock-out fruitfly model (Yu et al., 2015). The annotation of 458 lncRNAs in the swine genome remains remarkably poor and here we provide an initial 459 catalogue that is still incomplete. File 12). The reduced number of miRNAs described in our work is somewhat not 481 surprising as we only considered those miRNAs that were present in the 10 samples. 482 The 25% of the piRNA clusters co-localized with REs, most of which were SINEs (Figure 2  514 (B)). As piRNAs are tissue-specific and we queried a testes database (Rosenkranz,515 2016), we also carried a de novo prediction of piRNA clusters with proTRAC using the 516 remaining unaligned reads (average of 1.1 M reads) (Supplementary File 1). We 517 identified 17 novel potential clusters of average abundance and length of 11.3 -585 518 CPMs and 2,357 -56,029 bp, respectively and as a whole, they covered 159.7 kb of the 519 Sscrofa11.1 genome. Six of the novel clusters were present in the 10 samples and are 520 thus considered of high confidence (Supplementary File 14). improvement on the epididymal function and boar sperm quality in summer.

548
We compared the transcriptome (mRNA transcripts and miRNA) of the sperm samples 549 collected in the summer months (May to August; N = 5) with those collected in winter 550 (October to February; N = 5) in a temperate climate zone (latitude 42º N). The semen 551 quality of the summer and winter groups was not significantly different although a trend 552 was seen for sperm cell viability (p-val = 0.05), acrosome reaction (p-val = 0.09) and 553 neck (p-val = 0.07) and tail (p-val = 0.08) morphological abnormalities. We detected 36 554 transcripts displaying a significant difference in abundance. Of these, two transcripts 555 corresponded to the same gene and they were not taken into account due to concerns on 556 the transcript allocation carried by the software. From, the 34 remaining transcripts, 557 each from a different gene, 14 were up-regulated and 20 were down-regulated in the 558 summer group (Table 3). 559 560 The largest difference in gene abundance between both seasonal groups (q-val = 3.13 x 561 10 -16 ) corresponded to the minichromosome maintenance 8 homologous recombination 562 repair factor (MCM8) gene (Table 3) which is a helicase related to the initiation of 563 eukaryotic genome replication and may be associated with the length of the 564 reproductive lifespan and menopause. MCM8 plays a role in gametogenesis due to its 565 essential functions in DNA damage repair via homologous recombination of DNA 566 double strand breaks (Lutzmann et al., 2012). Another gene, the RUN Domain 567 Containing 3B (RUNDC3B) has unknown functions but it contains a RUN domain that 568 interacts with RAP2, a GTPase that has been linked to heat stress in plants (Figueroa-569 Yañez et al., 2016) and is related to male meiosis in mammals (Manterola et al., 2016). 570 In keeping with RAP2's function, a study in bulls found that spermatogonia undergoing 571 meiosis during spermatogenesis were susceptible to heat stress (Rahman et al., 2018). 572 This suggests that in mammals, spermatogonia exposed to heat stress, up-regulate the 573 expression of RUNDC3B as a protective mechanism to ensure correct spermatogenesis 574 and the production of normal spermatozoa. StAR Related Lipid Transfer Domain 575 Containing 9 (STARD9), which was up-regulated in the summer group, is a lipid 576 binding gene that has been related to asthenospermia in humans (Mao et al., 2011). 577 Moreover, the paralog STARD6 has been linked to spermatogenesis and spermatozoa 578 quality (Mao et al., 2011). This is in keeping with the fact that the spermatozoon is very 579 sensitive to oxidative damage for several reasons including the high amount of the 580 peroxidation-prone unsaturated fatty acids that are present in its plasma membrane 581 (Aitken and De Iuliis, 2010). Another gene that was found up-regulated in the summer 582 group is the Oxidative Stress Induced Growth Inhibitor 1 gene (OSGIN1). OSGIN1 has 583 been related to autophagy and oxidative stress and its encoded protein regulates both 584 cell death and apoptosis in the airway epithelium (Sukkar and Harris, 2017). Its 585 expression is induced by DNA damage, which is one of the key sperm parameters that 586 increase in the warm summer months (Perez-Crespo et al., 2008). Since this gene has 587 also been identified in the sperm lineage, it could respond with a similar anti-oxidative 588 role to heat stress in sperm.

590
The presence of RNA in ejaculated sperm in summer versus winter seasons has been 591 previously interrogated using the microarray technology (Yang et al., 2010). In that 592 study the authors identified 33 dysregulated transcripts, none of which was 593 differentially abundant in our dataset. This lack of concordance between works could be 594 due to both biological and technical reasons and is somewhat expected. We also identified 5 miRNAs up-and 2 miRNAs down-regulated in summer (Table 4). 605 This set included miR-34c, which was one of the most abundant miRNAs in our study, 606 as well as in the sperm of other species, and was down-regulated in the summer 607 samples. The RNA levels of miR-34c were also down-regulated in the sperm of men 608 and mice exposed to severe early life stress events (Dickson et al., 2018), and in the 609 testis of cynomolgus monkeys exposed to testicular hyperthermia (Sakurai et al., 2016), 610 thus suggesting a link between the seasonality of semen quality and miR-34c. miR-611 1249, down-regulated in the summer group, was also found to be altered in the semen of 612 bulls with moderate fertility (Fagerlind et al., 2015). Members of the miR-106 family 613 were recently associated with oxidative stress in several tissues and cell types. For 614 example, miR-106b targets the 12/15-Lipoxygenase enzymes, which are involved in the 615 metabolism of fatty acids and oxidative stress in murine neurons (Wu et al., 2017). miR-616 106b has also been related to autophagy and cellular stress in intestinal epithelial 617 HCT116 cells (Zhai et al., 2013). A study in cattle identified a single nucleotide 618 polymorphisms in a miR-378 target site of the INCENP semen quality associated gene 619 (Liu et al., 2016). In humans, miR-378 was found to also target the autophagy related We have identified a rich and complex sperm transcriptome with known and novel 636 coding RNAs, lncRNAs and sncRNAs that resembles the human, mouse and cattle 637 counterparts. Their roles are mainly related to the regulation of spermatogenesis, 638 fertility and early embryo development. These spermatozoal transcripts are fragmented, 639 likely in a selective manner, consistently affecting some genes more than others across 640 samples. This suggests that their fragmentation has a programmatic basis. Similarly, the 641 variability of the transcript abundance between samples was transcript specific. This in-642 depth transcriptome profile can be used a reference to identify RNA markers for semen 643 quality and male fertility in pigs and in other animal species.

645
Interestingly, the levels of some transcripts changed between the summer and the winter 646 ejaculates, most likely responding to heat stress, which would in turn, cause oxidative 647 stress, sperm membrane and DNA damage and autophagy. Our data supports previous 648 findings suggesting that feed supplementation can correct this seasonal effect and thus, 649 opens the door to explore nutri-genomics research to improve semen quality and male 650 fertility.