Long-read sequencing-based transcriptomic landscape in longissimus dorsi and transcriptome-wide association studies for growth traits of meat rabbits

Rabbits are an attractive meat livestock species that can efficiently convert human-indigestible plant biomass, and have been commonly used in biological and medical researches. Yet, transcriptomic landscape in muscle tissue and association between gene expression level and growth traits have not been specially studied in meat rabbits. In this study Oxford Nanopore Technologies (ONT) long-read sequencing technology was used for comprehensively exploring transcriptomic landscape in Longissimus dorsi for 115 rabbits at 84 days of age, and transcriptome-wide association studies (TWAS) were performed for growth traits, including body weight at 84 days of age and average daily gain during three growth periods. The statistical analysis of TWAS was performed using a mixed linear model, in which polygenic effect was fitted as a random effect according to gene expression level-based relationships. A total of 18,842 genes and 42,010 transcripts were detected, among which 35% of genes and 47% of transcripts were novel in comparison with the reference genome annotation. Furthermore, 45% of genes were widely expressed among more than 90% of individuals. The proportions (±SE) of phenotype variance explained by genome-wide gene expression level ranged from 0.501 ± 0.216 to 0.956 ± 0.209, and the similar results were obtained when explained by transcript expression level. In contrast, neither gene nor transcript was detected by TWAS to be statistically significantly associated with these growth traits. In conclusion, these novel genes and transcripts that have been extensively profiled in a single muscle tissue using long-read sequencing technology will greatly improve our understanding on transcriptional diversity in rabbits. Our results with a relatively small sample size further revealed the important contribution of global gene expression to phenotypic variation on growth performance, but it seemed that no single gene has an outstanding effect; this knowledge is helpful to include intermediate omics data for implementing genetic evaluation of growth traits in meat rabbits.


Introduction
Domestic rabbits (Oryctolagus cuniculus) are a very prolific and small herbivorous livestock with a global population of ~600 million (1).Rabbits are mainly raised in China, North Korea, and some European countries for providing meat, wool, fur, as well as the laboratory animal.Rabbit meat is characterized by excellent nutritional characteristics, such as high protein content, high percentage of unsaturated fatty acids, high content of essential amino acids, low fat content, and low cholesterol and sodium level (2,3).On the other hand, rabbits, as well as other herbivorous livestock, can efficiently utilize plant fiber fractions that are indigestible to human (4).In this context, it is economically necessary to improve growth performance of meat rabbits through genetic selection approaches, especially in the era of genomics (5).Of course, reproductive performance is another important contribution to lifetime productivity in rabbits (6,7).In comparison with other common livestock species, however, we remain less-known about genomic architecture and transcriptomic landscape underlying phenotypic variation on economically important traits in meat rabbits.
The live body weight (BW) at various ages and average daily gain of BW (ADG) post-weaning are two main types of traits that have been commonly used for measuring individual growth performance in rabbits.In Gabali rabbits, the estimates of heritability ranged from 0.06 to 0.26 for individual BW at 28 to 84 days of age, with the highest estimate at 56 days of age (8).Through divergent selection on individual BW, the estimate of heritability was 0.22 for BW at 63 days of age (9).Overall, moderate heritabilities of BW at various ages have been reported in meat rabbits (10).Like BW, the heritability estimates were generally moderate for post-weaning ADG.In two rabbit lines, the estimates of heritability were 0.19 and 0.22 for ADG under ad libitum and restricted feeding systems, respectively (11).Piles and Tusell (12) estimated the genetic correlation between growth and fertility in rabbits, and reported the heritability of 0.15 for ADG.Therefore, the moderate heritabilities of measured traits in relation to growth performance in meat rabbits could facilitate the genetic selection and improvement, for which García and Argente (13) provided a comprehensive review on the advances of genetic improvements achieved in meat rabbits.
In human and livestock, genome-wide association studies (GWAS) have been widely used for identifying quantitative trait loci (QTL) and causal genes/variants significantly affecting complex traits (14).Using 320 K genome-wide single-nucleotide polymorphisms (SNPs), Yang et al. (15) performed GWAS of BW at seven different ages in meat rabbits and suggested the significant candidate QTL and genes.Instead of using independent BW records measured at a specific age, GWAS was alternatively performed in meat rabbits via combining the fitting of growth curve based on multiple BW measurements and estimation of SNP effects into a single-step nonlinear mixed model (16).In rabbits, GWAS have been also applied to other production traits, such as feed efficiency (17), number of teats (18), and coat colour (19).Because of the relatively low SNP density used in these studies, however, it is difficult to identify causal genes and variants within the large genomic regions revealed by significant association signals.
Instead of genetic variant-trait association, transcriptome-wide association studies (TWAS) have been increasingly used during the past years to identify the association between gene expression levels and complex traits, which may effectively improve the power for identifying causal genes (20).TWAS also could fill up the gaps between significant variants and finally manifest phenotype that are mediated by transcriptional regulation.Because of the high cost and technological limitations, gene expression data involved in TWAS have been commonly obtained via computational imputation approaches based on both a small reference set of gene expression data and a large number of genotyped individuals (21).Due to the increasing throughput and decreasing cost, single-molecule long-read sequencing technologies, such as Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio), are becoming increasingly routine approaches for transcriptome profiling (22,23).In rabbits, a transcriptome atlas was successfully revealed using PacBio sequencing technology (24).In the present study, we aimed to: (1) comprehensively profile transcriptomic landscape in the muscle tissue using ONT RNA sequencing technology, (2) estimate the phenotypic variance of growth traits explained by genome-wide gene expression levels, and (3) perform TWAS with these traits in meat rabbits.

Animals and phenotypes
A crossbred population of Zika rabbits and Sichuan White rabbits was used in this study, and all of them were F1 offspring of four males and 14 females.After weaning at 35 days of age, all rabbits were fed a routine commercial pellet diet (labelled as: digestible energy = 10.5 MJ, protein = 15.5%, and crude fiber = 16.5%) and housed in cages of 50 × 40 × 40 cm in size until 84 days of age (two and one rabbits per cage before and after 70 days of age, respectively).The air conditioning control system was used when indoor temperature was higher than 25°C.The individual BW was measured at 35 (BW35), 56 (BW56), 70 (BW70), and 84 (BW84) days of age, respectively; and 119 rabbits were successfully collected for BW records at the four time points.These BW records were quality controlled by removing the outliers that reside outside the median ± 3 × median absolute deviation (MAD) at each time point (25), after which a total of 115 rabbits were finally retained.Based on these BW records, individual ADG for three growth periods were derived as between BW70 and BW84 (ADG70), between BW56 and BW84 (ADG56), and between BW35 and BW84 (ADG35), respectively.

Samples and transcriptome sequencing
All rabbits at 84 days of age were slaughtered by electrical shock after fasting for 24 h, and Longissimus dorsi tissue was collected and snap-frozen in liquid nitrogen for total RNA extraction and ONT transcriptome sequencing for each individual.Total RNA was extracted using RNASimple Total RNA Kit (Tiangen Biotech, Beijing, China) following the manufacturer's instruction.RNA concentration and RNA integrity number (RIN) were analyzed using Nanodrop 2000C (Thermo Fisher Scientific, Waltham, United States) and Agilent 2,100 Bioanalyzer (Agilent Technologies, Santa Clara, United States), respectively (Supplementary Table S1).The sequencing libraries were prepared using ~1 μg of quantified RNA sample and cDNA-PCR Sequencing Kit (SQK-PCS109, Oxford Nanopore Technologies).In brief, the full-length cDNAs were enriched using template switching activity of reverse transcriptase.PCR adapters were directly added to both ends of the first-strand cDNAs.After 14 rounds of PCR amplification using LongAmp Tag (NEB), ONT sequencing adaptors were ligated to PCR products using T4 DNA ligase (NEB).DNA purification was performed using Agencourt AMPure XP beads (Beckman, CA, United States).The final cDNA libraries were added to FLO-MIN109 flowcells and run on PromethION platform at Biomarker Technology Company (Beijing, China).

Assembly and quantification of transcripts
The raw sequencing reads were first subjected to quality controls (QC) for identifying, orienting, and reusing full-length Nanopore cDNA reads using Pychopper software with default parameters. 1uring this QC process, we discarded the short (<50 bp) or low quality (mean base quality <7.0) reads that accounted for 2.3% of raw sequencing reads on average.These qualified reads were aligned to rabbit reference genome sequences (UM_NZW_1.0,with only the autosome sequences) using minimap2 software with parameters of "-ax splice-p 0.9-N 1" (26).Herein, both reconstruction of transcripts and quantitation of gene/isoform expression levels were simultaneously performed for all individuals using IsoQuant software with the default parameters (23), which employs the intron graphs for reconstructing transcripts with reference genome annotation.The novel mono-exonic transcripts were not used.
Regarding the novel transcripts that have not been annotated yet, protein-coding potential was predicted using CPC2 software with the default parameters (27).The gene expression was quantified by directly counting the uniquely aligned reads but not requiring the necessary consistency with its isoform(s), while the preset parameters specifically regarding ONT long reads was used for matching read-toisoform relationship (23).After the raw read counts were normalized using TMM method (28), both gene and transcript expression levels were finally measured as counts per million reads (CPM) using edgeR R package (29).

Transcriptome-wide association studies
To avoid the bias resulting from lowly expressed genes in association studies, we only retained genes that had been effectively expressed (raw read count ≥2) within more than 30% of individuals.The associations between gene/transcript expression level and the trait of interest were analyzed using OSCA software (30) and mixed linear model (MLM) as: where y is an n ×1 vector of each trait (i.e., BW84, ADG70, ADG56, and ADG35) with n being the sample size; b i is the estimated effect of gene i on the trait with its expression level vector w i ; X is an n × 2 incidence matrix for the two covariates of sex (two levels) and birth season (three levels) with the effect vector b; Z is an n m × matrix containing the normalized expression levels of m genes; u is the an m ×1 vector of joint effect of all genes (also termed the polygenic effect) on the trait with u N ,A ~0 , in which A is the expression level-based relationship matrix (ORM) and σ o 2 is the proportion of phenotype variance explained by all genes; e is an n ×1 vector of residuals with e N ,I ~0 2 V e . The element of A matrix between individual j and k was computed as (30): where x ij and x ik are the normalized expression level of gene i in the individual j and k , respectively; µ i and σ i 2 are the mean and variance of expression level for gene i across all individuals, respectively.To avoid the double fitting problem of one target gene simultaneously considered as both fixed and random effects in the MLM, the MOMENT (multi-component MLM-based omic association excluding the target) module implemented in OSCA software (30) was used with the default parameters.The variance components of σ o 2 and σ e 2 were estimated using Restricted Maximum Likelihood (REML) algorithm in OSCA software (30).The multiple comparison adjustment was performed using Bonferroni approach (31), therefore, the P threshold of 0.05 divided by total number of genes/transcripts was used for defining the genome-wide significant gene/transcript.If no genome-wide significant gene/transcript was found, we alternatively listed the top 20 protein-coding genes that have the lowest p values as the suggestive candidates.The genomic inflation factor (λ) and 95% confidence interval were further computed for checking if there was potential population stratification problem (32).

Functional analyses of candidate genes
For the candidate genes proposed, functional enrichment analyses were conducted using the g:GOSt function of the g:Profiler web server (33), including the target data sets of the GO terms (34) and KEGG pathways (35).The default parameters and methods for adjusting for multiple hypotheses testing (i.e., the build-in g:SCS method) were used, targeting an adjusted 5% level of significance.

Transcriptome profiling in longissimus dorsi
An average of 6.6 million raw ONT long reads per individual were initially obtained with the mean length of 1,138 bp (Supplementary Table S1).During our QC process, the autotuned parameter of q value that determines the stringency of primer alignment was 0.1724 (Supplementary Figure S1), by which up to 97.7% of raw reads passed the QC steps.These qualified reads were aligned against all 21 autosomes with the average mapping ratio of 83.55%, ranging from 75.72 to 88.20% (Supplementary Table S1).
A total of 18,842 genes and 42,010 transcripts were detected among all the studied individuals, and 6,531 genes (35%) and 19,949 transcripts (47%) of them were novel in comparison with the reference genome annotation.On average there were 2.85 and 1.07 transcripts per gene for the known and novel genes identified, respectively (Figure 1A); therefore, most of these novel genes (97%) were single-transcript genes.The average number of exons was 9.17 and 4.42 for the known and novel transcripts, and the median sequence length was 2,546 bp and 1,229 bp, respectively (Figure 1B).Among the novel transcripts, 7,535 transcripts (38%) were predicted to be protein-coding.The mean and median lengths of the first exons of transcripts were 889.9 bp and 359.0 bp, respectively (Figure 1C).Based on the raw counts of mapped reads, 45% of genes were widely expressed among more than 90% of individuals studied, while 21% of genes were restrictively expressed among less than 10% of individuals (Figure 1D).

Transcriptome-wide association studies
The descriptive statistics of all four traits are shown in Table 1.The average BW84 was 1982 g and had the higher variability among individuals than other three traits.The average ADG decreased from 25.59 g/day between 35 and 84 days of age (ADG35) to 19.84 g/day between 70 and 84 days of age (ADG70), and the decreased variability was also observed.The phenotypic correlations of BW84 with ADG35, ADG56, and ADG70 were 0.780, 0.644, and 0.447, respectively.Among the three ADG traits, the phenotypic correlations ranged from 0.578 between ADG35 and ADG70 to 0.743 between ADG35 and ADG56 (Supplementary Figure S2).
After removing the lowly expressed genes, 10,290 genes were remained for the association analyses; and there was no obvious population stratification according to the global gene expression level (Figure 2).The estimates of variance components of MLM are shown in Table 2. Beside ADG70 that was not converged successfully, the phenotype variances explained by gene expression level (± standard error, SE) were 0.659 ± 0.198 for BW84, 0.956 ± 0.209 for ADG35, and 0.501 ± 0.216 for ADG56, respectively.The association analyses are shown in Figure 3, for which the genomic inflation factors (95% CI) were 0.947 (1.133-0.761)for BW84, 0.942 (1.123-0.760)for ADG35, and 1.005 (1.190-0.821)for ADG56.As no genome-wide significant gene was found by the association analyses, the top 20 protein-coding genes with the lowest p values are shown in Table 3.Among them, three genes were overlapped among three traits (TAR RNA binding protein 1, TARBP1) or between two traits (Kelch repeat and BTB domain containing 12, KBTBD12 and Leukotriene A4 hydrolase, LTA4H).Regarding these candidate genes, functional enrichment analysis revealed a significant GO term of "non-membrane spanning protein tyrosine kinase activity" (the adjusted p value = 0.0071), in which three genes of LYN (LYN proto-oncogene), PKDCC (protein kinase domain containing cytoplasmic), JAK1 (Janus kinase 1) were involved.No significant KEGG was found.The transcript expression level-based association studies were further conducted for a total of 16,601 transcripts.As shown in Table 2, the phenotype variances explained by transcript expression levels (±SE) were 0.510 ± 0.142 for BW84 and 0.658 ± 0.223 for ADG56, respectively (MLM failed to converge for the other two traits).Like the gene-based association results, there was no transcript that showed the statistically significant association with these growth traits (Supplementary Figure S3).Based on the top 20 suggestive genes and transcripts with the lowest p values, no gene was overlapped between them.

Discussion
The phenotypic variation of complex traits in human and livestock has been determined significantly by gene transcriptional regulation (36,37).Transcriptional diversity may be referred to the varied expression level and spatio-temporal transcription.However, both genomic architecture and transcriptomic landscape underlying economically important traits have not been extensively explored in rabbits in comparison with other livestock species, such as cattle, pigs, and chicken (5).In this study, therefore, we established an experimental cross population between Zika rabbit and Sichuan White rabbit, both of them are raised for producing meat.To investigate the transcriptional diversity extensively and accurately, we focused on a single muscle tissue among 115 individuals and also employed the more robust long-read sequencing technology.We further investigated the proportion of phenotypic variance of growth traits explained by global gene expression variation, and performed association analyses between gene expression levels and growth traits using MLM approach (38).However, we acknowledge that the sample size involved in this study is relatively small in the context of TWAS (20).
In the past decade, the comprehensive profiling of transcriptome has been largely facilitated by the enormous advances of short-read high-throughput sequencing of RNA molecules (39).The accuracies of transcript assembly and expression quantification have been further improved due to the later single-molecule long-read RNA sequencing technologies (40).Therefore, long-read sequencing technologies are expecting to be increasingly used in transcriptome studies (41), such as in cattle (42), pigs (43), and chicken (44).In rabbit, Chen et al. (24) first used PacBio long-read sequencing technology for exploring transcriptomic landscape and revealed a large proportion of novel genes and transcripts using a pooling sample of multiple organ tissues sampled at different ages.In the present study, ONT long-read sequencing technology was similarly used for investigating transcriptomic landscape in meat rabbits for a single muscle tissue and in a large set of individuals, by which considerable numbers of genes and transcripts were revealed to be novel.Hence, these findings indicate that current reference genome of rabbit has not been well annotated yet, whereas a comprehensive annotation is required for biomedical researches when using rabbit as animal model (45).
As a monogastric herbivore, rabbit can efficiently utilize plant fiber fractions that are indigestible to human, which means that raising rabbits for meat, fur and wool can be considered as an effective contribution to achieving global food security (46).In context of meat rabbits, the improved growth appearance is economically significant and could be achieved through genetic selection approaches.Recently, García and Argente (13) provided a good review on the estimated genetic parameters for various growth traits that have traditionally been used as selection criteria in meat rabbits, and the moderate to high heritabilities were reported regarding these traits.In livestock, the genome-wide SNPs, as an alternative to pedigree records, have been increasingly used for estimating heritabilities, genetic correlations, and individual breeding values for various production traits of interest (5).However, there are obvious gaps between significant genetic   variants identified and the finally manifested phenotype, mainly caused by the transcriptional and post-transcriptional variability (47).Therefore, genome-wide gene expression profile has been alternatively used for explaining the complex diseases in human and production traits in livestock (36,48).In this context, we obtained transcriptome profile using long-read sequencing technology in a single population, by which individual relationships were measured and hence used in MLM for estimating different variance components of growth traits in meat rabbits (30).We first found that more than half of phenotypic variances could be explained by the genome-wide gene expression variation, which are higher than the traditional pedigree-based estimation of heritability (13).These results suggest that there is considerable contribution of gene expression level to interindividual variation on growth performance.However, our estimates had some large SE mainly because of the relatively smaller sample size than that of pedigree-or SNP-based estimation.Using GTEx data, more recently, the proportion of phenotypic variance explained by gene expression level was estimated to be 0.68 (SE = 0.06) for body mass index in human, but the varied transcriptomic variances were observed across different tissues (49).To our best knowledge, few studies have been reported for partitioning phenotypic variance by the transcriptomic expression data regarding these economically important traits in livestock.
In addition to partitioning phenotypic variance by genome-wide gene expression level, it is possible and necessary to identify the potential causal genes that significantly affect individual phenotype using TWAS approach (20).Because of difficulty in sampling of appropriate biological issues and high cost of RNA sequencing for a large set of samples, the gene expression data used in TWAS are always imputed indirectly from a small reference data set of gene expression data (21,48).Unfortunately, high-quality reference transcriptome data sets are not routinely available in livestock until the recent releases of FarmGTEx project for cattle and pig. 2 In Huaxi cattle, the gene expression levels in Longissimus dorsi were imputed using a reference transcriptome data set containing 120 individual RNA sequencing data, and the TWAS successfully revealed some genes significantly 2 www.farmgtex.orgManhattan plots for the gene expression level-based association analyses.BW84, body weight at 84 days of age; ADG35, average daily gain between 35 and 84 days of age; ADG56, average daily gain between 56 and 84 days of age.associated with productive ability (50).In the present study, we did not detect gene or transcript that is significantly associated with growth traits in meat rabbits by TWAS approach, despite it was observed that the large proportions of phenotypic variance could be explained by genome-wide gene expression level.There are two likely explanations for the observed negative results in our TWAS.First, the growth of meat rabbits is controlled by an extremely polygenic architecture and, more importantly, no gene has an outstanding effect on the phenotype in the sense of gene expression level.Second, the relatively small sample size involved in the present study, which is expected to be increased in future studies due to a significant decrease in long-read sequencing cost, might compromise the statistical detection power of TWAS.Alternatively, we analyzed some top candidate genes with the lowest p values and found that they were significantly enriched into the health-related pathway.Among them, TARBP1 was suggested to be involved in multiple cancers in human by regulating immune function (51); the significantly differential expression was observed in human Colorectal adenoma for the gene of KBTBD12 (52); it was reported that LTA4H can modulate the susceptibility to Mycobacterial infection in zebrafish and humans (53).Furthermore, the significantly enriched GO term of "non-membrane spanning protein tyrosine kinase activity" in this study was previously reported to be involved in regulation of tumor immune microenvironment in glioma (54).These findings may be reasonably explained as a healthy rabbit will have the greater growth performance.There are three practical implications of results obtained in this study.First, the comprehensively explored transcripts and their expression levels in the muscle tissue have enhanced our understanding on transcriptomic landscape associated with growth performance in meat rabbits.Second, our observation that the high proportion of phenotypic variance could be explained by global gene expression variation suggests the possibility to predict the complex and hard-to-measured production traits, such as meat quality, using transcriptomic expression data.Third, the genetic effect of single gene expression level on complex traits may be smaller than we have expected, which suggest that a large enough sample size is required for successfully identifying the significant genes.On the other hand, we acknowledged some limitations of this study.The most obvious limitation is the relatively small sample size used in TWAS, which limited the detection power for identifying the significantly associated genes.Another limitation is the absence of genome-wide SNPs that could be used for analyzing QTL affecting gene expression level.

Conclusion
In this study, many novel genes and transcripts have been comprehensively explored in longissimus dorsi of meat rabbits using long-read RNA sequencing technology, which hence contributed to the improved annotation of rabbit genome.We also revealed that the large proportions of phenotypic variance on growth performance in meat rabbits could be explained by variation of genome-wide gene expression levels, whereas the transcriptome-wide association studies did not find gene or transcript that is statistically significantly associated with the growth traits studied.

FIGURE 1
FIGURE 1Transcript assembly and gene expression quantification.The numbers of transcripts per gene and numbers of exons per transcript are shown in (A) and (B), respectively.Length distribution of the first exon of transcripts (C) and the cumulative proportions of genes expressed among individuals (D) are further shown.

FIGURE 2
FIGURE 2 Sample clustering based on the principal component (PC) analyses of genome-wide gene expression level.

TABLE 1
Descriptive statistics of the growth traits.
BW84, body weight at 84 days of age; ADG35, average daily gain between 35 and 84 days of age; ADG56, average daily gain between 56 and 84 days of age; ADG70, average daily gain between 70 and 84 days of age; SD, standard deviation; Min, minimum value; Max, maximum value; CV, coefficient of variation.

TABLE 2
Estimates of variance components (±standard error)., body weight at 84 days of age; ADG35, average daily gain between 35 and 84 days of age; ADG56, average daily gain between 56 and 84 days of age; ADG70, average daily gain between 70 and 84 days of age. BW84

TABLE 3
The suggestive candidate genes by transcriptome-wide association studies.

TABLE 3 (
Continued), body weight at 84 days of age; ADG35, average daily gain between 35 and 84 days of age; ADG56, average daily gain between 56 and 84 days of age; Chr, chromosome. BW84