Expression Quantitative Trait Loci in Equine Skeletal Muscle Reveals Heritable Variation in Metabolism and the Training Responsive Transcriptome

While over ten thousand genetic loci have been associated with phenotypic traits and inherited diseases in genome-wide association studies, in most cases only a relatively small proportion of the trait heritability is explained and biological mechanisms underpinning these traits have not been clearly identified. Expression quantitative trait loci (eQTL) are subsets of genomic loci shown experimentally to influence gene expression. Since gene expression is one of the primary determinants of phenotype, the identification of eQTL may reveal biologically relevant loci and provide functional links between genomic variants, gene expression and ultimately phenotype. Skeletal muscle (gluteus medius) gene expression was quantified by RNA-seq for 111 Thoroughbreds (47 male, 64 female) in race training at a single training establishment sampled at two time-points: at rest (n = 92) and four hours after high-intensity exercise (n = 77); n = 60 were sampled at both time points. Genotypes were generated from the Illumina Equine SNP70 BeadChip. Applying a False Discovery Rate (FDR) corrected P-value threshold (P FDR < 0.05), association tests identified 3,583 cis-eQTL associated with expression of 1,456 genes at rest; 4,992 cis-eQTL associated with the expression of 1,922 genes post-exercise; 1,703 trans-eQTL associated with 563 genes at rest; and 1,219 trans-eQTL associated with 425 genes post-exercise. The gene with the highest cis-eQTL association at both time-points was the endosome-associated-trafficking regulator 1 gene (ENTR1; Rest: P FDR = 3.81 × 10-27, Post-exercise: P FDR = 1.66 × 10-24), which has a potential role in the transcriptional regulation of the solute carrier family 2 member 1 glucose transporter protein (SLC2A1). Functional analysis of genes with significant eQTL revealed significant enrichment for cofactor metabolic processes. These results suggest heritable variation in genomic elements such as regulatory sequences (e.g. gene promoters, enhancers, silencers), microRNA and transcription factor genes, which are associated with metabolic function and may have roles in determining end-point muscle and athletic performance phenotypes in Thoroughbred horses. The incorporation of the eQTL identified with genome and transcriptome-wide association may reveal useful biological links between genetic variants and their impact on traits of interest, such as elite racing performance and adaptation to training.

While over ten thousand genetic loci have been associated with phenotypic traits and inherited diseases in genome-wide association studies, in most cases only a relatively small proportion of the trait heritability is explained and biological mechanisms underpinning these traits have not been clearly identified. Expression quantitative trait loci (eQTL) are subsets of genomic loci shown experimentally to influence gene expression. Since gene expression is one of the primary determinants of phenotype, the identification of eQTL may reveal biologically relevant loci and provide functional links between genomic variants, gene expression and ultimately phenotype. Skeletal muscle (gluteus medius) gene expression was quantified by RNA-seq for 111 Thoroughbreds (47 male,64 female) in race training at a single training establishment sampled at two time-points: at rest (n = 92) and four hours after high-intensity exercise (n = 77); n = 60 were sampled at both time points. Genotypes were generated from the Illumina Equine SNP70 BeadChip. Applying a False Discovery Rate (FDR) corrected P-value threshold (P FDR < 0.05), association tests identified 3,583 cis-eQTL associated with expression of 1,456 genes at rest; 4,992 cis-eQTL associated with the expression of 1,922 genes post-exercise; 1,703 trans-eQTL associated with 563 genes at rest; and 1,219 trans-eQTL associated with 425 genes post-exercise. The gene with the highest cis-eQTL association at both time-points was the endosome-associated-trafficking regulator 1 gene (ENTR1; Rest: P FDR = 3.81 × 10 -27 , Post-exercise: P FDR = 1.66 × 10 -24 ), which has a potential role in the transcriptional regulation of the solute carrier family 2 member 1 glucose transporter protein (SLC2A1). Functional analysis of genes with significant eQTL revealed significant enrichment for cofactor metabolic processes. These results suggest heritable variation in genomic elements such as regulatory sequences (e.g. gene promoters, enhancers, silencers), microRNA and transcription factor genes, which are associated with metabolic function and may have roles in determining end-point muscle and athletic performance phenotypes in Thoroughbred horses. The incorporation of the eQTL identified with inTRODUcTiOn In the 6,000 years since horses were first domesticated on the Eurasian steppe, there has been strong artificial selection for various athletic traits (Levine, 1999). Selection for athleticism is perhaps most clearly manifested in the Thoroughbred, which has undergone over 300 years of intense selection for speed and racing performance (Willett, 1975;Todd et al., 2018). As a result the Thoroughbred has a highly developed musculature, with a skeletal muscle mass ~10% greater than other horse breeds (~55% compared to ~42%) (Gunn, 1987), accompanied by decreased body fat (Kearns et al., 2002), superior glycogen storage capacity (Votion et al., 2012), increased mitochondrial volume (compared to other mammals) (Kayar et al., 1989) and a high degree of plasticity in skeletal muscle fibre composition (Rivero, 2004).
The response of equine skeletal muscle to training has been well studied (Snow et al., 1985;Snow, 1994). These responses in general increase the oxidative capacity of the muscle, such as fibre type switching from fast-twitch glycolytic fibres to slow-twitch, highoxidative fibres (Snow, 1994;Serrano et al., 2000), an increase in oxidative phosphorylation (Snow et al., 1985;Votion et al., 2012) and increased mitochondrial volume (Tyler et al., 1998). Training also elicits an increase in skeletal muscle mass (Rivero et al., 1996), mediated through hyperplastic growth as opposed to marked hypertrophy (Rivero et al., 1996;Rivero et al., 2002).
The transcriptional response to exercise and training in skeletal muscle has been studied in the Thoroughbred (McGivney et al., 2009;Eivers et al., 2010;McGivney et al., 2010;Eivers et al., 2012;Bryan et al., 2017). Initially, reverse transcription quantitative realtime polymerase chain reaction (RT-qPCR) was used to quantify expression of 18 candidate genes in response to a standardised exercise test on a high-speed treadmill (Eivers et al., 2010). Significant differential expression of creatine kinase M-type (CKM), cytochrome c oxidase subunit 4I1 (COX4I1), cytochrome c subunit 4I2 (COX4I2), pyruvate dehydrogenase kinase 4 (PDK4), PPARG coactivator 1 alpha (PPARG1A) and solute carrier family 24 member 4 (SLC2A4) four hours post-exercise was detected. PPARG1A is a transcription factor downstream of hypoxia-inducible factor (HIF), activation of PPARG1A via HIF in response to exercise induces downstream adaptations in oxidative phosphorylation (Arany, 2008). The differentially expressed genes were downstream targets of HIF or related to oxidative phosphorylation or muscle substrate use (Kraniou et al., 2006) The availability of a dedicated equine microarray allowed gene expression to be measured across 9,333 expressed sequence tags (ESTs). This technology was then used to examine the changes in gene expression induced by exercise, without a priori knowledge of the genes involved (McGivney et al., 2009). Analysis of the differentially expressed genes showed a functional enrichment of genes involved in insulin signalling, focal adhesion, hypertrophic and apoptotic pathways. Digital gene expression was used to investigate the transcriptional response to a ten-month training protocol , identifying functional enrichment of genes relevant to aerobic metabolism. More recently, RNA sequencing (RNA-seq) was used to investigate the response to both exercise and training, and a network biology approach was employed to identify relevant functional modules that highlighted the role of autophagy (Bryan et al., 2017) While these studies provide insight to the genes involved in the transcriptional response to exercise, they do not reveal whether there is variation in the transcriptional response among individuals and how this may influence skeletal muscle function.
Expression quantitative trait loci (eQTL) are genomic variants, typically single nucleotide polymorphism (SNPs), that are associated with variation in RNA transcript abundance. Jansen and Nap (2001) introduced the concept of 'genetical genomics' where genomic loci were associated with cellular intermediates, such as transcript abundance, to catalogue functional relevance for non-coding variants. These measurements at a cellular level then act as endophenotypes, which are heritable, intermediate phenotypes This was a particularly important development because the clear majority (>85%) of QTLs detected in genomewide association studies (GWAS) are located in non-coding regions (Hindorff et al., 2009;Brown et al., 2013).
Cis-eQTL are genetic variants that alter gene expression in an allele-specific manner and are typically located in gene regulatory regions (Wittkopp, 2005;Westra and Franke, 2014). Identification of true cis-eQTL requires aligning reads to their chromosome of origin; consequently, many studies have by convention defined any eQTLs within 1 Mb of the transcription start site (TSS) of the gene they act on as cis. Conversely, trans-eQTL act in a less direct manner, altering the expression of a secondary genome product-for example, a transcription factor or a microRNA-that regulates expression of a distant gene elsewhere in the genome (Wittkopp, 2005).
The study of eQTL in skeletal muscle to-date has been largely to investigate functional variants in the pathogenesis of type II diabetes (T2D) in humans (Mason et al., 2011;Keildson et al., 2014b;Sajuthi et al., 2016). While GWAS for T2D have identified loci associated with disease risk, these studies have not provided information on the function of these variants or the mechanism by which they contribute to disease. Keildson et al. (2014b) performed an eQTL investigation using skeletal muscle biopsies from 104 human subjects and identified an association between the rs4547172 SNP and muscle phosphofructokinase gene (PFKM) expression. Furthermore, the study found that increased expression of PFKM was associated with increased resting plasma insulin (an endophenotype) and T2D (an endpoint phenotype). This example shows that an eQTL approach can identify functional links between genomic variants, gene expression, endophenotypes, and ultimately, disease.
Variation in human gene expression has been found to be highly heritable (Monks et al., 2004;Stranger et al., 2007;Wright et al., 2014). Given the influence of gene expression on phenotype, detection of heritable variation in skeletal muscle gene expression may provide insight into genomic loci contributing to variation in exercise and performance related phenotypes.
In this study, we hypothesised that there is heritable variation in the Thoroughbred skeletal muscle transcriptional response to exercise and training, and that this variation may have implications for athletic performance.

Ethics Statement
University College Dublin Animal Research Ethics Committee approval (AREC-P-12-55-Hill), a licence from the Department of Health (B100/3525), and informed owner consent were obtained. cohort Skeletal muscle biopsy samples (gluteus medius) were collected from 111 horses (47 male, 64 female) born between 2011 and 2012. All horses were based at a single training yard, under the supervision of a single trainer and under similar management and feeding regimes. The 111 horses used for the study were produced from 19 different sires and 94 different dams.
Biopsies were collected at two time points: untrained at rest (UR) and untrained four hours post-exercise (UE). Of the 111 horses, 60 were sampled at both time points. In total 92 UR samples and 77 UE samples were collected. The horses were defined as untrained because they had completed ≤ four sprint exercise bouts (e.g., work days) prior to sampling. The number of prior work days and days of submaximal prior training prior to sampling were recorded. Horses were defined as untrained in order to integrate results with those of Bryan et al. (2017), where the untrained cohort had performed only 1−2 work days prior to sampling, and the trained cohort had completed a mean of 15.1 work days prior to sampling (SD = 9.1).

Exercise Test
The exercise stimulus was an intense sprint bout of exercise (work day) undertaken as part of normal training. The training regime for horses is submaximal training at canter six times per week, with work days being introduced and replacing one to two submaximal bouts per week. On a work day horses were initially walked on an automated horse walker for 30-60 min, followed by 5-10 min of walking in hand. Under saddle there was an initial warm-up period of 300 m walk and 700 m of trot and slow canter down the incline of the track. The work day was performed on a 1,500 m all-weather woodchip gallop track, with the final 800 m straight set on a 2.7% incline. The sprint portion of the exercise bout consisted of the horses galloping at high intensity for 800-1,000 m up the incline of the gallop. In a larger cohort of horses (n = 294) from the same training establishment, the work day was characterised using concurrent global positioning system (GPS) and heart rate monitoring (Farries et al., 2019). From 2,900 GPS recordings the mean peak speed was 16.36 m/s (range: 14.23−17.63 m/s). Of these 2,900 recordings 1,056 had simultaneous heart rate recordings, with a mean peak heart rate of 219 beats per minute (range: 182−237).
For 34/77 UE horses, whole blood was collected at rest and five minutes post-exercise into fluoride oxalate tubes. Samples were centrifuged, and plasma lactate concentrations measured on-site using a YSI2300 STAT PLUS auto analyser (YSI UK Ltd, Hampshire, UK). These measurements were used to validate the intensity of the exercise test performed.

Biopsy Sampling
Percutaneous needle muscle biopsies (approximately 300 mg) were obtained from the ventral compartment of the middle gluteal muscle using the method described by Valette et al. (1999). All UR samples were collected between 7:30 am and 11:30 am. UE samples were taken four hours after completion of the exercise test, as this has previously been shown to be a timepoint where the greatest change in gene expression in response to acute exercise was observed (McGivney et al., 2009;Eivers et al., 2010). Muscle samples were stored in RNAlater (Thermo Fisher, Massachusetts, USA) for 24 hours at 4°C then stored at −20°C prior to RNA extraction.

Rna Extraction and Quality control
Total RNA was extracted from approximately 70 mg tissue using a protocol combining TRIzol reagent (Thermo Fisher), DNase I treatment (Qiagen, Hilden, Germany) and an RNeasy Mini-Kit (Qiagen). RNA was quantified using a Nano Drop ND1000 spectrophotometer V 3.5.27 (Thermo Fisher). RNA quality was assessed using the RNA integrity number (RIN) on an Agilent Bioanalyser with the RNA 6000 Nano LabChip kit6 (Agilent, Cork, Ireland).

Rna Sequencing
Indexed, strand-specific Illumina sequencing libraries were prepared using the TruSeq Stranded mRNA Library Preparation Kit LT (Illumina, San Diego, CA, USA). Libraries were pooled with 18-20 indexed libraries per pool and sequenced on an Illumina HiSeq 2500 using a Rapid Run flow cell and reagents to generate 100 bp paired-end reads. Each pool was sequenced across both lanes of the flow cell (dual lane loading). Demultiplexed sequence data was then converted to FASTQ format. Sequencing was performed by the Research Technology Support Facility, Michigan State University.
After mapping, featureCounts [version: 1.5.0] was used to assign reads to genes (Liao et al., 2014). Data for each sample from each sequencing lane was then merged where concordance was >99% between lanes. Count data was parsed using a custom script, then small non-coding RNA were filtered using BiomaRt (Durinck et al., 2009). Assessment of the count data and multidimensional scaling were performed using edgeR (Robinson et al., 2010). Results of the multidimensional scaling were visualised using ggplot2 (Wickham, 2009). Count data was quantile normalised using preprocessCore [version: 1.40.0] (Bolstad, 2017) within the R environment [version: 3.5.1] (R Core Team, 2017), and the log 2 of quantile-normalised count data calculated.
genotyping Genomic DNA was extracted from whole blood using the Maxwell 16 automated DNA purification system (Promega, Madison, WI, USA). Horses were genotyped on the Illumina Equine SNP70 BeadChip (Illumina). A genetic versus phenotypic sex check was performed. SNPs with a genotyping rate of <95%, and individuals with a genotyping rate <95% were excluded. SNPs with a minor allele frequency (MAF) < 0.10 were removed. Using these quality-controlled SNPs, identity by state (IBS) distances between individuals were calculated using the 'genome' function in PLINK [version 1.09] (Chang et al., 2015). The remaining 43,988 SNPs were then pruned based on pairwise linkage disequilibrium (LD) using a sliding window with an LD threshold of r 2 > 0.7, a window size of 50, and a step of 5 in PLINK. A set of 15,995 SNPs were used for the eQTL analysis. Pruning was undertaken due to the large spanning of LD within the Thoroughbred, with previous work validating the use of <15,000 SNPs to capture the majority of genetic variation (Corbin et al., 2014;Schaefer et al., 2017). eQTL analysis eQTL were determined using a linear model within matrixEQTL [version: 2.1.1] (Shabalin, 2012); including sex and age at sampling (days) as covariates. As samples had been included in two separate sample pools which were sequenced separately, the sequencing batch for each sample was also included as a covariate. Tests of association were corrected using the Benjamini-Hochberg procedure (Benjamini and Hochberg, 1995) and eQTL with a corrected P-value (P FDR ) < 0.05 were catalogued for UR and UE samples separately. eQTL located within 1 Mb of the transcription start site (TSS) of the gene they were associated were designated as cis, and those located >1Mb from the TSS were designated trans, in keeping with human eQTL studies (Lonsdale et al., 2013). Significant results were then compared against genes previously identified in the skeletal muscle transcriptional response to acute, high-intensity exercise (a work day; 3,241 genes) and transcriptional response to a sixmonth period of training (3,405 genes) (Bryan et al., 2017).

Functional Enrichment analysis
Genes with significant eQTL were investigated for enrichment of biological processes using gene ontology (GO) categories (Ashburner et al., 2000) with the clusterProfiler package [version: 3.10.1] (Yu et al., 2012) within the R environment. Equine Ensembl IDs were mapped to annotated human orthologs, retrieved from the BioMart database (Kasprzyk, 2011) and GO enrichment performed using the annotation from the human genome annotation package org. Hs.eg.db [version: 2.12.0] (Carlson, 2019). The background gene set was the complement of genes expressed in skeletal muscle identified in this study (13,384 genes; 12,707 mapped to human orthologs). A threshold for significant enrichment was set at <0.05 after adjustment using the Benjamini-Hochberg procedure (P FDR ) (Benjamini and Hochberg, 1995). The number of genes assigned to each Biological Process (Gene count) and proportion of genes associated with that cluster out of all the genes expressed (Gene ratio) were also reported. Results were visualised using the clusterProfiler package (Yu et al., 2012).

RESULTS
cohort UR horses had a mean age of 611.7 days (range: 513-787 days), UE horses had a mean age of 757.5 days (range: 617-1,283 days). Dates of commencing preparatory training were available for 90 of the UR horses; 21 of the UR horses were sampled prior to breaking, 69 were sampled after breaking with a mean of 41.5 days after commencing preparatory training (range: 5-154 days) ( Table 1). UE horses were sampled on average 156.6 days after commencing preparatory training (range: 31-307). UR horses had an average of 41.5 days submaximal training (range: 5-154) and UE had on average 48.6 (range: 19-152) ( Table 1). UR horses had completed a mean of 0.3 work days (range: 0-4), UE horses completed a mean of 0.5 WDs prior to sampling (range: 0-3) ( Table 2). A subset of 34 of the UE horses had a mean peak postexercise plasma lactate concentration of 28.2 mmol/L, and a mean resting plasma concentration of 0.4 mmol/L. All RNA samples used for RNA-seq had a RIN greater than 7.0, the UR cohort had a mean RIN of 8.0 (range: 7.2−9.3) and the UE cohort had a mean RIN of 8.1 (range: 7.0−9.3). Multi-dimensional scaling was used to visually inspect the count data, showing separation of untrained resting and untrained post-exercise samples ( Figure S1).
Analysis of the genetic relatedness of the cohort showed the mean IBS distance between individuals was 0.69 and ranged from 0.64−0.85 (SD = 0.03). Of the 19 sires represented in the cohort, the top six sires in terms of number of progent represented had 39, 23, 14, 10, 6, and 4 progeny. There were two sires with two progeny and the remaining ten sires had one offspring each. There were 12 full siblings in the cohort and 34 half-siblings by dam.

genetic Regulation of Exercise Relevant genes
The total set of genes with expression changes associated with eQTLs (i.e. eGenes) were queried against genes that we have reported from the same dataset to be differentially expressed post-exercise in a sample of 39 Thoroughbreds (Bryan et al., 2017). Of the 3,582 UR cis-eQTL, 913 were associated with genes differentially expressed in response to exercise. The most significant association was between BIEC2-285235 and the CCR4-NOT transcription complex subunit 11 gene (CNOT11; P FDR = 3.00 × 10 -15 ) ( Table 5). Of the 1,703 UR trans-eQTL, 144 were associated with exercise relevant genes. The most significant trans-eQTL was between BIEC2-1061469 and the TAL bHLH transcription factor 2 gene (TAL2; P FDR = 3.03 × 10 -10 ) ( Table 6).
Within the UE cohort 4,992 cis-eQTL were identified, 1,132 of which were associated with genes differentially expressed post-exercise. The strongest association was between BIEC2-240006 and the polyA-specific ribonuclease gene (PARN; P FDR = 2.41 × 10 -21 ) ( Table 5). Of the UR trans-eQTL, 121 eQTL were associated with eGenes in the transcriptional exercise response. The strongest trans-eQTL in the UE cohort was BIEC2-1053404, associated with expression of the peroxiredoxin 2 gene (PRDX2; P FDR = 1.51 × 10 -8 ) ( Table 6).   , 2017), we examined our results based on eQTL associated with genes within this transcriptional response. Within the UR cohort, 609 of the 3,582 cis-eQTL were associated with training response genes. The strongest association was between BIEC2-1061469 and the spindle and expression of the kinetochore associated complex subunit 1 gene (SKA1; P FDR = 9.80 × 10 -18 ) ( Table 7).

DiScUSSiOn
Using a systems genetics approach we have integrated RNA-seq and genome-wide SNP data for a large cohort of Thoroughbred horses in active race training that were maintained in a single environment. This strategy has allowed us to detect significant cis and trans eQTL in equine skeletal muscle that are likely to be relevant to an exercise phenotype, adaptation to training, an important and valuable trait in the racing Thoroughbred. A total of 4,992 cis-eQTL associated with the expression of 1,922 distinct genes were identified in the UR cohort; and 4,886 cis-eQTL associated with the expression of 1,875 genes were identified in   the UE cohort. Fewer trans-eQTL were detected (UR: 1,703; UE: 1,219), which is consistent with previous studies, and likely due to the greater statistical power required to identify trans-eQTL (Westra and Franke, 2014).
The gene with the most significant association with a cis-eQTL in the UR and UE cohorts was ENTR1 ( Table 3, UR: P FDR = 3.81 × 10 -27 , UE: P FDR = 1.66 × 10 -24 ). The ENTR1 protein is involved in cellular transport of cargo proteins from the endosome to the  Golgi apparatus or for degradation in the lysosome (McGough et al., 2014) and has been suggested to play a role in cytokinesis (Hagemann et al., 2013). In a study where ENTR1 protein expression was blocked by RNA interference, there was a decrease in solute carrier family 2 member 1 glucose transporter protein (SLC2A1; previously known as GLUT1) (McGough et al., 2014). When examining whether this was due to increased SLC2A1 degradation, there was no evidence of increased transport of SLC2A1 to the lysosome. It was hypothesised that the decrease in SLC2A1 was mediated through regulation of transcription by ENTR1. SLC2A1 is responsible for approximately 30−40% of the glucose uptake in skeletal muscle, with the remainder transported through GLUT4 (Zisman et al., 2000;Rudich et al., 2003). As opposed to GLUT4 which is primarily expressed in skeletal muscle, SLC2A1 is widely expressed and is highly expressed on erythrocyte membranes (Krook et al., 2004). The control of SLC2A1 by ENTR1 in the context of the equine athlete is intriguing to speculate since SLC2A1 is expressed within equine lamellar tissue, and its expression is increased in hyperinsulinemia, therefore may play a role in the pathophysiology of equine laminitis (Campolo et al., 2016). SLC2A1 is also differentially expressed in response to hypoxia, this has also been shown in equine chondrocytes in vitro after exposure to cobalt chloride (to mimic hypoxia) and in chondrocytes from osteoarthritis cases (Peansukmanee et al., 2009). The most significant trans association in the UR cohort was between BIEC2-526896 and expression of the DEAH-box helicase 16 gene (DHX16) ( Table 4). DHX16 is an RNA helicase and is involved in regulation of translation and pre-mRNA splicing (Gencheva et al., 2010;Putiri and Pelegri, 2011). The gene located closest to BIEC2-526896 is the olfactory receptor family 12 subfamily D member 3 gene (OR12D3) with the TSS located 96.5 kb from the SNP. However, the zinc finger protein 311 gene (ZNF311) also relatively close to BIEC2-526896 (792.5 kb) (Consortium, 2017). ZNF311 has previously been associated with telomere length in heterozygous ataxia-talengiectasia mutated (ATM) gene patients (Renault et al., 2017). As a member of the a krueppel c2h2-type zinc-finger protein family it is likely a transcription factor and has been associated with Biological Processes such as 'regulation of transcription, DNA templated' and 'regulation of transcription by RNA polymerase II' (Consortium, 2017). The trans association between BIEC2-526896 and DHX16 expression may therefore be mediated via the gene regulatory function of ZNF311.
The most significant trans-eQTL in the UE cohort was UKUL2765 and expression of the methylcrotonoyl-CoA carboxylase 2 gene (MCCC2) ( Table 4). MCCC2 encodes a subunit of 3-methylcrotonyl-CoA carboxylase (MCC), an enzyme which catabolises leucine (Stadler et al., 2005). Mutations within MCCC2 have been found to result in MCC deficiency, which has varying implications for patients from no symptoms at all to death in early infancy . To date studies have yet to discern mutations which result in more or less severe disease phenotypes (Gallardo et al., 2001;Stadler et al., 2006). In terms of muscle physiology, MCCC2 has been shown to be highly expressed in skeletal muscle of the red seabream fish (Pagrus major), which is likely due to high levels of protein metabolism within skeletal muscle (Abe et al., 2004). The TSS of the jumonji domain containing 4 gene (JMJD4) is located 71 bp from UKUL2765. The JMJD4 protein catalyses the hydroxylation of translation termination factor eRF1 lysine 63, which in turn enables the correct termination of translation and maintenance of translational fidelity (Feng et al., 2014). It is possible that the variation proximal to JMJD4 is influencing expression of JMJD4, in turn altering expression of MCCC2. However, from the data available only one significant cis-eQTL for JMJD4 was detected in the UR cohort and this was BIEC2-277622 located 257.8 kb downstream of the TSS (P FDR = 6.58 × 10 -5 ). Therefore it is not clear if UKUL2765 is tagging variation influencing JMJD4 expression and mediating its influence on MCCC2 through JMJD4.
Examination of eGenes previously shown to be involved in the skeletal muscle transcriptional response to exercise and training demonstrated that TAL2 exhibited the most significant trans-eQTL in the UR cohort (BIEC2-658237; Table 6) and that this trans-eQTL was also highly significant in the UE cohort (P FDR = 9.80 × 10 -8 ; Table 4). TAL2 encodes a basic-helix-loop-helix transcription factor (Xia et al., 1991;Langlands et al., 1997). Deletion of TAL2 in mice has been shown to cause severe disruption of the development of the central nervous system, with new-born mice dying shortly post-partum (Bucher et al., 2000). TAL2 has been shown to be vital for the development of gamma-aminobutyric acid (GABA, inhibitory neurotransmitter) signalling neurons in the developing midbrain, showing highly regulated and coordinated expression (Achim et al., 2013). When expression of TAL2 was inhibited, neurons more closely resembled an excitatory glutamatergic phenotype (Achim et al., 2013). In terms of application in racing performance, GABA has previously been used as a calming agent in Thoroughbred racehorses, although it was banned from use in 2012. The GABA type A receptor associated protein like 1 gene (GABARAPL1) was also identified as a key regulator in the skeletal muscle transcriptional response to exercise (Bryan et al., 2017). In addition, we have previously reported functional enrichment of pathways related to neurodegenerative disorders in the transcriptional response to exercise (Bryan et al., 2017). Given the role of TAL2 in GABAergic neuronal fate, this suggests a potential role for TAL2 in the coordination of the response to exercise. These results suggest that the role of genes associated with neuronal differentiation and disease in the context of muscle and exercise warrants further investigation.
To identify common biological functions within genes identified under cis or trans regulation, enrichment analysis of Biological Processes among the gene sets was performed. Among the cis eGenes detected in both the UR and UE cohort, as well as trans eGenes in the UR cohort there was significant enrichment of cofactor metabolic processes (GO:0051186, Tables S1, S3, and S4). Cofactor metabolic process is defined as chemical reactions and pathways requiring the activity of an inorganic cofactor, such as an ion, or an organic coenzyme for the activity of an enzyme or other functional protein. Genes within this cluster were related to metabolism and substrate utilisation, including vitamin and mineral binding and synthesis such as: selenium (selenium binding protein 1 gene, SELENBP1; and selenoprotein T gene, SELENOT), molybdenum (molybdenum cofactor sulfurase gene, MOCOS) and thiamine (thiamine triphosphatase gene, THTPA). Consequently, variation in the expression of genes associated with nutrient binding may lead to variation in the ability of horses to utilise such nutrients. In this regard, abundance of selenoprotein gene transcripts has been used to identify dietary requirements for selenium in rats and turkeys (Barnes et al., 2009;Taylor and Sunde, 2017). Given the inter-animal variation in expression observed for genes relevant to substrate binding, it may be possible to use this information to evaluate nutrient requirements for individual horses, or whether expression of these genes can be modulated through diet.
Many of these genes have also been shown to have functions relevant to exercise, and variation within the expression of these genes may underpin variation in athletic performance. For example, the selenium binding protein 1 gene (SELENBP1) is significantly downregulated in response to exercise (log 2 FC = −0.56; P FDR = 3.71 × 10 -11 ) (Bryan et al., 2017). In both normal and cancerous human cells SELENBP1 has been shown to be highly variable in expression (Yang and Sytkowski, 1998). Functionally, the SELENBP1 protein has been shown to be involved in many cellular processes including detoxification (Ishii et al., 1996), cytoskeletal outgrowth (Miyaguchi, 2004) and regulation of reduction and oxidation within the cell (Jamba et al., 1997). SELENBP1 was found be differentially expressed in blood in response to administration of human recombinant erythropoietin in human endurance athletes (Durussel et al., 2016;Wang et al., 2017), suggesting a potential role in haematopoiesis and its regulation. In the UE cohort; the DDB1 and CUL4 associated factor 12 (DCAF12) and guanosine monophosphate reductase (GMPR) genes both exhibited significant cis-eQTL (DCAF12: P FDR = 0.02; GMPR: P FDR = 4.17 × 10 -3 ). These genes, in addition to SELENBP1, were also shown by Wang et al. (2017) to be differentially expressed in blood in response to human recombinant erythropoietin. Variation in the expression of these genes may therefore potentially underpin variation in haematological phenotypes in horses, which may in turn influence traits relevant to aerobic capacity. It is also noteworthy that selenium deficiency has been associated with significant myopathy (White muscle disease) (Lofstedt, 1997;Delesalle et al., 2017) and reduced exercise tolerance in horses (Brady et al., 1978;Avellini et al., 1999). In addition, selenoproteins have been shown to be involved in several metabolic pathways and the response to oxidative stress in muscle (Rederstorff et al., 2006). These findings suggest an important role for selenium, and its associated biochemical machinery, in the correct functioning of skeletal muscle and muscle metabolism. This highlights the importance of selenium in the context of exercise and provides a potential role for variation in expression of genes relevant to selenium metabolism in determining metabolic function within the muscle.
The cofactor metabolic process cluster also contained genes relevant to mitochondrial function and oxidative phosphorylation. These included genes within the coenzyme Q synthesis pathway: coenzyme Q3 hydroxylase (COQ3), coenzyme Q7 (COQ7) and coenzyme Q8A (COQ8A). The coenzyme Q complex is a critical component of the electron transport chain during oxidative phosphorylation, moving electrons from complexes I and II to complex III (Lenaz, 1985;Turunen et al., 2004;Stefely and Pagliarini, 2017). COQ7 and COQ8A are required for coenzyme Q biosynthesis (Mollet et al., 2008;Stefely et al., 2016). Human patients with COQ8A mutations suffered seizures and other neurological symptoms and showed reduced coenzyme Q within skeletal muscle (Jacobsen et al.;Mollet et al., 2008). An eQTL for COQ8A in skeletal muscle has already been identified in horses, with a 227 bp SINE insertion in the promotor region of MSTN (g.66495326_66495327ins227) on ECA18 associated with increased expression of COQ8A (previously known as ADCK3) in Thoroughbreds (Rooney et al., 2017). However it should be noted that this increase in COQ8A expression did not appear to accompany an COQ8A protein abundance, with no difference in COQ8A protein abundance across genotypes (Rooney et al., 2017). This may be due to COQ8A having a regulatory role in coenzyme Q biosynthesis (Acosta et al., 2016). Electron transport chain complex activity assays, as well as assays using the exogenous application of ubiquinone, suggested a difference in the abundance of coenzyme Q across genotypes at this locus (Rooney et al., 2017). Suggesting variation at this SINE insertion is associated with COQ8A expression as well as coenzyme Q abundance. Therefore eQTL in the current study associated with COQ8A (Figure S4), and indeed other genes within the coenzyme Q biosynthetic pathway, may result in variation in synthesis of the coenzyme Q complex and have downstream implications for mitochondrial function.
We have for the first time systematically catalogued eQTL in equine skeletal muscle, both at rest and post-exercise. Previous investigations of eQTL in skeletal muscle have focussed primarily on human T2D (Sharma et al., 2011;Keildson et al., 2014a;Sajuthi et al., 2016;Langefeld et al., 2018) and meat quality traits in production animals (Ponsuksili et al., 2015;Gonzalez-Prendes et al., 2017;Pampouille et al., 2018;Gonzalez-Prendes et al., 2019;Velez-Irizarry et al., 2019). Our investigation of eQTL in the context of skeletal muscle and exercise present some of the only work to-date in this area (Kelly et al., 2014). Our work utilised linear models to detect associations between SNPs and gene expression as quantified by RNA-seq. It should be noted at this point that there are potential biases introduced in terms of the high number of related individuals within the cohort, future work could utilise more sophisticated techniques such as allele specific expression, where transcripts are mapped back to the maternal and paternal chromosomes and the expression of the maternal and paternal transcripts can be compared (Chamberlain et al., 2015). This would be particularly useful in our cohort given the high number of offspring by a small number of sires. Within the cohort there is also some variation in the amount of training prior to sampling, this was kept to a minimum by our sampling criteria. However, extending the study to incorporate trained horses and utilise variation in prior training by modelling the transcriptional training response could provide information on regulation of the training responsive transcriptome.
Our current results provide novel information concerning the regulation of gene expression in horses and can provide a framework for interpreting future GWAS of athletic and performance traits in Thoroughbreds. In terms of future applications of these results, the identification of quantitative trait transcripts (QTT) for athletic traits characterised in our cohort could be used to detect associations between a SNP, variation in expression of a QTT and a trait of interest. Thus giving a fuller picture of genetic variation contributing to traits of interest. An example may be detecting loci and QTT involved in the response to exercise and training, which has previously been shown to be highly heritable in humans (Timmons et al., 2010;Bouchard et al., 2011). The use of systems genetics approaches that integrate differential gene expression with genome variation represent an excellent strategy for dissecting the genetic architecture of complex anatomical and physiological traits.

DaTa aVaiLaBiLiTY STaTEMEnT
The RNA-seq datasets analyzed for this study can be found in EBI ArrayExpress https://www.ebi.ac.uk/arrayexpress/experiments/ E-MTAB-5447/. SNP datasets will not be made publicly available as the data were generated from privately owned horses, with a legal commitment to confidentiality. Researchers may request access to the data and consideration will be given to individuals following the conclusion of a confidentiality agreement. Requests should be made to the UCD Technology Transfer Office (https:// www.ucd.ie/innovation/knowledge-transfer/).

ETHicS STaTEMEnT
University College Dublin Animal Research Ethics Committee approval (AREC-P-12-55-Hill), a licence from the Department of Health (B100/3525) and informed owner consent were obtained.

aUTHOR cOnTRiBUTiOnS
GF performed computations and functional analyses. KB and PM assisted in analysis and pipeline development. CM, KG and GF performed biopsy sample collections. JB and GF prepared RNA. GF wrote the manuscript in close consultation with EH, LK and DM. All authors were involved in study design, implementation of the research and preparation of the manuscript.

acKnOWLEDgMEnTS
We would like to thank J.S. Bolger for access to his horses, and staff at Glebe House stables for their assistance, particularly B. O'Connor and P. O'Donovan. This research was conducted with the financial support of Science Foundation Ireland (grant no. SFI/11/PI/1166 and 18/TIDA/6019).