RNA-Seq Transcriptome Analysis of Peripheral Blood From Cattle Infected With Mycobacterium bovis Across an Experimental Time Course

Bovine tuberculosis, caused by infection with members of the Mycobacterium tuberculosis complex, particularly Mycobacterium bovis, is a major endemic disease affecting cattle populations worldwide, despite the implementation of stringent surveillance and control programs in many countries. The development of high-throughput functional genomics technologies, including RNA sequencing, has enabled detailed analysis of the host transcriptome to M. bovis infection, particularly at the macrophage and peripheral blood level. In the present study, we have analysed the transcriptome of bovine whole peripheral blood samples collected at −1 week pre-infection and +1, +2, +6, +10, and +12 weeks post-infection time points. Differentially expressed genes were catalogued and evaluated at each post-infection time point relative to the −1 week pre-infection time point and used for the identification of putative candidate host transcriptional biomarkers for M. bovis infection. Differentially expressed gene sets were also used for examination of cellular pathways associated with the host response to M. bovis infection, construction of de novo gene interaction networks enriched for host differentially expressed genes, and time-series analyses to identify functionally important groups of genes displaying similar patterns of expression across the infection time course. A notable outcome of these analyses was identification of a 19-gene transcriptional biosignature of infection consisting of genes increased in expression across the time course from +1 week to +12 weeks post-infection.


INTRODUCTION
Bovine tuberculosis (BTB) is caused by Mycobacterium bovis and other intracellular bacterial pathogens of the Mycobacterium tuberculosis complex (MTBC), which display 99.9% DNA sequence identity at the genome level (1)(2)(3). Each member of the MTBC has a distinctive host spectrum, such that tuberculosis (TB) affects a wide range of mammals including humans (4). In addition, BTB has been classified as the fourth most important disease of livestock in terms of zoonotic and economic impact globally (5,6). It has also been conservatively estimated that BTB costs $3 billion annually and imposes a large financial burden on farmers with infected herds (7,8). Furthermore, as a zoonosis, M. bovis infection has important implications for human health; transmission from cattle to humans does occur and is responsible for a small but significant number of human TB cases, particularly in developing countries (9)(10)(11).
Tuberculous mycobacteria-primarily M. bovis and M. tuberculosis, the main cause of human TB-are generally inhaled from the environment within aerosol droplets and are phagocytosed by host alveolar macrophages (AMs); therefore, infection is normally initiated within, and restricted to, lung tissues (12)(13)(14)(15). Tuberculous mycobacteria have evolved a wide range of mechanisms to modulate, suppress, and manipulate specific host immune mechanisms, including inhibition of phagosomal maturation, detoxification of reactive oxygen and nitrogen species (ROS and RNS), repair of ROS-and RNS-induced cellular damage, resistance to antimicrobial and cytokine defences, modulation of antigen presentation, and induction of cellular necrosis and inhibition of apoptosis (16)(17)(18)(19). Tuberculous disease is characterised by lesions located at the site of infection, which are formed when AMs and other immune cells engage and eliminate most of the bacilli. The remaining intact mycobacterial cells are confined in granulomas that act to contain the infection, but may, under certain conditions, actually facilitate expansion and dissemination of mycobacteria to spread infection (20)(21)(22).
In Ireland, a test and slaughter policy for BTB was introduced in the early 1950s as part of the national BTB eradication scheme (23,24). This policy includes compulsory screening of all animals using the single intradermal comparative tuberculin test alone or in conjunction with an in vitro enzyme-linked immunosorbent assay-based interferon γ (IFNγ) release assay (IGRA) that increases the sensitivity of diagnosis (25). However, limitations of current diagnostic methods prevent early and accurate detection and subsequent removal of all infected animals from a herd, thereby contributing to the ongoing persistence of BTB, which continues to impact cattle production in Ireland, the United Kingdom, and other countries (24,26). Therefore, the most important objective of an effective BTB control strategy-to identify and remove all infected cattle from a herd regardless of the stage of infection-is substantially hindered by current diagnostic technologies. Novel methods of BTB diagnosis are urgently required to augment current test procedures in conjunction with appropriate wildlife reservoir control measures (27).
In recent years, the availability of a well-annotated bovine genome sequence combined with high-throughput functional genomics technologies has provided an unprecedented opportunity to gain a deeper understanding of host-pathogen interaction, identify blood-derived RNA-based biomarkers, and develop new diagnostic methods for BTB caused by infection with M. bovis (28)(29)(30)(31)(32)(33).
Previous transcriptomics studies of the host immune response to M. bovis have been performed using blood-derived RNA obtained from both naturally and experimentally infected animals, as it has been shown that host immune responses occurring in peripheral blood reflect those at the primary site of disease in BTB (34). In this regard, the dynamic transcriptome of circulating blood, which contains a large pool of "biosensors" in the form of RNA transcripts, can reflect physiological and pathological events occurring elsewhere in different tissues and organs, thereby providing a comprehensive overview of the status of the immune system (35,36). In addition, peripheral blood has provided information on the pathobiology of many diseases; it is accessible and easily collected, making it ideally suited for the development of diagnostic biomarker tests based on transcriptional profiling (37)(38)(39).
For the experimental work described here, RNA sequencing (RNA-seq) was used to study the bovine whole peripheral blood transcriptome in response to infection with M. bovis across a large-scale 14-week animal infection time course. The main objectives of the study were to examine the host peripheral blood transcriptional responses across the early stages of M. bovis infection and identify differentially expressed (DE) genes across the infection time course that represent promising candidate biomarkers for BTB. In addition, we aimed to identify host canonical pathways and interaction networks enriched for DE genes, which may shed light on the immunobiology of M. bovis infection in cattle. We also used time-series analysis and Gene Ontology (GO) information to identify functionally important groups of DE genes across the infection time course.

Overview of Animal Infection Time Course Experiment
Animal resources for the present study were obtained from a 26-week vaccination and challenge experiment of age-and sexmatched cattle infected with M. bovis (40)(41)(42)(43)(44). Ten male agematched Holstein-Friesian calves (4-6 months old) were sourced from farms known to be free of BTB disease. The animals used for the experimental work described here were the naive control group (non-vaccinated) for a vaccine efficacy study (40). Figure 1 shows the experimental schedule used by Dean et al. (40) and details the sampling time points for the 10 non-vaccinated control cattle used for the research work described here.

Inoculation With M. bovis Strain AF2122/97
The challenge strain, M. bovis AF2122/97 (2,45), was delivered endobronchially at 2 × 10 3 colony-forming units per animal using the following procedure described by Whelan et al. (46). Prior to endobronchial inoculation animals were sedated with Rompun R (Bayer Animal Health, Newbury, UK) according to the manufacturer's instructions. Following this, an LSVP 22 VGS89x14 endoscope (Veterinary Endoscopy Services, Welshpool, UK) lubricated with Vet-Lubigel (Millpledge Veterinary, Clarborough, UK) was inserted through a nostril into the trachea and placed above the bronchial opening to the cardiac lobe and the main bifurcation between left and right lobes. A cannula of 1.8-mm internal diameter (Veterinary Endoscopy Services) was inserted through the endoscope and used to deliver the M. bovis AF2122/97 inoculum in 2 mL of phosphate-buffered saline (PBS). Following this, an additional 2 mL of PBS was then used to wash the cannula, and the cannula and endoscope were withdrawn. For each individual animal, a new sterile cannula was used, the internal channel of the endoscope, through which the cannula was inserted, was rinsed with 20 mL of PBS, and the outside of the endoscope was cleaned thoroughly with sterilising tissue wipes (Medichem International, Sevenoaks, UK).
Individual responses to infection across the time course and disease pathology for the animals used in this study have been described in detail previously and include whole-blood IFN-γ assay, evaluation of peripheral blood mononuclear cell (PBMC) cytokine responses by intracellular cytokine staining, gross (visible) pathology and histopathology, and evaluation of bacterial load in lymph nodes (40,41).

Peripheral Blood Collection and Total RNA Extraction
Approximately 3 mL of ex vivo peripheral blood was sampled from all 10 naive control animals at −1 week pre-infection and then at +1, +2, +6, +10, and +12 weeks postinfection (Figure 1). All blood samples were obtained during the morning (between 7:00 and 10:00 A.M.) of each collection day and directly collected into Tempus TM blood RNA tubes (Applied Biosystems R /Thermo Fisher Scientific, Warrington, UK). Immediately after blood collection at each time point, Tempus TM tube samples for each animal were vortexed for ∼10 s to ensure complete red blood cell lysis. Tempus TM tube blood lysate samples for animals at each of the nine time points were then stored at −80 • C until they were used for total RNA extraction and purification.
The Tempus TM Spin RNA Isolation Kit (Applied Biosystems R /Thermo Fisher Scientific) was used for total RNA extraction and purification using the following protocol provided by the manufacturer. Tempus TM tube blood lysate samples were thawed at room temperature prior to RNA extraction and purification. Once thawed, for each sample, ∼3 mL of blood lysate was transferred to a 50-mL plastic centrifuge tube, and PBS was added to a final volume of 12 mL. Each sample was then mixed by vortexing for 30 s and then centrifuged at 3,000 × g for 30 min at 4 • C. The supernatant was then removed, and the remaining RNA-containing pellet was resuspended with a brief vortex in 400 µL of the proprietary RNA Purification Resuspension Solution. Following this, the resuspended RNA sample was pipetted into the RNA purification filter inserted into a 1.5-mL microcentrifuge tube for waste collection. The RNA purification filter/microcentrifuge tube was then centrifuged at 16,000 × g for 30 s, and the liquid waste and microcentrifuge tube discarded. The RNA purification filter was then placed in a clean microcentrifuge tube, 500 µL of proprietary RNA Purification Wash Solution 1 was added, followed by another centrifugation step at 16,000 × g for 30 s and disposal of the liquid waste and microcentrifuge tube. This step was then repeated using 500 µL of proprietary RNA Purification Wash Solution 2 with a centrifugation step at 16,000 × g for 30 s. A final wash step was then performed with 500 µL of RNA Purification Wash Solution 2 and centrifugation at 16,000 × g for 30 s followed by disposal of the liquid waste and microcentrifuge tube. The RNA purification filter was then placed in a clean microcentrifuge tube and centrifuged at 16,000 × g for 30 s to dry the membrane. The RNA purification filter was then inserted into a clean RNase-free collection microcentrifuge tube and 100 µL of Nucleic Acid Purification Elution Solution was added and incubated for 2 min followed by centrifugation at 16,000 × g for 30 s; the RNA eluate was then pipetted back onto the filter membrane, and the centrifugation step was repeated. Approximately 90 µL of the final RNA eluate was then pipetted (avoiding particulate material) into a new labelled RNase-free collection microcentrifuge for long-term storage at −80 • C.

RNA Quality Checking and Quantification
RNA quantity and quality checking were performed using a NanoDrop TM 1000 spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA) and an Agilent 2100 Bioanalyzer using an RNA 6000 Nano LabChip kit (Agilent Technologies, Cork, Ireland). The majority of samples displayed a 260/280 ratio >1.8 and RNA integrity numbers >8.0 (Supplementary Table 1 in Supplementary Material 1). RNA quality and quantity checking revealed that three samples did not have measurable quantities of RNA, and these were excluded from downstream RNA-seq library preparation (15, ID 6520, +2 weeks; 21, ID 6522, +2 weeks; and 27, ID 6526, + 2 weeks).

Strand-Specific RNA-Seq Library Preparation and Sequencing
For RNA-seq library preparation, 1 µg of total RNA from each sample was used to prepare individually barcoded strand-specific RNA-seq libraries. Two rounds of poly(A) + RNA purification were performed for all RNA samples using the Dynabeads R mRNA DIRECT TM Micro Kit (Ambion R /Thermo Fisher Scientific, Austin, TX, USA) according to the manufacturer's instructions. The purified poly(A) + RNA was then used to generate strand-specific RNA-seq libraries using the ScriptSeq TM v2 RNA-Seq Library Preparation Kit, the ScriptSeq TM Index PCR Primers (sets 1-4), and the FailSafe TM PCR enzyme system (all sourced from Epicentre R /Illumina R Inc., Madison, WI, USA) according to the manufacturer's instructions. RNA-seq libraries were purified using the Agencourt R AMPure R XP system (Beckman Coulter Genomics, Danvers, MA, USA) according to the manufacturer's instructions for double size selection (0.75× followed by 1.0× ratio). RNA-seq libraries were quantified using a Qubit R fluorometer and Qubit R dsDNA HS Assay Kit (Invitrogen TM /Thermo Fisher Scientific, Carlsbad, CA, USA), whereas library quality cheques were performed using an Agilent 2100 Bioanalyzer and High Sensitivity DNA Kit (Agilent Technologies Ltd.). Individually barcoded RNA-seq libraries were pooled in equimolar quantities, and the quantity and quality of the final pooled libraries (three pools in total) were assessed as described previously. RNA-seq library sample barcode index sequences are detailed in Supplementary Table 1 (Supplementary Material 1).
Prior to high-throughput sequencing, the content of several RNA-seq libraries was validated using conventional Sanger dideoxy sequencing. Library inserts from 16 libraries were cloned using the Zero Blunt R TOPO R PCR Cloning Kit according to the manufacturer's instructions (Invitrogen TM /Thermo Fisher Scientific). Sanger sequencing of 36 plasmid inserts from these selected libraries confirmed that the RNA-seq libraries contained inserts derived from bovine mRNA. Plasmid sequencing was outsourced (Source Bioscience Ltd., Dublin, Ireland), and sequences generated were validated using BLAST searching of the DNA sequence database (47). Cluster generation and high-throughput sequencing of the pooled RNA-seq libraries were performed using an Illumina R HiSeq TM 2000 Sequencing System at the MSU Research Technology Support Facility (RTSF) Genomics Core (https://rtsf.natsci.msu.edu/genomics; Michigan State University, MI, USA). Each of the three pooled libraries was sequenced independently on five lanes split across multiple Illumina R flow cells. The pooled libraries were sequenced as paired-end 2 × 100 nucleotide reads using Illumina R version 5.0 sequencing kits. Additionally, after exploratory data analysis (Supplementary Figures 1, 2), it was decided to remove animal ID 6522 completely from the analysis and proceed with 52 RNA-seq sample data sets (Supplementary Table 1 in Supplementary Material 1). All RNA-seq data generated for this study have been deposited in the European Nucleotide database with experiment series accession numbers (PRJEB27764 and PRJEB44470).

Bioinformatics Analyses of RNA-Seq Data
Except where indicated, bioinformatics procedures and analyses were performed on a 32-core Compute Server running Linux Ubuntu (version 12.04.2) hosted at the UCD Research IT Data Centre (stampede.ucd.ie) and administered by the UCD Animal Genomics Group. All of the bioinformatics workflow/pipeline components including Linux Bash, Perl, and R scripts used were deposited in a GitHub repository (https://github.com/ kmcloughlin1/RNA-sequencing) and were modified from published methods described by our group (48). Supplementary Figure 3 shows a schematic of the complete RNA-seq bioinformatics workflow and the downstream tools used for time-series analysis and various systems biology methods.
Deconvolution (filtering and segregation of sequence reads based on the unique RNA-seq library barcode index sequences; Supplementary Table 1 in Supplementary Material 1) was performed by the MSU RTSF Genomics Core using a pipeline that simultaneously demultiplexed and converted pooled sequence reads to discrete FASTQ files for each RNA-seq sample with no barcode index mismatches permitted. The RNA-seq FASTQ sequence read data were then downloaded from the MSU RTSF Genomics Core FTP server, and a custom Perl script was used to filter out paired-end reads containing adapter sequence contamination (with up to three mismatches allowed) and to remove poor quality paired-end reads (i.e., one or both reads containing 25% of bases with a Phred quality score <20). The quality of individual RNA-seq sample library files was then reassessed postfiltering using the FastQC software package (version 0.10.1) (49).
Paired-end reads, from each filtered individual library, were aligned to the Bos taurus reference genome (UMD3.1.73) (50) using the STAR aligner software package (version 2.3.0) (51). For each library, raw counts for each gene based on the sense strand data were obtained using the featureCounts software from the Subread package (version 1.3.5-p4) (52). The featureCounts parameters were set to unambiguously assign uniquely aligned paired-end reads in a stranded manner to the exons of genes within the UMD3.1.73 B. taurus reference genome annotation. The gene count outputs were then used to perform differential gene expression analysis using the edgeR Bioconductor package (version 3.2.4) (53) within an R-based pipeline that was customised to perform the following functions: 1. Use biomaRt (54) to generate a detailed bovine gene annotation for downstream analyses, then filter out all bovine rRNA genes. 2. Filter out genes displaying expression levels below a minimal detection threshold of one count per million in at least n = 9 individual libraries (where n = smallest group of biological replicates). 3. Calculate normalisation factors for each library using the trimmed mean of M values method (55). 4. Identify DE genes between the pre-infection animal group (−1 week) and each of the post-infection animal groups (+1, +2, +6, +10, and +12 weeks) using a paired-sample approach with the edgeR package. Differential expression was evaluated by fitting a negative binomial generalised linear model for each gene. 5. Correct for multiple testing using the Benjamini-Hochberg method (56) with a false discovery rate (FDR) threshold of ≤0.05.

Systems Analyses of RNA-Seq Gene Expression Data
The Ingenuity R Pathway Analysis (IPA) software package (57)

Time-Series Analysis of RNA-Seq Gene Expression Data
Time-series analysis of gene expression data from the animal infection time-course experiment was performed using the Short Time-series Expression Miner (STEM) software package (59). The computational procedure for selecting model profiles that are representative and distinct is described by Ernst et al. (60). The software package implements a method for clustering short time-series expression data that can differentiate between real and random patterns of temporal gene expression changes and assigns each gene to the model profile that most closely matches the temporal gene expression profile for that gene as determined by the correlation coefficient. A permutation test is then used to determine which model profiles have a statistically significant number of genes assigned compared to random expectations from the mean number assigned to each profile based on the permuted data (59). STEM also incorporates GO enrichment functionality for biological interpretation of time-series gene expression data.

RNA-Seq Summary Statistics
Deconvolution and filtering of sequence reads to remove adaptor-dimer contamination yielded a mean of 20.6 ± 2.0 million reads per individual barcoded RNA-seq sample library (n = 52 libraries and ± SD); this corresponded to a mean of 82% reads that passed this filtering step. These filtered reads were then aligned to the B. taurus UMD3.  Figure 4 show the RNA-seq summary statistics. Filtering of the RNA-seq data using 52 samples (60 minus the three technical dropouts and the animal ID 6522 samples at −1, +1, +6, +10, and +12 weeks) produced 12,406 genes suitable for downstream differential expression analysis.
Multidimensional scaling plots (Supplementary Figure 5) demonstrated that it was not possible to differentiate the infected animals from non-infected animals during the early stages of the animal infection time course (+1 week post-infection vs. −1 week pre-infection, and +2 weeks vs. −1 week pre-infection). Conversely, discrimination of infected and non-infected animals was partially observed at +6 weeks post-infection and was clearly evident at +10 weeks post-infection, but this pattern of discrimination effectively disappeared by +12 weeks postinfection. Previous work by our group has shown that microarray and RNA-seq gene expression data sets from peripheral blood leukocytes (PBLs) can be used to unambiguously discriminate M. bovis-infected from non-infected control cattle (28,31). However, it is important to note that the M. bovis-infected animals used for these earlier studies were heavily infected animals, which were maintained for ongoing disease surveillance and potency testing of diagnostic reagents.

Differentiation of M. bovis-Infected and Non-infected Control Groups: Toward the Development of Transcriptomics-Based Biomarkers
The concept and the implications regarding biomarker identification and biosignature development for infectious disease have been explored thoroughly by Chaussabel et al. (36,61,62). In particular, these researchers emphasise that leukocytes present in peripheral blood convey valuable information about the status of the immune system that can be translated to biomarkers and onward to sensitive and specific biosignatures of infection. In addition, peripheral blood is easily accessible and can be stabilised and processed for high-throughput transcriptomics analyses using RNA-seq and other massively parallel gene expression technologies such as microarrays. It is also notable that development of transcriptional biomarkers in cattle is relatively straightforward because very low levels of globin gene transcripts (HBA and HBB) are observed in bovine peripheral blood compared to other mammalian species (63,64).
Statistical analysis of the RNA-seq gene expression data with a B-H FDR adjusted P ≤ 0.05 demonstrated that differential gene expression was evident between each of the five post-infection time points (+1, +2, +6, +10, and +12 weeks) and the −1 week pre-infection time point (Figure 2A). Relatively small numbers of DE genes were detected at +1 week (37 exhibited increased and 20 exhibited  As shown in Figure 3, there is a striking concordance between the patterns of expression for these 19 genes across the infection time course and earlier PBL microarray and RNA-seq studies published by our group (28,31). These results provide good support for the hypothesis that a biosignature of M. bovis infection can be generated using transcriptomics data from cattle with early-and later-stage BTB. It also provides evidence that putative transcriptional biomarkers, identified using an experimental challenge with a relatively high M. bovis infectious dose, can be translated as diagnostic tools for use in naturally infected animals. Diagnostic biosignature development focusing on smaller panels of transcriptomics-based biomarkers has been used with notable success for human TB. In this case, research work has focused on specificity and differentiating active TB from latent TB and also TB disease from non-infected controls and diseases with similar pathology but distinct aetiology such as sarcoidosis, pneumonia, and lung cancer (35,39,(65)(66)(67)(68)(69)(70)(71).
Previous work on M. tuberculosis and human immunodeficiency virus infection in humans has shown that CXCR4 is upregulated in blood monocytes and bronchoalveolar lavage cells from human patients with pulmonary TB (72)(73)(74). In addition, a loss-of-function mutation in the murine Thbd gene that impairs activated protein C production results in uncontrolled lung inflammation in mice infected with M. tuberculosis, highlighting the importance of the THBD gene in mammalian TB disease (75). Galietti et al. have also shown that M. tuberculosis-and M. bovis-infected, but not M. aviuminfected, human monocytes showed increased expression of the CDKN1A protein encoded by CDKN1A (76). A range of studies have shown that levels of the protein product of the PLAUR gene are elevated in serum from human patients infected with M. tuberculosis (77)(78)(79)(80)(81). Increased expression of the OSM gene and induction of matrix metalloproteinases, which contribute to tissue damage characteristic of TB, have been demonstrated in M. tuberculosis-infected human monocytes (82). High expression of OSM was also observed in the blood transcriptome of patients presenting with high mycobacterial load sputum (83). In addition, the OSM gene is located within a candidate QTL region for TB susceptibility identified using admixture mapping in humans (84).
Ten of the 19 genes that showed consistently increased expression across all post-infection time points were observed to overlap with results from RNA-seq of an in vitro infection timecourse experiment using bovine AMs stimulated with M. bovis (48). FOSB and NR4A1 were upregulated in AMs at 2 hpi; CXCL8, NR4A1, PLAUR, and RGS16 were upregulated at 6 hpi; EVI2A, CXCL8, FOSB, HBEGF, OSM, PLAUR, RGS16, and THBD were upregulated at 24 hpi; the eight genes observed at 24 hpi plus the CDKN1A gene were also upregulated at 48 hpi.
One of the most notable putative transcriptional biomarkers represented in the panel of 19 genes and in the independent PBL studies is CXCL8 (previously known as IL8). CXCL8 is a chemokine encoded by the CXCL8 gene, which is a strong neutrophil chemoattractant and also chemotactic for monocytes and T cells (85,86); it has been observed to exhibit increased expression for different mycobacterial infections in a range of mammalian systems (87)(88)(89)(90)(91)(92)(93)(94)(95). CXCL8 enhances killing of mycobacteria by neutrophils and macrophages (96,97), and these immune cells also secrete CXCL8 when stimulated by M. tuberculosis (98). In this regard, Godaly and Young showed that M. bovis bacillus Calmette-Guérin (BCG) induces CXCL8 secretion by human neutrophils via MyD88-dependent signalling through TLR2 and TLR4 (99). Also, stimulation of human lung fibroblasts in vitro using conditioned medium from M. tuberculosis-infected monocytes caused prolonged expression of CXCL8 mRNA and >10-fold increase in CXCL8 secretion (88).
With regard to the CXCL8 mRNA transcript as a biomarker of infection, Alessandri et al. were able to detect significantly elevated levels of the CXCL8 cytokine in plasma from patients with pulmonary TB (87). More recently, based on reversion of IGRA test results in a Chinese cohort, it has been proposed that decreased serum levels of CXCL8 are associated with clearance of M. tuberculosis infection (100). In addition, using microarray and reverse transcriptase-qualitative polymerase chain reaction technologies, Widdison et al. have shown that M. tuberculosisand M. bovis-infected bovine AMs express high levels of CXCL8 transcripts compared to non-infected control cells (90). In support of this, using RNA-seq, we have shown that CXCL8 increases in expression in bovine AMs infected with either M. tuberculosis or M. bovis across a 48-h time course (101). CXCL8 has also been shown to be significantly increased in expression after in vitro PPD-b stimulation of PBMCs from cattle infected with M. bovis (102) and bovine monocyte-derived macrophages (103). CXCL8 also exhibited increased expression in PBL (28) (94). Interestingly, the potential specificity of increased CXCL8 gene expression as a biomarker for M. bovis infection in cattle is illustrated by recent results obtained by Alonso-Hearn et al. (105). Using similar RNA-seq methodology, they observed significantly decreased expression of CXCL8 in peripheral blood from cattle infected with M. avium subsp. paratuberculosis, the causative agent of Johne disease. Finally, it is important to note that several primary studies and meta-analyses  have provided evidence that a single-nucleotide polymorphism (rs4073) at the human CXCL8 gene locus is associated with resistance/susceptibility to M. tuberculosis infection (106)(107)(108)(109).

Functional Biology of Peripheral Blood Gene Expression Across the Infection Time Course
Of the 12,406 genes (50.40% of total B. taurus reference genes) that were suitable for differential expression analysis, 10,703 genes (86.27%) were mapped to molecules in the IPA Knowledge Base. IPA was used to identify overrepresented canonical pathways and construct biological interaction networks for sets of DE genes at each post-infection time point (+1, +2, +6, +10, and +12 weeks) compared to the pre-infection control time point (−1 week). Only DE genes that were significant after a multiple testing correction was applied (Benjamini-Hochberg method, FDR threshold ≤0.05) were used. The gene expression data for the panel of 19 genes ( Table 1) were also analysed using IPA and to identify enriched canonical pathways and biological interaction networks. However, these analyses did not reveal any notable functionally relevant pathways or networks (results not shown).

+1 Week Post-infection Time Point
Forty-eight of the 57 DE genes detected between sample groups at +1 week post-infection and −1 week pre-infection were mapped to the IPA Knowledge Base (84.21%); however, no statistically significant canonical pathways were detected for this gene expression contrast. Four biological interaction networks were generated from this 57-DE-gene set using the IPA Knowledge Base. Supplementary Figure 6 shows the highest-ranked network, which is associated with embryonic development, organismal development, and reproductive system development and function, and has the ubiquitin C protein encoded by the UBC gene as a central hub.

+2 Weeks Post-infection Time Point
Eighty-one of the 93 DE genes detected between sample groups at +2 weeks post-infection and −1 week preinfection were mapped to the IPA Knowledge Base (87.10%). Material 3) details the overrepresented IPA canonical pathways for this DE gene set. The top-ranked canonical pathway at +2 weeks post-infection was the Glucocorticoid Receptor Signalling pathway with eight genes displaying increased expression (CDKN1A, CXCL8, DUSP1, FOS, IL10, PLAUR, PTGS2, and SGK1) out of a total of 275 members of this pathway (P = 1.05 × 10 −5 ). The main effects of glucocorticoid steroid hormones signalling through the cytosolic nuclear receptor subfamily 3, group C, member 1 (glucocorticoid receptor) (NR3C1) protein on the immune system are to upregulate expression of anti-inflammatory genes and downregulate expression of proinflammatory genes (110,111). Therefore, glucocorticoid receptor signalling activity evident in the peripheral blood transcriptome during the early stages of M. bovis infection may reflect perturbation of homeostasis (112) and possible modulation of host cellular mechanisms at the site of infection in the lungs. Nine biological interaction networks were generated from this 81-DE-gene set using the IPA Knowledge Base. Figure 4 shows the highest-ranked network, which was centred on increased expression of the IL10, CXCL2, CXCR4, and CXCR2 genes with Cellular Movement, Haematological System Development and Function, and Immune Cell Trafficking as the top IPA disease and function categories. In this regard, IL-10, an inhibitory and anti-inflammatory pleiotropic cytokine with a major role in suppression of macrophage and dendritic cell functions, has been hypothesised as a target for modulation and manipulation by mycobacterial pathogens (113,114). Also, IL-10 is linked to chronic mycobacterial infection in the mouse model (115)(116)(117). It has been shown that mycobacterial RNA induces IL-10 production in infected cells through TLR3-mediated activation of the PI3K/AKT signalling pathway (118) and that M. tuberculosis infection of THP-1 cells induces IL10 expression through perturbation of the histone deacetylases HDAC6 and HDAC11 (119). In addition, it has been observed that increased levels of IL-10 cytokine in TB patients lead to impaired T-cell function, thereby contributing to an inefficient host immune response (120).

+6 Weeks Post-infection Time Point
Six hundred twenty-six of the 687 DE genes detected between sample groups at +6 weeks post-infection and −1 week pre-infection were mapped to the IPA Knowledge Base (91.12%). Supplementary Table 9 (Supplementary Material 3) details the overrepresented IPA canonical pathways for this DE gene set. Twenty-five biological interaction networks were generated from this 626-DE-gene set using the IPA Knowledge Base, and Supplementary Figure 7 shows the highest-ranked network, which contains mostly down-regulated focus molecules associated with DNA replication, recombination and repair, and control of gene expression and the cell cycle. In this regard, it is noteworthy that M. tuberculosis has recently been demonstrated to suppress innate immunity by exploiting the host ubiquitination system (121)(122)(123)(124).
Twenty-five biological interaction networks were generated from this 2,247-DE-gene set using the IPA Knowledge Base. Figure 5 shows the highest-ranked network, which was centred on the amyloid β (A4) precursor protein encoded by the APP gene (previously known as ABPP) as a central hub, and the top IPA disease and function categories represented were Antigen Presentation, Carbohydrate Metabolism, and Cardiovascular Disease. Currently, there are few published research works demonstrating differential expression of the APP gene during TB in vertebrates (125,126); however, the presence of this network at +10 weeks post-infection suggests that the represented genes and gene products may have roles in BTB development and hostpathogen interactions. In addition, using RNA-seq, the APP gene was also significantly increased in expression in PBL from M. bovis-infected cattle compared to non-infected controls (31).

+12 Weeks Post-infection Time Point
There was a marked decrease in the number of DE genes at +12 weeks compared to +10 weeks post-infection, which may reflect control of the infection by the immune system at this stage of the time course. Two hundred ninety-three of the 338 DE genes detected between sample groups at +12 weeks post-infection and  (12,127,128).
Nineteen biological interaction networks were generated from this 293-DE-gene set using the IPA Knowledge Base, and Figure 6 shows the highest-ranked network, which was centred on increased expression of the CXCR4, PTGS2, and KLF4 proteins, and the top IPA disease and function categories represented were Cellular Development, Haematological System Development and Function, and Cell-mediated Immune Response. As described above, the CXCR4 gene is known to be upregulated in blood monocytes and bronchoalveolar lavage cells from human patients with pulmonary TB (74). In addition, the PTGS2 gene (previously known as COX-2) encodes prostaglandinendoperoxide synthase 2, a key enzyme in prostaglandin biosynthesis, which is known to be triggered in macrophagesvia a TLR2-dependent mechanism-by ESAT-6 proteins secreted by virulent M. tuberculosis and M. bovis (129). In this regard, it has been hypothesised that induction of PTGS2 may facilitate intracellular mycobacterial survival through inhibition of p53dependent apoptosis (130). Conversely, it has also been shown that PTGS2 enhances bactericidal activity in M. tuberculosisinfected macrophages through promotion of autophagy (131).
The KLF4 gene encodes a zinc finger-containing transcription factor that regulates macrophage polarisation, displaying increased expression in M2 macrophages and strongly decreased expression in M1 macrophages (132). Integrative network analyses of transcriptome, protein-protein interaction, and transcription factor-binding site data have shown that KLF4 is an important regulator of lung cell gene expression during the early events of M. tuberculosis infection in mice (133). It has FIGURE 4 | The top-ranked biological interaction network generated using IPA for +2 weeks post-infection. Differential gene expression is represented with a red-green colour scale. This network consisted of 13 focus molecules (IPA Network Score = 24), and the top IPA Disease and Function categories represented were Cellular Movement, Haematological System Development and Function, and Immune Cell Trafficking. A detailed legend for IPA biological interaction networks including a key for node shapes and edge classifications is available at the following link: https://qiagen.secure.force.com/KnowledgeBase/articles/Basic_Technical_ Q_A/Legend. also been shown that nitric oxide (NO) and KLF4 epigenetically modify class II transactivator protein causing repression of major histocompatibility complex (MHC) class II expression during M. bovis BCG infection of murine macrophages (134). Furthermore, downregulation of microRNA-26a during M. tuberculosis infection of murine macrophages upregulates KLF4, in turn promoting increased arginase and decreased activity of inducible NO synthase, as well as preventing trafficking of M. tuberculosis to lysosomes (135). Taken together, these results support the hypothesis that increased expression of KLF4 facilitates mycobacterial evasion of host immune surveillance.

Time-Series Analysis
The STEM tool was designed specifically for analyses of short time-series data sets (3-8 time points) (59, 60) similar to the RNA-gene expression data set obtained from the M. bovis animal infection time-course experiment described here. Time-series analysis can be a powerful technique for uncovering networks of coregulated genes in longitudinal time-course experiments (136)(137)(138), particularly for gene expression data associated with host immunobiological responses to infection (139)(140)(141)(142).  The top-ranked time-series profiles (by P-value) obtained using the two different STEM analyses were very similar. For example, as shown in Supplementary Table 12 ( Supplementary Material 4), the first-ranked STEM model profile for STEM analysis 1 (profile 40; Figure 7) was enriched for the signal transduction, single organism signalling, cell communication, regulation of multicellular organismal process, and cellular response to stimulus GO terms. This profile was also similar to the second-ranked model profile for STEM analysis 2 (profile 40; Supplementary Figure 9), which was enriched for many of the same GO terms (signal transduction, single organism signalling, cell communication; see Supplementary Table 13, Supplementary Material 4). The third-ranked model profile for STEM analysis 1 (profile 23; Supplementary Figure 9) was highly similar to the first-ranked model profile for STEM analysis  It is interesting to note that STEM profile 40 in each analysis is characterised by a cluster of ∼300 genes associated with cell signalling and cellular response to stimuli, which exhibited increasing expression across the four time points with a peak at +10 weeks, followed by a substantial decrease at +12 weeks (Figure 7). Conversely, STEM profile 23 (516 genes for STEM analysis 2) is characterised by genes associated with mitochondrial components, particularly the mitochondrial membrane, which displayed an oscillating pattern of expression with a marked decrease at +1 week 10 post-infection (Figure 7).
These time-dependent patterns of gene expression in peripheral blood may reflect pathogenesis of early BTB disease during the infection time course with concomitant host cellular responses to M. bovis infection, disruption of homeostasis, and changing cellular, tissue, and organismal energy requirements (143)(144)(145). In addition, it is important to note that although these longitudinal patterns of gene expression may be due to coregulation of genes in the same cluster, they are likely to also reflect fluctuations in peripheral blood cell type populations comparable to those previously observed for comparisons of M. bovis-infected and control non-infected cattle (28,104).

CONCLUSIONS
The results presented here provide good support for the hypothesis that the peripheral blood transcriptome constitutes a source of gene expression biomarkers for BTB caused by M. bovis infection in cattle. This is particularly apparent for the panel of 19 genes exhibiting consistently, statistically significantly increased expression across the infection time course, the majority of which (16 genes) were also significantly increased in PBL harvested from an independent cohort of field-infected cattle. However, the sensitivity and specificity of putative transcriptional biosignatures of M. bovis infection will need to be verified and validated using larger panels of cattle naturally infected with M. bovis and also populations of animals infected with a range of viral and bacterial pathogens.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found at: https://www.ebi.ac.uk/ ena, PRJEB27764 and PRJEB44470.

ETHICS STATEMENT
The animal study was reviewed and approved by Animal