SNPs in Mammary Gland Epithelial Cells Unraveling Potential Difference in Milk Production Between Jersey and Kashmiri Cattle Using RNA Sequencing

Deep RNA sequencing experiment was employed to detect putative single nucleotide polymorphisms (SNP) in mammary epithelial cells between two diverse cattle breeds (Jersey and Kashmiri) to understand the variations in the coding regions that reflect differences in milk production traits. The low milk-producing Kashmiri cattle are being replaced by crossbreeding practices with Jersey cattle with the aim of improving milk production. However, crossbred animals are prone to infections and various other diseases resulting in unsustainable milk production. In this study, we tend to identify high-impact SNPs from Jersey and Kashmiri cows (utilizing RNA-Seq data) to delineate key pathways mediating milk production traits in both breeds. A total of 607 (442 SNPs and 169 INDELs) and 684 (464 SNPs and 220 INDELs) high-impact variants were found specific to Jersey and Kashmir cattle, respectively. Based on our results, we conclude that in Jersey cattle, genes with high-impact SNPs were enriched in nucleotide excision repair pathway, ABC transporter, and metabolic pathways like glycerolipid metabolism, pyrimidine metabolism, and amino acid synthesis (glycine, serine, and threonine). Whereas, in Kashmiri cattle, the most enriched pathways include endocytosis pathway, innate immunity pathway, antigen processing pathway, insulin resistance pathway, and signaling pathways like TGF beta and AMPK which could be a possible defense mechanism against mammary gland infections. A varied set of SNPs in both breeds, suggests a clear differentiation at the genomic level; further analysis of high-impact SNPs are required to delineate their effect on these pathways.


INTRODUCTION
Cow milk is an essential natural product which provides a medium for nutrients including growth and immune factors to offspring and a valued raw material for human food (Séverin and Wenshui, 2005;Reinhardt and Lippolis, 2006). It plays an important role in supporting a healthy immune system and provides protection against infections (Goldman, 2000;Séverin and Wenshui, 2005). Milk is produced in the gland by mammary epithelial cells (MEC), which are gradually exfoliated from the epithelium during lactation (Boutinaud et al., 2015). Bovine milk comprises a diverse population of somatic cells including epithelial cells, macrophages, neutrophils, and lymphocytes. The process of lactation in animals consists of several physiological and metabolic changes (Arora et al., 2019). The length of lactation and yield greatly varied among the breeds (Auldist et al., 2007;Mech et al., 2008). Augmenting milk production in cattle is an essential step toward improving the profitability of dairy farms, and the success of dairy forms plays a crucial role in ensuring economic sustainability.
Single nucleotide polymorphisms markers are being rapidly used in selective breeding programmers for improving phenotypic selections of the animals through genomic selection (GS), gene-assisted selection (GAS), and marker-assisted selection (MAS) methods (Georges et al., 1995;Dekkers, 2004;Hayes and Goddard, 2010). It is remarkable that the discovery of the SNP for economic traits has great potential in the genetic improvement of cattle (Pareek et al., 2017). One of the great interests is the ability to improve lactation performance in poor performing breeds. To enhance the lactation performance of dairy animals, the knowledge of gene expression along with SNP profiling and biological pathways and mechanisms that promotes the mammary gland development and lactation is important.
High-throughput RNA sequencing technologies have provided unprecedented prospects in functional and comparative genomic research, including gene expression, genome annotation and pathway analysis, non-coding RNA discovery, and SNP detection and profiling (Bentley, 2006;Morozova and Marra, 2008). It has been widely used to study SNP in cattle breeds like Polish Holstein-Friesian (Pareek et al., 2016), Brangus, Brahman, Nellore, Angus, and Holstein (Dias et al., 2017) and indigenous cattle breeds like Simmental bull, Xuanhan bull, and Shuxuan bull (Wang et al., 2018).
Kashmiri and Jersey cattle are two important milk animals of the Indian northern state, Kashmir which contributes significantly to the total milk production in the state. Kashmiri indigenous cattle are small, hardy, and well adapted to the hilly areas of this region and differ greatly from Jersey cattle in dairy production traits. Whereas Jersey is a well-established exotic dairy breed which has been utilized in crossbreeding programs to enhance the milk production capability of Kashmiri cattle (Bhat et al., 2019).
In the present study, we used the transcriptome data of Jersey and Kashmiri cattle with the objective to identify the single nucleotide polymorphism in coding regions that were associated with milk production traits. Additionally, the study is aimed to benefit the marker-assisted selection programs by cataloging the differential SNP profile for the improvement of milk production in Kashmiri indigenous cattle.

MATERIALS AND METHODS
Transcriptome data of the Jersey and Kashmiri cattle were downloaded from NCBI SRA database with accession number SRR6324372. The samples were obtained at three different lactation stages from three Jersey and three Kashmiri cattle's breed in similar conditions at Sher-e-Kashmir University of Agricultural Sciences and Technology dairy farm, Mountain Livestock Research Institute (MLRI), Kashmir, India. The sample collection from the healthy animals including sample preparation and RNA sequencing were explained in the study by Bhat et al. (2019). Read quality control was assessed using FASTQC program v0.11.1 (Andrews, 2010). After preprocessing, filtering of low-quality sequences and adaptor trimming were performed using Cutadapt v3.40 (Martin, 2011), high-quality sequencing reads that passed thresholds (PhredScore > 30) were assembled for SNP discovery analysis. A collection of over 40 million high-quality clean reads were obtained for each sample. The cleaned reads were mapped to reference genome assembly ARS-UCD1.2.99 using HISAT2 (Kim et al., 2019). The data preprocessing steps recommended in the GenomeAnalysisToolkit (GATK) best practices workflow was performed before variant identification (Poplin et al., 2018). PCR duplicates were marked with the MarkDuplicates from Picard tools (Picard toolkit, 2019). We also performed local realignment around InDels, checked intron-exon junctions, and recalibrated the base quality scores with GATK. Two different variant callers were used to perform SNP and INDELs discovery across eight Jersey and nine Kashmiri transcriptome samples separately: (i) GATK using the HaplotypeCaller tool in multisample calling mode (modality "GATK"); (ii) mpileup from SAMtools v1.4 (Li et al., 2009) in multisample calling mode using default parameters. Final set for analysis contains SNPs and InDelscommon in both datasets.
Bovine genetic variants from dbSNP 2.0 build 153 dated: Aug 8 2019 were incorporated in SNP calling to populate the RS_ID column of the known SNPs. Filtering [base quality score (Q-Score) > 30, mapping quality > 30, and minimum depth > 10] of generated variants and annotation were performed using VCFtools version 0.1.8 and SnpEff program v4.1. To further evaluate the biological significance of the genes with highimpact variations, the KEGG pathway enrichment analysis was performed using nKOBAS server version 3 (Xie et al., 2011;Kanehisa et al., 2021).

RESULTS
In our previous study after analyzing the inter-and intrastage gene expression profiling between Jersey and Kashmiri cattle, we observed a vast diversity in terms of gene expression while huge similarity between the breeds was also witnessed. Differentially expressed genes in both Kashmiri and Jersey were enriched for multicellular organismal process, receptor activity, catalytic activity, signal transducer activity, macromolecular complex, and developmental processes. Most of the identified pathways responsible for milk production in Jersey include JAK-STAT, p38 MAPK, and PI3 kinase whereas antioxidant genes like RPLPO and RPS28 are highly expressed in Kashmiri cattle (Bhat et al., 2019). The present study based on SNP profiling in coding regions showed interbreed potential difference between Jersey and Kashmiri cattle. The difference in SNPs can help to understand its potential role in controlling the milk production between the two breeds.

Quality Control, Mapping, and Posttreatment
The trimming process removed 0.21% of reads and only 4.2% of duplicated reads were rejected. At the mapping step, around 99.7% of the reads were mapped against the reference genome (Bostaurus assembly ARS-UCD1.2.99) (Bhat et al., 2021)

Analysis of Genes With SNPs and INDELs
Functional annotation suggests the enrichment (p-value < 0.05) of signaling pathways (like AMP activated protein kinase (AMPK) and tumor growth factor (TGF) beta), insulin resistance, and high antigen processing in Kashmiri cows. Jersey cattle enrichment analysis shows genes with high-impact SNPs were involved mainly in nucleic acid (specifically pyrimidine metabolism) and nucleotide (specifically glycine, serine, and threonine) metabolism pathways. Interestingly, the activity of primary immunodeficiency pathway, nucleotide excision repair pathway, glycerolipid metabolism, and phosphatidylinositol signaling pathway were significantly enriched in Jersey cattle ( Table 1). Gene ontology (GO) analysis strongly suggests (FDR corrected p-value < 0.05) genes with SNPs in Kashmiri cows were mainly involved in the binding process (enzyme and ribonucleotide binding) and antigen processing (Figure 2). In Jersey cattle, mutated genes were involved mainly in ion binding and ATPase activity (Figure 2).

DISCUSSION
Kashmiri cattle are poor performing and differ greatly from Jersey in dairy production characteristics. A comprehensive phenotypic analysis with respect to milk production traits was carried out in our previous study (Bhat et al., 2019) which showed higher milk yield and protein content for Jersey cattle as compared with the Kashmiri indigenous cattle. The average milk production of Jersey and Kashmiri cattle varies between 6 and 10 kg/day and 3 and 5 kg/day, respectively. The protein content in Jersey cattle ranged from 2.91 to 3.36%, and the corresponding values for Kashmiri cattle were 2.81-3.21%. The Jersey is among the top milk producer cattle breeds and is routinely used to upgrade the milk-producing capacity of the Kashmiri cattle by crossbreeding practices. It becomes imperative to understand the difference at multiple levels such as gene expression and SNPs in the coding and regulatory region of these breeds. For this purpose, we have analyzed RNAseq data and performed the comparative study between both the breeds. A SNP detection analysis was performed using sequencing reads from nine Jersey and eight Kashmiri cows to determine putative polymorphisms in genes involved in the milk production. A total 442 and 464 high-impact SNPs were identified from Jersey and Kashmir cattle, respectively. Out of all the highly confident SNPs identified, a significant portion of them (34.37 and 37.701%) were missense mutations distributed in 29 chromosomes. Furthermore, the functional analysis of the genes showed that they were involved in several crucial biological and physiological processes in lactation. We observed TGFbeta, AMPK, and endocytosis show higher activity in Kashmiri cattle compared with Jersey breed. It is reported that TGF-β1 expression increases in MECs during of mammary gland involution in mouse (Atwood et al., 1995), goat (Wareski et al., 2001), sow , and cow (Zarzyńska et al., 2007). TGF-β1 plays a key role in the involution of the bovine mammary gland by prompting apoptosis and autophagy in bovine mammary luminal epithelial cells (Kolek et al., 2003;Gajewska et al., 2005). Apoptosis is the foremost kind of cell death responsible for the involution of the bovine mammary gland, but type II programmed cell death (PCD II) is also observed: autophagic cell death. The MEC apoptosis is a marked indication of ending milking at the beginning of the dry period (Zarzyńska and Motyl, 2008). We presume the shorter lactation length in Kashmiri cattle could be one of the reasons for the increased expression of apoptotic signaling pathway toward the beginning of dry period. Moreover, it is known that some milk trait genes (e.g., genes in apoptosis pathway) are not solely expressed in MECs but also by other cell types like leukocytes (Boutinaud et al., 2015).
The levels of TGF-β1 and TGF-β2 decrease with advancing chronological age, and colostrum contains the highest levels for both (Frost et al., 2014). The pronounced role of AMPK is to regulate catabolic and anabolic processes. Negative energy balance (NEB) is generally witnessed in milking cows particularly during lactation. Increase in ADP or AMP is usually related with a NEB. It has been found that during maximum lactation, the energy intake in dairy cows is not able to fulfill the requirement for milk production. The AMPK signaling has been found greatly active throughout this period, which results in significant reduction in the milk of cows (Eastham et al., 1988). AMPK regulates lipid metabolism by altering the transcription of the Frontiers in Genetics | www.frontiersin.org 4 August 2021 | Volume 12 | Article 666015 lipogenic gene and the posttranslational modification of the major enzymes involved in lipid synthesis. In addition, AMPK activation was found to repress global protein synthesis via inhibition of mTOR signals in bovine mammary epithelial cells (Appuhamy et al., 2014). These changes along with the effect of lower plane of nutrition during formative phase of the animals and consequent poor body condition may contribute to the lower concentrations of total milk proteins in Kashmiri cattle. This  (Cai et al., 2020). An endocytosis pathway was found to be enriched in the milk of Kashmiri cattle. The mechanism for internalization (endocytosis) and release (exocytosis) are governed by the physiological phase of the mammary gland (McShane and Zerial, 2008). Furthermore, Truchet and Ollivier-Bousquet (2009) reported that the mammary epithelium enrichment organizes the absorption of milk precursors and transport of milk components to produce milk of reasonably constant composition, provided the mammary gland is healthy.
Several important metabolic pathways that were found to be enriched exclusively in Jersey cattle include glycolipid, pyrimidine, and amino acid metabolism (specifically glycine, serine, and threonine) and other important pathways like phosphatidylinositol signaling, Antifolate resistance, ABC transporters pathway, and nucleotide excision repair. Amino acid (AA) metabolism plays a critical role in milk production by regulating maternal endocrine status, blood flow through the lactating mammary gland, and activation of mechanistic (mammalian) target rapamycin (mTOR) signaling (Wu et al., 2010). The higher activity of glycine, serine, and threonine metabolism pathways in Jersey cattle may be particularly important in lactation initiation and higher milk production (Li and Jiang, 2019). TCA cycle and glycine, serine, and threonine metabolism pathways are the most important and work together in the mammary gland for lactation initiation (Sun et al., 2015(Sun et al., , 2017. It is a known fact that AAs significantly contribute to liver gluconeogenesis in early lactation. Glycine, serine, and alanine, comprised more than 65% of the liver uptake of glucogenic AA during the periparturient period Kristensen, 2009, 2012). In Jersey cattle, the phosphatidylinositol 3-kinase (PI3K)/protein kinase B (Akt) signaling pathway emerged as the most enriched pathway. PI3K-Akt pathway mediates glucose absorption into cells and is important for normal insulin-mediated glucose metabolism (Taniguchi et al., 2006). Since Jersey cattle produce milk in higher quantities than Kashmiri cattle, it requires a high-energy source. The enrichment of glycerolipid metabolic pathway in Jersey via glycerolipid pathway converts glycerol into glucose thus providing sufficient energy for maintaining high milk production (Sun et al., 2017).
In Kashmiri cattle, we found variations in the major histocompatibility complex (MHC) region. In cattle, polymorphism in the MHC class II genes influences both the magnitude and specificity of antigen-specific T cells to various diseases like mastitis (Hameed et al., 2006). In our previous study, we found that a wide range of proteins like apelin, acid glycoprotein, CD14 antigens, and lactoferrin, involved in immune response and host defense were highly expressed in mammary gland of Kashmiri cattle (Bhat et al., 2020). The role of the MHC in immune response makes a MHC an attractive candidate gene to study associations with disease resistance or susceptibility in cattle. High variability in many MHC genes plays a role in the recognition of bacteria (Apanius et al., 1997). Hedrick and Kim (1999) demonstrate the relation between MHC variation and resistance to bacterial infections. There are several well-documented cases in which specific MHC haplotypes or genotypes provide resistance to bacteria (Hill et al., 1991;Carrington et al., 1999).
We also found variations in genes regulating TGF-β (SMAD4, BMP7) and mTOR signaling in Kashmiri cattle ( Supplementary  Figures 1, 2) (Bhat et al., 2017). SMAD4 proteins serve as crucial components of TGF-β signaling, which negatively regulates cell growth and promotes apoptosis of epithelial cells. The rate of decline in milk yield with stage of lactation is strongly influenced by the rate of cell death by apoptosis in the lactating gland (Stefanon et al., 2002). Such changes depict the early backdrop for dryness in Kashmiri cattle. SMAD4 has been found to play important roles in various biological processes. In humans, genetic variants in the SMAD4 have been found to play a protective role in various types of cancers (Wu et al., 2010). A high-impact stop-gained variation on SMAD4 (c.1165C) and frameshift variation on BMP7 (c.693delC) could possibly dysregulate TGF-β pathway in Kashmiri cattle. Further study is required to study the effect of these variations on the TGF-β signaling pathway.
It is widely accepted that mTOR is a key regulator of milk protein synthesis, and most reports were concerned with the role of AAs in the regulation of P-mTOR on Ser2448 in milk protein synthesis (Appuhamy et al., 2012;Apelo et al., 2014). mTORC1 signaling mediates the cell surface level of Wnt receptor frizzled FZD and plays an essential role in embryogenesis and homeostasis Chen et al. (2020). In cattle, the milk protein synthesis was upregulated through activation of the mTOR pathway (Gao et al., 2015). We found double frameshift variations on FZD9 (c.484delG and c.1563delT) in Kashmiri cattle and could be the probable reason for its low milk production.

CONCLUSION
The SNP analysis based on RNA sequencing demonstrated a clear distinction between Jersey and Kashmiri cattle in milk production traits. In Kashmiri cattle, the high-impact SNP variants were involved in adaptive immunity and resistance against mammary gland infectious diseases. Whereas, in Jersey cattle, enriched pathways were mainly involved in maintenance of lactation and production. These findings provide insights in genetic variation of the breeds that can be used for genomic selection of the animals.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. This data can be found here: https://www.ncbi.nlm.nih.gov/sra/ SRR6324372.

AUTHOR CONTRIBUTIONS
SA designed the study and wrote the manuscript. BB and MY performed the data analysis. SB and SM helped in the writing of the manuscript. MR and MI helped in data interpretation and analysis. RS and NG did proofreading of the manuscript. All authors revised and approved the final version of the manuscript.

FUNDING
The Department of Biotechnology, Ministry of Science and Technology, Government of India is duly acknowledged for supporting the study under grant BT/PR114 08/AAQ/1/591/2014.