Screening and Identification of Muscle-Specific Candidate Genes via Mouse Microarray Data Analysis

Muscle tissue is involved with every stage of life activities and has roles in biological processes. For example, the blood circulation system needs the heart muscle to transport blood to all parts, and the movement cannot be separated from the participation of skeletal muscle. However, the process of muscle development and the regulatory mechanisms of muscle development are not clear at present. In this study, we used bioinformatics techniques to identify differentially expressed genes specifically expressed in multiple muscle tissues of mice as potential candidate genes for studying the regulatory mechanisms of muscle development. Mouse tissue microarray data from 18 tissue samples was selected from the GEO database for analysis. Muscle tissue as the treatment group, and the other 17 tissues as the control group. Genes expressed in the muscle tissue were different to those in the other 17 tissues and identified 272 differential genes with highly specific expression in muscle tissue, including 260 up-regulated genes and 12 down regulated genes. is the genes were associated with the myofibril, contractile fibers, and sarcomere, cytoskeletal protein binding, and actin binding. KEGG pathway analysis showed that the differentially expressed genes in muscle tissue were mainly concentrated in pathways for AMPK signaling, cGMP PKG signaling calcium signaling, glycolysis, and, arginine and proline metabolism. A PPI protein interaction network was constructed for the selected differential genes, and the MCODE module used for modular analysis. Five modules with Score > 3.0 are selected. Then the Cytoscape software was used to analyze the tissue specificity of differential genes, and the genes with high degree scores collected, and some common genes selected for quantitative PCR verification. The conclusion is that we have screened the differentially expressed gene set specific to mouse muscle to provide potential candidate genes for the study of the important mechanisms of muscle development.

Muscle tissue is involved with every stage of life activities and has roles in biological processes. For example, the blood circulation system needs the heart muscle to transport blood to all parts, and the movement cannot be separated from the participation of skeletal muscle. However, the process of muscle development and the regulatory mechanisms of muscle development are not clear at present. In this study, we used bioinformatics techniques to identify differentially expressed genes specifically expressed in multiple muscle tissues of mice as potential candidate genes for studying the regulatory mechanisms of muscle development. Mouse tissue microarray data from 18 tissue samples was selected from the GEO database for analysis. Muscle tissue as the treatment group, and the other 17 tissues as the control group. Genes expressed in the muscle tissue were different to those in the other 17 tissues and identified 272 differential genes with highly specific expression in muscle tissue, including 260 up-regulated genes and 12 down regulated genes. is the genes were associated with the myofibril, contractile fibers, and sarcomere, cytoskeletal protein binding, and actin binding. KEGG pathway analysis showed that the differentially expressed genes in muscle tissue were mainly concentrated in pathways for AMPK signaling, cGMP PKG signaling calcium signaling, glycolysis, and, arginine and proline metabolism. A PPI protein interaction network was constructed for the selected differential genes, and the MCODE module used for modular analysis. Five modules with Score > 3.0 are selected. Then the Cytoscape software was used to analyze the tissue specificity of differential genes, and the genes with high degree scores collected, and some common genes selected for quantitative PCR verification. The conclusion is that we have screened the differentially expressed gene set specific to mouse muscle to provide potential candidate genes for the study of the important mechanisms of muscle development.
Keywords: muscle development, microarray analysis, differential genes, bioinformatics, biotechnology INTRODUCTION Muscles represent a crucial group of soft tissues derived from the mesoderm that are primarily responsible for locomotion, and movement in all animals butthe World Health Organization estimates musculoskeletal disorders cause the highest proportion of disabilities worldwide, affecting approximately 1.7 billion people (1). Therefore, there is significant interest in characterizing the genetics that underpin muscular development, and any associated pathophysiology.
There are three major group of muscles i.e., skeletal, myocardium and smooth muscles. Skeletal muscles weigh about 40% of adult weight in humans, and represent the main subgroup of muscles that allow for locomotion in conjunction with the skeletal system. Apart from locomotion, skeletal muscles also have other important functions e.g., heat production, support and protection of other soft tissues, and participation in metabolic homeostasis (2,3). Diseases that affect primary skeletal muscles or the neuromuscular junction frequently manifest in the form of pathological muscle weakness or reduced skeletal muscle mass, which weakens the body's ability to respond to stress and chronic diseases (4). Moreover, amino acids released from muscles help maintain blood sugar levels during starvation. Therefore, diseases affecting skeletal muscles can result in wide ranging pathologies, and represent a key cause of morbidity and disability in human populations.
Skeletal muscles are multinucleated, and develop via the fusion of myogenic progenitor cells called myoblasts, into muscle fibers called myotubes, via a complex process known as myogenesis (5,6). Several genes are known to play a crucial role either during myogenesis, or subsequently, in ensuring normal muscle physiology (7). Some of the main genes involved in muscular development include transcription factors MYOD1 (myogenic differentiation 1), MYF5 (myogenic factor 5), MYOG (myogenin) and MRF (myogenic regulatory factor), MYF6 (herculin), PAX3 (paired box 3), PAX7 (paired box 7) and MEF2 (myocyte enhancer factor 2) family (8). MYOD1 and MYF5 are involved in the early phases of skeletal muscle development by promoting the proliferation and differentiation of myogenic progenitor cells into myoblasts, while MYOG plays an important role in the latter phases of myogenesis that involve fusion of myoblasts into myotubes. The precise function of MYF6 remains unknown, though it is thought to regulate myogenesis, and is exclusively expressed in skeletal muscles (9). Apart from these widely known genes, several other genes that influence either skeletal muscle development or physiology remain unidentified and/or uncharacterized.
High-throughput gene chip technologies that provide largescale gene expression data by measuring transcript abundance in various tissues or cells (10), can be leveraged in combination with online gene expression databases (e.g., NCBI's GEO database) to identify such genes. Moreover, given that human populations are genetically heterogenous, inbred animals models can be very useful in identifying and characterizing key genes associated with muscle development and disease.
Therefore, the overall aim of the present study, was to identify genes that are differentially expressed in skeletal muscles of [10][11][12] week old C57BL/6 mice, by comparing skeletal muscle expression profiles against 16 non-muscle tissues. Genes identified as differentially expressed in muscles, were subsequently subjected to bioinformatic analyses including process and pathway enrichment analysis, protein-protein interaction (PPI) network construction and molecular compounding. Finally, the genes with partial height difference multiples were selected for validation via qPCR.

Ethics Statement
All procedures were approved by the Experimental Animal Center of Xi'an Jiaotong University. Animal care and use protocols (EACXU 172) were approved by the Institutional Animal Care and Use Committee of Xi'an Jiaotong University and Northwest A&F University, Yangling. All animal experiments were performed in adherence with the NIH Guidelines on the Use of Laboratory Animals.

Microarray Data
The microarray data was downloaded from NCBI's GEO (Gene Expression Omnibus) database (GEO accession number GSE9954). The downloaded dataset contained microarray expression data from 70 samples that collectively represent 22 tissues (including muscles). The microarray dataset was derived from 10-12 week old male C57BL/6 mice using the Affymetrix Mouse Genome 430 2.0 Array platform (GPL1261). After euthanasia, multiple organs and tissues were taken for microarray analysis (11). In this study, microarray data from 18 out of the 22 available tissues were selected. The selected tissues included muscles, adipose tissue, adrenal gland, bone marrow, brain, eye, heart, kidney, liver, lung, pituitary gland, placenta, salivary gland, seminal vesicle, small intestine, spleen, testis, and thymus.

Process and Pathway Enrichment Analyses
Genes identified as differentially expressed in the initial screening, were subjected to Gene Ontology (GO) analysis via the Database for Annotation, Visualization and Integrated Discovery (DAVID) (https://david.ncifcrf.gov/home.jsp) using the Mus musculus genome annotation as background. Three aspects of the GO database were targeted in the GO enrichment analyses i.e., cellular component (CC), molecular function (MF), and biological process (BP). Similarly, KEGG pathway enrichment analysis was performed using DAVID and KOBAS (KEGG Orthology-Based Annotation System-http://kobas.cbi.pku.edu. AGAGCAGGACCTAGAGTCTTGTTG FIGURE 1 | Normalization of microarray expression data, and volcano plot of DEGs (A) Gene expression profile prior to normalization (B) and after normalization was carried out using Limma package in R; (C) Volcano plot of DEGs (P < 0.05 and log2|FC|≥2) identified by contrasting the expression profiles in muscles against the combined expression profiles of 17 other tissues. The y-axis represents the log2|FC|, and the x-axis displays the statistical significance of the differences. Black dots represent genes that were not found to be differentially expressed. Red dots represent genes that were significantly upregulated, and green dots represent genes that were significantly downregulated.

PPI Network and Module Analysis
Protein-protein interaction network and module analysis was performed using online tools and the String database (https:// string-db.org/). Genes identified as differentially expressed in muscles were used to construct a PPI network map (14), and the MCODE (Molecular Complex Detection) plug-in of Cytoscape software (version: 3.6.0; Java version: 1.8.0_201) was subsequently used to identify interconnected clusters within the PPI network using a node cutoff score of >3.0. The top 30 proteins with the highest number of degrees (i.e., edges) were represented in the form of a bar graph (15); and network modules identified via MCODE (score >3.0) were also represented diagrammatically (16).

Animals and Tissues Collection
The animals used in this study were obtained from the Experimental Animal Center of the Medical College of Xi'an Jiaotong University. As per approved animal use protocols 12week-old female C57B/L mice were euthanized with 5% chloral hydrate, and tissue samples (heart, liver, spleen, lung, kidney, muscles and adipose) were subsequently collected surgically. All surgical instruments used in the experiment were put into 0.1% DEPC solution overnight, and then autoclaved and dried for use. Collected tissue samples were rinsed in pre-chilled Phosphate Buffer Saline (PBS), put into RNase-free centrifuge tubes, and immediately snap frozen in liquid nitrogen. Total RNA was extracted from these tissue samples after transportation to the laboratory.

Total RNA Extraction and cDNA
Frozen tissue samples were homogenized prior to RNA extraction using enzyme-free centrifuge tubes containing Trizol (TakaraBio, Dalian, China), as per manufacturer's instructions. The concentration of the extracted total RNA was determined via nanodrop quantification. Finally, extracted RNA samples were reverse transcribed into cDNA using a Prime Script RT Reagent Kit (TakaraBio, Dalian, China) for subsequent quantitative PCR.

Primer Information
Intron spanning primers were designed using Primer Premier ver. 5.0 (PREMIER Biosoft, http://www.premierbiosoft.com/). The primer sequences, as well as annealing temperatures are described in Table 1.

qRT-PCR
The CFX-96 (BIO-RAD, US) was used to carry out real-time fluorescence quantitative polymerase chain reaction (qRT-PCR) using a commercially available kit (TB Green Premix Ex Taq II, Tli RNaseH Plus, TakaraBio, Dalian, China). Three reference genes (18s rRNA, GAPDH and β-actin) were tested as internal controls via homogeneity checks. Subsequently, the geometric mean values of 18s rRNA and β-actin was decided to be used as internal reference for qRT-PCR. The final raw data was analyzed via the delta-delta Ct (2 − CT ) calculation method (17), and graphical analysis of data was performed via GraphPad Prism 6.0 (https://www.graphpad.com/scientific-software/prism/).

Differentially Expressed Genes Analysis
Microarray data was normalized prior to differentially expressed genes (DEGs) analysis. The gene expression profiles prior to normalization, and after normalization are presented in Figures 1A,B respectively. A volcano plot showing differentially expressed genes (DEGs) that were upregulated or downregulated when contrasting muscle gene expression profiles against a combination of the remaining 17 tissues, is presented in Figure 1C.
Genes that were significantly up or downregulated (P < 0.05) in muscles, with a log2(fold change) ≥2, were identified as differentially expressed genes (DEGs). Gene expression profile of muscles was first contrasted against each of the 17 control tissues used in this study. The number of differentially expressed genes (DEGs) identified in each of these individual contrasts are noted in Table 2. Comparing, gene expression in muscle to the expression profiles of all 17 control tissues combined, allowed for the identification of 260 DEGs that were upregulated, and 12 differentially expressed genes (DEGs) that were downregulated in muscle. Some of the DEGs that were found to be upregulated include myocyte differentiation markers myosin light chain 1 (Myl1), myosin heavy chain 4 (Myh4), myosin heavy chain 2 (Myh2), and inositol protein (Myot). Myosin heavy chain 1 (Myh1) was found to have the highest log2 (fold change) of 6.675, which is indicative of a more than 100-fold greater expression in muscles relative to the combination of the 17 control tissues used in this study. Amongst the genes that were downregulated, Cytochrome C Oxidase Subunit 6A1 (Cox6a1) was found to have the lowest log2 (FC) of−2.472, which is equivalent to an approximately 5-fold reduction in gene expression. A complete list of the top 20 upregulated DEGs, and all of the 12 downregulated DEGs, is presented in Table 3.

Process and Pathway Enrichment Analysis
Enrichment analysis targeting GO terms identified a total of 752 GO annotations that were significantly enriched (P < 0.01) in differentially expressed genes DEGs identified within this study. Of the total 752 GO terms, 548 GO terms represented biological processes amongst which, the most significantly enriched processes included muscle system process, muscle structure development, myofibril assembly and muscle cell development. A further 103 GO terms representing cellular components were identified, of which the most significantly enriched GO terms including myofibrils, contractile fibers, sarcomeres and contractile fibers. Finally, a total of 101 GO terms representing molecular functions were identified, of which, the most significantly enriched GO terms included cytoskeletal protein binding, actin binding, and structural molecular functions. A summary of the top 10 GO terms identified in each of the three GO aspect categories is presented in Table 4. Top GO terms identified through enrichment analysis are also presented diagrammatically in Figure 2. Complete enrichment analysis results are presented in Supplementary Table 2.
Functional enrichment analysis performed using DAVID identified 29 KEGG pathways significantly associated with muscle specific differentially expressed genes DEGs (P < 0.01). The top 20 of these KEGG pathways are presented in the form of a bubble chart in Figure 3, which demonstrates that the identified differentially expressed genes DEGs are highly relevant in cardiac function and pathophysiology. Other KEGG pathways identified via DAVID analysis included AMPK, cGMP-PKG and calcium signaling pathways; Glycolysis / Gluconeogenesis, Carbon metabolism, Arginine and Proline metabolism etc. (Figure 3).

PPI Network Analysis and Module Screening
Network analysis focused on protein-protein interactions performed using the online STRING (https://string-db.org/) database, identified a total of 247 Nodes and 2,813 Edges (score >0.4). Results from the PPI network analysis are presented diagrammatically in Figure 4A, which shows upregulated DEGs in red, and downregulated DEGs in blue, with the color intensity corresponding to fold changes (darker colors reflecting higher fold changes). These results clearly indicate the presence of a large highly correlated network of upregulated muscle specific genes. Network analysis was further performed via Cytoscape software to compute the number of connections of each individual node (i.e., node degrees), and these results are presented in Supplementary Table 4

qRT-PCR Validation of Identified DEGs
Differentially expressed genes DEGs that were identified in this study were also annotated for tissue specific expression using the online DAVID (https://david.ncifcrf.gov/) database, and this identified 55 genes with tissue specific expression in skeletal muscles ( Table 5) qRT-PCR was performed to determine mRNA expression levels in seven different murine tissues including the heart, liver, spleen, lung, kidney, muscle and adipose tissue to validate muscle specific expression of selected genes. Results from qRT-PCR (Figure 6), confirmed high levels of expression of TNNT3, PYGM, ENO3, CMYA5 in muscles, reaffirming the validity of the findings in this study. We used 18S rRNA, β-actine and GAPDH as housekeeping genes for the mRNA expression analysis of DEGs in the target tissues. Although GAPDH is not considered a very suitable option for using as a reference gene (18), however, we used triple reference genes for the expression of mRNA levels in all target tissues.

DISCUSSION
The overall aim of the current study was to identify genes specifically expressed in skeletal muscles via bioinformatic analyses of publicly available microarray data, followed by qRT-PCR to validate muscle specific expression of selected genes in an independent set of samples. Bioinformatic analysis of publicly available microarray data resulted in the identification of 272 DEGs with at least a 4-fold expression level relative to expression in 17 other mice tissues. The majority of these genes were upregulated (n = 260), and a very small proportion of genes were downregulated (n = 12). These results suggest that upregulation of key genes is more crucial for muscle physiology and development, relative to downregulation of specific genes. A  20 KEGG pathways. The y-axis represents different KEGG pathways that were found to be enrichment in the identified DEGs. The x-axis represents the rich factor, which in turn represents the ratio of DEGs to the total number of genes in any given KEGG pathway. The size of each bubble represents the number of DEGs in a given KEGG pathway, and the color represents enrichment significance.

FIGURE 4 | PPI network and modules (A)
Overall PPI network constructed using the identified DEGs. Genes represented in red are upregulated, and those represented in blue are downregulated within the network. The color intensity represents fold change of gene expression, with higher color intensity representing a higher fold change in expression levels. (B) Bar graph representing the top 30 genes with the highest degrees (connections) within the PPI network; the y-axis represents each of the top 30 genes, and the x-axis represents the number of connections of each gene within the network (degrees). (C-G) Network modules identified via MCODE, including module 1 (score 36.905), module 2 (score 9.769), module 3 (score 6), module 4 (score 4) and module 5 (score 4). Again genes represented in red indicate upregulation, and genes represented in blue indicate downregulation.
previous study (19) which describes gene expression profiles of different tissues (kidney, liver, lung, heart, muscle, and adipose tissue), also reported that key genes (e.g., Myot, Tnnc2, Tnni2, Tnnt3, Actn3, Mybpc1, Mybpc2, Myoz1) are highly upregulated in both human and murine muscles. In our study, we have also found almost all of the above genes to also be highly upregulated ( Table 3) in muscles, and therefore our results align with these previous findings. A number of these DEGs have also previously been reported to be involved in muscle development. Some of these genes include Tnnt3 (20,21), Myh1, Myh2, Myh4 (22)(23)(24) and Actn3 (25), Pvalb (26) Ckmt2 (29) Cox8b (30). However, several other genes that were identified to be differentially expressed in muscles have not yet been reported to have a role in muscle development or physiology (27,28). Enrichment and pathway analyses performed identified several GO annotations terms (n = 752) to be significantly enriched in the 272 genes identified to be differentially expressed in skeletal muscles. The GO enrichment demonstrated significant involvement of ontologies relevant to muscle development, physiology and function; which in turn accords with findings from DEG analyses. Pathway analysis also identified 29 KEGG pathways, several of which were relevant to muscle development. However, one of the more interesting findings here was that the top four pathways identified were all associated with cardiac muscle physiology and pathology. This could suggest that some genes specifically expressed in skeletal muscles, could be involved in cardiac myopathies.  -57  PRKAG3, PDLIM3, ANKRD2, RTN2, ART3, JSRP1, PVALB, SH3BGR, JPH1, RBFOX1, MYH1, MYH2,  TNMD, LDB3, MYH4, ACTN2, MYH7, FBP2, ACTN3, CACNG1, TACC2, TNNT3, TNNT1, SGCG, CFL2,  ITGB1BP2, RYR1, SMPX, SGCA, SGCB, CAV3, FHL1, PHKA1, SRL, TPM2, KCNA7, ENO3, DDIT4L,  HRC, MYF6, MUSTN1, MYOZ2, TRIM63, TNNI1, SLC16A3, TUBA8 While the skeletal muscle samples used in this study were derived from 10-12 week old mice, these samples include both satellite cells and skeletal muscle-derived stem cells, which together form the pool of cells required for myogenesis (31). When satellite cells are activated (e.g., due to muscle injury), they get induced to undergo myogenic differentiation, which in turn requires highly specific temporal and spatial expression patters of different transcription factors and proteins (32) that is consistent with findings in this study. Similarly, skeletal muscle stem cell proliferation and muscle differentiation can also be triggered in adults under the influence of hormones like IGF1 (33), which in turn activates a number of downstream pathways including MAPK, PI3K-AKt-mTOR-P70S60K and PI3K-AKt-mTOR-GSKβ signaling pathways (34)(35)(36)(37). Therefore, the identification of several genes, ontologies and pathways associated with muscle development is not surprising.
To affirm the findings from differentially expressed genes DEGs, enrichment and pathway analysis, we constructed a PPI protein interaction network map, consisting of 2,813 edges (interactions) between 247 nodes (proteins). The PPI network map identified several structural proteins and enzymes as core nodules (e.g., TMOD4, MYL1, MYBPC2, ATP2A1). When degree scores were computed via Cytoscape network analysis, the top nodes were also mainly comprised of structural genes, and genes involved in muscle physiology and function (e.g., TNN, ACTN2, LDB3, CKMT2). Network module analysis via the MCODE plug-in also identified 7 interaction modules. The largest of these modules was comprised of a total of 43 nodes and 775 edges, and included several structural myosinrelated (Myh7, Myl2, Myl3, Myh2, Myh4, Myh1) and actinrelated (Acta1, Actc1, Actn2, Actn3) proteins. Overall, findings from enrichment, pathway and network analysis were in accord and reaffirmed the involvement of identified DEGs in muscle structure and physiology.
Module analysis of the PPI network identified several genes that have been previously reported to be involved in muscle development (e.g., Module 1, Figure 4C). However, several interesting candidates, whose roles in muscle development are yet to be characterized, were also identified. Examples of such genes include Tripartite motifcontaining 54 (Trim54), Creatine kinase, mitochondrial 2 (Ckmt2), cardiac disease associated 5 (Cmya5) and Leiomodin 2 (Lmod2). Future research aimed at characterizing the function of these genes could offer novel insights into mechanistic aspects of muscle development and associated pathophysiology.
Finally, we used qRT-PCR to validate the expression patterns of selected genes that were identified as specifically expressed in skeletal muscles (via DAVID analysis), and were also identified within the top 30 genes of the PPI network (i.e., those having the highest scores). The obtained results are consistent with the results from microarray DEG analyses, which reaffirms the findings from bioinformatic analyses of the microarray data. FIGURE 6 | Quantitative real-time PCR validation of differential gene expression. Relative expression levels of four genes (TNNT3, PYGM, ENO3 and CMYA5) in seven different tissue samples. Each bar represents the mean ± SD of the three replicates, and **represents statistical significance at P < 0.01.

CONCLUSION
In conclusion, 272 genes with muscle-specific expression profiles were identified in this study, which included several genes widely known to be involved in muscle development and function. Downstream enrichment and pathway analysis identified several muscle specific ontologies and pathways reaffirming findings of differentially expressed genes DEG analysis. Validation of results in an independent set of samples via qRT-PCR also reaffirmed muscle specific expression of selected DEGs. Several of the 272 differentially expressed genes DEGs identified in this study are yet to be functionally characterized in context of muscle development and physiology. Once characterized, these candidate genes could offer new targets for development of mutant mouse models of human muscle associated diseases and disorders. Therefore, future research aimed at investigating the role of these candidate genes in the context of muscle development and physiology is warranted.

ETHICS STATEMENT
All procedures were approved by the Experimental Animal Center of Xi'an Jiaotong University. Animal care and use protocols (EACXU 172) were approved by the Institutional Animal Care and Use Committee of Xi'an Jiaotong University and Northwest A&F University, Yangling. All animal experiments were performed in adherence with the NIH Guidelines on the Use of Laboratory Animals. Written informed consent was obtained from the owners for the participation of their animals in this study.

AUTHOR CONTRIBUTIONS
SR conceived, conceptualization, and designed the experiments. SR and WG performed the experiments and data analysis. CL and WG contributed to data curation. GC and CM contributed to methodology. ZM, AS, and MA contributed to the investigation, methodology, and validation. SP and RK provided constructive suggestions for the discussion and validation and contributed in drafting, editing, and review of manuscript. NS revised manuscript critically for content and grammar. LZ contributed to project administration, supervision of the overall study, and provided the necessary resources. All authors have read and agreed to the published version of the manuscript.