ORIGINAL RESEARCH article

Front. Plant Sci., 12 November 2018

Sec. Plant Systems and Synthetic Biology

Volume 9 - 2018 | https://doi.org/10.3389/fpls.2018.01550

Unified Transcriptomic Signature of Arbuscular Mycorrhiza Colonization in Roots of Medicago truncatula by Integration of Machine Learning, Promoter Analysis, and Direct Merging Meta-Analysis

  • 1. Australian Centre for Antimicrobial Resistance Ecology, School of Animal and Veterinary Sciences, The University of Adelaide, Adelaide, SA, Australia

  • 2. Institute of Biotechnology, Shiraz University, Shiraz, Iran

  • 3. Department of Biology, University of Qom, Qom, Iran

  • 4. Department of Biotechnology, Shahrekord University, Shahrekord, Iran

  • 5. School of Agriculture Food and Wine, Department of Plant Science, The University of Adelaide, Adelaide, SA, Australia

  • 6. Max-Planck-Institute for Informatics, Saarbrucken, Germany

  • 7. Adelaide Medical School, The University of Adelaide, Adelaide, SA, Australia

  • 8. Division of Information Technology, Engineering and the Environment, School of Information Technology and Mathematical Sciences, University of South Australia, Adelaide, SA, Australia

  • 9. Faculty of Science and Engineering, School of Biological Sciences, Flinders University, Adelaide, SA, Australia

Article metrics

View details

29

Citations

6,1k

Views

2,8k

Downloads

Abstract

Plant root symbiosis with Arbuscular mycorrhizal (AM) fungi improves uptake of water and mineral nutrients, improving plant development under stressful conditions. Unraveling the unified transcriptomic signature of a successful colonization provides a better understanding of symbiosis. We developed a framework for finding the transcriptomic signature of Arbuscular mycorrhiza colonization and its regulating transcription factors in roots of Medicago truncatula. Expression profiles of roots in response to AM species were collected from four separate studies and were combined by direct merging meta-analysis. Batch effect, the major concern in expression meta-analysis, was reduced by three normalization steps: Robust Multi-array Average algorithm, Z-standardization, and quartiling normalization. Then, expression profile of 33685 genes in 18 root samples of Medicago as numerical features, as well as study ID and Arbuscular mycorrhiza type as categorical features, were mined by seven models: RELIEF, UNCERTAINTY, GINI INDEX, Chi Squared, RULE, INFO GAIN, and INFO GAIN RATIO. In total, 73 genes selected by machine learning models were up-regulated in response to AM (Z-value difference > 0.5). Feature weighting models also documented that this signature is independent from study (batch) effect. The AM inoculation signature obtained was able to differentiate efficiently between AM inoculated and non-inoculated samples. The AP2 domain class transcription factor, GRAS family transcription factors, and cyclin-dependent kinase were among the highly expressed meta-genes identified in the signature. We found high correspondence between the AM colonization signature obtained in this study and independent RNA-seq experiments on AM colonization, validating the repeatability of the colonization signature. Promoter analysis of upregulated genes in the transcriptomic signature led to the key regulators of AM colonization, including the essential transcription factors for endosymbiosis establishment and development such as NF-YA factors. The approach developed in this study offers three distinct novel features: (I) it improves direct merging meta-analysis by integrating supervised machine learning models and normalization steps to reduce study-specific batch effects; (II) seven attribute weighting models assessed the suitability of each gene for the transcriptomic signature which contributes to robustness of the signature (III) the approach is justifiable, easy to apply, and useful in practice. Our integrative framework of meta-analysis, promoter analysis, and machine learning provides a foundation to reveal the transcriptomic signature and regulatory circuits governing Arbuscular mycorrhizal symbiosis and is transferable to the other biological settings.

Introduction

Arbuscular mycorrhiza (AM) fungal symbiosis expands the surface area of plant root, allowing for better absorption of substances such as phosphorus, ammonium, and zinc from soil. This symbiosis supports plant development, particularly under nutrient deficiency and other stressful conditions. Specific genetic programs activated by AM inoculation lead to successful microsymbiont colonization and functional symbiosis. Most studies in AM symbiosis are limited to the investigation of a single gene or a cluster of similar genes. Genes such as DMI1, DMI2, NFP, NSP1 (Oláh et al., 2005), MtBcp1 (Hohnjec et al., 2005), ENOD11 (Genre et al., 2005), MIG1 (Heck et al., 2016), RAM1 (Rich et al., 2017), nfr1, nfr5, lys11 (Rasmussen et al., 2016), and NIN (Guillotin et al., 2016) are reported to play roles in the formation of mycorrhizal symbiosis.

The regulatory mechanisms underpinning AM symbiosis in plants are poorly understood. The GRAS transcription factor family contains the best known regulators of AM symbiosis. The function of ATA/RAM1, a member of this family, in reprogramming AM symbiosis has been established (Rich et al., 2017). It has been suggested that RAM1 controls the expression of many essential AM-related genes such as STR, STR2, RAM2, and PT4 (Rich et al., 2017). Another member of the GRAS transcription factor family, MIG1, interacts with DELLA1 and the root GA signaling pathway to regulate cortical cell expansion in developing AM symbiosis (Heck et al., 2016). The role of small RNAs, such as miR171 in establishment of AM symbiosis has also been investigated recently (Couzigou et al., 2017).

Successful AM colonization is vital to establish symbiosis and improve phosphorous and water uptake. The AM type, as well as many, environmental and genetic factors affect the intensity, timing, and the success of AM colonization. Cross-comparison of successful colonization between different AM types in a range of experiments by meta-analysis provides the opportunity to move toward understanding the genetic basis of endosymbiosis (Tromas et al., 2012), the conserved transcriptomic program that can reflect successful AM colonization and establishment. Those genes can unravel the functional groups that may play key roles in the establishment and functioning of the three AM symbioses. The transcriptomic signature of AM colonization can be further employed for: (1) increasing AM efficiency by application of chemical and environmental treatments, (2) monitoring successful/unsuccessful AM colonization, and (3) finding the upstream regulatory mechanisms and regulators such as transcription factors and microRNAs that control AM colonization and symbiosis.

However, no attempt has been made to identify the unified transcriptomic signature of AM symbiosis. The term of “Unified transcriptomic signature” or “biosignature” refers to robust transcript responses that can monitor the successful AM colonization. Overlaps observed in transcriptional profiles of Medicago truncatula roots inoculated with two different Glomus fungi (Hohnjec et al., 2005) support the possibility of achieving a unified transcriptomic signature of AM colonization to provide an insight into the genetic program activated during AM.

The emerging field of meta-analysis may solve the issue of merging different experiments to identify a unique biosignature of Medicago root response to AM inoculation. Cross-species meta-analysis of transcriptomic data has received increased attention in recent years due to the advances in pattern discovery and meta-analysis models (Tromas et al., 2012; Farhadian et al., 2018b). Meta-analysis enables the combination of expression datasets and is highly advantageous in increasing statistical power to detect biological phenomena from studies with a restricted sample size (Johnson et al., 2007).

The biosignature of AM inoculation obtained may be utilized to further computational systems biology analysis, such as promoter analysis, common regulator discovery, and common target discovery, in order to lead us to the key regulators and targets of the AM symbiosis pathway.

Different statistical methods have been developed for meta-analysis of expression data such as combining effect sizes, combining ranks, combining p-values, vote counting, and direct merging (DM) (Borenstein et al., 2009, 2010; Campain and Yang, 2010; Chang et al., 2013; Sharifi et al., 2018). Within meta-analysis approaches, DM analysis of expression data or genomic variant data of different studies is an attractive meta-analysis method to increase statistical power and lead to a robust transcriptomic or genomic signature (Tseng et al., 2012). DM, as a meta-analysis approach, has been used in web-tools such as INMEX (Xia et al., 2013, 2015), A-MADMAN (Bisognin et al., 2009), WGAS (Dai et al., 2007), and GEOSS (Bisognin et al., 2009) for integrative meta-analysis of expression data. DM-based meta-analysis provides the possibility of data collection from different experiments, even when a treatment or a control is missing in one or more experiments. This contributes to a higher statistical power of meta-analysis.

The major concern about the DM approach is heterogenicity across studies. The success of the DM approach depends on normalization across studies to reduce non-biological experimental variation as well as biological variations unrelated to treatment (also called batch effects or study effects) (Johnson et al., 2007; Tseng et al., 2012). Collection of arrays from similar platforms across all studies (mainly Affymetrix) and pre-processing of the CEL expression files by model-based robust multi-array (RMA) normalization (Irizarry et al., 2003) have been suggested to decrease heterogenicity across all studies (Lee et al., 2008; Sims et al., 2008; Tseng et al., 2012). However, it has been debated that RMA is not strong enough to remove batch effects (Guerra and Goldstein, 2009). To sufficiently reduce batch effects for accurate DM, additional normalization techniques such as empirical Bayes methods (Johnson et al., 2007), cross-platform normalization (Shabalin et al., 2008), weighted distance weighted discrimination (Qiao et al., 2010), enrichment-based meta-analysis, and Ratio adjustment and calibration scheme (Cheng et al., 2009) have been used.

Recent advances in application of supervised machine learning models in transcriptomic studies have opened a new venue to engage data mining models in decreasing batch effects and integration of different studies (Pashaiasl et al., 2016a,b). Supervised machine learning has brought new possibilities to predictive studies (Bakhtiarizadeh et al., 2014a; Ebrahimi et al., 2014; Zinati et al., 2014; Kargarfard et al., 2015; Pashaiasl et al., 2016a,b). The capability to simultaneously analyse both categorical and numerical features, power to analyse large data, and various predictive algorithms with diverse statistical backgrounds are distinguished features of supervised machine learning models (Shekoofa et al., 2014; Ebrahimi et al., 2015; Jamali et al., 2016). The possibility to include the categorical variables in predictive models can outstandingly decrease the heterogenicity across studies as the batch effects (Shekoofa et al., 2014). For example, in this study, the different experiments or types of AM can be added as variables and analyzed in the predictive model of the AM transcriptomic signature. This possibility is highly limited in traditional multivariate or regression models.

Due to the central role of colonization in establishing a microsymbiont, we developed a framework for finding the transcriptomic signature of successful AM colonization on roots of Medicago truncatula by integration of meta-analysis and machine learning (attribute weighting) models. Special attention was paid to reducing the batch effects by utilizing normalization methods and finding reliable gene candidates by machine learning models. The genes discovered in the transcriptomic signature were further used as the input of promoter analysis to identify the transcription factors which regulate the signature.

Methods

A flowchart of the integrative computational systems biological approach employed in this study is presented in Figure 1.

Figure 1

Figure 1

The flowchart of computational systems biological approach, developed in this study.

Data collection for meta-analysis

Studies on the AM transcriptome were identified in repositories of high-throughput expression data such as NCBI GEO (https://www.ncbi.nlm.nih.gov/geo/) and ArrayExpress (https://www.ebi.ac.uk/arrayexpress/). Supplementary Table 1 presents the list of the studies mined and their platforms. The Microarray studies belonged to Medicago truncatula A17.

The microarray experiment (Floss et al., 2017) was originally designed to compare gene expression in roots of Medicago truncatula A17 and Medicago truncatula mutant mtpt4-1 colonized with Gigaspora gigantea. We only used data from three independent biological replicate samples of wild type plants colonized with Gigaspora gigantea for meta-analysis. Samples were harvested 18-day post planting and 11 days post contact with the spores.

In the original experiment (Truong et al., 2015), the impact of P limitation and both P and N limitation on Medicago truncatula A17 root transcriptome in response to Rhizophagus irregularis (previously known as Glomus intraradices) were investigated. In the original experiment, the root transcriptome of both wild type plants and a hypermycorrhizal mutant (B9) grown on limiting or non-limiting phosphate were analyzed to determine which processes were in the hypermycorrhizal mutant. Plants were harvested 4 weeks after inoculation. From this experiment, only data of mycorrhizied wild type plants colonized with Rhizophagus irregularis and grown under P limitation were used for our meta-analysis study.

In the experiment of Hogekamp et al. (2011), gene expression profiles of roots of Medicago truncatula A17 in response to colonization by two different arbuscular mycorrhizal fungi (Rhizophagus irregularis and Glomus mosseae) as well as P treatment with phosphate were studied. From this experiment, data of two groups of samples were used for meta-analysis; data of inoculated plants and non-inoculated plants under P limitation. Non-inoculated plants were used as control.

CEL (expression intensity) files of these studies were downloaded from NCBI GEO databank and their corresponding library (CDF) and annotation (CSV) files from the Affymetrix FTP repository by Affymetrix Expression Console Software (version: 1.3.1.187, https://www.affymetrix.com/).

Reducing the batch effect in direct merging (DM) meta-analysis

Reducing heterogenicity across studies (batch effects) is an essential step for direct combination of expression data in DE meta-analysis. Here, we developed an integrative approach including multi-array (RMA) normalization within studies, Z-standardization of expression values, and between studies quartiling/scaling normalization for reducing batch effects before combining samples for supervised machine learning.

Rma normalization of samples in each study (within study normalization)

CEL files of Affymetrix arrays in each study were normalized by an RMA algorithm (Irizarry et al., 2003) using Affymetrix Expression Console Software (version: 1.3.1.187).

Z-value standardization

Z-standardization has been extensively used in meta-analysis (Lipsey and Wilson, 2001; Kinoshita and Obayashi, 2009). RMA normalized expression values of each samples were converted to Z-value by subtracting the mean and dividing by the standard deviation using Minitab 17 (www.minitab.com/).

Between sample normalization by scaling and quartiling

To unify the RMA-normalized and Z-standardized values, we used an additional normalization step. Here, we evaluated the efficiency of the scaling and quartiling approach (Bolstad et al., 2003) in reducing the batch effects, using CLC Genomics Workbench (QIAGEN, https://www.qiagenbioinformatics.com/products/clc-genomics-workbench/). Supplementary Figure 1 shows the pseudo code for scaling and quartiling approach. In the Scaling approach (Supplementary Figure 1A), the sets of the expression values for the samples were multiplied by a constant so that the sets of normalized values for the samples have the same ‘target' value. The target (normalization) value was defined as Median mean/Median median of all samples. The Mean and Median are the types of normalization value of the samples to ensure that they are equal for the normalized expression values.

In quartiling approach (Supplementary Figure 1B), the empirical distributions of the sets of expression values for the samples were used to calculate a common target distribution, which was used to calculate normalized sets of expression values for the samples. Here, the term of empirical distribution refers to real (empirical) statistical characteristics of samples to be used for calculation of normalization values.

Application of seven supervised machine learning models to find the medicago response genes distinguishing AM colonized from non-colonized symbiosis

At first, a cleaning step was performed and the probsets with no gene annotation, or the ones which matched to multiple genes were removed. Then, the expressions of 33685 probsets/genes in AM colonization and non-colonization conditions, as numerical features, were mined by seven attribute weighting (feature selection) models. Also, study number and type of AM (Gigaspora gigantean, Rhizophagus irregularis, Glomus mosseae, or none) were added to the dataset as categorical features. Consequently, a dataset of 33687 (33685 gene probes + type of AM + Study ID) and 18 records (samples), belonging to two categories of AM-inoculated and non-inoculated (label variable), were used for machine learning. The selected feature selection models were able to analyse both categorical and numerical features simultaneously. This provided the opportunity to assess batch effects.

Feature selection models identify the most important genes whose AM expression differs between colonized and non-colonized symbioses. The resulting weights of each feature selection model were normalized into the interval between 0 and 1 to provide the similar significance across various feature selection models. Weights closer to 1 show a higher relevance (importance) of a particular gene in distinguishing AM inoculated from non-inoculated roots, according the employed feature selection model. The genes determined to be important by most of the feature selection models (intersection of weighting methods with various statistical backgrounds) with cut-off ≥ 0.95 were assumed to be the key distinguishing genes to form the biosignature. The employed feature selection models were: RELIEF, UNCERTAINTY, GINI INDEX, CHI SQUARED, RULE, INFO GAIN RATIO, and INFO GAIN.

RELIEF is a classification attribute weighting model, independent from Heuristic search and is considered to be one of the most successful models for evaluating the quality of features because of its simplicity and efficiency. RELIEF is a robust noise-tolerant model able to feature interactions where it employs the random selection of instances for weight estimation (Kira and Rendell, 1992; Rosario and Thangadurai, 2015). RELIEF estimates the relevance of attributes (genes + study number + AM type) according to how well their values discriminate between the instances of the same and different classes of label (AM colonization/non-colonization) that are near each other (Ebrahimi et al., 2014).

UNCERTAINTY measures the weight of attributes (genes + study number + AM type) against the label attribute (AM colonization/non-colonization) by estimating the symmetrical uncertainty with respect to the class (Liang, 2011).

GINI INDEX attribute weighting algorithm evaluates the weight of attributes (genes + study number + AM type) by computing the Gini index of the class distribution (AM colonization/non-colonization) and is a measure of data impurity (Lerman and Yitzhaki, 1984; Ebrahimi et al., 2011).

CHI SQUARED attribute weighting model evaluates the importance of attributes (genes + study number + AM type) with respect to the label attribute (AM colonization/non-colonization) based on chi squared statistic (Ebrahimi et al., 2014).

INFO GAIN model calculates the relevance of attributes (genes + study number + AM type) by measuring the Information Gain in class distribution (AM colonization/non-colonization) (Guyon and Elisseeff, 2003). INFO GAIN is suitable for datasets such as the expression of genes where attributes cannot take a large number of distinct values.

INFO GAIN RATIO uses information Gain Ratio for feature selection. This model is a modified version of INFO GAIN that biases against considering attributes with a large number of distinct values (Zinati et al., 2014).

RULE attribute weighting model estimates the weights of attributes (genes + study number + AM type) with respect to the label attribute (AM colonization/non-colonization) by constructing a single rule for each attribute and calculating the error (Liu and Motoda, 2012).

Multivariate analysis of the developed AM transcriptomic signature

After developing the AM transcriptomic signature by integration of meta-analysis and machine learning, clustering based on the Average Linkage method and Euclidean distance measure, as well as cross validation based on Discriminant (modeling) analysis, was used to evaluate the power of the emergent AM transcriptomic signature for discrimination of AM-inoculated from non-inoculated samples. For clustering, the expression values of genes which formed the transcriptome biosignatures were standardized. The multivariate analyses mentioned were performed using Minitab 17 (www.minitab.com/).

Based on the paper published by Hogekamp et al. (2011), we also investigated the fitness of some previously-reported markers of the mycorrhizal symbiosis, including MtLec5 (legume lectin family protein, MTR_5g031030), MtGIP1 (germin-like protein 9-2, MTR_4g052770), MtPt4 (high affinity inorganic phosphate transporter, MTR_1g028600), and MtBcp1 (blue copper-like protein, MTR_6g013420).

Gene ontology (GO) analysis

For a better understanding of the biological importance of the identified AM transcriptomic signature, we used a Gene Ontology (GO) approach that classifies genes and proteins based on a controlled functional vocabulary in terms of their Molecular Function, Biological Process, and Cellular Component (Ashburner et al., 2000; Fruzangohar et al., 2013, 2017). Unregulated genes (73 in total) in the AM inoculation transcriptomic signature with a Z-value difference of >0.5, were announced important by most feature selection models that were used as input of Ensembl Biomart and agriGO web applications (Kinsella et al., 2011; Tian et al., 2017). Agrigo employs the Fisher test and FDR correction for identifying the significance of GO terms of input genes compared to whole genome GO distribution (as a control/background).

Upstream regulatory (common TFs) analysis of AM colonization signature through promoter analysis of highly expressed meta-genes in response to AM inoculation

The developed transcriptomic signature of successful AM colonization of roots of Medicago truncatula, obtained by integration of meta-analysis and machine learning (attribute weighting) models, was used for upstream regulatory analysis through common regulator discovery, as previously described (Deihimi et al., 2012; Babgohari et al., 2014; Bakhtiarizadeh et al., 2014b; Shamloo-Dashtpagerdi et al., 2015). To this end, the top 20 highly upregulated meta-genes, which responded to AM inoculation, (Supplementary Table 5) were selected for promoter analysis and common transcription factor discovery. In other words, the genes revealed in the transcriptomic signature were further used as the input of promoter analysis to find the transcription factor matrix families which regulate the transcriptome signature. Matrix families are groups of weight matrices for the same or functionally similar transcription factors (Cartharius et al., 2005).

To mine the binding of transcription factor families to the promoter regions, we used the MatInspector webtool (Quandt et al., 1995; Cartharius, 2005; Cartharius et al., 2005; Hosseinpour et al., 2013) to calculate the following scores: Core similarity, Matrix similarity, Model similarity, Free energy, Match rate, and p-value for the common TFs. Core similarity describes the similarity between core sequence of transcription factor matrix family and the input sequence. Core sequence of a transcription factor matrix family is the consecutive highest conserved positions of the matrix. The maximum core similarity of 1.0 is only reached when the highest conserved bases of a matrix match exactly with the input sequence. Matrix similarity is more important than the core similarity that takes into account all bases over the whole matrix length. Matrix similarity of 1.0 reaches only if the candidate sequence corresponds to the most conserved nucleotide at each position of the matrix. The free energy (in kcal/mol) is a thermodynamic parameter for the stability of secondary structures (hairpins) of matrix family with input sequence. The higher the free energy, the more stable the hairpin is. The match rate is the number of matching base pairs in percent of the total element length. The p-value for the common TFs is the probability to obtain an equal or greater number of sequences with a match in a randomly drawn sample of the same size as the input sequence set using Fisher's exact test. The lower this probability, the higher the importance of the observed common transcription factors.

Based on p < 0.01 of the common TFs and Matrix similarity >0.95, the enriched transcription factors on promoter regions of AM colonization signature were recorded as “common regulators.” In other words, transcription factors with the highest number of possible interactions with the upregulated genes after AM inoculation were assumed as the key regulators, called common regulators of AM inoculation.

Independent validation of meta-genes based biosignature of AM inoculation by RNA-Seq

For independent validation of the AM colonization signature, derived by integration of meta-analysis and supervised attribute weighting models, independent samples of AM-inoculated and non-inoculated from RNA-seq experiment with GEO accession of GSE94266 (Garcia et al., 2017) were selected. The original experiment was designed to determine the effect of K+ on colonization of Medicago truncatula plants (Garcia et al., 2017). Plants were co-cultured with the AM fungus Rhizophagus irregularis under normal and low K+ regimes. We used 3 AM-inoculated samples (GEO accessions: GSM2471944, GSM2471945, and GSM2471946) and 3 AM non-inoculated samples (GSM2471950, GSM2471951, and GSM2471951) of this experiment under normal K+ regime to investigate the transcriptome response of Medicago truncatula to AM inoculation. Raw SRA files of the above-mentioned samples (100 bp, single end, Illumina sequencing technology) were retrieved by the DRAsearch tool (http://trace.ddbj.nig.ac.jp/DRASearch/) of the Research Organization of Information and System National Institute of Genetics (NIG), Japan. SRA files were transformed to fastq files using SRA Toolkit software (NCBI).

The Medicago truncatula reference genome (Mt4. 0v2 Assembly), including fasta (genome sequence) and GFF3 (genome annotation) files, were downloaded from the Medicago truncatula Genome Database (Young et al., 2011; Krishnakumar et al., 2014). Quality control of reads was analyzed using FastQC package (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Low quality reads and adaptor sequences were trimmed by the CLC Genomics Workbench 11.0.1 (QIAGEN). Mapping of short reads to the reference genome was performed using the CLC Genomics Workbench using the following criteria: mismatch cost = 2, insertion cost = 3, deletion cost = 3, length fraction = 80%, and similarity fraction = 80%). Total counts of mapped reads and RPKM (Reads Per Kilobase of transcript per Million mapped reads) were recorded as expression measurements for each gene. The differences related to the depth of sequencing were corrected by per-sample library size normalization using TMM (trimmed mean of M values) normalization method via calculating and adjusting the effective libraries sizes (Robinson and Oshlack, 2010).

To find the differentially expressed genes during AM-colonization vs. non-colonization, Generalized Linear Model (GLM) based on Negative Binomial distribution (Anders and Huber, 2010) was used to fit curves to expression values without assuming that the error on the values is normally distributed. GLM-based p-values for differentially expressed genes were calculated and corrected using FDR statistics and the CLC Genomics Workbench tool. Also, fold changes were calculated from the GLM to correct for differences in library size between the samples and the effects of confounding factors.

To visualize the differentially-expressed genes by heatmap, the following steps were performed: (1) log CPM (Counts per Million) values were calculated for each gene. The CPM calculation uses the effective library size, calculated by the TMM normalization. (2) log CPM values were standardized across samples for each gene by transforming to Z-values.

Results

Selected samples from different studies for de meta-analysis

As the transcriptomic signature of AM may differ in different tissues, we selected the transcriptome files of root samples of Medicago truncatula A17. To reduce the variation between experiments, the Medicago Genome Array of Affymetrix platform with 61278 probset IDs was selected. Some samples in four studies had these criteria (Table 1, Figure 1) (Hogekamp et al., 2011; Bonneau et al., 2013; Truong et al., 2015; Floss et al., 2017). These samples were roots colonized with Gigaspora gigantean, Rhizophagus irregularis, or Glomus mosseaeas well as non-inoculated ones.

Table 1

StudyReferenceSampleGEO accession of experimentStrain of arbuscular mycorrhizaTreatmentPlatformType of platformGEO accession of sample
1PMID: 283921101GSE95545Gigaspora giganteaAM inoculatedAffymetrixMedicago Genome ArrayGSM2516130
1PMID: 283921102GSE95545Gigaspora giganteaAM inoculatedAffymetrixMedicago Genome ArrayGSM2516131
1PMID: 283921103GSE95545Gigaspora giganteaAM inoculatedAffymetrixMedicago Genome ArrayGSM2516132
2PMID: 235066134GSE38847Rhizophagus irregularisAM inoculatedAffymetrixMedicago Genome ArrayGSM950680
2PMID: 235066135GSE38847Rhizophagus irregularisAM inoculatedAffymetrixMedicago Genome ArrayGSM950682
2PMID: 235066136GSE38847Rhizophagus irregularisAM inoculatedAffymetrixMedicago Genome ArrayGSM950688
3PMID: 248153247GSE44102Rhizophagus irregularisAM inoculatedAffymetrixMedicago Genome ArrayGSM1078957
3PMID: 248153248GSE44102Rhizophagus irregularisAM inoculatedAffymetrixMedicago Genome ArrayGSM1078949
3PMID: 248153249GSE44102Rhizophagus irregularisAM inoculatedAffymetrixMedicago Genome ArrayGSM1078951
4PMID: 2203462810GSE32208Rhizophagus irregularisAM inoculatedAffymetrixMedicago Genome ArrayGSM797965
4PMID: 2203462811GSE32208Rhizophagus irregularisAM inoculatedAffymetrixMedicago Genome ArrayGSM797966
4PMID: 2203462812GSE32208Rhizophagus irregularisAM inoculatedAffymetrixMedicago Genome ArrayGSM797967
4PMID: 2203462813GSE32208Glomus mosseaeAM inoculatedAffymetrixMedicago Genome ArrayGSM797968
4PMID: 2203462814GSE32208Glomus mosseaeAM inoculatedAffymetrixMedicago Genome ArrayGSM797969
4PMID: 2203462815GSE32208Glomus mosseaeAM inoculatedAffymetrixMedicago Genome ArrayGSM797970
4PMID: 2203462816GSE32208NoneNon-inoculated Control with low P(20 miM)AffymetrixMedicago Genome ArrayGSM797971
4PMID: 2203462817GSE32208NoneNon-inoculated Control with low P(20 miM)AffymetrixMedicago Genome ArrayGSM797972
4PMID: 2203462818GSE32208NoneNon-inoculated Control with low P(20 miM)AffymetrixMedicago Genome ArrayGSM797973

The studies and samples used in this study for obtaining the unified transcriptomic signature of Arbuscular mycorrhiza in roots of Medicago truncatula.

Reducing heterogenicity between samples of studies: a framework integrating within study rma-normalization, z-value transformation, and within samples scaling/quartiling normalization

Reducing heterogenicity across studies (batch effects) is an essential step for direct combination of expression data in DE meta-analysis. To reduce the batch effects for DM meta-analysis, we developed an integrative approach of within-study RMA-normalization, Z-value transformation, and within samples scaling/quartiling normalization. Figures 2A–D compares the heterogenicity between samples of different studies after different normalization steps. As can be inferred from Figure 2, the proposed framework of normalization and standardization reduced the batch effects and facilitated direct merging of samples from different experiments.

Figure 2

Figure 2

Reducing heterogenicity between samples of studies for direct merging meta-analysis by a three-step normalization process. (A) Box plot of raw expression. (B) Box plot of expression of RMA-normalized samples. (C) Box plot of expression of scaling after Z-standardization and RMA normalized samples. (D) Box plot of expression of quartiling after Z-standardization and RMA normalized samples.

Meta-genes based biosignature of AM inoculation derived by supervised attribute weighting models

To achieve the transcriptomic signature of AM inoculation, the expression of 33685 genes in inoculated and non-inoculated Medicago roots was mined by 7 feature selection models. Also, the effects of Study (batch effect) and AM type were considered by adding 2 additional polynomial attributes to the expression dataset. The weights of genes as well as Study ID and AM type are presented in Supplementary Table 2. The resulting weights of each feature selection model were normalized into the interval between 0 and 1. The genes announced important by most of the feature selection models with the cut-off > 0.95 were assumed as the key distinguishing genes to form the AM inoculation biosignature.

In total, 681 genes received weight equal or higher than 0.95 by most feature selection algorithms (5 out of 6 models), including UNCERTAINTY, GINI INDEX, CHI SQUARED, RULE, INFO GAIN RATIO, and INFO GAIN. RELIEF was not efficient in gene selection and only gave a high weight to 2 genes out of 33685. Within 681 genes, 180 genes had absolute Z-value difference of 0.5 (> 0.5 or < −0.5) between AM inoculated and non-inoculated (Supplementary Table 3). As presented in Table 2, 73 genes selected by feature selection models were up-regulated with a Z-value difference of > 0.5.

Table 2

GeneExpression statistics of selected meta-gene (up-regulated) in AM inoculated and non-inoculated conditionEmployed attribute weighting models to discriminate AM inoculation from non-inoculation conditionCut-off
Mean of expression in AM inoculated roots (Z-value standardized)Mean of expression in non-inoculated roots (Z-value standardized)Z-value difference between inoculated and non-inoculated rootsStandard deviation in inoculated conditionStandard deviation in non-inoculated conditionWeight_ReliefWeight_UncertaintyWeight_GiniIndexWeight_Chi SquaredWeight_RuleWeight_Info GainRatioWeight_Info GainNumber of models confirmed the importance of gene (Cutoff > 0.95)
MTR_8g0051750.03−0.530.560.290.030.020.601.001.001.001.001.005
MTR_3g0650500.85−0.271.120.510.150.010.601.001.001.001.001.005
MTR_1g4710502.151.590.560.170.110.060.751.001.001.001.001.005
MTR_5g0921501.711.090.620.230.070.030.651.001.001.001.001.005
MTR_7g1129631.390.610.780.340.140.010.561.001.001.001.001.005
MTR_4g0878301.640.800.840.200.140.030.641.001.001.001.001.005
MTR_4g1138201.180.350.830.360.170.000.601.001.001.001.001.005
MTR_6g0796301.21−0.091.300.830.060.010.651.001.001.001.001.005
MTR_4g1024001.820.661.150.430.080.010.661.001.001.001.001.005
MTR_7g0926200.93−0.291.220.520.020.010.621.001.001.001.001.005
MTR_3g1159401.500.660.840.180.110.030.651.001.001.001.001.005
MTR_8g0061900.32−0.450.770.400.030.020.671.001.001.001.001.005
MTR_4g1290101.360.630.740.360.110.010.601.001.001.001.001.005
MTR_7g0769600.51−0.641.150.610.170.020.621.001.001.001.001.005
MTR_7g0769201.280.101.190.660.060.000.651.001.001.001.001.005
MTR_2g4811501.06−0.581.640.620.140.010.621.001.001.001.001.005
MTR_8g0759900.24−0.730.960.490.070.020.771.001.001.001.001.005
MTR_1g0983000.68−0.371.050.460.060.010.661.001.001.001.001.005
MTR_3g0579801.380.460.920.340.120.010.661.001.001.001.001.005
MTR_3g0346401.140.310.830.260.090.010.621.001.001.001.001.005
MTR_7g0801801.721.130.590.250.100.040.561.001.001.001.001.005
MTR_8g0749201.200.700.500.250.110.020.561.001.001.001.001.005
MTR_3g0640900.50−0.140.640.210.100.000.731.001.001.001.001.005
MTR_6g0150201.560.970.590.200.060.040.641.001.001.001.001.005
MTR_0088s01000.26−0.700.960.540.090.020.641.001.001.001.001.005
MTR_1g1152300.780.280.500.300.030.010.621.001.001.001.001.005
MTR_5g0186101.850.461.380.590.080.010.631.001.001.001.001.005
MTR_3g0796201.52−0.702.210.880.250.010.691.001.001.001.001.005
MTR_3g0454401.05−0.651.710.490.100.000.651.001.001.001.001.005
MTR_5g0194601.691.170.510.150.030.070.621.001.001.001.001.005
MTR_2g0887001.210.660.560.220.060.020.681.001.001.001.001.005
MTR_1g0697251.110.130.970.380.070.000.601.001.001.001.001.005
MTR_1g103090−0.03−0.780.760.650.060.020.791.001.001.001.001.005
MTR_8g0753300.78−0.221.000.460.080.010.621.001.001.001.001.005
MTR_3g078730−0.21−0.760.560.450.040.020.761.001.001.001.001.005
MTR_5g0942100.07−0.840.910.630.070.030.671.001.001.001.001.005
MTR_3g1124600.34−0.520.860.460.080.020.621.001.001.001.001.005
MTR_5g0454702.570.811.760.360.340.020.701.001.001.001.001.005
MTR_6g0437000.86−0.561.420.820.210.020.601.001.001.001.001.005
MTR_5g0311601.65−0.562.200.940.080.000.731.001.001.001.001.005
MTR_1g1151951.220.260.960.370.280.000.601.001.001.001.001.005
MTR_8g0877101.850.651.200.560.080.010.661.001.001.001.001.005
MTR_4g0756901.530.730.800.490.160.010.651.001.001.001.001.005
MTR_2g4619701.941.120.820.200.070.040.631.001.001.001.001.005
MTR_8g0360501.450.161.290.460.060.010.651.001.001.001.001.005
MTR_2g0105801.520.840.680.250.020.030.651.001.001.001.001.005
MTR_8g0720101.551.000.550.160.220.030.711.001.001.001.001.005
MTR_4g0698101.680.301.380.410.170.010.601.001.001.001.001.005
MTR_8g0442300.660.080.580.360.060.010.591.001.001.001.001.005
MTR_7g1055601.711.020.690.170.050.050.721.001.001.001.001.005
MTR_6g0278401.900.841.060.320.130.020.591.001.001.001.001.005
MTR_6g0069900.73−0.561.290.500.070.010.601.001.001.001.001.005
MTR_8g0682651.791.220.570.160.100.050.661.001.001.001.001.005
MTR_6g0291801.580.820.770.190.080.040.621.001.001.001.001.005
MTR_4g0811901.01−0.501.510.540.100.010.681.001.001.001.001.005
MTR_7g0826600.13−0.520.660.210.190.020.701.001.001.001.001.005
MTR_5g0754001.260.041.220.620.100.000.681.001.001.001.001.005
MTR_3g0836302.041.230.820.270.170.030.591.001.001.001.001.005
MTR_8g0222701.73−0.121.850.670.090.000.681.001.001.001.001.005
MTR_8g0186502.421.331.090.320.110.030.601.001.001.001.001.005
MTR_7g0982300.81−0.331.140.280.030.000.661.001.001.001.001.005
MTR_2g0891000.70−0.100.800.260.150.000.611.001.001.001.001.005
MTR_3g0579701.060.450.610.260.050.010.641.001.001.001.001.005
MTR_3g0580000.40−0.450.850.380.130.010.661.001.001.001.001.005
MTR_1g0505500.57−0.220.790.200.060.000.651.001.001.001.001.005
MTR_7g0771101.26−0.671.930.470.200.000.631.001.001.001.001.005
MTR_7g0770500.96−0.971.930.470.070.000.651.001.001.001.001.005
MTR_2g0689501.070.360.710.260.050.010.631.001.001.001.001.005
MTR_4g0764901.090.300.790.160.060.020.651.001.001.001.001.005
MTR_6g0056300.49−0.470.960.380.070.010.621.001.001.001.001.005
MTR_1g0941301.821.300.520.260.180.030.581.001.001.001.001.005
MTR_8g0680500.990.360.630.250.040.010.591.001.001.001.001.005
MTR_5g0267300.50−0.571.070.420.040.010.661.001.001.001.001.005
Study ID0.030.240.170.180.000.050.260
Type of AM0.000.901.001.001.000.321.004

Transcriptomic biosignature of Arbuscular mycorrhiza (AM) inoculation on Medicago roots derived by integration of supervised attribute weighting models and direct merging meta-analysis.

The role of the following 73 up-regulated meta-genes (derived by direct merging meta-analysis of different experiments) was re-confirmed by feature selection (attribute weighting) models. To this end, seven feature selection models including RELIEF, UNCERTAINTY, GINI INDEX, CHI SQUARED, RULE, INFO GAIN, and INFO GAIN RATIO evaluated the relevance of 33685 genes as well as Study ID and AM type in discriminating AM inoculated roots from non-inoculated ones. The resulting weights of each feature selection model were normalized into the interval between 0 and 1. The upregulated genes in response to AM inoculation with a Z-value difference of >0.5, announced important by most of feature selection models, intersection of weighting methods with various statistical backgrounds (cutoff > 0.95), were selected as the key distinguishing genes to form the AM inoculation biosignature.

The 73 highly upregulated genes responding to AM inoculation, as transcriptomic biosignature (Table 2), contain important classes of genes including AP2 domain class transcription factors (MTR_6g029180), GRAS family transcription factors (MTR_1g069725 and MTR_2g089100), cyclin-dependent kinase (MTR_1g098300), receptors [lectin receptor kinase (MTR_8g068050), LRR receptor-like kinase (MTR_8g044230), cysteine-rich RLK (MTR_3g064090), and LRR receptor-like kinase (MTR_8g044230)], trypsin inhibitor (MTR_5g045470), Nodule Cysteine-Rich secreted peptide (MTR_3g065050), early nodulin 93 (MTR_4g113820), and transporters (MTR_4g081190, MTR_4g081190, MTR_8g022270, MTR_8g087710, and MTR_1g050550) (Table 3).

Table 3

Class of proteinSubclassMember from upregulated transcriptomic signature responding to AM inoculation
Transcription factorGRAS family transcription factorMTR_1g069725, MTR_2g089100
AP2 domain class transcription factorMTR_6g029180
Zinc finger, C3HC4 type (RING finger) proteinMTR_5g026730
Phosphate synthase1-deoxy-D-xylulose-5-phosphate synthaseMTR_8g068265
Geranylgeranyl pyrophosphate synthaseMTR_5g019460
TransportersPhospholipase A1 transporterMTR_4g087830
ABC transporter B family proteinMTR_4g081190, MTR_8g022270
Major intrinsic protein (MIP) family transporterMTR_8g087710
MFS transporterMTR_1g050550
Peptide transporterMTR_3g112460, MTR_7g098230
Cyclin-dependent kinaseCyclin-dependent kinaseMTR_1g098300
ReceptorsCysteine-rich RLK (receptor-like kinase) proteinMTR_3g064090
Lectin receptor kinaseMTR_8g068050
LRR receptor-like kinaseMTR_8g044230
Nodule proteinsNodule Cysteine-Rich (NCR) secreted peptideMTR_3g065050
Early nodulin 93MTR_4g113820
Tyrosine kinaseTyrosine kinase family proteinMTR_4g129010
CytochromeCytochrome P450MTR_3g057970, MTR_3g057980, MTR_3g058000, MTR_5g092150, MTR_7g092620
OxidaseL-ascorbate oxidaseMTR_3g078730, MTR_3g078730
Multi-copper oxidase-like proteinMTR_4g075690
Serine carboxypeptidaseSerine carboxypeptidase-like proteinMTR_3g079620, MTR_7g080180
Biotin carboxyl carrier acetyl-CoA carboxylaseMTR_6g015020
InhibitorInhibitor of trypsin and hageman factor-like proteinMTR_5g045470
legume specific proteinsLegume lectin beta domain proteinMTR_5g031160
Cysteine-rich proteinCAP, cysteine-rich secretory protein, antigen 5MTR_2g010580
Tetrahydrodipicolinate synthase4-hydroxy-tetrahydrodipicolinate synthaseMTR_8g036050
HydrolaseGlycoside hydrolaseMTR_8g075330, MTR_8g075990
Epoxide hydrolaseMTR_7g112963
ChitinaseChitinaseMTR_6g079630
Alginate lyaseAlginate lyaseMTR_6g043700
Oxidoreductase2OG-Fe(II) oxygenase family oxidoreductaseMTR_2g068950
Glucan-protein synthaseAlpha-1,4-glucan-protein synthase proteinMTR_2g461970
ArginaseArginase family proteinMTR_0088s0100
Beta-carotene isomeraseBeta-carotene isomerase D27MTR_1g471050
Carbonic anhydraseCarbonic anhydrase family proteinMTR_6g006990
GlucosidaseGlucan endo-1,3-beta-glucosidaseMTR_4g076490
Glutathione S-transferaseGlutathione S-transferaseMTR_1g115195
Oxygen enhancer proteinOxygen-evolving enhancer proteinMTR_8g005175
PectinacetylesterasePectinacetylesterase family proteinMTR_8g072010
PolygalacturonasePolygalacturonaseMTR_6g005630
Prolyl oligopeptidaseProlyl oligopeptidase family proteinMTR_1g115230
LipoxygenaseSeed linoleate 9S-lipoxygenaseMTR_8g018650
TransmembraneSeven transmembrane MLO family proteinMTR_3g115940
Squalene synthaseSqualene/phytoene synthaseMTR_3g083630
ProteolysisSubtilisin-like serine proteaseMTR_4g102400
SyntaxinSyntaxin of plants 122 proteinMTR_2g088700

Description of highly upregulated genes in transcriptomic signature of Arbuscular mycorrhiza (AM) inoculation on Medicago roots.

The 73 up-regulated genes responding to AM inoculation were derived by meta-analysis and supervised machine learning analysis.

Within the previously-reported AM markers (Hogekamp et al., 2011), MtGIP1 received high weights (values) in 6 out of 7 of the employed attribute weighting models in order to distinguish AM-inoculated from non-inoculated samples. Figure 3 visualizes the high weights assigned to MtGIP1 by UNCERTAINTY, GINI INDEX, CHI SQUARED, RULE, INFO GAIN RATIO, and INFO GAIN models where weighting closer to 1 shows a higher relevance (importance) of gene according to the respective model. Also, Figure 3 presents the normalized expression value of MtGIP1 in AM-inoculated and non-inoculated samples. MtGIP1 can be assumed to be a reliable AM colonization marker as its predictive powers is confirmed by previous individual studies as well as the combined meta-analysis performed here.

Figure 3

Figure 3

MtGIP1 received high weights (values) in 6 out of 7 of the employed attribute weighting models in order to distinguish Arbuscular mycorrhiza (AM)-inoculated samples from non-inoculated ones, according to UNCERTAINTY, GINI INDEX, Chi Squared, RULE, INFO GAIN RATIO, and INFO GAIN models, where weighting closer to 1 shows a higher relevance (importance) of gene according to the respected model. The normalized expression value of MtGIP1 in AM-inoculated and non-inoculated samples are also presented.

Supervised machine learning models showed that the batch effect (heterogenicity between experiments) is remarkably reduced

The results of attribute weighting (feature selection) models presented in Table 2 show that the effect of Study ID (batch effect) is not significant in deriving the signature of AM inoculation. Interestingly, while 180 genes were selected by most of attribute weighting models to discriminate AM-inoculated from non-inoculated roots (Supplementary Table 3, Table 2), none of the models selected Study ID.

In line with this finding, clustering analysis (Figure 4) showed that the developed AM inoculation signature is able to discriminate between AM-inoculated and non-inoculated samples. As presented in Figure 4, while the AM-inoculated samples had more than 50% similarity to transcriptomic signature genes, this similarity decreased to 22% with AM non-inoculated genes.

Figure 4

Figure 4

Evaluation of the developed transcriptomic signature of Arbuscular mycorrhiza (AM) inoculation in discrimination of inoculated from non-inoculated samples. Clustering was performed based on Average Linkage method and Euclidean distance. Clustering based on the identified transcriptomic signature was able to efficiently distinguish between AM inoculated from non-inoculated samples.

The transcriptomic signature of am inoculation identifies the involvement of hydrolase activity, phosphorylation, cell wall organization, and transport, based on computational systems biology analysis

Functional annotation of the transcriptomic signature based on GO analysis showed that a majority of genes in the transcriptomic signature encode membrane proteins and are involved in cell wall organization, membrane transport, proteolysis, and oxidoreductase activities (Supplementary Table 4). GO distribution of the transcriptomic signature is presented in Figure 5.

Figure 5

Figure 5

Gene Ontology (GO) distribution of upregulated genes in transcriptomic signature of successful Arbuscular mycorrhiza (AM) colonization in roots of Medicago truncatula. (A) GO at Biological process level, (B) GO at Cellular component level, (C) GO at Molecular function level.

The isoprenoid biosynthetic/metabolic process and the lipid biosynthetic/metabolic process were statistically significant (enriched) biological processess that can be activated by upregulated genes of transcriptomic signature (p-value FDR < 0.05) (Figure 5A). Response to stimulus was another interesting aspect enriched in the GO. In the cellular component GO category, genes involved in response to AM colonization, including cell wall and external encapsulating structure, showed high enrichment (Figure 5B). In terms of Molecular Function, transferase and hydrolase activities were significantly enriched (Figure 5C) (Supplementary Table 4). In line with this finding, analysis of transcriptome response of Medicago truncatula to Glomus mosseae and Rhizophagus irregularis by Hohnjec et al. (2005), showed that 201 plant genes were significantly co-induced at least 2-fold. These genes were related to functions such as nitrate, ion, and sugar transporter, and enzymes involved in secondary metabolism, proteases, and Kunitz-type protease inhibitors.

Overrepresented transcription factor binding sites on promoter regions of upregulated AM colonization transcriptomic signature enabled discovery of potential master regulators of AM colonization

Transcription factors with enriched binding sites on promoter regions of the transcriptomic signature are candidates for “common master regulators” (Hosseinpour et al., 2013; Mahdi et al., 2014; Alanazi and Ebrahimie, 2016; Alanazi et al., 2018). Transcription factor matrix families with statistically enriched (p < 0.01) binding sites in the highly upregulated meta-gene transcriptomic signature (top 20 genes) of successful AM inoculation is presented in Table 4 and Supplementary Table 6. The common enriched TFs were: P$FLO2, P$SEF3, P$TERE, P$ASRC, P$CARM, P$TOEF, P$SEF4, P$LREM, P$MYBL, P$CAAT, P$GTBX, and P$WOXF (Table 4 and Supplementary Table 6).

Table 4

TF Matrix FamilyTF Family DescriptionExample of TFsBinding domain of TFp-valueNO binding sitesNO gens with TFTop 20 upregulated genes in AM inoculation signature
MTR_3g045440MTR_2g068950MTR_2g481150MTR_8g022270MTR_7g077110MTR_5g018610MTR_7g092620MTR_4g081190MTR_1g069725MTR_5g045470MTR_4g069810MTR_8g036050MTR_8g068050MTR_6g043700MTR_3g079620MTR_7g077050MTR_6g079630MTR_5g031160MTR_6g006990MTR_5g094210
P$FLO2Floral homeotic protein APETALA 2AP2AP2 domain2.05E-057120243731357344222212122
P$SEF3Soybean embryo factor 3SEF3not specified0.000271211442111021100210101201
P$TERETracheary-element-regulating cis-elements, conferring TE-specific expressionTERE0.000345421804452112314311222130
P$ASRCAS1/AS2 repressor complexAS1, AS2not specified0.0004321032051031137934158329341111
P$CARMCA-rich motifCARMnot characterized0.000613421732051026102141112172
P$TOEFTarget of early activation tagged factorsRAP2.7, TOE2AP2 domain0.001698562035152332425221226231
P$SEF4Soybean embryo factor 4SEF4not specified0.002042241314130103112010200310
P$MYBLMYB-like proteinsMYB, AS1, AS2, FIF10.00282529320132065361021281061210152207418266
P$CAATCCAAT binding factorsLEC1, NF-YA1,NF-YB1heterotrimeric transcription factor0.003168921931101733410662423513171
P$GTBXGT-box elementsASIL1, S1FA, GT2, GT10.006813454202260106171626272414261222822711165310
P$WOXFWUS homeobox-containing protein familyWOX13homeodomain0.009397731846192167504432453250

Transcription factors matrix families with frequent binding sites on promoter regions of the top 20 upregulated genes during successful Arbuscular mycorrhiza (AM) colonization as potential master regulators of AM colonization.

Promoter analysis of upregulated genes in the AM colonization signature identified the P$FLO2 matrix family as one of the master regulators due to the highest number of binding sites (71) within promoter regions of all the 20 highly upregulated genes (0.0000204538). The P$FLO2 family contains transcription factors with AP2 domains and ethylene-responsive element (ERE) binding. (Table 4, Figure 6A, and Supplementary Table 6).

Figure 6

Figure 6

P$CAAT (A) and P$FLO2 (B) transcription factor matrix families were master regulators of successful Arbuscular mycorrhiza (AM) colonization in roots of Medicago truncatula with enriched (high number of) binding sites on promoter regions of the top 20 upregulated genes during successful AM colonization. P$FLO2 transcription factor matrix family contains transcription factors with AP2 domain structure and ethylene-responsive element (ERE) binding. P$CAAT matrix family includes CCAAT binding transcription factors, such as NF-YA, NF-YB, and LEC1.

The P$CAAT matrix family that includes CCAAT binding transcription factors, such as NF-YA, NF-YB, and LEC1, was selected as another common TF (master regulator) with a low p-value (p = 0.00316799) in common TF analysis. P$CAAT matrix had binding sites on promoter regions in 19 out of the 20 upregulated genes in the AM inoculation signature with the total number of 92 binding sites (Table 4, Supplementary Table 6, and Figure 6B).

P$SEF3 (soybean embryo factor 3) and P$SEF4 (soybean embryo factor 4), that contain SEF3 and SEF4 transcription factors, had a significantly (p- value < 0.01) high number of interactions with the top 20 upregulated genes in the AM colonization signature and these were tentatively identified as potential key regulators of AM colonization (Table 4, Supplementary Table 6). The P$TERE matrix family that confers transcription factor-specific expression was also enriched in promoter regions of upregulated genes after AM colonization.

Another enriched transcription factor matrix family was P$TOEF that contains the AP2 domain in its structure and is involved in early activation/response (Table 4, Supplementary Table 6). RAP2.7 and TOE2 are well-known members of this matrix family. GO analysis showed that this matrix family is involved in organ morphogenesis.

A matrix family involved in response to fungal colonization, P$ASRC, had 103 binding sites on promoter regions of the top 20 upregulated genes in AM successful colonization with p-value of 0.000432195. AS1, AS2 are members of this transcription factor matrix family.

The transcription factor family of MYB-like proteins, belonging to the P$MYBL matrix family, was also enriched (total of 293 binding sites and p-value of 0.002825). This family includes important transcription factors, including MYB, AS1, AS2, and FIF1.

The AM colonization meta-signature showed high repeatability in an independent rna-seq experiment of AM colonization

In an independent RNA-seq experiment, we observed high correspondence between RNA-seq data of AM colonization and the identified AM colonization signature in this study, derived from integration of meta-analysis with supervised attribute weighting models (Figure 7, Supplementary Table 7). Fifty-one of 73 (70%) of the upregulated genes in the developed transcriptomic biosignature of AM colonization were also upregulated in the independent RNA-seq data of AM colonization with FDR-corrected p < 0.01 (Figure 7A). Noticeably, the identified AM colonization meta-signature was able to discriminate accurately between AM-inoculated samples and non-inoculated ones (Figure 7B). High correspondence between the expression of some of the important genes of the AM-colonization signature in the original microarray experiments (based on standardized Z-value of expression) and the expression of those genes in the RNA-seq experiment [based on RPKM (Reads Per Kilobase of transcript per Million mapped reads) are visualized in Figure 7C, including the AP2 domain class transcription factor (MTR_6g029180), members of GRAS family of transcription factor (MTR_1g069725, MTR_2g089100), Cyclin-dependent kinase (MTR_1g098300), MIP family transporter (MTR_8g087710), ABC transporter B family-like protein (MTR_8g022270), Legume lectin beta domain protein (MTR_5g031160), Sigma factor sigb regulation rsbq-like protein (MTR_3g045440), and Serine carboxypeptidase-like protein (MTR_3g079620).

Figure 7

Figure 7

High correspondence between RNA-seq data of Arbuscular mycorrhiza (AM) colonization and the identified AM colonization meta-signature in this study, derived from integration of meta-analysis with supervised attribute weighting models. (A) 51 out of 73 (70%) of the upregulated genes in the developed transcriptomic biosignature ofcolonization were also upregulated in the RNA-seq data of AM colonization with FDR-corrected p < 0.01. (B) The identified AM colonization meta-signature was able to accurately discriminate AM-inoculated samples from non-inoculated ones. (C) Visualization of the expression of some important genes of AM colonization signature in original experiments (based on standardized Z-value of expression) and RNA-seq experiment [based on RPKM (Reads Per Kilobase of transcript per Million mapped reads)].

In short, after quality control and trimming, 18772504, 19678186, 22009349, 18982223, and 19364133 remained in 3 AM-inoculated and non-inoculated samples. High efficiency of mapping to genes (more than 95%) was observed in all samples. Supplementary Table 8 presents RNA-seq based differential expression analysis of Medicago truncatula response to AM inoculation compared to non-AM inoculation.

Discussion

Finding a biosignature/predictors based on a single transcriptomic experiment is a major challenge due to a large prediction error caused by a large number of independent predictors (genes) and a restricted number of observations (replications) (Baseri et al., 2011). Also of concern is the repeatability of a selected subset of a gene derived from a single experiment/condition. Inter-species analysis of a range of experiments by meta-analysis and machine learning techniques is able to deal with theseshortcomings, leading to the generation of a robust and repeatable biosignature (Farhadian et al., 2018b). Meta-analysis has received increased attention in recent years because of its remarkable potential to increase the statistical power and generalizability of single study analysis (Farhadian et al., 2018a; Sharifi et al., 2018). Meta-analysis not only reinforces the findings of the individual studies, but is also may identify new undetected outcomes/patterns in single studies as meta-analysis considers the direction/trend of variables in each experiment to find the consistent, robust and repeatable patterns in all experiments (Sharifi et al., 2018). The inter-species DE-based meta-analysis employed in this study had more samples and stronger statistical power and was successful in achieving a statistically-reliable transcriptomic biosignature of successful AM inoculation, independent from the study. In addition, the biosignature was repeatable and discriminative when a new and independent RNA-seq experiment was used for its validation. Due to the availability of Medicago truncatula transcriptomic data (as a model plant), the meta-analysis was solely performed on this plant resulting in the identification of a robust and high performance transcriptomic signature of AM colonization. However, in non-model plants with the subsequent generation of new transcriptomic data, it will be necessary the identified Medicago truncatula-derived transcriptomic signature of AM colonization will need further examination.

In DE-based meta-analysis, it is crucial to adjust for batch effects before combining expression datasets. Heterogenicity (batch effects) is the major concern in meta-analysis of expression data (Leek and Storey, 2007; Ramasamy et al., 2008). In this study, we developed a new approach for reducing batch effects and direct merging meta-analysis by combination of meta-analysis, multi-step normalization, and supervised attribute weighting models. We observed that quartiling outperforms the scaling approach in reducing the batch effect. Heterogenicity-reducing based on the quartiling approach has been used extensively for knowledge discovery and pattern recognition in large data analysis, particularly in integrated classification and association-rule mining (CBA) algorithm (Kargarfard et al., 2015, 2016). As an example, CBA analysis of quartiled protein and DNA measurements was able to find a biosignature for increased host range and the emergence of an outbreak in influenza (Kargarfard et al., 2016). Supervised machine learning has brought new possibilities to predictive studies (Bakhtiarizadeh et al., 2014a; Ebrahimi et al., 2014; Zinati et al., 2014; Ebrahimie et al., 2018b). Supervised attribute weighting (feature selection) algorithms are techniques for reducing the variables and identifying a subset of highly relevant ones in order to improve the efficiency of classification algorithms (Rosario and Thangadurai, 2015). The capability to simultaneously analyse both categorical and numerical features, power to analyse large data, and the ability to produce various predictive algorithms with diverse statistical backgrounds are distinguished features of supervised machine learning models (Ebrahimie et al., 2011; Shekoofa et al., 2014). The possibility to include the categorical variables in predictive models can remarkably decrease the heterogenicity across studies as the batch effects and other non-biological experimental variation were incorporated in the models (Shekoofa et al., 2014). In this study, different experiments or types of AM were added as variables and analyse in the predictive model that resulted in remarkable control of batch effect. This possibility is very limited in traditional multivariate or regression models.

The identified meta-genes of successful AM colonization, derived by integration of meta-analysis with supervised attribute weighting models, was able to discriminate efficiently between AM-inoculated and non-inoculated samples. As a validation analysis, the developed signature showed high performance in distinguishing AM-colonized roots from non-inoculated ones in an independent RNA-seq experiment. Recently, integration of supervised machine learning algorithms with meta-analysis has been used to identify a mastitis bio-signature and early prediction of its occurrence (Ebrahimie et al., 2018a; Sharifi et al., 2018). The developed integrative approach in this study, comprising multi-step normalization, direct-merging meta-analysis, and supervised attribute weighting models, is platform-independent approach. By subsequent generation of more RNA-seq data, the developed pipeline may be employed for biosignature discovery in RNA-seq transcriptomic data, integration of microarray and transcriptomic data as is possible using some other NGS platforms, such as ChIP-Seq and SNP to perform meta-analysis on significant peaks in ChIP-Seq experiments and frequency of SNPs in genome-wide experiments.

The core 73 upregulated genes in the developed transcriptomic biosignature contain novel regulators of AM colonization including two transcription factors from the GRAS family (MTR_1g069725, MTR_2g089100), one transcription factor from AP2 domain class (MTR_6g029180), and one Zinc finger protein. It has been documented that the GRAS-type transcription factors, such as NSP1 (Nodulation Signaling Pathway1) and NSP2, play essential signaling functions in promoting both Rhizobium nodulation and mycorrhizal colonization (Kaló et al., 2005; Smit et al., 2005; Liu et al., 2011; Gobbato et al., 2012). Another transcription discovered factor, MTR_6g029180, has an AP2 domain in this structure. Interestingly, it has been reported that ERF transcription factors with a highly conserved AP2 DNA-binding domain are necessary for nodulation and symbiosis (Middleton et al., 2007). Cyclin-dependent kinase (MTR_1g098300) was another highly upregulated gene in the signature of successful AM colonization in this study. Mycorrhizal colonization is classified as postembryonic development of plant organs that need a constant interplay between the cell cycle and developmental programs (Kondorosi and Kondorosi, 2004). Cyclin-dependent kinase controls cell cycle and plays the key role in endoreduplication and activation of the anaphase-promoting complex during symbiotic cell development (Kondorosi and Kondorosi, 2004). The discovery of the essential transcription factors of successful mycorrhizal colonization and symbiosis in the developed biosignature highlights the robustness and applicability of meta-analysis in the AM colonization signature discovery and the importance of the developed transcriptomic signature. The biosignature obtained here provides a platform for increasing the efficiency of AM inoculation in future by finding accelerator AM colonization agents, such as small molecules/chemicals, and manipulating the expression of key genes in the biosignature.

The reasons that some previously-reported AM-associated genes were not identified in the AM meta-signature might be: (1) there are other genes with higher and more repeatable expression in response to AM induction and colonization which are, as a result, selected. These new candidates have higher preference over some of the previously-known biomarkers of AM symbiosis, (2) some AM markers might interact with the type of AM and consequently these will not appear in cross-species meta-analysis, and (3) some AM markers may interact with a specific condition or timing of AM symbiosis. As example, mycorrhiza-specific phosphate transporter seems to be more closely related to P homeostasis rather than colonization as the phosphate transporter mediates early root responses to phosphate status in non-mycorrhizal roots (Volpe et al., 2016).

Reinforcing the importance of the existence of AP2 transcription factors in the upregulated transcriptomic signature of AM colonization, promoter analysis demonstrated that the P$FLO2 transcription factor matrix family, with the AP2 domain structure and ethylene-responsive element-binding, had the highest number of promoter binding sites of all 20 highly upregulated genes in the AM inoculation signature. Floral homeotic protein APETALA 2, a member of P$FLO2 matrix family, has a documented role in the control of flower and seed development (Jofuku et al., 1994). Strong induction of APETALA 2 in developing nodules of Medicago truncatula has been observed and suggested as a potential regulator of the symbiotic program (El Yahyaoui et al., 2004). Another enriched transcription factor matrix family was the P$TOEF matrix family that contains the AP2 domain in its structure and is involved in early activation/response (Table 4, Supplementary Table 6). GO analysis showed that these are involved in organ morphogenesis.

P$CAAT was another potential master regulator of the identified AM colonization signature that contains CCAAT-binding family transcription factors. It has been documented that CCAAT-binding family transcription factors are essential for endosymbiosis establishment and development (Diédhiou and Diouf, 2018). Laser microdissection has documented the expression of CAAT-Box transcription factor in AM, correlated with fungal contact and spread (Hogekamp et al., 2011). Two members of this CCAAT-binding family, NF-YA1a and NF-YA1b, are positive regulators of AM colonization in soybean (Schaarschmidt et al., 2013). Before the present study, most of the known CCAAT-binding family transcription factors had been reported to be involved in nodulation (Marsh et al., 2007; Soyano et al., 2013). Functional genetic studies of symbiotic genes in Medicago truncatula indicate a role for a CCAAT-box transcription factor in rhizobial infection (Cousins, 2016). Analytical approaches based on literature mining have suggested association between a number of potential microRNAs (particularly microRNA169 and microRNA156) and microRNA-regulated transcription factors, which may be involved in the coordinated regulation of nitrogen and phosphorous starvation responses in soybean and NF-YA3 and NF-YA8 are targets of microRNA169 (Dehcheshmeh, 2013; Chiasson et al., 2014).

A MYB transcription factor belonging to P$MYBL matrix family was also enriched on promoter region of the identified signature of AM colonization. It has been demonstrated that a transcriptional program for arbuscule degeneration during AM symbiosis is regulated by MYB1 (Floss et al., 2017).

At the regulatory level, promoter analysis of co-expressed genes has demonstrated high potential in identifying key enriched transcription factors, finding undiscovered roles of genes (Deihimi et al., 2012), developing the functional genomics catalog of activated transcription factors during a phenomenon (Mahdi et al., 2013; Zinati et al., 2014), and discovery of transcriptional regulatory networks (Bakhtiarizadeh et al., 2013, 2014b). It has been also shown that number and diversity of differential cis-regulatory elements on promoter regions are strong predictors of gene function and level of expression under different conditions (Babgohari et al., 2014; Shamloo-Dashtpagerdi et al., 2015). This has resulted in developing new indicators of gene importance not based on the gene sequence but on the promoter region. In our previous study, we developed a novel pairwise comparison method for in silico discovery of statistically significant cis-regulatory elements in eukaryotic promoter regions (Shamloo-Dashtpagerdi et al., 2015).

Transcription factors have interactions with DNA to regulate gene expression in cells (Pomerantz et al., 2015). In future studies, genome-wide mapping of binding sites of the identified transcription factors [GRAS family transcription factor (MTR_1g069725, MTR_2g089100), AP2 domain transcription factor (MTR_6g029180), and CCAAT-binding transcription factors] by CHIP-seq techniques may unravel the cistrome of successful AM colonization in symbiosis establishment.

Conclusion

In this study, we developed a new approach for reducing heterogenicity between experiments (batch effect) and direct merging meta-analysis by combining meta-analysis, multi-step normalization, and supervised attribute weighting models. We employed this approach to obtain a unified transcriptomic signature of successful AM colonization in roots of Medicago truncatula. The genes of identified in the signature, derived by integration of meta-analysis with supervised attribute weighting models, were strongly up-regulated in all AM symbioses and probably correspond to the end targets of the symbiotic programme. The identified meta-genes of successful AM colonization discriminated efficiently between AM inoculated and non-inoculated samples. Furthermore, the developed signature showed high performance in distinguishing AM-colonized roots from non-inoculated ones in an independent RNA-seq experiment. Important protein classes such as the AP2 domain class transcription factor (MTR_6g029180), GRAS family transcription factors (MTR_1g069725 and MTR_2g089100), and cyclin-dependent kinase (MTR_1g098300) were highly upregulated during AM successful colonization. The developed direct merging-based meta-analysis, by combining meta-analysis, multi-step normalization, and supervised attribute weighting models, provides the possibility of data collection from different experiments even when a treatment or a control is missing in one or more of the experiments.

We suggest that the promoters of meta-genes identified in the transcriptomic signature of AM colonization may have the power to unravel key transcription factors as master regulators of AM symbiosis. Analysis of promoter regions of the top upregulated meta-genes in the AM-successful colonization signature in this study identified enriched transcription factor binding sites and led us to possible master regulators that form the transcriptome expression pattern. These included AP2 domain class transcription factors, CCAAT-binding family transcription factors, SEF transcription factors, and response to fungus ASRC transcription factors. Further functional characterization of these transcription factors is needed to understand their precise role in AM symbioses.

This study provides a framework for an improved understanding of the dynamics of successful AM colonization in establishing microsymbionts. It offers a new approach for related investigations into the other symbiosis systems.

Statements

Author contributions

MM-D, EE, and AN designed the research. ME, MT, ZN, RE, MM-D and EE collected and analyzed the data. ME developed the required software. MM-D and EE wrote the paper. All authors have read and approved the manuscript.

Acknowledgments

We greatly thank the Institute of Biotechnology and Iran's National Elites Foundation for funding this research. We also thank Professor Jeremy Timmis, School of Biological Sciences, The University of Adelaide, Dr. Gerard Tarulli, School of BioSciences, University of Melbourne, Dr. Marie Wood, Catholic education, South Australia, and Ms. Faeze Ebrahimi, Department of Biology of The University of Qom for English editing the manuscript. This research was supported by use of the Nectar Research Cloud, a collaborative Australian research platform supported by the National Collaborative Research Infrastructure Strategy (NCRIS).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2018.01550/full#supplementary-material

Supplementary Table 1

Available expression data on Arbuscular mycorrhiza at NCBI GEO (https://www.ncbi.nlm.nih.gov/geo/) database.

Supplementary Table 2

Mining transcriptome profile of Medicago roots to achieve transcriptomic signature of Arbuscular mycorrhiza (AM) inoculation. Seven feature selection models including RELIEF, UNCERTAINTY, GINI INDEX, CHI SQUARED, RULE, INFO GAIN, and INFO GAIN RATIO evaluated the relevance of 33685 genes as well as Study ID and AM type in discriminating AM inoculated roots from non-inoculated ones. The resulting weights of each feature selection model were normalized into the interval between 0 and 1. The genes announced important by most of the feature selection models (intersection of weighting methods with various statistical backgrounds) with cut-off > 0.95 were assumed as the key distinguishing genes to form the AM inoculation biosignature.

Supplementary Table 3

Genes discriminating Arbuscular mycorrhiza (AM) inoculated roots from non-inoculated ones according to UNCERTAINTY, GINI INDEX, CHI SQUARED, RULE, INFO GAIN, and INFO GAIN RATIO feature selection models (weight higher than 0.95)and also Z-value difference of > 0.5 or < −0.5. The resulting weights of each feature selection model were normalized into the interval between 0 and 1.

Supplementary Table 4

Gene ontology classification of transcriptomic signature of Arbuscular mycorrhiza (AM) inoculation. GO approach in terms of Molecular Function, Biological Process, and Cellular Component was used for functional annotation of 73 unregulated genes in AM inoculation that were announced important by most of feature selection models with Z-value difference of >0.5.

Supplementary Table 5

The top 20 highly upregulated meta-genes of successful Arbuscular mycorrhiza (AM) colonization transcriptomic signature on roots of Medicago truncatula, obtained by integration of meta-analysis and machine learning, were used for promoter analysis and common regulator discovery.

Supplementary Table 6

Transcription factors matrix families that have enriched binding sites on promoter regions of the top 20 upregulated genes during successful Arbuscular mycorrhiza (AM) colonization according to common TFs analysis of MatInspector tool. Transcription factor Matrix similarity was set to >0.95.

Supplementary Table 7

High correspondence between independent RNA-seq data of Arbuscular mycorrhiza (AM) colonization in Medicago truncatula and the identified AM colonization signature in this study, derived by integration of meta-analysis and supervised attribute weighting models.

Supplementary Table 8

RNA-seq based differential expression analysis of Medicago truncatula response of Arbuscular mycorrhiza (AM) inoculation compared to non-AM inoculation.

Supplementary Figure 1

Pseudo codes for scaling and quartiling normalization approaches, employed in this study. (A) Scaling approach. (B) Quartiling approach.

References

  • 1

    AlanaziI. O.AlyahyaS. A.EbrahimieE.Mohammadi-DehcheshmehM. (2018). Computational systems biology analysis of biomarkers in lung cancer; unravelling genomic regions which frequently encode biomarkers, enriched pathways, and new candidates. Gene659, 2936. 10.1016/j.gene.2018.03.038

  • 2

    AlanaziI. O.EbrahimieE. (2016). Computational systems biology approach predicts regulators and targets of microRNAs and their genomic hotspots in apoptosis process. Mol. Biotechnol.58, 460479. 10.1007/s12033-016-9938-x

  • 3

    AndersS.HuberW. (2010). Differential expression analysis for sequence count data. Genome Biol.11:R106. 10.1186/gb-2010-11-10-r106

  • 4

    AshburnerM.BallC. A.BlakeJ. A.BotsteinD.ButlerH.CherryJ. M.et al. (2000). Gene Ontology: tool for the unification of biology. Nat. Genet.25:25. 10.1038/75556

  • 5

    BabgohariM. Z.EbrahimieE.NiaziA. (2014). In silico analysis of high affinity potassium transporter (HKT) isoforms in different plants. Aquat. Biosyst.10:9. 10.1186/2046-9063-10-9

  • 6

    BakhtiarizadehM. R.Moradi-ShahrbabakM.EbrahimiM.EbrahimieE. (2014a). Neural network and SVM classifiers accurately predict lipid binding proteins, irrespective of sequence homology. J. Theor. Biol.356, 213222. 10.1016/j.jtbi.2014.04.040

  • 7

    BakhtiarizadehM. R.Moradi-ShahrbabakM.EbrahimieE. (2013). Underlying functional genomics of fat deposition in adipose tissue. Gene521, 122128. 10.1016/j.gene.2013.03.045

  • 8

    BakhtiarizadehM. R.Moradi-ShahrbabakM.EbrahimieE. (2014b). Transcriptional regulatory network analysis of the over-expressed genes in adipose tissue. Genes Genomics36, 105117. 10.1007/s13258-013-0145-x

  • 9

    BaseriS.TowhidiM.EbrahimieE. (2011). A modified efficient empirical bayes regression model for predicting phenomena with a large number of independent variables and fewer observations; examples of its application in human disease, protein bioinformatics, and microarray gene expression profiling. Adv. Stud. Biol.3, 1812014.

  • 10

    BisogninA.CoppeA.FerrariF.RissoD.RomualdiC.BicciatoS.et al. (2009). A-MADMAN: annotation-based microarray data meta-analysis tool. BMC Bioinformatics10:201. 10.1186/1471-2105-10-201

  • 11

    BolstadB. M.IrizarryR. A.ÅstrandM.SpeedT.P. (2003). A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics19, 185193. 10.1093/bioinformatics/19.2.185

  • 12

    BonneauL.HuguetS.WipfD.PaulyN.TruongH. N. (2013). Combined phosphate and nitrogen limitation generates a nutrient stress transcriptome favorable for arbuscular mycorrhizal symbiosis in Medicago truncatula. New Phytol.199, 188202. 10.1111/nph.12234

  • 13

    BorensteinM.HedgesL. V.HigginsJ.RothsteinH. R. (2010). A basic introduction to fixed-effect and random-effects models for meta-analysis. Res. Synth. Methods1, 97111. 10.1002/jrsm.12

  • 14

    BorensteinM.HedgesL. V.HigginsJ. P.RothsteinH. R. (2009). Introduction to Meta-Analysis. Chichester: John Wiley & Sons, Ltd.

  • 15

    CampainA.YangY. H. (2010). Comparison study of microarray meta-analysis methods. BMC Bioinformatics11:408. 10.1186/1471-2105-11-408

  • 16

    CarthariusK. (2005). MatInspector: analysing promoters for transcription factor binding sites in Analytical Tools for DNA, Genes and Genomes: Nuts & Bolts, ed A. Markoff, The nuts & bolts series, Radford, VA: DNA Press.

  • 17

    CarthariusK.FrechK.GroteK.KlockeB.HaltmeierM.KlingenhoffA.et al. (2005). MatInspector and beyond: promoter analysis based on transcription factor binding sites. Bioinformatics21, 29332942. 10.1093/bioinformatics/bti473

  • 18

    ChangL.-C.LinH.-M.SibilleE.TsengG. C. (2013). Meta-analysis methods for combining multiple expression profiles: comparisons, statistical characterization and an application guideline. BMC Bioinformatics14:368. 10.1186/1471-2105-14-368

  • 19

    ChengC.ShenK.SongC.LuoJ.TsengG. C. (2009). Ratio adjustment and calibration scheme for gene-wise normalization to enhance microarray inter-study prediction. Bioinformatics25, 16551661. 10.1093/bioinformatics/btp292

  • 20

    ChiassonD. M.LoughlinP. C.MazurkiewiczD.MohammadidehcheshmehM.FedorovaE. E.OkamotoM.et al. (2014). Soybean SAT1 (Symbiotic Ammonium Transporter 1) encodes a bHLH transcription factor involved in nodule growth and NH transport. Proc. Natl. Acad. Sci. U.S.A.111, 48144819. 10.1073/pnas.1312801111

  • 21

    CousinsD. R. (2016). Functional Genetic Studies of Symbiotic Genes in Medicago truncatula Indicate a Role for a CCAAT-Box Transcription Factor in Rhizobial Infection. University of East Anglia.

  • 22

    CouzigouJ.-M.LauresserguesD.AndréO.GutjahrC.GuillotinB.BécardG.et al. (2017). Positive gene regulation by a natural protective miRNA enables arbuscular mycorrhizal symbiosis. Cell Host Microbe21, 106112. 10.1016/j.chom.2016.12.001

  • 23

    DaiM.WangP.JakupovicE.WatsonS. J.MengF. (2007). Web-based GeneChip analysis system for large-scale collaborative projects. Bioinformatics23, 21852187. 10.1093/bioinformatics/btm297

  • 24

    DehcheshmehM. M. (2013). Regulatory Control of the Symbiotic Enhanced Soybean BHLH Transcription Factor, GmSAT1.University of Adelaide, School of Agriculture, Food and Wine.

  • 25

    DeihimiT.NiaziA.EbrahimiM.KajbafK.FanaeeS.BakhtiarizadehM. R.et al. (2012). Finding the undiscovered roles of genes: an approach using mutual ranking of coexpressed genes and promoter architecture-case study: dual roles of thaumatin like proteins in biotic and abiotic stresses. Springerplus1:30. 10.1186/2193-1801-1-30

  • 26

    DiédhiouI.DioufD. (2018). Transcription factors network in root endosymbiosis establishment and development. World J. Microbiol. Biotechnol.34:37. 10.1007/s11274-018-2418-7

  • 27

    EbrahimiM.AghagolzadehP.ShamabadiN.TahmasebiA.AlsharifiM.AdelsonD. L.et al. (2014). Understanding the underlying mechanism of HA-subtyping in the level of physic-chemical characteristics of protein. PLoS ONE9:e96984. 10.1371/journal.pone.0096984

  • 28

    EbrahimiM.EbrahimieE.BullC. M. (2015). Minimizing the cost of translocation failure with decision-tree models that predict species' behavioral response in translocation sites. Conserv. Biol.29, 12081216. 10.1111/cobi.12479. Epub 2015 Mar 3

  • 29

    EbrahimiM.LakizadehA.Agha-GolzadehP.EbrahimieE.EbrahimiM. (2011). Prediction of thermostability from amino acid attributes by combination of clustering with attribute weighting: a new vista in engineering enzymes. PLoS ONE6:e23146. 10.1371/journal.pone.0023146

  • 30

    EbrahimieE.EbrahimiF.EbrahimiM.TomlinsonS.PetrovskiK. R. (2018a). Hierarchical pattern recognition in milking parameters predicts mastitis prevalence. Comput. Electr. Agric.147, 611. 10.1016/j.compag.2018.02.003

  • 31

    EbrahimieE.EbrahimiF.EbrahimiM.TomlinsonS.PetrovskiK. R. (2018b). A large-scale study of indicators of sub-clinical mastitis in dairy cattle by attribute weighting analysis of milk composition features: highlighting the predictive power of lactose and electrical conductivity. J. Dairy Res.85, 193200. 10.1017/S0022029918000249

  • 32

    EbrahimieE.EbrahimiM.SarvestaniN. R.EbrahimiM. (2011). Protein attributes contribute to halo-stability, bioinformatics approach. Saline Syst.7:1. 10.1186/1746-1448-7-1

  • 33

    El YahyaouiF.KüsterH.Ben AmorB.HohnjecN.PühlerA.BeckerA.et al. (2004). Expression profiling in Medicago truncatula identifies more than 750 genes differentially expressed during nodulation, including many potential regulators of the symbiotic program. Plant Physiol.136, 31593176. 10.1104/pp.104.043612

  • 34

    FarhadianM.RafatS. A.HasanpurK.EbrahimiM.EbrahimieE. (2018a). Cross-species meta-analysis of transcriptomic data in combination with supervised machine learning models identifies the common gene signature of lactation process. Front. Genet.9:235. 10.3389/fgene.2018.00235

  • 35

    FarhadianM.RafatS. A.HasanpurK.EbrahimieE. (2018b). Transcriptome signature of the lactation process, identified by meta-analysis of microarray and RNA-Seq data. BioTechnologia99, 153163. 10.5114/bta.2018.75659

  • 36

    FlossD. S.GomezS. K.ParkH.-J.MacleanA. M.MüllerL. M.BhattaraiK. K.et al. (2017). A transcriptional program for arbuscule degeneration during AM symbiosis is regulated by MYB1. Curr. Biol.27, 12061212. 10.1016/j.cub.2017.03.003

  • 37

    FruzangoharM.EbrahimieE.AdelsonD. L. (2017). A novel hypothesis-unbiased method for Gene Ontology enrichment based on transcriptome data. PLoS ONE12:e0170486. 10.1371/journal.pone.0170486

  • 38

    FruzangoharM.EbrahimieE.OgunniyiA. D.MahdiL. K.PatonJ. C.AdelsonD. L. (2013). Comparative GO: a web application for comparative gene ontology and gene ontology-based gene selection in bacteria. PLoS ONE8:e58759. 10.1371/journal.pone.0058759

  • 39

    GarciaK.ChasmanD.RoyS.AneJ.-M. (2017). Physiological responses and gene co-expression network of mycorrhizal roots under K+ deprivation. Plant Physiol. 173, 18111823. 10.1104/pp.16.01959

  • 40

    GenreA.ChabaudM.TimmersT.BonfanteP.BarkerD. G. (2005). Arbuscular mycorrhizal fungi elicit a novel intracellular apparatus in Medicago truncatula root epidermal cells before infection. Plant Cell17, 34893499. 10.1105/tpc.105.035410

  • 41

    GobbatoE.MarshJ. F.VerniéT.WangE.MailletF.KimJ.et al. (2012). A GRAS-type transcription factor with a specific function in mycorrhizal signaling. Curr. Biol.22, 22362241. 10.1016/j.cub.2012.09.044

  • 42

    GuerraR.GoldsteinD. R. (2009). Meta-Analysis and Combining Information in Genetics and Genomics. Boca Raton, FL: CRC Press.

  • 43

    GuillotinB.CouzigouJ.-M.CombierJ.-P. (2016). NIN is involved in the regulation of arbuscular mycorrhizal symbiosis. Front. Plant Sci.7:1704. 10.3389/fpls.2016.01704

  • 44

    GuyonI.ElisseeffA. (2003). An introduction to variable and feature selection. J. Mach. Learn. Res.3, 11571182.

  • 45

    HeckC.KuhnH.HeidtS.WalterS.RiegerN.RequenaN. (2016). Symbiotic fungi control plant root cortex development through the novel GRAS transcription factor MIG1. Curr. Biol.26, 27702778. 10.1016/j.cub.2016.07.059

  • 46

    HogekampC.ArndtD.PereiraP. A.BeckerJ. D.HohnjecN.KüsterH. (2011). Laser microdissection unravels cell-type-specific transcription in arbuscular mycorrhizal roots, including CAAT-box transcription factor gene expression correlating with fungal contact and spread. Plant Physiol.157, 20232043. 10.1104/pp.111.186635

  • 47

    HohnjecN.ViewegM. F.PühlerA.BeckerA.KüsterH. (2005). Overlaps in the transcriptional profiles of Medicago truncatula roots inoculated with two different glomus fungi provide insights into the genetic program activated during arbuscular mycorrhiza. Plant Physiol.137, 12831301. 10.1104/pp.104.05657

  • 48

    HosseinpourB.BakhtiarizadehM. R.KhosraviP.EbrahimieE. (2013). Predicting distinct organization of transcription factor binding sites on the promoter regions: a new genome-based approach to expand human embryonic stem cell regulatory network. Gene531, 212219. 10.1016/j.gene.2013.09.011

  • 49

    IrizarryR. A.BolstadB. M.CollinF.CopeL. M.HobbsB.SpeedT. P. (2003). Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res.31, e15e15. 10.1093/nar/gng015

  • 50

    JamaliA. A.FerdousiR.RazzaghiS.LiJ.SafdariR.EbrahimieE. (2016). DrugMiner: comparative analysis of machine learning algorithms for prediction of potential druggable proteins. Drug Discov. Today21, 718724. 10.1016/j.drudis.2016.01.007

  • 51

    JofukuK. D.Den BoerB. G.Van MontaguM.OkamuroJ. K. (1994). Control of Arabidopsis flower and seed development by the homeotic gene APETALA2. Plant Cell6, 12111225. 10.1105/tpc.6.9.1211

  • 52

    JohnsonW. E.LiC.RabinovicA. (2007). Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics8, 118127. 10.1093/biostatistics/kxj037

  • 53

    KalóP.GleasonC.EdwardsA.MarshJ.MitraR. M.HirschS.et al. (2005). Nodulation signaling in legumes requires NSP2, a member of the GRAS family of transcriptional regulators. Science308, 17861789. 10.1126/science.1110951

  • 54

    KargarfardF.SamiA.EbrahimieE. (2015). Knowledge discovery and sequence-based prediction of pandemic influenza using an integrated classification and association rule mining (CBA) algorithm. J. Biomed. Inform.57, 181188. 10.1016/j.jbi.2015.07.018

  • 55

    KargarfardF.SamiA.Mohammadi-DehcheshmehM.EbrahimieE. (2016). Novel approach for identification of influenza virus host range and zoonotic transmissible sequences by determination of host-related associative positions in viral genome segments. BMC Genomics17:925. 10.1186/s12864-016-3250-9

  • 56

    KinoshitaK.ObayashiT. (2009). Multi-dimensional correlations for gene coexpression and application to the large-scale data of Arabidopsis. Bioinformatics25, 26772684. 10.1093/bioinformatics/btp442

  • 57

    KinsellaR. J.KähäriA.HaiderS.ZamoraJ.ProctorG.SpudichG.et al. (2011). Ensembl BioMarts: a hub for data retrieval across taxonomic space. Database 2011. 10.1093/database/bar030

  • 58

    KiraK.RendellL. A. (1992). The feature selection problem: traditional methods and a new algorithm in AAAI'92 Proceedings of the Tenth National Conference on Artificial (San Jose, CA), 129134.

  • 59

    KondorosiE.KondorosiA. (2004). Endoreduplication and activation of the anaphase-promoting complex during symbiotic cell development. FEBS Lett.567, 152157. 10.1016/j.febslet.2004.04.075

  • 60

    KrishnakumarV.KimM.RosenB. D.KaramychevaS.BidwellS. L.TangH.et al. (2014). MTGD: The Medicago truncatula genome database. Plant Cell Physiol.56, e1e1. 10.1093/pcp/pcu179

  • 61

    LeeY.ScheckA. C.CloughesyT. F.LaiA.DongJ.FarooqiH. K.et al. (2008). Gene expression analysis of glioblastomas identifies the major molecular basis for the prognostic benefit of younger age. BMC Med. Genomics1:52. 10.1186/1755-8794-1-52

  • 62

    LeekJ. T.StoreyJ. D. (2007). Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet.3:e161. 10.1371/journal.pgen.0030161

  • 63

    LermanR. I.YitzhakiS. (1984). A note on the calculation and interpretation of the Gini index. Econ. Lett.15, 363368. 10.1016/0165-1765(84)90126-5

  • 64

    LiangJ. (2011). Uncertainty and feature selection in rough set theory in International Conference on Rough Sets and Knowledge Technology (Berlin; Heidelberg: Springer), 815.

  • 65

    LipseyM. W.WilsonD. B. (2001). Practical Meta-Analysis.Thousand Oaks, CA: Sage Publications.

  • 66

    LiuH.MotodaH. (2012). Feature Selection for Knowledge Discovery and Data Mining, Vol 454. Berlin: Springer Science & Business Media.

  • 67

    LiuW.KohlenW.LilloA.Op Den CampR.IvanovS.HartogM.et al. (2011). Strigolactone biosynthesis in Medicago truncatula and rice requires the symbiotic GRAS-type transcription factors NSP1 and NSP2. Plant Cell23, 38533865. 10.1105/tpc.111.089771

  • 68

    MahdiL. K.DeihimiT.ZamansaniF.FruzangoharM.AdelsonD. L.PatonJ. C.et al. (2014). A functional genomics catalogue of activated transcription factors during pathogenesis of pneumococcal disease. BMC Genomics15:769. 10.1186/1471-2164-15-769

  • 69

    MahdiL. K.EbrahimieE.AdelsonD. L.PatonJ. C.OgunniyiA. D. (2013). A transcription factor contributes to pathogenesis and virulence in Streptococcus pneumoniae. PLoS ONE8:e70862. 10.1371/journal.pone.0070862

  • 70

    MarshJ. F.RakocevicA.MitraR. M.BrocardL.SunJ.EschstruthA.et al. (2007). Medicago truncatula NIN Is essential for rhizobial-independent nodule organogenesis induced by autoactive calcium/calmodulin-dependent protein kinase. Plant Physiol.144, 324335. 10.1104/pp.106.093021

  • 71

    MiddletonP. H.JakabJ.PenmetsaR. V.StarkerC. G.DollJ.KalóP.et al. (2007). An ERF transcription factor in Medicago truncatula that is essential for nod factor signal transduction. Plant Cell19, 12211234. 10.1105/tpc.106.048264

  • 72

    OláhB.BrièreC.BécardG.DénariéJ.GoughC. (2005). Nod factors and a diffusible factor from arbuscular mycorrhizal fungi stimulate lateral root formation in Medicago truncatula via the DMI1/DMI2 signalling pathway. Plant J.44, 195207. 10.1111/j.1365-313X.2005.02522.x

  • 73

    PashaiaslM.EbrahimiM.EbrahimieE. (2016a). Identification of the key regulating genes of diminished ovarian reserve (DOR) by network and gene ontology analysis. Mol. Biol. Rep.43, 923937. 10.1007/s11033-016-4025-8

  • 74

    PashaiaslM.KhodadadiK.KayvanjooA. H.Pashaei-AslR.EbrahimieE.EbrahimiM. (2016b). Unravelling evolution of Nanog, the key transcription factor involved in self-renewal of undifferentiated embryonic stem cells, by pattern recognition in nucleotide and tandem repeats characteristics. Gene578, 194204. 10.1016/j.gene.2015.12.023

  • 75

    PomerantzM. M.LiF.TakedaD. Y.LenciR.ChonkarA.ChabotM.et al. (2015). The androgen receptor cistrome is extensively reprogrammed in human prostate tumorigenesis. Nat. Genet.47:1346. 10.1038/ng.3419

  • 76

    QiaoX.ZhangH. H.LiuY.ToddM. J.MarronJ. (2010). Weighted distance weighted discrimination and its asymptotic properties. J. Am. Stat. Assoc.105, 401414. 10.1198/jasa.2010.tm08487

  • 77

    QuandtK.FrechK.KarasH.WingenderE.WernerT. (1995). Matlnd and Matlnspector: new fast and versatile tools for detection of consensus matches in nucleotide sequence data. Nucleic Acids Res.23, 48784884. 10.1093/nar/23.23.4878

  • 78

    RamasamyA.MondryA.HolmesC. C.AltmanD. G. (2008). Key issues in conducting a meta-analysis of gene expression microarray datasets. PLoS Med.5:e184. 10.1371/journal.pmed.0050184

  • 79

    RasmussenS.FüchtbauerW.NoveroM.VolpeV.MalkovN.GenreA.et al. (2016). Intraradical colonization by arbuscular mycorrhizal fungi triggers induction of a lipochitooligosaccharide receptor. Sci. Rep.6:29733. 10.1038/srep29733

  • 80

    RichM. K.CourtyP.-E.RouxC.ReinhardtD. (2017). Role of the GRAS transcription factor ATA/RAM1 in the transcriptional reprogramming of arbuscular mycorrhiza in Petunia hybrida. BMC Genomics18:589. 10.1186/s12864-017-3988-8

  • 81

    RobinsonM. D.OshlackA. (2010). A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol.11:R25. 10.1186/gb-2010-11-3-r25

  • 82

    RosarioS. F.ThangaduraiK. (2015). RELIEF: feature selection approach. Int. J. Innovative Res. Dev. 4, 218224.

  • 83

    SchaarschmidtS.GresshoffP.HauseB. (2013). Analyzing the soybean transcriptome during autoregulation of mycorrhization identifies the transcription factors GmNF-YA1a/b as positive regulators of arbuscular mycorrhization. Genome Biol.14:R62. 10.1186/gb-2013-14-6-r62

  • 84

    ShabalinA. A.TjelmelandH.FanC.PerouC. M.NobelA. B. (2008). Merging two gene-expression studies via cross-platform normalization. Bioinformatics24, 11541160. 10.1093/bioinformatics/btn083

  • 85

    Shamloo-DashtpagerdiR.RaziH.AliakbariM.LindlöfA.EbrahimiM.EbrahimieE. (2015). A novel pairwise comparison method for in silico discovery of statistically significant cis-regulatory elements in eukaryotic promoter regions: application to Arabidopsis. J. Theor. Biol.364, 364376. 10.1016/j.jtbi.2014.09.038

  • 86

    SharifiS.PakdelA.EbrahimiM.ReecyJ. M.Fazeli FarsaniS.EbrahimieE. (2018). Integration of machine learning and meta-analysis identifies the transcriptomic bio-signature of mastitis disease in cattle. PLoS ONE13:e0191227. 10.1371/journal.pone.0191227

  • 87

    ShekoofaA.EmamY.ShekoufaN.EbrahimiM.EbrahimieE. (2014). Determining the most important physiological and agronomic traits contributing to maize grain yield through machine learning algorithms: a new avenue in intelligent agriculture. PLoS ONE9:e97288. 10.1371/journal.pone.0097288

  • 88

    SimsA. H.SmethurstG. J.HeyY.OkoniewskiM. J.PepperS. D.HowellA.et al. (2008). The removal of multiplicative, systematic bias allows integration of breast cancer gene expression datasets–improving meta-analysis and prediction of prognosis. BMC Med. Genomics1:42. 10.1186/1755-8794-1-42

  • 89

    SmitP.RaedtsJ.PortyankoV.DebelléF.GoughC.BisselingT.et al. (2005). NSP1 of the GRAS Protein Family Is Essential for Rhizobial Nod Factor-Induced Transcription. Science308, 17891791. 10.1126/science.1111025

  • 90

    SoyanoT.KouchiH.HirotaA.HayashiM. (2013). NODULE INCEPTION directly targets NF-Y subunit genes to regulate essential processes of root nodule development in Lotus japonicus. PLoS Genet.9:e1003352. 10.1371/journal.pgen.1003352

  • 91

    TianT.LiuY.YanH.YouQ.YiX.DuZ.et al. (2017). agriGO v2.0: a GO analysis toolkit for the agricultural community, 2017 update. Nucleic Acids Res.45, W122W129. 10.1093/nar/gkx382

  • 92

    TromasA.ParizotB.DiagneN.ChampionA.HocherV.CissokoM.et al. (2012). Heart of endosymbioses: transcriptomics reveals a conserved genetic program among arbuscular mycorrhizal, actinorhizal and legume-rhizobial symbioses. PLoS ONE7:e44742. 10.1371/journal.pone.0044742

  • 93

    TruongH. N.ThalineauE.BonneauL.FournierC.PotinS.BalzergueS.et al. (2015). The Medicago truncatula hypermycorrhizal B9 mutant displays an altered response to phosphate and is more susceptible to Aphanomyces euteiches. Plant Cell Environ.38, 7388. 10.1111/pce.12370

  • 94

    TsengG. C.GhoshD.FeingoldE. (2012). Comprehensive literature review and statistical considerations for microarray meta-analysis. Nucleic Acids Res.40, 37853799. 10.1093/nar/gkr1265

  • 95

    VolpeV.GiovannettiM.SunX. G.FiorilliV.BonfanteP. (2016). The phosphate transporters LjPT4 and MtPT4 mediate early root responses to phosphate status in non mycorrhizal roots. Plant Cell Environ.39, 660671. 10.1111/pce.12659

  • 96

    XiaJ.FjellC. D.MayerM. L.PenaO. M.WishartD. S.HancockR. E. (2013). INMEX—a web-based tool for integrative meta-analysis of expression data. Nucleic Acids Res.41, W63W70. 10.1093/nar/gkt338

  • 97

    XiaJ.GillE. E.HancockR. E. (2015). NetworkAnalyst for statistical, visual and network-based meta-analysis of gene expression data. Nat. Protoc.10, 823844. 10.1038/nprot.2015.052

  • 98

    YoungN. D.DebelléF.OldroydG. E.GeurtsR.CannonS. B.UdvardiM. K.et al. (2011). The Medicago genome provides insight into the evolution of rhizobial symbioses. Nature 480:520. 10.1038/nature10625

  • 99

    ZinatiZ.ZamansaniF.KayvanjooA. H.EbrahimiM.EbrahimiM.EbrahimieE.et al. (2014). New layers in understanding and predicting α-linolenic acid content in plants using amino acid characteristics of omega-3 fatty acid desaturase. Comput. Biol. Med.54, 1423. 10.1016/j.compbiomed.2014.08.019

Summary

Keywords

machine learning, meta-analysis, regulatory mechanism, symbiosis, systems biology

Citation

Mohammadi-Dehcheshmeh M, Niazi A, Ebrahimi M, Tahsili M, Nurollah Z, Ebrahimi Khaksefid R, Ebrahimi M and Ebrahimie E (2018) Unified Transcriptomic Signature of Arbuscular Mycorrhiza Colonization in Roots of Medicago truncatula by Integration of Machine Learning, Promoter Analysis, and Direct Merging Meta-Analysis. Front. Plant Sci. 9:1550. doi: 10.3389/fpls.2018.01550

Received

16 April 2018

Accepted

03 October 2018

Published

12 November 2018

Volume

9 - 2018

Reviewed by

Jedrzej Jakub Szymanski, Leibniz-Institut für Pflanzengenetik und Kulturpflanzenforschung (IPK), Germany; Alessandra Salvioli, Università degli Studi di Torino, Italy

Updates

Copyright

*Correspondence: Manijeh Mohammadi-Dehcheshmeh

This article was submitted to Plant Systems and Synthetic Biology, a section of the journal Frontiers in Plant Science

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Outline

Figures

Cite article

Copy to clipboard


Export citation file


Share article

Article metrics