Peripheral Blood-Based Gene Expression Studies in Schizophrenia: A Systematic Review

Schizophrenia is a disorder that is characterized by delusions, hallucinations, disorganized speech or behavior, and socio-occupational impairment. The duration of observation and variability in symptoms can make the accurate diagnosis difficult. Identification of biomarkers for schizophrenia (SCZ) can help in early diagnosis, ascertaining the diagnosis, and development of effective treatment strategies. Here we review peripheral blood-based gene expression studies for identification of gene expression biomarkers for SCZ. A literature search was carried out in PubMed and Web of Science databases for blood-based gene expression studies in SCZ. A list of differentially expressed genes (DEGs) was compiled and analyzed for overlap with genetic markers, differences based on drug status of the participants, functional enrichment, and for effect of antipsychotics. This literature survey identified 61 gene expression studies. Seventeen out of these studies were based on expression microarrays. A comparative analysis of the DEGs (n = 227) from microarray studies revealed differences between drug-naive and drug-treated SCZ participants. We found that of the 227 DEGs, 11 genes (ACOT7, AGO2, DISC1, LDB1, RUNX3, SIGIRR, SLC18A1, NRG1, CHRNB2, PRKAB2, and ZNF74) also showed genetic and epigenetic changes associated with SCZ. Functional enrichment analysis of the DEGs revealed dysregulation of proline and 4-hydroxyproline metabolism. Also, arginine and proline metabolism was the most functionally enriched pathway for SCZ in our analysis. Follow-up studies identified effect of antipsychotic treatment on peripheral blood gene expression. Of the 27 genes compiled from the follow-up studies AKT1, DISC1, HP, and EIF2D had no effect on their expression status as a result of antipsychotic treatment. Despite the differences in the nature of the study, ethnicity of the population, and the gene expression analysis method used, we identified several coherent observations. An overlap, though limited, of genetic, epigenetic and gene expression changes supports interplay of genetic and environmental factors in SCZ. The studies validate the use of blood as a surrogate tissue for biomarker analysis. We conclude that well-designed cohort studies across diverse populations, use of high-throughput sequencing technology, and use of artificial intelligence (AI) based computational analysis will significantly improve our understanding and diagnostic capabilities for this complex disorder.

Schizophrenia is a disorder that is characterized by delusions, hallucinations, disorganized speech or behavior, and socio-occupational impairment. The duration of observation and variability in symptoms can make the accurate diagnosis difficult. Identification of biomarkers for schizophrenia (SCZ) can help in early diagnosis, ascertaining the diagnosis, and development of effective treatment strategies. Here we review peripheral blood-based gene expression studies for identification of gene expression biomarkers for SCZ. A literature search was carried out in PubMed and Web of Science databases for blood-based gene expression studies in SCZ. A list of differentially expressed genes (DEGs) was compiled and analyzed for overlap with genetic markers, differences based on drug status of the participants, functional enrichment, and for effect of antipsychotics. This literature survey identified 61 gene expression studies. Seventeen out of these studies were based on expression microarrays. A comparative analysis of the DEGs (n = 227) from microarray studies revealed differences between drug-naive and drug-treated SCZ participants. We found that of the 227 DEGs, 11 genes (ACOT7, AGO2, DISC1,LDB1,RUNX3,SIGIRR,SLC18A1,NRG1,CHRNB2,PRKAB2,and ZNF74) also showed genetic and epigenetic changes associated with SCZ. Functional enrichment analysis of the DEGs revealed dysregulation of proline and 4-hydroxyproline metabolism. Also, arginine and proline metabolism was the most functionally enriched pathway for SCZ in our analysis. Follow-up studies identified effect of antipsychotic treatment on peripheral blood gene expression. Of the 27 genes compiled from the follow-up studies AKT1, DISC1, HP, and EIF2D had no effect on their expression status as a result of antipsychotic treatment. Despite the differences in the nature of the study, ethnicity of the population, and the gene expression analysis method used, we identified several coherent observations. An overlap, though limited, of genetic, epigenetic and gene expression changes supports interplay of genetic and environmental factors in SCZ. The studies validate the use of blood as a surrogate tissue for biomarker analysis. We conclude that well-designed cohort studies across diverse populations, use of high-throughput sequencing technology, and use of artificial intelligence (AI) based computational analysis will significantly improve our understanding and diagnostic capabilities for this complex disorder.

INTRODUCTION
Schizophrenia is a multifactorial disorder with 1.13 million incident cases globally. Schizophrenia has claimed 12.66 million disability-adjusted life years up to 2017 (He et al., 2020). The death rate in schizophrenia patients is much higher when compared to healthy individuals (Saha et al., 2007;Lomholt et al., 2019). Schizophrenia claims millions of lives globally every year, with cardiovascular diseases as a leading cause of death followed by suicide, and respiratory and cancerrelated disorders (Bushe et al., 2010). The current treatment for SCZ includes a combination of antipsychotics and behavioral therapy. The development of second generation antipsychotics has lead to effective treatment for both positive and negative symptoms of SCZ (Zhang et al., 2013;Mauri et al., 2014;Gründer et al., 2016;Chen and Nasrallah, 2019). Introduction of third generation antipsychotics has also contributed in lowering extrapyramidal side effects (Stepnicki et al., 2018). Even though the antipsychotics have evolved into relatively specific drugs for the clinical pathology, these drugs still happen to be part of symptom management therapy. The cure for SCZ still lies in the future; however, upon timely intervention of the available treatment, affected individuals can live a relatively symptomfree life.
Schizophrenia, as a disorder, has been known to humankind for over 100 years. The complexity of the disorder is known and has been highlighted since its identification. Eugene Bleuler proposed the term "Schizophrenia" in 1908, describing it as a group of disorders existing together and leading to the disoriented state of mind (Heckers, 2011;Moskowitz and Heim, 2011). These co-existing disorders are now categorized into positive and negative symptoms. Hallucinations, delusion and derailment of speech are positive symptoms while diminished emotional expression and symptoms similar to it are categorized as negative ones. The presence of positive and negative symptoms forms the basis of current diagnostic methods laid by International Classification of Diseases (ICD) (World Health Organization, 2004) and Diagnostic and Statistical Manual of Mental Disorders (DSM) (American Psychiatric Association, 2013). DSM being specific to psychiatric disorders is widely accepted for the diagnosis of the SCZ. The recent edition of DSM (DSM-5) has eliminated the subtypes of SCZ owing to their limited diagnostic ability and identifies SCZ as the single disorder (American Psychiatric Association, 2013).
Even with the criteria laid by DSM and ICD for the diagnosis of psychiatric disorders the misdiagnosis in psychiatry is quite common (Mukherjee et al., 1983;Shen et al., 2018;Tzur Bitan et al., 2018;Coulter et al., 2019). Also, according to these guidelines a person is required to have a set of behavioral symptoms for a specific period, which generally exceeds 1 month for both DSM as well as ICD. Further, the presence of symptoms needs to be profound enough to be even considered for diagnosis. The time required for the diagnosis may delay the necessary early intervention of the treatment. Early and reliable diagnosis may help to reduce the vulnerability, rate of conversion to psychosis, time to remission, incidence of the disorder, economic burden on the society and premature deaths (Bahn et al., 2011). Identification of biomarkers can aid current diagnostic process for accurate and timely diagnosis of the SCZ. However, extreme care should be taken during the development and use of the possible biomarkers for schizophrenia, as it may lead to stigmatization as well as other ethical dilemmas.
Brain imaging methods have shown anatomical changes in whole brain as well as specific regions in SCZ patients. Smith and co-workers carried out the first imaging study using Magnetic Resonance Imaging (MRI) in 1984 (Smith et al., 1984). Since then, a plethora of imaging studies have emerged and have identified structural changes in the brain associated with SCZ (Andreasen et al., 1990;Staal et al., 1998Staal et al., , 2000Gur et al., 1999;Levitt et al., 1999;Downhill et al., 2000;Sanfilipo et al., 2000). The findings of imaging studies are limited to the identification of structural and volumetric changes in the brain (Shenton et al., 2001). The evolution of MRI into functional MRI (fMRI) has given a new hope for the discovery of imaging biomarkers for psychiatric disorders (Du et al., 2012;Yoon et al., 2012;Su et al., 2013;Koch et al., 2015). Identification of functional changes in correlation with the loss of cognitive functions may help to differentiate the SCZ phenotype from others. However, for the heterogeneous disorder like SCZ where affected individuals experience different sets of symptoms with different intensity, the structural and volumetric changes induced in the brain may not be disorder-specific (Linden, 2012). Lack of reproducibility, smaller sample size, low feasibility of clinical application and higher cost, limit the use of imaging analysis as a corroboratory diagnostic tool for behavioral disorders such as SCZ (Linden, 2012).
A combination of genetic and environmental factors are suspected to be responsible for the development of SCZ (Tsuang, 2000;Howes et al., 2017). Genome-Wide Association Studies (GWAS) have identified single nucleotide polymorphisms (SNPs) (Ripke et al., 2014) and copy number variations (CNVs) (Marshall et al., 2017) in a large number of quantitative trait loci (QTLs). However, these genetic variations altogether explain 70-80% of heritability 1 (Hilker et al., 2018). The risk of developing SCZ in the offspring of affected and non-affected identical twin suggests a role of gene-environment interaction in the development of SCZ (Gottesman and Bertelsen, 1989;Kringlen and Cramer, 1989). Environmental factors often affect genes by covalently modifying (methylating) the DNA thus affecting the gene expression.
Recently various studies have attempted epigenome wide changes such as DNA methylation for their association with the disorder Wockner et al., 2014;Jaffe et al., 2015;Viana et al., 2017). However, lack of accessibility of brain tissue for epigenetic studies has resulted in a limited number of studies for psychiatric disorders. Non-target tissues such as peripheral blood have also been used for epigenome profiling. These studies have identified differentially methylated loci in SCZ. Site specific DNA methylation changes using peripheral blood have also been explored as potential biomarkers (Ikegame et al., 2013;Cheng et al., 2014;Nabil Fikri et al., 2017;Nour El Huda et al., 2018).
In other complex diseases such as diabetes (Nathan et al., 2009) and cancer (Deras et al., 2008;Esserman et al., 2017), blood-based biomarkers are already in use in clinical settings. However, the use of peripheral blood as a tissue for biomarker discovery in behavioral disorders has been debated since long. But it is well evident from the recent literature that SCZ phenotype is associated with molecular changes in the non-target tissue such as blood (Levin et al., 2010;Montano et al., 2016;Gilabert-Juan et al., 2019). The presence of inflammatory markers in the brain suggests the existence of a blood-brain relationship (Black and Miller, 2015;Trovão et al., 2019). Also, the immune hypothesis (Kinney et al., 2010;Muller and Schwarz, 2010) suggests the onset of the SCZ may begin with the crossing of the blood-brain barrier by immune cells from the peripheral tissue (Capuron and Miller, 2011;Khandaker and Dantzer, 2016;Van Kesteren et al., 2017). A transcriptome-wide mega-analyses has revealed a correlation between blood and brain gene expression changes in SCZ (Hess et al., 2016). Discovery of SCZ specific molecular markers from peripheral blood may not necessarily indicate the cause and effect relationship. However, the specificity and sensitivity of the markers can be exploited for their potential use as biomarkers.
With the recognized scope for identification of molecular biomarkers, RNA and protein expression studies in peripheral blood for Schizophrenia are on the rise. The surge in the discovery for protein-based biomarkers comes from their important role as functional molecules in cellular processes. The deregulated proteome in SCZ may be a result of underlying pathophysiology associated with the disorder (Martins- de-Souza et al., 2009;Reis-de-Oliveira et al., 2020). A few proteomic studies have revealed the dysregulated pathways associated with immune system in the peripheral blood of SCZ Ezeoke et al., 2013;Goldsmith et al., 2016). Further, proteomic studies with varied approaches have also identified differentially expressed proteins with a significant diagnostic potential for SCZ Nascimento and Martins-De-Souza, 2015;Comes et al., 2018). Even with the application of mass spectrometry, the identification and quantification of low abundance protein still remains a limitation (Pradet-Balade et al., 2001;Chandramouli and Qian, 2009). One might argue that already developed tools like ELISA can be used for protein biomarker discovery in SCZ. However, the ELISA based techniques rely on the abundance of target protein present in the sample and availability of specific antibodies for the detection (Del Campo et al., 2015). In contrast, the nucleic acids such as messenger ribonucleic acid (mRNA) are much more sensitive to the genomic and epigenomic changes. For the complex disorder like SCZ where genetic and epigenetic mechanisms are suspected to play an important role in the development and progression of the disorder, gene expression profiling may be a feasible option for the biomarker discovery.
Recent transcriptomic studies using peripheral blood have shown a significant correlation between gene expression profile and the clinical features of SCZ (Bousman et al., 2010b;Wu et al., 2016;Zheutlin et al., 2016). Focusing on these gene expression changes may shed light on molecular pathways involved in the development of the disorder (Middleton et al., 2005;Wu et al., 2016). Researchers have also identified similarities and dissimilarities in gene expression profiles in peripheral blood of Schizophrenia, Bipolar disorder (BPD) and Major depressive disorder (MDD) (Cattane et al., 2015;Miyamoto et al., 2020). Altogether, the findings suggest that the sensitive and specific nature of gene expression changes in the peripheral tissue such as blood can be exploited for the development of diagnostic, prognostic and predictive biomarkers.
Considering the prospects of application of differentially expressed genes into the clinics, here we chose to review the gene expression studies using peripheral blood in SCZ. We have used these studies to summarize the methodologies and their findings. Further, we also discuss the limitations of the current approaches and discuss possible solutions to overcome them.

METHODOLOGY Design
We performed a systematic review of peripheral blood gene expression biomarkers for SCZ. The guidelines set by Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) were followed to conduct this review (Moher et al., 2009). Covidence software was used for screening of the research articles (Covidence systematic review software, Veritas Health Innovation).

Search Strategy
Research studies were identified from PubMed and Web of Science electronic databases using the keywords "Gene expression, " "Peripheral blood, " "Biomarkers" and "Schizophrenia" or "Schizophrenia spectrum." We included research articles published until February 2021 for further screening (Figure 1).

Eligibility
We included peer-reviewed original research articles aiming at identification of differentially expressed genes in peripheral blood of SCZ participants. Studies with diagnostic schemes other than DSM and ICD were also included. Cross-sectional and follow-up studies with drug-naive and drug treated participants were also included. We did come across studies aiming at identification of non-coding RNAs (miRNA, lncRNA, circRNA) from peripheral blood in association with SCZ. However, studies on lncRNA (Chen et al., 2016;Melbourne et al., 2018;Fallah et al., 2019;Safari et al., 2019) and circRNA (Yao et al., 2019) were excluded as their number was not significant for a systematic review. Further, miRNA dysregulation in SCZ and its application as biomarker has already been reviewed and hence these studies were excluded as well (Beveridge and Cairns, 2012;Wang et al., 2014;Liu et al., 2017).
Research studies on psychiatric disorders other than SCZ, gene expression studies using transformed peripheral blood mononuclear cells (PBMCs), protein expression studies and in silico analysis without any supportive wet lab validation of the markers, were excluded from the review. Research articles published in languages other than English were not considered for the review.

Data Extraction
In addition to the information about authors and year of publication, participants' information such as demographic details, number of participants, medication status, detection methods, potential biomarkers, significant findings and type of study were also extracted for all the studies (Supplementary Table 1). We further narrowed down our focus on microarray based transcriptomic studies due to their exploratory approach. Custom microarray-based studies and those with no available information on medication status of the participants, were excluded from further analysis. Significantly differentially expressed genes (p < 0.05) from the microarray studies were compiled. The gene symbols were verified against the approved symbols by HUGO gene nomenclature committee (HGNC) using the Multi-Symbol checker tool (Braschi et al., 2019). The differentially expressed genes (DEGs) between cases and controls were categorized based on drug status ("drug-treated" or "drug-naïve") of the participants.

Data Analysis
We performed a functional enrichment analysis on DEGs using online tools such as g:Profiler (Raudvere et al., 2019) and Database for Annotation, Visualization, and Integrated Discovery (DAVID) (Dennis et al., 2003). To observe the similarities between the DEGs compiled in this literature survey and findings of genetic and epigenetic studies, we queried SZDB2.0 (Wu et al., 2020). A comparative analysis was performed between drug naïve and drug treated groups by plotting a Venn diagram of DEGs. Genes in each category were also subjected to gene ontology and pathway enrichment analysis as described earlier. We also identified studies which have evaluated biomarkers for their diagnostic potential. The potential biomarkers, methods used for evaluation and their results in terms of accuracy, sensitivity, specificity, etc. were compiled for these studies. Further, from the previously identified sixty one gene expression studies, we identified those aimed at evaluating the effect of antipsychotics on gene expression. The DEGs and the reported change in their expression levels before and after antipsychotic treatment were tabulated ( Table 5).

Screening of Peripheral Blood-Based Gene Expression Studies
We identified 180 studies from PubMed and 118 studies from Web of Science. After removal of duplicates and filtering out the articles based on abstract screening, 103 articles were retained. Full text screening for inclusion criteria resulted in 61 research articles relevant to the topic of interest (Figure 1). Most of these 61 studies used DSM for diagnosis of SCZ. Thirty seven of the 61 studies provided explicit information about ethnicity. Also, 47 of the shortlisted studies were crosssectional in nature whereas others were follow-up studies (Supplementary Table 1). Peripheral blood based biomarker studies for schizophrenia showed consistent increase over the past two decades (Figure 2A). Most of the studies take PCR based targeted approach. Among transcriptomic studies, the number of microarray based articles was significantly larger than the RNA-Sequencing based ones. Most of the PCR based studies validated previously reported DEGs in their respective cohorts ( Figure 2B). The effect of antipsychotics on peripheral gene expression appears to be well recognized as a quarter of the studies screened included drug-naive participants to identify potential diagnostic and clinical state biomarkers ( Figure 2C).

Transcriptomic Studies for Schizophrenia
We came across 17 microarray studies published during 2005-2019. For an unbiased comparison of the findings, studies with custom microarray were excluded (Zvara et al., 2005;Bowden et al., 2006). The resulting 15 transcriptomic studies (Table 1) majorly differed from each other based on the medication status of the SCZ participants. Study with no available information on medication status of the participants, was excluded from the comparative analysis (Middleton et al., 2005). Thus the resulting 14 studies with the information available for medication status of the participants were used for the comparative analysis described further in this review.

Cross-sectional
• In silico analysis identified the involvement of immune function pathways in association with SCZ.

Genetic and Epigenetic Changes, and DEGs
A total of 227 DEGs were reported with respect to healthy controls in the 14 shortlisted microarray studies. Genetic studies have identified several SNPs and CNVs associated with the SCZ phenotype. These polymorphisms and variations may result in the differential expression of genes across all the tissues. To observe the similarities between the DEGs identified from this literature survey and previously reported GWAS findings, we queried SZDB2.0 (Wu et al., 2020). Interestingly, CTNNA1, SATB2, SPATA31D1, and SLC45A1 from this literature survey were also reported by Psychiatric Genomics Consortium (PGC2) (Ripke et al., 2014) and Clozapine clinic UK (CLOZUK) GWAS (Pardiñas et al., 2018). Similar to the GWAS, Marshall et al., identified 16 independent CNV loci associated with SCZ (Marshall et al., 2017). Out of the genes that were affected by CNVs, PRKAB2, and ZNF74 were common to 227 genes. Marshall et al. have also reported an elevated risk for SCZ with the gain of CNVs in these genes (Marshall et al., 2017). Other than GWAS and CNV studies, genetic linkage and association studies have linked SLC18A1, DISC1, NEUROG1, PRODH, CCKAR, DAO, NRG1, DAOA, AKT1, NTF3, CHRNB2, CHI3L1, CHGA, ACKR1, and DGCR6 common to the 227 DEGs to the SCZ phenotype (Lewis et al., 2003;Allen et al., 2008;Sun et al., 2008;Ng et al., 2009). Further, an overlap of 54 genes was observed between 227 DEGs and genes identified by exome analysis. Two of these 54 genes viz. CTNNA1, SPATA31D1 were also identified by GWAS studies. In addition to the genetic changes, we also explored DNA methylation changes reported in peripheral blood of SCZ participants. Similar to the genetic analysis, peripheral blood methylation studies in SZDB2.0 were queried with DEGs (n = 227). Of the 227 DEGs, 33 DEGs have been reported to be differentially methylated in peripheral blood of SCZ affected individuals (Kinoshita et al., 2014;Hannon et al., 2016;Montano et al., 2016). Specifically, Montano C et al. have reported differential methylation of CCNE1, MCM3, AGO2, SCAP, TGFA in SCZ participants which were in common to the 227 DEGs compiled in this review (Montano et al., 2016). Similarly, Hannon et al. reported ZNF74, SLC45A1, ACOT7, GOT2, PRICKLE2, C11orf49, CARD11, NRG1, CD81, DISC1 and AGO2 to be differentially methylated in SCZ (Hannon et al., 2016). Also, 22 differentially methylated genes reported by Kinoshita M et al. (viz. RUNX3, ARF1, CD81, ZBTB4, SLC18A1, SALL1, RNPS1, KIF23, SIGIRR, CD6, GOT2, UBAP2L, PRKAB2, RAB11FIP3, SRPK1, NAF1, BRCA1, MMP9, SLC18A1, CHRNB2, ACOT7, LDB1) were in common to the DEGs identified in this study (Kinoshita et al., 2014). AKT1 and BRCA1, the DEGs identified in this study have also been reported to be differentially methylated in brain prefrontal cortex (Wockner et al., 2014). Even with the limited number of epigenomics studies an overlap of differential methylation with expression was noteworthy. Interestingly, 11 of the 33 differentially methylated genes mentioned above (ACOT7, AGO2, DISC1, LDB1, RUNX3, SIGIRR, SLC18A1, NRG1, CHRNB2, PRKAB2, and ZNF74) were also associated with genetic alterations in SCZ. The above observation suggests that  both genetic and epigenetic changes may result in differential expression of genes in SCZ.

Comparative Gene Expression Analysis
Out of the 227 DEGs identified from the 14 shortlisted microarray studies, 110 DEGs belong to the studies with drugtreated participants while 117 belonged to the studies with drugnaive participants (Supplementary Table 2). Three independent follow-up studies reported a profound effect of antipsychotic treatment on gene expression levels (Kumarasinghe et al., 2013;Xu et al., 2016;Gilabert-Juan et al., 2019). For an overview of the reported DEGs, a comparative analysis was performed between studies on drug-treated participants and drug-naive participants. UBD, BRCA1, AKT1, CCDC134, ZIC2, EIF2D and TOX were identified as common DEGs in both the groups (Figure 3 and Table 2). This analysis highlighted DEGs associated with SCZ which can be tested for their application as biomarkers.

Gene Enrichment Analysis
We performed a functional enrichment analysis using g:Profiler for the DEGs identified in the shortlisted transcriptomic studies. The entire list of 227 DEGs with respect to controls resulted in significant enrichment of the biological processes such as 4hydroxyproline and proline metabolism. The similar set of genes was processed for pathway enrichment analysis using DAVID functional annotation tool. Arginine and proline metabolism and ErbB signaling were the top two enriched pathways with p < 0.05 (Table 3). A significant enrichment of 2-Oxocarboxylic acid metabolism was also observed. A similar functional enrichment analysis was carried out for the 110 DEGs from the studies on drug-treated participants, and for the 117 DEGs from the studies on drug-naive participants separately. A significant enrichment of arginine and proline metabolism pathways was observed in drug-naive participants whereas the ErbB signaling pathway was enriched in the drug-treated participants.

Potential Biomarkers
It is essential to test candidate biomarkers for their diagnostic potential before suggesting their clinical application. Nine out of the previously identified 61 studies systematically evaluated the identified biomarkers for their efficiency ( Table 4). Majority of the studies tabulated above ( Table 4) use computational methods for identification of biomarkers. The qPCR-based methods that test one or two genes are associated with poor sensitivity or specificity. Irrespective of the drug status and other variations in cohorts, most of the models seem to diagnose SCZ with high efficiency.

Effect of Antipsychotic Treatment
It is well known that antipsychotics induce significant gene expression changes in peripheral blood. Multiple studies have evaluated the effect of antipsychotics by post-treatment followup of the drug-naive or drug-free SCZ affected individuals. Studies suggest that the effect of antipsychotics can be observed within one week of the treatment (Zhang et al., 2008) and can be persistent for the duration of the medication (Okazaki et al., 2016). We identified 13 studies (4-Microarray, 9-PCR based) aimed at evaluating the effect of antipsychotics on gene expression, from the previously identified sixty one studies of SCZ. The DEGs and the reported change in their expression levels before and after antipsychotic treatment were tabulated ( Table 5). We came across 27 genes which were reported to be differentially expressed upon antipsychotic treatment. Sixteen genes (RPS25, DISC1, SLC2A3, UBD, NRG1, BRCA1, AKT1, MAL, RXRA, CCDC134, ZIC2, NLN, DAAM2, DGCR6, EIF2D and MMP9) out of these 27 were part of the 227 genes identified earlier. The period of treatment for these studies varied from 1 week to 2 years. Importantly, most of the studies used drugnaive participants as the baseline samples. However, the types of antipsychotics and dosage varied across studies and participants. A few studies have evaluated the effect of specific antipsychotics such as Risperidone, Haloperidol and Olanzapine on gene expression levels of target genes (Shariati et al., 2009;Ota et al., 2015). A few of the follow-up studies reported no change in the expression status of AKT1 (Kumarasinghe et al., 2013;Xu et al., 2016), DISC1 (Kumarasinghe et al., 2013), HP (Trossbach et al., 2019) and EIF2D (Gilabert-Juan et al., 2019) on antipsychotic treatment. Thus, indicating their stronger association with the disorder. These genes can be further explored for their potential application as biomarkers and drug targets for development of effective treatment.

DISCUSSION
A significant number of gene expression studies have reported DEGs in association with SCZ. However, we did not come across any systematic review for gene expression studies aiming for biomarker discovery using peripheral blood. Here we attempt to summarize 61 gene expression studies on peripheral blood with special interest in microarray based expression profiling. The studies were based on diverse ethnic groups including Caucasian, Han Chinese and Japanese populations. Dedicated studies on Southeast Asian and African, and other populations were fewer (Glatt et al., 2009;Bousman et al., 2010b;Kumarasinghe et al., 2013;Yee et al., 2017). This bias needs to be addressed since more studies from diverse populations will be helpful in establishment of robust biomarkers for SCZ. As mentioned earlier, majority of the identified studies were cross-sectional in nature. Crosssectional studies can provide important insight into the stage of the disease at a given point of time. However, in case of a multiform disorder like SCZ which is known to progress in more than one defined direction, it is important to have a prospective cohort study for biomarker discovery. Prospective studies with clinically high-risk participants can also help in identifying underlying biological processes associated with SCZ endophenotypes.
A well-designed cohort study can provide significant leads to the discovery of biomarkers for SCZ. Accurate sample size, uniform clinical diagnosis and extensive clinically relevant information (e.g. smoking, drug abuse, Body Mass Index, years lived with disorder, age of onset of the disorder, family history, prescribed medication, nutritional status, etc.) are of paramount importance for establishing a cohort for the discovery of biomarkers. A very few studies from our list provided such elaborate information (Bousman et al., 2010a;Glatt et al., 2011;Gardiner et al., 2013). Further to address the diversity in the clinical symptoms of SCZ, symptom-based correlation of biomarkers is a necessity. Unlike any other complex disorders, SCZ phenotypes may exhibit a different set of symptoms with different intensities at the onset of the disorder. Hence, clinical   interviews and symptom rating scales (e.g., PANSS and BPRS) gain significant importance in the cohort establishment. Though, the majority of the studies shortlisted by us used DSM for diagnosis, only a few used structured clinical interviews for the same. A routine use of clinical interviews and rating scales can be helpful in identification of more accurate biomarkers for SCZ and its severity.
In the past couple of decades, the use of microarray technology has picked up a pace and has provided compelling evidence for its application in biomarker discovery. However, the use of cuttingedge technologies like Next Generation Sequencing (NGS) remains scanty for transcriptomic studies for SCZ compared to their use in genetic studies. Unlike microarray technology, NGS is less affected by technical problems such as background noise and data normalization. We came across only two studies based on RNA sequencing (RNA-Seq) to identify gene expression changes in peripheral blood (Gilabert-Juan et al., 2019;Zhang et al., 2020). Gilabert-Juan et al., reported differential expression of EIF2D at onset of the disorder, and at 3 months and 1 year post treatment follow-up compared to controls (Gilabert-Juan et al., 2019). A study by Zhang et al., identified co-expressed genes associated with abnormal psychomotor behavior in SCZ (Zhang et al., 2020). Also, of the 506 DEGs reported by Zhang et al., SRPK1, DEFA4, ACO1 were common to the 110 DEGs from microarray studies with drug treated participants and UBAP2L, TKTL1, KIF23 were common to the 117 DEGs from microarray studies with drug naive participants identified in this literature survey. A large number of whole-genome transcriptomic studies using NGS can accelerate the biomarker discovery for psychiatric disorders. NGS based studies will not only provide better insight into functional genomics of the SCZ but will also generate a substantial number of leads which can later be validated for their application as biomarkers.
As described earlier, we screened 14 microarray-based gene expression studies in SCZ for analysis. We categorized these studies into "drug treated" and "drug naïve" groups and compiled the gene expression changes. We found 7 common genes (viz. AKT1, EIF2D, CCDC134, ZIC2, TOX, UBD, and BRCA1) between these two groups suggesting their potential role as biomarkers irrespective of the drug status (Figure 3). In total, four studies independently reported differential expression of AKT1 (Kumarasinghe et al., 2013;Liu et al., 2016;Xu et al., 2016;Mostaid et al., 2017). A diagnostic accuracy of 86.5% for EIF2D and TOX was reported using Weighted Gene Co-Expression Network Analysis (WGCNA) and predictive mathematical model (Gilabert-Juan et al., 2019). These studies suggest that the common DEGs in drug-treated and drug-naïve groups hold a promise for diagnosis irrespective of the medication status of the participants. Also, additional information about the drug status may help in improving the accuracy of the diagnosis. Various computational methods have also been used for development of diagnostic models. Generally, classification methods based on fewer genes suffer from relatively poor sensitivity or specificity of diagnosis (Vachev et al., 2015;Trossbach et al., 2019). The artificial neural network based methods seem to improve the accuracy of diagnosis (Takahashi et al., 2010;Zhu et al., 2021); however, they often lack the "explanability" required for clinical applications. Development of explainable learning based models can be crucial in this aspect.
A few studies have also identified transcriptional changes in the first degree siblings and discordant twins in comparison to their affected counterparts (Middleton et al., 2005;Glatt et al., 2011;Zheutlin et al., 2016). The same studies also pointed out gene expression differences between unaffected siblings and controls. Similarly, a PCR based study reported over expression of TH, IL-1β and TNF-α in both SCZ affected participants and their siblings when compared to controls (Liu et al., 2010). This suggests the use of these DEGs as biomarkers for inherited vulnerability. On similar lines, Ota et al., identified transcriptional changes which could differentiate clinical highrisk participants, first episodic individuals and participants with chronic SCZ from each other (Ota et al., 2019). Genetic and epigenetic studies component of SCZ have identified a large number of quantitative trait loci and differentially methylated genes in SCZ. A significant number of DEGs identified in this study were reported earlier to be differentially methylated in peripheral tissue such as blood. As mentioned earlier, we found an overlap in the differentially expressed genes and the genetic loci associated with SCZ. Altogether, the findings suggest that the genetic and epigenetic changes associated with SCZ can influence the gene expression in peripheral tissue such as blood. These gene expression changes can offer more specificity as biomarkers for the diagnosis of SCZ at an early stage of the disease.
We further performed a gene enrichment analysis using 227 DEGs compiled from microarray studies. Enrichment of arginine, proline metabolism pathway in our analysis is in agreement with the findings of the recent metabolomics study which reported changes in amino acid signatures in SCZ affected individuals (He et al., 2012;Parksepp et al., 2020). In addition to this a study by Chen et al. reported arginine and proline metabolism as one of the differential metabolic pathways in violent schizophrenia affected participants (Chen et al., 2020). Interestingly, 4-hydroxyproline was identified as one of the potential metabolic markers by this study (Chen et al., 2020). These findings are in accordance to our observations where 4hydroxyproline was one of the enriched metabolic processes in the gene enrichment analysis (Table 3). Similarly, enrichment of ErbB signaling pathway and differential expression of NRG1 in our compiled data suggests the dysregulated NRG-ErbB pathway in SCZ participants. This alteration in NRG-ErbB pathway in SCZ has been reported earlier in independent studies as well (Wu et al., 2016;Mostaid et al., 2017).
Besides the enriched terms and pathways identified by our analysis, metabolomic studies have also identified impaired glucose and lipid metabolism in association with SCZ (Orešič et al., 2011;Liu et al., 2015). Two independent pathway analyses, irrespective of medication status, have highlighted a significant enrichment of pathways regulating the immune response Wu et al., 2016). Recently, the KEGG-based pathway analysis by Xu et al. identified olfactory transduction and protein digestion and absorption as enriched pathways in drug-naive participants (Xu et al., 2016). Similarly, Kumarasinghe et al. reported enrichment of pathways involving AKT1 signaling in drug-naive participants (Kumarasinghe et al., 2013). Glutamate metabolism, chondroitin sulfate biosynthesis (Bousman et al., 2010a), and neural signaling pathways (Wu et al., 2016) on the other hand have been found to be enriched in drug-treated participants with SCZ. The glucose and lipid metabolism are suspected to be dysregulated due to antipsychotic medication and unhealthy lifestyle (Henderson et al., 2005;Ventriglio et al., 2015). However, the dysregulated amino acid metabolism, immune pathways, AKT and ErbB signaling are suspected to be involved in pathogenesis of the SCZ and thus can be further explored as potential targets for drug discovery and biomarker research (Buonanno, 2010;Zheng et al., 2012;Saleem et al., 2017;Van Kesteren et al., 2017).
In an attempt to identify the effect of the DEGs associated with antipsychotic treatment we screened 13 follow-up studies (9-PCR, 4-microarray) from the previously identified 61 studies. The tabulation of gene expression status of the DEGs before and after treatment revealed that the antipsychotic treatment does influence peripheral blood gene expression in SCZ, thus, suggesting the importance of drug-naive participants in diagnostic biomarker discovery for SCZ. Of the 27 genes identified from the follow-up studies only AKT1, DISC1 (Kumarasinghe et al., 2013;Xu et al., 2016), HP (Trossbach et al., 2019) and EIF2D (Gilabert-Juan et al., 2019) had no effect of antipsychotic treatment on their expression status (Table 5). Therefore, these genes can be further explored for their potential application as diagnostic biomarkers. Also, the genes whose expression status is influenced by antipsychotic treatment can serve as potential candidates for predictive biomarker discovery (Table 5). However, these findings need further validation based on multi-time point follow up studies.
The recent gene expression studies in SCZ have provided sufficient evidence for their potential use as biomarkers to support the current diagnosis. Along with ease of detection, gene expression can offer valuable insight into the complexity of the disorder. Recent developments in sequencing technology have a lot to offer for biomarker discovery in SCZ. Use of microarray for biomarker discovery has resulted in generation of a significant amount transcriptomic data. These datasets can prove useful for the meta-analysis project to identify significant potential hits for the discovery of biomarkers. However, a very few researchers make their raw data publicly available. Of the microarray studies which are reviewed in this article, only one research group (Bousman et al., 2010b) has deposited their data on Gene Expression Omnibus (GEO) (Barrett et al., 2013). Discovery of gene expression biomarkers for SCZ needs integration of clinical psychiatry tools, statistical analysis, transcriptomic datasets, bioinformatics pipeline, and artificial intelligence-based predictive modeling. A collaboration of the experts from the respective field is the key to the discovery of biomarkers for Schizophrenia.

CONCLUSION
Here we reviewed the potential of DEGs to be used as biomarkers at various stages of schizophrenia. The unprecedented insight offered by the DEGs into the complexity of schizophrenia and the ease of their application into clinics outweighs other proposed molecular biomarkers. Our preliminary functional analysis of the previously reported DEGs sheds light on arginine, proline and hydroxyproline metabolism in association with SCZ. According to our literature survey, AKT1 remains the most frequently reported differentially expressed gene associated with SCZ even with many diverse study designs and detection techniques. The current literature provides sufficient evidence for the existence of specific gene expression patterns for SCZ. The use of NGS (RNA-Seq) and machine learning approaches is yet to be exploited for the detection of robust biomarkers. Further, the efforts made toward establishing prospective cohorts of the younger population with the multi-omics approach will contribute substantially toward the discovery of gene expression biomarkers for SCZ.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding authors.

AUTHOR CONTRIBUTIONS
VW and SK constructed the theme and structure of the research article. VW, SK, and VP wrote the manuscript. VP and SA provided inputs on the clinical aspects of the review. VW extracted and compiled the data. TP and PV verified the data. All authors have read and approved the manuscript.

FUNDING
The study was funded by an intramural research grant (MjRP/19-20/1516) from Symbiosis Center for Research & Innovation (SCRI), SIU, Pune, India. VW and PV received the research fellowships from UGC, New Delhi, India and MjRP, Symbiosis Center for Research & Innovation (SCRI), SIU, Pune, India respectively. SK is also a beneficiary of a DST SERB SRG grant (SRG/2020/001414).