Blood Gene Expression Profile Predicts Response to Antipsychotics

Antipsychotic drugs are one of the largest types of prescribed drugs and have large inter-individual differences in efficacy, but there is no methodology to predict their clinical effect. Here we show a four-gene blood expression profile to predict the response to antipsychotics in schizophrenia patients before treatment. We sequenced total mRNA from blood samples of antipsychotic naïve patients who, after 3 months of treatment, were in the top 40% with the best response (15 patients) and in the bottom 40% with the worst response (15 patients) according to the Brief Psychiatric Rating Scale (BPRS). We characterized the transcriptome before treatment of these 30 patients and found 130 genes with significant differential expression (Padj value < 0.01) associated with clinical response. Then, we used Random Forests, an ensemble learning method for classification and regression, to obtain a list of predictor genes. The expression of four genes can predict the response to antipsychotic medication with a cross-validation accuracy estimation of 0.83 and an area under the curve of 0.97 using a logistic regression. We anticipate that this approach is a gateway to select the specific antipsychotic that will produce the best response to treatment for each specific patient.


INTRODUCTION
Antipsychotic medications are the mainstay of schizophrenia treatment (Insel, 2010). They are also used for several other clinical conditions (i.e., other psychoses, bipolar disorder, delirium, depression, personality disorders, dementia and autism; Alexander et al., 2011;Carton et al., 2015;Roberts et al., 2016) and are therefore one of the most widely used and costly types of drugs having experienced a significant increase in overall prescription in recent years (Kantor et al., 2015). Unfortunately, only 55%-60% of first episode patients will have significantly reduced the severity of their psychopathology with adequate doses of antipsychotic drugs (Crespo-Facorro et al., 2007) and 30% of patients will fail to respond to two antipsychotics after adequate trials (Meltzer, 1997;Zhang and Malhotra, 2011;Pouget et al., 2014). Research to find predictors of the response to antipsychotic treatment is an old field of psychiatry; however, despite decades of research to find clinical biomarkers, there is not a useful molecular test available to predict the response to treatment (Prata et al., 2014). In a previous study (Crespo-Facorro et al., 2015), we analyzed the blood transcriptome of 22 schizophrenia patients before and after medication with atypical antipsychotics and we found that 17 genes had significantly altered expression after medication (P value adjusted < 0.05). With the goal of generating an expression profile that could predict the outcome of treatment, we characterized the transcriptome of a drug-naïve (before any dose of antipsychotic medication was taken) cohort of 37 first episode schizophrenia patients. Patients were then divided into two groups according to their response after 3 months of antipsychotic treatment according to their Brief Psychiatric Rating Scale (BPRS; Lukoff et al., 1986) total scores: the first group included the top 40% of patients who had the best response to treatment (highest absolute decrease of BPRS score) further referred to as ''best-responders'', and the other group included the bottom 40% of patients with the worst response (lower absolute decrease of BPRS score) further referred to as ''worst-responders'' ( Table 1). The transcriptomes at baseline of both groups were compared using the program Deseq (Anders and Huber, 2010) to define the genes with significant differential expression.

Study Setting and Subjects
The cohort analyzed in this study was obtained from an ongoing epidemiological and 3-year longitudinal intervention program of first-episode psychosis (PAFIP) conducted at the outpatient clinic and the inpatient unit at the University Hospital Marques de Valdecilla (Cantabria, Spain). Conforming to international standards for research ethics, this study was approved by the Cantabria Ethics Institutional Review Board (IRB). Patients meeting inclusion criteria and their families provided written informed consent to be included in the PAFIP. The biological samples of patients included in the study were provided by the Valdecilla biobank. All referrals to PAFIP were screened for patients who met the following criteria: (1) 15-60 years old; (2) living in the catchment area (Cantabria); (3) experiencing a first episode of psychosis; (4) having received no prior treatment with antipsychotic medication; (5) DSM-IV criteria for schizophrenia, schizophreniform disorder, schizoaffective disorder, or brief psychotic disorder. Patients were excluded for any of the following reasons: (1) meeting DSM-IV criteria for drug dependence; (2) meeting DSM-IV criteria for mental retardation; and (3) having a history of neurological disease or head injury. Only patients with a history of drug dependence (DSM-IV diagnosis; but not drug abuse) during the last 12 months were excluded of the study. The diagnoses were confirmed using the Structured Clinical Interview for DSM-IV (SCID-I) carried out by an experienced psychiatrist 6 months on from the baseline visit. Our operational definition for a ''first episode of psychosis'' includes individuals with a non-affective psychosis (meeting the inclusion criteria defined above) who have not previously received antipsychotic treatment regardless the duration of psychosis. The 37 individuals who gave written consent to their participation in the program, who fulfilled inclusion criteria at 6 months, and had mRNA samples at baseline and at 3 months, were included in our analyses.
After informed consent was signed, patients were included in a prospective, randomized, flexible-dose, open-label study (EudraCT number 2013-005399-16). We used a simple randomization procedure. At study intake, all patients were antipsychotic naïve and were randomized to aripiprazole (N = 17), risperidone (N = 20). Initial dose ranges were 5-10 mg/day of aripiprazole and 1-2 mg/day of risperidone. Rapid titration schedule (5-day), until optimal dose was reached, was used as a rule unless severe side effects occurred. Dose range was 5-30 mg/day aripiprazole and 2-6 mg risperidone. The dose and type of antipsychotic medication could be changed, at the physician's discretion, based on clinical efficacy and the profile of side effects during the follow-up period. Initial prescribed medications should be switched to olanzapine (range dose 5-20 mg) due to lack of clinical response at 3 weeks (less than 30% of total BPRS score reduction and CGI ≥ 5) or persisting distressing adverse drug reactions. For patients who fail to reach a clinical response after two anti-psychotic agents, clozapine should be started. However, despite this recommendation, clozapine treatment should not be commenced at clinician's discretion.
For the present investigation, drug-naïve first episode patients were divided into two groups based on their clinical response after 3 months of antipsychotic treatment according to their BPRS (Lukoff et al., 1986) total scores: the first group included the top 40% of patients who had the best response to treatment (highest absolute decrease of BPRS score) and are referred to as ''best-responders'', and the other group included the bottom 40% of patients with the worst response (lower absolute decrease of BPRS score) and are referred to as ''worst-responders''.
No statistically significant differences between the two groups (best vs. worst clinical response patients) with respect to percentage of concomitant use of antidepressants, antimuscarinics, hypnotics and benzodiazepines at 3 months were observed (all p's > 0.542). After 6 months follow-up patients were diagnosed as follows, in the group of patients with the best-response to antipsychotic treatment: nine schizophrenia, four schizophreniform disorder, one brief psychotic disorder, and one schizoaffective disorder; in the group of patients with worst-response to antipsychotic treatment: nine schizophrenia, four schizophreniform disorder and two brief psychotic disorder. No significant differences between groups were either found.

Premorbid and Sociodemographic Variables
Premorbid and sociodemographic information was collected from patients (Table 1), relatives and previous medical records. Age of onset of psychosis was defined as the age when the emergence of the first continuous psychotic symptom occurred. Duration of untreated psychosis (DUP) was defined as the time from the first psychotic symptom to the initiation of antipsychotic drug treatment. Duration of untreated illness (DUI) was defined as the time from the first unspecific symptoms related to psychosis to the initiation of antipsychotic drug treatment.

Laboratory Assessments
To minimize the effects of diet and technique, blood samples were obtained from fasting subjects from 8:00 to 10:00 a.m. by the same staff, in the same setting. None of the patients had a chronic inflammation or infection, or were taking medication that could influence the results of blood tests.

RNA Extraction
Total RNA was extracted from blood using the Tempus TM Blood RNA Tube and Tempus TM Spin RNA Isolation Kit (Applied Biosystems, Foster City, CA, USA) following the manufacturer protocols. To define expression profiles, a key factor is that the RNA is intact. To select only high-quality RNA, the RNA Integrity Number (RIN) was characterized with a Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA) and samples with a RIN of at least 7.2 were selected. The selected samples have RINs that range from 8 to 10 with an average of 9.11.

RNA Next Generation Sequencing
Total RNA was extracted from peripheral whole blood of each individual. The mRNA obtained from blood was sequenced at the Centro Nacional de Análisis Genómico (CNAG) using Illumina HiSeq instruments (San Diego, CA, USA). The mRNA was isolated from the total RNA and was fragmented once transformed into cDNA. Fragments of 300 bp on average were selected to construct the libraries for sequencing. Pair-end sequences of 70 nucleotides for each end were produced.

Alignment of Reads to the Human Genome Reference
Alignment of the reads was performed in an SLURM HPC server running Tophat 2.0.6 with default options . Tophat aligns RNA-Seq reads to genomes using the Bowtie 2.0.2 alignment program (Langmead et al., 2009), and then analyses the mapping results to identify splice junctions between exons.

Differential Expression Statistical Analyses
Bedtools 2.17.0 (multicov option; Quinlan and Hall, 2010) was used to count the amount of reads mapped to each gene. The Reference Sequence (RefSeq) gene coordinates were defined using the RefFlat file from the UCSC Genome Bioinformatics Site (as February 28th, 2014). DESeq 1.4 package (Anders and Huber, 2010), setting up fit-only as fitting method, was used to test for differential expression using gene-count data. Two sided Fisher tests were carried out to identify functional enrichment of biological annotations.

Prediction Method
Gene selection was performed with the implementation of Random Forest method in the Random Forest 4.6-12 package (Breiman, 2001) of R. Expression values of the 130 genes with significantly different expression between the best-responders and worst-responders was used as input with default parameters. Genes with the best Gini were selected for the predictor. It was trained using Logistic regression (Cox, 1958) with the glm function of R 3.2.3 and the calculation of the estimated crossvalidation was performed with the cv.glm function of boot package which implements bootstrapping methods (Hinkley, 1988).

Differential Gene Expression Between Best-Responders and Worst-Responders Before Treatment
We found 130 genes with significant differential expression between the best-responders and the worst-responders (P adj value < 0.01; Supplementary Table S1). These genes were significantly enriched for schizophrenia related genes according to the scientific literature in the Gene Reference into Function (GeneRIF) database (Mitchell et al., 2003). We found 14 schizophrenia differential expression genes between the best-responders and the worst-responders (13.4% observed vs. 6.8% expected; Fisher P value = 0.016).

Differential Gene Expression Between the Best-Responders and the Worst-Responders After Treatment
To obtain more information, we sequenced the transcriptome of the patients after 3 months of treatment with antipsychotics. We defined 219 differential expression genes between the best-responders and the worst-responders after 3 months of medication (Supplementary Table S2). These genes were enriched significantly for schizophrenia (21 genes or 11.3% vs. 6.8%; Fisher P value = 0.027). After 3 months of medication with antipsychotics, 6 out of the 14 schizophrenia-annotated genes with differential expression before medication between the best-responders and the worst-responders no longer had differential expression: VWF, a glycoprotein with increased levels in plasma of non-medicated patients and in bipolar disorder and schizophrenia compared to control individuals (Yri et al., 2012); UGT1A1, a gene with promoter variations in patients with schizophrenia that result in lower serum bilirubin levels (Vitek et al., 2010); HMOX1, an enzyme that has anti-inflammatory properties and mediates the first step of heme catabolism involved in the production of carbon monoxide, a putative neurotransmitter, is over-expressed in transgenic mice with schizophrenia-like features (Song et al., 2012); IL8 (also known as CXCL8), a chemokine with altered expression in the dorsolateral prefrontal cortex of individuals with schizophrenia (Fillman et al., 2013); NTNG2, a gene with haplotypes associated with schizophrenia and isoform expression significantly different in schizophrenic and control brains (Aoki-Suzuki et al., 2005); and PTGDS, a prostaglandin that acts as a neuromodulator as well as a trophic factor in the central nervous system and was studied as a schizophrenia candidate (Ruano et al., 2007) and with reduced mRNA expression in peripheral blood of bipolar disorder patients compared with healthy control subjects (Munkholm et al., 2015). The differential expression genes between the best-responders and the worst-responders after 3 months of medication indicates that 5 out the 11 genes involved in ''drug processing'' with differential expression before medication still had differential expression and similar expression profile after medication. These five genes are: GSTM1, a glutathione S-transferase involved in the detoxification of electrophilic compounds, such as therapeutic drugs, by conjugation with glutathione, with genetic variations that affect the toxicity and efficacy of certain drugs (Li et al., 2013); THBS1, an adhesive glycoprotein that mediates cellto-cell and cell-to-matrix interactions and related to drug resistance (Rath et al., 2006); PRAME, an antigen related to cytotoxic drug sensitivity (Kewitz and Staege, 2013); GSTT1, a glutathione S-transferase that functions as a drug metabolizing enzyme (Yri et al., 2012); and SLC22A16, a solute carrier reported to be involved in drug response (Aouida et al., 2010).

Differential Gene Expression of the Best-Responders Before and After Treatment and of the Worst-Responders Before and After Treatment
We also compared the transcriptome of the best-responders before and after medication and characterized 176 genes with differential expression (Supplementary Table S3). These genes were also enriched for schizophrenia annotations (9.4% observed vs. 6.7% expected). When we defined the differential expression genes of the worst-responders, before and after medication, we found only 23 genes (Supplementary Table S4) and these were not enriched for schizophrenia annotations (5.2% vs. 6.7%). These data indicate that the individuals that respond worst to treatment have fewer altered genes (a 7.6 fold decrease) in their expression by antipsychotics than the bestresponders.

Predictor Test
To generate a predictor test before medication of response to antipsychotics, we analyzed all 130 genes with significantly different expression between the best-responders and the worstresponders using Random Forests, an ensemble learning method for classification and regression, among other tasks, that operates by constructing a multitude of decision trees (Díaz-Uriarte and Alvarez de Andrés, 2006; Koo et al., 2013). Functional analyses of the 30 genes with the highest predictive power (Figure 1; Supplementary Table S5) indicated a significant enrichment of genes related to schizophrenia (29.2% observed vs. 6.9% expected; Fisher P value = 0.0009) and bipolar disorder (16.7% vs. 2.6%; Fisher P value = 0.003). In the complete list of 130 genes, the enrichment for schizophrenia genes was smaller (13.4%).
Using logistic regression we defined the area under the curve (AUC), representing the combined predictive power of the genes, and a cross-validation estimate of accuracy for the prediction using the first two, three and four genes. We found that the prediction using two genes (SLC9A3 and HMOX1) had an area under the curve of 0.92 and a cross-validation estimate of accuracy of 0.73; with three genes (adding SLC22A16) the values were 0.96 and 0.833 respectively; and with four genes (adding LOC284581) they were 0.97 and 0.833 respectively (Figure 2). These data indicate that a test with the top four or three of the most predictive genes could be an appropriate choice.

DISCUSSION
We found a significantly different number of schizophrenia differential expression genes between the groups of best-responders and worst-responders, suggesting that the response to treatment could be due, at least partially, to the fact FIGURE 1 | Variable Importance (Gini) for the top 30 predictor genes. Gini variable importance measures reflect the mean decrease in impurity by splits of a given variable in the classification tree, weighted by the proportion of samples reaching that node. A greater "mean decrease Gini" indicates that the gene plays a greater role in partitioning the data into the defined classes.
Frontiers in Molecular Neuroscience | www.frontiersin.org Our results indicate that 6 out of the 14 schizophreniaannotated genes with differential expression before medication between the best and worst-responders had no longer differential expression after 3 months of medication. These six genes are excellent candidates to be the targets used by the drugs to improve the symptoms of schizophrenia. The data show that the group of best-responders has a significant enrichment of schizophrenia-annotated genes with differential expression before and after 3 months of medication. However, we do not observe this significant enrichment of schizophrenia-annotated genes in the group of worst-responders, suggesting that the efficacy of antipsychotics is dependent of the expression profile of the patient before medication and that a predictor could be generated using a gene expression profile of untreated patients.
With the purpose of finding genes that can predict the response to antipsychotics, we characterized 130 genes with differential expression between the best and worst responders before treatment. We found that the 30 genes with the highest predictive power among these 130 genes, had a higher enrichment of schizophrenia-annotated genes, indicating that genes related to schizophrenia tend to have a higher predictive value. This would suggest that schizophrenia genes are involved in the response and that the schizophrenia causative genes tend to be different between the best-responders and the worstresponders.
We tested the predictive power of the top four predictor genes in our patients, and found that the test would accurately predict 93% of the worst-responders and 87% of the best-responders (Figure 3). The gene with the highest predictive value, SLC9A3, is a Na/H exchanger and belongs to several pathways such as the transmembrane transportation of small molecules and the SLC-mediated transmembrane transport that facilitate the movement of a specific substrate either against or following their concentration gradient. Currently, there are novel drugs being developed targeting SLC9A3 that could be of interest for the field FIGURE 3 | Predicted probability of response to antipsychotics using a 4-gene test. Scatter plot, which represents the predicted probability of response (y-axis) for each input sample (x-axis) in the logistic regression predictor. Red points represent worst-responder patients and black points represent best-responder patients.
of schizophrenia (Dominguez Rieg et al., 2016). The second best predictor is HMOX1, or Heme oxygenase, an essential enzyme in heme catabolism, that cleaves heme to form biliverdin, which is subsequently converted to bilirubin by biliverdin reductase, and carbon monoxide, a putative neurotransmitter. The relationship of this gene with the generation of a neurotransmitter could explain the relationship of his expression with the response to antipsychotics and open the gate to possible medical actions. The previous observations are consistent with the fact that the mouse homologous gene has been related to schizophrenia-like features in transgenic mice, what can facilitate new therapeutics for schizophrenia (Song et al., 2012). HMOX1 has been also related to drug resistance in acute myeloid leukemia (Zhe et al., 2015).
The third predictor, SLC22A16, encodes a member of the organic zwitterion transporter protein family which transports carnitine. The encoded protein has also been shown to be involved in the transport of anticancer drugs such as bleomycin (Aouida et al., 2010) and successful treatment has been correlated with the level of activity of this transporter in tumor cells. Variant alleles of SLC22A16 are associated with response and levels of toxicity caused by doxorubicin and cyclophosphamide therapy in the treatment of breast cancer which is consistent with the fact that doxorubicin is a substrate this transporter (Bray et al., 2010). Interestingly, SLC22A16 is located within a schizophrenia susceptibility locus in chromosome 6q (Cao et al., 1997). The fourth gene, LOC284581, is not annotated but it maps within a Parkinson's disease locus of 15.8 Mb (Satake et al., 2009). Interestingly, all three annotated genes are involved in drug resistance, toxicity or transportation and offer new possibilities to generate therapeutic actions for schizophrenia.
A potential limitation of the study is the small sample size (n = 30 or n = 15 per group). However, we are not aware of any full transcriptome study on drug-naïve patients (not having received a single dose of antipsychotics) that had studied similar or higher number of samples than our study. Besides, our study sample size is larger than in most studies based on RNASeq data (according to the mean of samples reported by the Gene Expression Omnibus database). A possible confounding factor could be the cannabis use that has been reported to enhance negative outcomes such as suicidal risk (Serafini et al., 2012). However, it is unlikely that this factor could affect the results of our study given that only eight patients reported as cannabis users and were equally divided in both groups, four in the best-responders and four in the worst-responders. The data we are presenting here is a first step to generate a simple test (expression levels of a small number of genes) to predict the response to antipsychotics, one of the most prescribed types of drugs worldwide, and to provide a tool to select the antipsychotic expected to generate the best clinical response.

AUTHOR CONTRIBUTIONS
JS, CP and BC-F designed the study, analyzed the data and wrote the manuscript; JS and BC-F directed and supervised the research; BC-F performed the clinical analysis; CP and JS performed the bioinformatic work; FR-J provided technical assistance and was responsible for sample managing.

FUNDING
This work was supported by the Ministerio de Ciencia e Innovación (MICINN) in a coordinated project (grant SAF2010-20840-C02-01/02) and by the Spanish Ministry of Economy, Industry and Competitiveness (grants SAF2013-46292-R; PTA2015-10483-I); and by the National Institutes of Health (grant 5R01HD056465-07. Sub-award #320793 from The Children's Hospital of Philadelphia). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

ACKNOWLEDGMENTS
Bioinformatics work was partially performed using the Altamira supercomputer (Spanish Supercomputing Network). We thank the Valdecilla Biobank for blood RNA sampling handling and storage and the CNAG for mRNA sequencing. We also wish to thank the participants and their families for enrolling in this study.