Genome Plasticity in Cultured Leishmania donovani: Comparison of Early and Late Passages

Leishmania donovani possesses a complex heteroxenic life cycle where infective metacyclic promastigotes are pre-adapted to infect their host and cope up with intracellular stress. Exploiting the similarities between cultured and sandfly derived promastigotes, we used early and late passage cultured promastigotes to show specific changes at genome level which compromise pathogen fitness reflected in gene expression and infection studies. The pathogen loses virulence mostly via transcriptional and translational regulations and long-time cultivation makes them struggle to convert to virulent metacyclics. At the genomic level very subtle plasticity was observed between the early and the late passages mostly in defense-related, nutrient acquisition and signal transduction genes. Chromosome Copy number variation is seen in the early and late passages involving several genes that may be playing a role in pathogenicity. Our study highlights the importance of ABC transporters and calpain like cysteine proteases in parasite virulence in cultured promastigotes. Interestingly, these proteins are emerging as important patho-adaptive factors in clinical isolates of Leishmania. We found that the currently available genome of Leishmania in the NCBI database are from late passages. Our early passage genome can act as a reference for future studies on virulent isolates of Leishmania. The annotated leads from this study can be used for virulence surveillance and therapeutic studies in the Indian subcontinent.


INTRODUCTION
Leishmania donovani is an intracellular obligate parasite of mammalian macrophages and causes visceral leishmaniasis or kala-azar, which is a fatal disease if not treated on time. The disease severity varies from host to host, region to region as well as between parasite strains. Extensive research by different groups exploiting genomic, proteomic, metabolomic, immunologic, and animal models have pointed toward tripartite determinants mediating the development of this disease viz. the vector, host and pathogen. Although vector and host characteristics are important for symptomatic disease, parasite characteristics majorly determine the fate of the disease (McCall et al., 2013). Promastigotes are the culturable form of Leishmania. Stationary phase culture of Leishmania is expected to contain a large number of metacyclic parasites and have been routinely used for experimental infections. Metacyclic promastigotes are injected into the host during blood meal and are the infective stage of the parasite. We and others have observed that continuous axenic cultivation of Leishmania leads to loss of virulence over a period of time (Dey et al., 2002;Ali et al., 2013). The relative virulence of stationary phase promastigotes is proportional to peanut agglutinin negative promastigotes contained within these populations (da Silva and Sacks, 1987) which is attributable to exposed surface carbohydrates. It has also been observed that the in vitro maintenance of Leishmania promastigotes by cultivation over longer periods may reduce their ability to differentiate into amastigote forms Ali et al., 2013). Studies at the mRNA and proteomic levels indicated differential expression of virulence related genes in both promastigotes and amastigotes cultured axenically for long periods compared to freshly isolated/transformed or early passage parasites (Lei et al., 2010;Pescher et al., 2011;Ali et al., 2013;Magalhaes et al., 2014). These studies were done with the aim of standardizing protocols for the use of axenic Leishmania cultures in infection studies and to identify virulence factors. Studies at genomic level mostly identified parasite evolution in the Indian subcontinent under drug pressure or disease phenotype (Zhang et al., 2014;Imamura et al., 2016). As promastigotes are the infective form of Leishmania, and cultured stationary phase promastigotes show remarkable consistency with sandfly derived metacyclic promastigotes (Inbar et al., 2017), we chose the former system to unravel the genetic mechanism of parasite pre-adaptation to host environment and how it is lost with continuous in vitro passages. We report here the global changes in the genome and transcriptome of serially passaged L. donovani AG83 strain which cause specific alterations in few genes and their expression that lead to loss of virulence reflected in the infection studies. We also try to link these adaptive features with parasite's nutritional environment and complex life cycle, and how closely they relate to the global evolution of L. donovani clinical isolates as available through published work. This study also throws light on the pathologically significant genes common to clinical and laboratory strains which must be considered for virulence surveillance at least in the Indian subcontinent.

Ethics Statement
All animal experiment protocols adhered to the guidelines of the Committee for the Purpose of Control and Supervision on Experimental Animals (CPCSEA), Ministry of Environment and Forest, Government of India, and were approved by the Animal Ethics Committee (147/1999/CPSCEA) of CSIR-IICB.

Parasite Culture and Maintenance in Animals
Leishmania donovani (MHOM/IN/1983/AG83; ATCC repository number PRA R -413 TM ) amastigotes from infected hamster spleen were transformed to promastigotes in Schneider's Drosophila medium (Sigma-Aldrich) at 22 • C and sub-cultured as promastigotes in M 199 (Sigma-Aldrich) both supplemented with 100 U/ml penicillin, 100 µg/ml streptomycin, and 10% FCS (Sigma-Aldrich) (Banerjee et al., 2008). Parasites were subcultured till 25th passage for some experiments. Golden Syrian Hamsters, 5-6 weeks old, reared in institute facilities were used for the purpose of parasite maintenance.

Macrophage Infection in Vitro
Macrophages isolated from the peritoneal cavity of hamsters (12-16 weeks old) by injecting chilled RPMI-1640-10% FCS were pooled and cultured overnight on glass cover slips as described elsewhere (Bhowmick et al., 2010). AG83 promastigotes of different passages were allowed to infect peritoneal macrophages (MoI of 10:1) for 3 h, washed with warm 20 mM PBS, pH 7.2 and further incubated in fresh medium till 72 h. At 24 and 72 h postinfection, cells were fixed in methanol and stained with Giemsa for microscopic determination of intracellular parasite numbers per 100 host cells. Data were analyzed using GraphPad Prism 5 software.

Determination of Splenic and Hepatic Parasite Burden
Hamsters (5-6 weeks) were infected by injecting 2 × 10 7 stationary phase promastigotes by intracardiac injection suspended in 20 mM PBS. Animals were sacrificed 8 weeks post-infection for determination of parasite burden by Leishman Donovan Units (LDU) in Giemsa stained impression smears and limiting dilution assay in fivefold serial dilutions of homogenized organs as described previously (Banerjee et al., 2008). Data analysis was performed on GraphPad Prism 5 software.

High Throughput Genome Sequencing
High quality genomic DNA was extracted from stationary phase promastigotes of the early (HTI4) and late (HTI5) passage of L. donovani AG83 using a commercial procedure as recommended by the manufacturer (Qiagen, Germany). Size check, integrity and presence of contaminants in the DNA samples were assessed through gel electrophoresis. DNA purity was measured using a Nano Drop 2000 spectrophotometer (Thermo Scientific, Waltham, MA, United States). Two separate libraries of L. donovani AG83 were prepared one each for early and late passages. De novo paired end sequencing was done on Illumina HiSeq 2500 with the read length of 125 base pairs with an insert size of 250 bp. A total of 66 and 44 million raw reads were generated from early and late passages, respectively.

Transcriptome Sequencing
For each passage, parasites were harvested by chilling on ice, spun down, and washed once in cold PBS solution, pH 7.4, and suspended in RNA Later. Total RNA was isolated from promastigotes of early (2nd), intermediate (11th), and late (25th) passage (referred to as q1, q2, and q3, respectively, in the figures) by using "Roche High Pure RNA Isolation Kit, " (Product no.11828665001) according to manufacturer's protocol. The purity and concentration of each RNA sample was checked by using the Agilent 2100 bioanalyzer (Agilent Technologies, CA, United States) before proceeding for further downstream analyses. Paired end transcriptome sequencing was carried out on Illumina HiSeq platform, with the read length of 125 base pairs with an insert size of 250 bp generating 37-42 million raw reads (approximately). The library has been sequenced following manufacturer's instructions using the HiSeq SBS Kit v4 (Part # 15034097Rev.B) to generate paired-end reads. Additional quality control of raw data using FastQC (11) was performed. The reads were preprocessed using Trimmomatic and the poor quality sequences, contaminated sequences were removed using the blast and blat based searches using the reference (LdBPK82A1; Bioproject: PRJNA171503) from Genbank. The high quality filtered reads were used for downstream analyses.

Genome and Transcriptome Assembly and Functional Annotation
The early and late passage genomes were assembled using Allpaths-LG assembler (Ribeiro et al., 2012). We simulated reads with 6 K and 20 K insert sizes from the existing genomes of LdBPK282A1 retrieved from Genbank (PRJNA171503) with WGsim (Li et al., 2008).
The assembly from Allpaths was resolved up to chromosome level reference based genome assembly by nucleotide alignment using Nucmer and further using our in-house scripts 1 . RNAseq reads from all passages were aligned using blat (Kent, 2002) with the Genbank data of LdBPK282A1 for removal of contaminants. The cleaned reads were used for transcriptome assembly using Trinity transcriptome assembler (Grabherr et al., 2011;Haas et al., 2013) (Supplementary Table S1). We used BUSCO (Simão et al., 2015) for predicting the genome completeness and the core ortholog analysis was computed using eukaryotic genomes (Supplementary Table S2).
We predicted protein coding genes of the assembled genomes using Augustus (Stanke et al., 2008) and Scipio pipeline with LdBPK282A1 coding sequences as training material (Bioproject: PRJNA171503). Predicted genes were annotated using BLAST (Altschul et al., 2009) as well as InterProScan (Quevillon et al., 2005) searches. Whole genome protein sequences were submitted to the gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis. GO ids were mapped back to their ontology functions using in house scripts. GO enrichment was analyzed in the categories of molecular function, cellular component and biological process. Metabolic pathways analysis was done using the KASS server (Moriya et al., 2007).
Additionally, we used companion server (Steinbiss et al., 2016) for predicting orthologous genes from the two culture passages using Genbank Leishmania major as the reference. This server predicted many interesting features including the pseudogenes present in the genomes that were not predicted by Augustus. Genomes of two passages were compared using MUMmer, NUCmer, and PROmer (Kurtz et al., 2004).

Differential Expression Analysis of Transcripts
We carried out differential expression analysis of transcripts using Tuxedo suit tools (Langmead et al., 2009; 1 https://github.com/madhubioinfo/STLab-assembler Trapnell et al., 2012;Kim et al., 2013) by aligning the RNAseq reads of three passages with the reference genome of LdBPK282A1. The pattern of expression across samples was calculated by using the normalized Fragments per Kilo base of transcript per Million (FPKM) values. R scripts were used to construct box plots and perform hierarchical clustering (Supplementary Figure S1). Putative secretary functions were assigned to the DEGs by using Secretome tool (Bendtsen et al., 2004).
Total RNA was isolated from 4 × 10 7 parasite promastigotes of different passages, using TriZOL-chloroform method. Pure RNA pellet was suspended in 50 µl DEPC treated RNase free water and concentration was measured using Nanodrop 2000 (Thermo Fisher Scientific). Almost 2 µg pure RNA was used for cDNA synthesis following manufacturer's protocol using Revert Aid First Strand cDNA synthesis kit (Thermo Fisher, MA, United States). Differential expression analysis of parasite genes was performed using SYBR green master mix (Roche). Ld GAPDH was used as reference gene. Reaction was run in Roche light cycler 96 (Roche) and fold change of different genes was expressed as 2 − Ct , where Ct = gene Ct-reference gene Ct and ( Ct) = Ct test -Ct control. RNA was collected from n = 3 hamster and proceeded for further long passage preparation. Also data per group (A2, A11, and A25th passage) represents n = 3 experiments performed for each group.

SNP Identification
Sequenced illumina genomic reads of early and late passages were aligned with the reference genome of LdBPK282A1 using Burrows-Wheeler Aligner (Li and Durbin, 2009), After marking the duplicates using PICARD tools the variants were called using the Genome Analysis toolkit (GATK) (Van der Auwera et al., 2013), obtained variants were filtered with the cutoff of QD < 2.0 ||MQ < 40||FS > 60.0|| ReadPosRankSum < −8.0.

Copy Number Variant Detection
The reads mapped by using BWA were used to estimate the read depth for each sample at each chromosome using the coverage values from Samtools v0.1.18 . After removing the duplicates using Picard the CNVnator (Abyzov et al., 2011) was used to identify copy number variants. We used 100 bp as the window size for creating the histogram bins of read depth in CNVnator. For CNVnator, we removed calls which were less than the cutoff of t-test calculated e-value of 1e-5.

Cell Viability and Apoptotic Cell Death Assay
Promastigotes of early (2nd) and late (25th) passages were treated with graduated doses of miltefosine for 24 h and the percent viability was determined by enzymatic reduction of 3-[4,5dimethylthiazole-2-yl]-2,5-diphenyltetrazolium bromide (MTT) to MTT-formazan. In parallel experiments the miltefosine treated promastigotes were analyzed for apoptotic cell death by AnnexinV-FITC and propidium iodide staining acquired in FITC and PE channels, respectively, on FACS LSR Fortessa and analyzed for percent FITC positive and FITC-PI double positive cells on FACS Diva software.

Isoelectric Focusing
The isoelectric focusing (IEF) was performed using the PROTEAN IEF Cell system (Bio-Rad). 500-600 µg of whole cell protein was added to 300 µl of rehydration buffer for 11 cm IPG strip (pH 4-7 non-linear, Bio-Rad). Protein sample was applied to the IPG strips and kept at room temperature (16-18 h) for passive rehydration. IEF was performed at 250 V for 20 min; 10,000 V for 2 h 50 min; 10,000-40,000 V/h and an optional step of 500-1000 V 10-15 h.

Second Dimension-SDS-PAGE
Before the second dimension, proteins in the IEF strips were reduced in equilibration buffer (6 M urea, 2% SDS, 300 mM Tris-HCl pH 8.8, 20% glycerol) containing 20 mg/ml DTT and alkylated in dark with 25 mg/ml iodoacetamide, again dissolved in equilibration buffer, for 15-20 min each. Strips were washed with SDS-PAGE running buffer and separated across 12% SDS-PAGE gels (30% acrylamide, 0.8% bis-acrylamide) using a vertical system (Bio-Rad) and standard Tris/glycine/SDS buffer. Gels were run at 30 mA/gel until the tracking dye left the gel. The protein standard was purchased from Bio-Rad (prestained SDS-PAGE broad range). For western blot analysis, total cell lysates of early and late passage promastigotes were run on SDS-PAGE followed by incubation with antibodies for gp63 and HDAC. The images were developed using Luminata Forte (Thermo Fisher, MA, United States) on Gel Doc XR (Bio-Rad). The anti-gp63 antibody raised in rabbit (in house) and antimouse HDAC antibody (Cell Signaling Technology) were used for the assay. The 2-DE gels were stained with Coomassie Brilliant Blue G-250. Gel images were taken on the Gel Doc XR+ (Bio-Rad) and analysis was done using the PD Quest software version 8.0.1 (Bio-Rad Laboratories) and manual checking. The gel having a higher number of spots was used to locate the corresponding protein spots within different gels. In this study, we have applied a cut-off of at least 1.5-fold up-regulation/down-regulation of protein for the study. Experimental molecular weights and isoelectric points were calibrated according to Bio-Rad standard 2-DE PAGE marker. Spots were manually excised and processed according to the In-Gel Tryptic Digestion Kit (Thermo Fisher, MA, United States). The trypsin digested samples were concentrated using speed-vac before loading the samples for MALDI-TOF Mass Spectrometry analysis on 4800 MALDI ToF/ToF analyzer (Applied Biosystems, CA, United States). Zip Tip protein concentrator (Millipore, MA, United States) was used for concentrating and de-salting of the low abundance proteins. The samples were dissolved in a solvent consisting of 0.1% trifluoroacetate and 50% acetonitrile in MilliQ Water. Then, 0.5 µl of sample solution was mixed with 0.5 µl of matrix solution (1 mg/ml α-cyano-4-hydroxycinnamic acid dissolved in the aforementioned solvent), applied to a 384-MALDI sample target plate, and dried in air. Peptides were evaporated with a ND: YAG laser at 355 nm, using a delayed extraction approach. They were accelerated with 25 kV injection pulse for ToF analysis. Each spectrum was the cumulative average of 1000 laser shots. The MS/MS spectrum was collected in MS/MS 1 kV positive reflectron mode with fragments generated by post source decay. The MS/MS mass tolerance was set to ±20 ppm. After processing, 10 MS/MS precursors were selected (Minimum signal to noise ratio-50). Before each analysis, the instrument was calibrated with the Applied Biosystems 4700 Proteomics Analyzer Calibration Mixture. Data interpretation was performed using the GPS Explorer Software (Applied Biosystems, CA, United States), and an automated database search was performed using the MASCOT program (Matrix Science Ltd., London, United Kingdom).

Data Availability
The Genome sequences of Early and Late passages of L. donovani MHOM/IN/1983/AG83 is available at NCBI with submission numbers GCA_001989955.1 and GCA_001989975.1, respectively.

Continuous in Vitro Passaging Leads to Loss of Virulence of Promastigotes
We first compared the infection causing capacity of promastigotes from different passages in vitro and in vivo. Stationary promastigotes recovered from different passages were quantified and employed in the experiments. Hamsters and hamster derived macrophages were highly susceptible to infection, with infection persisting up to 25th passage (Figures 1, 2) and displayed a decline in infectivity with more time spent in the culture. The average number of amastigotes per macrophage was 14.4 ± 3.2 at 24 h, which increased to 21.6 ± 3.9 at 72 h ( Figure 1A) post-infection with early passage. There was over a 50-fold reduction in the number of intra-macrophagic amastigotes compared to the 2nd passage when parasites of 25th passage were used for infection (Figure 1, p < 0.0001, One way Anova). Parasites from the 2nd passage could infect  96.83 ± 1.4% of macrophages at 72 h while only 8 ± 1.6% cells were infected with promastigotes from 25th passage ( Figure 1B). From these experiments it seemed the infectivity is affected at two stages, first at the level of host-parasite interaction evident from lower infection at 24 h, and second, parasite sustenance inside macrophages, as seen in clearance of parasites after 72 h. Hamsters infected with 25th passage promastigotes presented over 6.5-fold less parasite burden in both liver (LDU of 515.6 ± 82.44) and spleen (LDU of 211.5 ± 42.21) as compared to those infected with the 2nd passage (LDU of 3358 ± 132.9 and 1416 ± 126.6, respectively, in liver and spleen) (Figures 2A,B, p < 0.0001, One Way Anova) at 8 week post-infection. As the number of live amastigotes inside the organs gives an idea of the disease severity, we additionally assessed the parasite burden in the infected liver and spleen of differentially infected hamsters by serial dilution method. The number of parasites showed a gradual decline with passage number thus reinstating the fact that continuous axenic culture of promastigotes leads to loss of virulence of the parasite. As shown in Figure 2C, the number of live amastigotes came down from 10 16 in the liver of hamsters infected with virulent parasites (2nd passage) to as low as 100 parasites in the animals with attenuated infection (25th passage) (p < 0.0001, One Way Anova). The splenic infection level was very high in hamsters infected with early passage promastigotes, and reached levels as high as 10 21 ( Figure 2D, p < 0.0001, One Way Anova) whereas the parasites failed to survive and multiply in the animals infected with late passage, and only about 100 parasites could be detected in the piece of organ examined by us. The hamsters with mild infection also demonstrated reduced liver and spleen size ( Figure 2E) unlike the severely infected groups. Thus early passage promastigotes displayed increased fitness inside host cells reflected in higher parasite burden while late passage promastigotes were successfully cleared by macrophages.

Late Passage Genome Has Higher Number of Pseudogenes
Genome plasticity in Leishmania has been linked to its ability to adapt to different environments (Sterkers et al., 2012). This prompted us to check for the genetic adaptations which may account for adaptability to growth in culture medium and the associated loss of infectivity as a result of continuous passaging. The genomes of early and late passages were assembled into complete 36 chromosomes with a genome size of 32.2 and 32.1 MB, respectively ( Table 1A). The genomes of these two passages, however, had about 11.3% gaps compared to 3.67% gaps in LdBPK282A1 from Genbank [Accession number: GCA_000227135.2]. A chromosome wise comparison of early and late passages with LdBPK282A1 (Supplementary Figure  S2) didn't reveal much difference, whereas the whole genome comparisons clearly pointed out the small but prominent changes in the genome of the late passage, which was comparable to the NCBI reference LdBPK282A1 strain (Figure 3, upper panel). We used RATT and companion for transferring annotation from LdBPK282A1 strain (Downing et al., 2011) that resulted in 7563 and 7552 predicted protein coding genes, respectively, in early and late passages, compared to 7967 protein coding genes in LdBPK282A1 (Table 1B). The less number of protein coding genes from our cultures are mostly attributed to the presence of assembly gaps. The numbers of predicted pseudogenes in early and late passages were 73 and 82, respectively, and this number is significantly higher than 54 pseudogenes predicted in LdBPK282A1 strain, but similar to what is reported in the Srilankan L. donovani (70) strains (Zhang et al., 2014). Presence of large protein families such as calpain-like proteases and amastins are found to occur in large directional clusters in Genbank strain as well as our strains. We noticed a large amastin cluster in chromosome 8 in Genbank that are present in chromosome 10 in our strain ( Table 2).
The GO Biological Process annotation of early and late passages indicates there are no changes in the gene numbers of processes in early and late passages. However, genes responsible for defense system (GO: 0006952) and positive regulation of cell proliferation (GO: 0008284) are missing in later passages (Figure 3, lower panel and Supplementary Data Sheet S1). Thus repeated ex host passages have made the late passage parasites suited to extracellular life where selection pressure against host defense system is not needed. Further, the cellular machinery needed for transformation to infective form is also perturbed which may impair their ability to infect the host.

Chromosome Copy Number Variation Drives Gene Expression in Leishmania
The previous studies on L. donovani (Leprohon et al., 2009;Dumetz et al., 2017) reported that aneuploidy and Copy number variants regulate the gene expression. In our study we identified SNPs from early and late passages. Total 4390 heterozygous loci were found in early passage and 4356 heterozygous loci were found in late passage of the genome.
The local copy number variant detection was performed on the basis of read depth binning ratio approach. It was already known that deletion and duplication event plays an important role in these genomes to maintain the virulence in the different environmental conditions (Dumetz et al., 2017).
The copy Number variation on our passages was done using LdBPK282A1 as reference, with an e-value cut-off of 1e−5 for filtering. Three hundred and eighty CNV events were qualified in late passage and 365 CNV events were qualified in early passage (Supplementary Data Sheet S2). The size of CNV events ranged from 0.7 to 271 kb. We checked for the regions which were overlapping with the protein coding genes. A total of 230 deletion events and 135 duplication events were identified in the early passage while 234 deletions events and 146 duplication events were identified in the late passage genome (Supplementary Data Sheet S2). Uniq gene lists that had undergone changes are listed in Supplementary Data Sheet S2: uniq_CNV. The genes undergoing structural changes in later passages compared to earlier passages comprised of several ABC Transporters (10), Amino acid transporters (7), Amastins (3), GP63, calpain like cysteine proteases to name a few. Interestingly, we have also reported changes in gene   14,18,29 * One or more genes not present due to assembly gaps. * * Gene prediction error. # Gene absent in the scaffold due to assembly gaps.
expression profiling of GP63 and Calpain like cysteine proteases between the late and early passages. We found more CNV events in chromosomes 5, 6, 8, 15, and 31 in the genome of early passage as per earlier reports (Laffitte et al., 2016;Iantorno et al., 2017). High number of CNV events play a major part in the gene expression regulation of in vitro cultured promastigotes (Dumetz et al., 2017). Interestingly we observed an increased frequency of CNVs in the protein coding regions of the late passage particularly in chromosome 3, 4, 13, 16, and 20. Protein coding regions corresponding to calpain like cysteine protease genes on chromosome 20 displayed multiple duplication events. Phosphoglycerate kinase B, cytosolic fragment on chromosome 20 showed an insertion event in the late passage which was not found in the earlier passage. This gene was reported to be over expressed in Srilankan L. donovani strains causing visceral disease versus those causing cutaneous lesions (Zhang et al., 2014). Noticeable protein coding genes falling in the variant regions on chromosome 13 of late passage genome were some acetyl transferases including histone acetyltrasferase, N-acetyl transferase subunit ARD1, RAS-related protein RAB5, mitogen activated protein kinase 2, etc. which may have a role in cell cycle progression and morphogenesis of Leishmania (Wiese, 1998;Yadav et al., 2016).

Single Nucleotide Polymorphisms in the ABC Transporter Coding Genes
The evolution of pathogenicity in microbes has been attributed to ordered changes in the functionality of the genes as a result of physiological constraints encountered by the organism in their immediate environment. A KEGG pathway analysis didn't show major differences in the number of genes involved (Supplementary Data Sheet S3) although detailed mutational analysis revealed interesting changes linked directly or indirectly to loss of virulence in the later passage. Among the defense gene products, ABC transporters are important proteins involved in drug resistance, nutrient acquisition and pathogen virulence (Glavinas et al., 2004) and have expanded in the pathogenic protists as a parasitic adaptation (Zhang et al., 2014;Jackson et al., 2016). Genomes of L. donovani AG83 in early and late passages contain 42 copies of the genes spread over the genome. However, there are at least two instances where the ABC transporters are undergoing polymorphism that could result in reduced pathogen fitness. ABC transporter at LDON_230007700.1; HTI5:chr23: 91219-96174 in late passage has undergone several changes at the nucleotide level leading to an inactive ABC transporter in the late passages. Our Copy Number variant analysis also detected CNV duplication event of 12000 bp in the region of 86801-98800 in chromosome 23 in late passage. In earlier passages, a functional ABC transporter [HTI4:Chr23:91908-92025] resided inside a larger gene locus at HTI4:chr23:89807-101617 (Figure 4). We found a CNV duplication event as well in chromosome 23: 86801-98700 in early passage that corroborates this finding. Interestingly, there is another ABC transporter gene in chromosome 31 with gene id LDON_310017500.1 in HTI4 and gene id LDON_310017000.1 in HTI5 where a substitution at 5012th position (GGG ->GCA a non-synonymous substitution which leads to a change from G->A amino acid at 1671st position in HTI5 (Figure 5A). The same gene undergoes two substitutions at the 5010th position (from AGC ->AGT) ( Figure 5B) leading to synonymous substitution. The Genbank L. donovani strain LdBPK282A1 gene corresponding to this gene has the ' A' variant in this locus ( Figure 5C). This observation is also supported by our CNV analysis in chromosome 31 as well as by Imamura et al. (2016). This gene codes for pentamidine resistance transporter protein. Major facilitator Protein (MFS class) is responsible for solute transport via membrane. Recently their role in stress responses and virulence has been proposed by various groups (Shah et al., 2014;Zhang et al., 2014). One such gene in HTI4 has been pseudogenized and this gene is absent in HTI5 (assembly gaps can't be ruled out as one of the plausible causes though). A small segment of the gene (297-360 bp) is copied at two places both in HTI5 and HTI4 with one point non-synonymous mutation (AAC->AGC) (gene id: LDON_290021200 in HTI4) resulting in N ->S (Figure 6). The presence of this duplicated domain and its role in gene regulation and pathogenicity may be intriguing. Functional members of this family is present in duplicates in HTI5 as well as HTI4 (HTI5: Chr29: 711070-711429; HTI5: Chr29: 710625-710970 and HTI4: chr29: 711414-711772; HTI4:chr29: 710661-711007). This indicates selection pressure is working on MFS classes of proteins. Studies on pathogenic organisms demonstrate that the acquisition of iron is very important in the intracellular pathogenesis process and Leishmania possesses molecular machinery for iron regulation (Huynh and Andrews, 2008). Interestingly, these transporter proteins have links to the MFS superfamily (Huynh et al., 2006) and insight into genetic structure, mechanism and regulation via domain duplication in this class of proteins appears intriguing in designing targeted therapies.
A fully functional acetyl-CoA synthatase gene in HTI4 has undergone modification [chr23: 199275-199598 in HTI5 and chr23: 198391-19851 in HTI4] leading to loss of function (Figure 7). There is also domain duplication in HTI4: LDON_230010500.1 from 1 to 118th position. The stringency in nutrient acquisition is key to preparedness of the parasite to intracellular environment (Carman et al., 2008) and loss of function of acetyl-CoA synthetase may restrict the parasite's capacity to thrive in nutrient poor conditions using alternative carbon sources (McConville et al., 2015). Domain duplication in the early passage promastigotes (HTI4) indicate enhanced intracellular survival strategy.
Calpain like proteases play a very important role in infection process. A single insertion in 172519th position in HTI5 on chromosome 27 (Chr27: 172458-188744) leads to frameshift mutation causing this gene to become a pseudogene. The corresponding functional gene in early passage is present at Chr27: 172465-188750 (Figure 8). In case of Genbank L. donovani genome, the protein sequence is more identical to the HTI5 protein sequence. Recent reports have pointed toward the regulatory role played by expressed pseudogenes in cancer cells and parasites (Wen et al., 2011). To check the downstream impact of mutations in ABC transporters and CALPs, we checked the relative sensitivity of early and late passage promastigotes to the drug miltefosine. Early passage parasites displayed an IC 50 of 30 µM whereas the late passage promastigotes showed increased apoptotic death under drug pressure and an IC 50 of 18 µM (Supplementary Figure S3). Interestingly, parasite CALPs may serve other functions in the intracellular form which determine disease outcome and host responses (Branquinha et al., 2013) and are thus potent drug targets. This may open up new avenues in understanding Leishmania biology. The implications of these modifications need further investigation.

Comparative Transcriptome Profiles Between Passages Reveal Calpain Proteins Play an Important Role in Maintaining Virulence
The information from genomic studies done in this work as well as previous studies on axenic promastigotes indicated a strong link between parasite pre-adaptation and virulence. We hypothesized that continuous axenic cultivation may lead to altered transcript expression. To check for specific changes we analyzed the differential levels of transcript expression in three different culture stages (early, intermediate, and late passages) using RNAseq. Overall, there was no massive transcript expression switch between the early and late passages (Figure 9 and Supplementary Figure S1) although specific genes presented differential expression (Supplementary Data Sheet S4) which correlated to certain extent with the CNV. The relative gene expression of a few genes was validated by RT-PCR (Supplementary Figure S4). Among the genes coding for surface active proteins, there was down regulation in membrane bound acid phosphatase and surface antigen like protein transcript levels (Supplementary Data Sheet S4) in the later passages. Acid phosphatase activity is needed for virulence and is also involved in endosome sorting (Katakura and Kobayashi, 1988). This may also be an adaptive response to the hydrolytic environment encountered inside sandfly gut and mammalian cells (Papadaki et al., 2015). Cyclin-dependent kinase pho85like protein is a morphogenesis related protein having strong link to virulence in Ustilago (Castillo-Lluva et al., 2007) although its function in Leishmania has not been elucidated to date; it is implicated in environmental signaling in yeast (Carroll and O'Shea, 2002). This gene was under-expressed in the later passages. Among the cytoskeletal and flagellar proteins, a  paraflagellar rod protein 1D (PFR 1D) was over expressed in early passage compared to the intermediate passage consistent with previous proteomic data (Magalhaes et al., 2014) on differentially passaged promastigotes. PFR 1D is essential for proper flagellar motility, a marker of promastigote virulence (Ginger et al., 2008) and it seems that flagellar morphology gets affected after few   passages. Some of the calpains have lost protease domain to gain microtubule organization function like SMP-1 (Branquinha et al., 2013). SMP-1 is required in all promastigote stages and also for transformation of amastigotes to promastigotes for the development of flagella (Tull et al., 2010). Thus over expression of these atypical CALPs in the later passages indicate struggle of the promastigotes to convert to infective forms. The role of calpains in signal transduction and cytoskeletal remodeling is well documented in trypanosomatids (Galetovic et al., 2011), and structural deformities may affect the virulence function of the parasite.
Metabolic reprogramming is a hallmark of Leishmania life-cycle where the parasite switches between two extreme environmental conditions. In the sandfly gut glucose and amino acids are the primary source of carbon while once inside the host, fatty acids and amino acids provide the necessary carbon for metabolism. Increased transcript abundance of 3-hydroxyisobutyryl-coenzyme a hydrolase-like protein in the early passages relative to the late is in consistency with the preparedness of the infection ready parasites to use beta-oxidation of fatty acids, as observed by others (Alcolea et al., 2009) (Supplementary Data Sheet S4). As opposed to the earlier observations (Moreno et al., 2014), level of tyrosine amino transferase presented a reverse trend, as it was slightly downregulated in the early passage, although protein expression may reveal different results as observed previously for Trypanosoma cruzi (dos Santos et al., 2012). Purine acquisition and inter-conversion is an important determinant of environmental sensing and morphogenesis in Leishmania. We observed a decrease in expression of a putative ribonucleosidediphosphate reductase small chain, involved in this pathway in the late passage promastigotes in sync with earlier observation in L. donovani under purine stressed condition (Martin et al., 2014). On the other hand an ATP diphosphohydrolase was upregulated in the virulent parasites indicating preparedness for hostcell invasion and immunomodulation (Figueiredo et al., 2016) reflected in the differential host response in our infection studies. Amino acid transporter (aATP11) which has also been associated with purine starved condition (Martin et al., 2014) triggering parasite differentiation to metacyclic form (Inbar et al., 2017) was upregulated in the late passage, again reinstating the fact that these parasites lag in morphological transformation and try hard to attain infective status when growing in culture for a long time. Prostaglandin f2-alpha synthase involved in prostaglandin biosynthetic pathway is reduced in expression in the more infective promastigotes while the gene expression is more pronounced in the later passages indicating non-completion of metacyclogenesis as reported earlier (Araujo-Santos et al., 2014). Leishmania salvage folate and pteridines from their host or vector as they cannot synthesize them. The expression of these genes is higher in infective promastigotes and is known to aid in survival under extreme intra macrophagic conditions (Papadopoulou et al., 2002;Alcolea et al., 2010). As expected a gene for this protein on chromosome 6 was upregulated till 11th passage followed by a significant under expression in the late passage (Figure 9 and Supplementary Data Sheet S4).
Altogether it signifies that nutrient availability is strongly linked to gene expression in Leishmania and that many metabolites and classical metabolic enzymes are capable of altering the state of cellular differentiation and development as has been extensively studied in cancer cells (van der Knaap and Verrijzer, 2016). Metallopeptidases are important virulence factors in Leishmania and we already observed differential regulation of the major surface peptidase gp63 at protein level. Here, we observed slightly upregulated expression of two peptidases a metallopeptidase and a thimet oligopeptidase in the late passage. Transcriptomic data from late logarithmic phase promastigotes of Trypanosoma brucei growing in normal and purine supplemented medium displayed under expression of thimet oligopeptidase A in the former system linking it to purine stressed condition when approaching metacyclic state (Fernandez-Moya et al., 2014) corroborating our finding.
FIGURE 9 | Hierarchical Clustering Heatmap of differentially expressed genes in early, intermediate and late passages. Heat maps were generated to show the comparative log 2 abundance ratios between early, intermediate, and late passages using RNAseq data. Upregulated proteins are depicted by blue bars, downregulated proteins by red bars.
As pointed out above, genetic and epigenetic control of gene expression is central to adaptation of the parasite to its immediate and prospective environment. Consistent with that we observed an increased abundance of a histone-like transcription factor (CBF/NF-Y) and archaeal histone transcript in early passage relative to the late passage. The histone modifying enzyme histone deacetylase was also over expressed in the early passage. This protein has been associated with the disease causing stage of the parasite (Vergnes et al., 2005) and its higher activity may be linked to more invasiveness in the parasite. The higher protein level expression of HDAC in early passage promastigotes was also confirmed by western blot on whole cell lysates (Supplementary Figure S5). Together with this, protein modifying enzyme farnesyltransferase presented a downward trend in the late passage (Figure 9 and Supplementary Data Sheet S4). This gene product has been associated with severe growth impairment in trypanosomatids (Gillespie et al., 2007). A putative pre-mRNA splicing factor ATP-dependent RNA helicase, member of DEAD box RNA helicase family presented increased expression till 11th passage. These proteins were shown to be strongly associated with mRNA of differentiating promastigotes as well as interactome of in vitro cultured amastigotes (Saxena et al., 2007). These proteins are mostly transiently upregulated or downregulated controlling the translational machinery. One of them has been associated with growth defects in L. infantum (Barhoumi et al., 2006). DNA repair related MutS-like proteins play important role in preventing genotoxicity and oxidative damage, a survival strategy adapted by many prokaryotic and eukaryotic pathogens (Genois et al., 2014), was upregulated in the virulent passage. Among the other differentially expressed transcripts are non-protein coding rRNA genes and conserved proteins with unknown functions. Non-coding RNAs along with proteins in the secreted exosomal milieu of Leishmania are known to modulate host-parasite interaction and changes in the composition of these exosomes may affect infectivity (Lambertz et al., 2015).
The results indicate that subtle genomic and transcriptomic variations drive the adaptation of promastigotes to culture condition and these changes mostly lead to loss of virulence in addition to morphological and metabolic modifications. Of particular interest is the role played by peptidases of different classes mainly metallopeptidases and calpain-like cysteine proteases in host-parasite interaction. Another important aspect is the importance of purine stress and modes of purine acquisition which modulate parasite and host responses to infection.

Significant Regulation of Virulence Proteins in Comparative Protein Expression Studies
Till date the loss of infectivity in late passage culture has been explained by reduced expression of virulence factors, particularly those which can interact with and modulate the host cells. Proteomic comparison of crude membrane extracts of L. donovani AG83 promastigotes (LAg) revealed altered protein expression, particularly a decrease in expression of some known virulence factors after continuous passaging (Table 3 and Supplementary Figure S6). There was very little overlap between transcriptomic and proteomic data which reinstates the fact that leishmanial gene expression is majorly controlled at post-transcriptional level (Lahav et al., 2011). A total of 35 differentially expressed spots were identified, out of which 31 spots presented higher expression in the early passage while four spots were highly expressed in the late passage. For example, gp63 (Olivier et al., 2012), HSP 70 (Ramirez et al., 2013a), elongation factor 1 alpha (Nandan et al., 2003), etc. have all been reported as virulence factors and were over expressed in the early passage. Western blot analysis on whole cell lysates of early and late passage promastigotes confirmed increased expression of gp63 in the virulent parasites (Supplementary Figure S5). The pyruvate dehydrogenase complex and its component dihydrolipoamide acetyltransferase has been associated with multiple functions including oxidative defense and regulation of gene expression in various pathogenic microbes (Spalding and Prigge, 2010), and presented an increased expression in the early passage. Apart from these, many hypothetical proteins also displayed differential protein expression and may determine the invasive capacity of the parasite. Transcript expression of each gene is also presented in Table 3 and presented a similar trend. Most of the proteins which displayed differential expression in our study form a part of the infectious exosomal proteins (Silverman et al., 2010). The changes in exosomal milieu probably changes the modulatory capacity of the parasites of late passage inside macrophages, which in turn make them more susceptible to microbicidal attack. The ultimate protein repertoire required for successful hostparasite interaction is the result of ordered changes in the genetic make-up and/or gene expression in the promastigotes.
An upregulated calpain like protein related to small myristoylated protein-1 (SMP-1) may also signify the struggle of late passage promastigotes to transform into infective metacyclic forms and as already mentioned in the previous section. It is also interesting to note that some calpain-like proteins, particularly those lacking the protease domain, are upregulated in the late passages while some others with protease activity are downregulated or psedogenized in the avirulent late passage parasites. Another consistently upregulated protein in the late passage was a putative ATPase beta subunit. Although there is no report in protozoans, a report linked increased acid sensitivity of bacterial pathogen to ATPase overexpression (McEntire et al., 2004). This may explain partly the non-viability of late passage parasites inside acidic host phagolysosomes. Beta tubulin was significantly up regulated in the non-infectious stage of the pathogen (Coulson et al., 1996). On the contrary alpha tubulin was mostly down regulated in the later passages. Tubulin plays a very important role in cytoskeleton formation and its expression is post-transcriptionally controlled and tightly linked to parasite transformation (Ramirez et al., 2013b). During the infection cycle, the pathogen utilizes hosts cytoskeletal machinery for pathogenicity (Gruenheid and Finlay, 2003) probably leading to less expression of these proteins.
Our study indicates that pathogenicity in L. donovani is a complex mechanism wherein the parasites are pre-adapted to a pathogenic lifestyle where nutritional stress and environmental sensing lead to global changes in both genes and gene expressions,  ultimately leading to the production of effector molecules inside the host culminating in infection. The heteroxenic lifecycle and pleomorphic nature of the parasite adds another dimension to this process as successful transformation to metacyclic promastigotes determines virulence phenotype both in in vivo (other studies) in sandfly and in vitro (others and our study) in culture medium. It is re-established from these studies that long term cultivation of Leishmania promastigotes in axenic environment diminishes their adaptability toward intracellular life, and many of the factors involved in this process are markers of natural virulence attenuation. Leishmania have evolved several strategies to sense their immediate environment individually as well as collectively and respond to the same by controlling gene expression, and ultimately cell division and transformation. Transporter proteins are known to be involved in this pathway in other pathogens (Hlavacek et al., 2009). Continuous axenic cultivation possibly reverses this pathoadaptive response by inducing mutations in these genes which leads to loss of virulence. Further, pseudogenization of calpain like proteins involved in signal transduction and parasite transformation results in growth arrest at non-infective stages. Studies on in vitro cultured drug resistant and sensitive clinical isolates of L. donovani provided evidence that the former system has added advantage of increased metacyclogenesis in culture and higher infectivity index in host cells (Vanaerschot et al., 2010), which is also reflected in the miltefosine sensitivity assay done on the early and late passage promastigotes. Moreover, early and late passage promastigotes display differential infectivity (this work) because interaction of virulent L. donovani with host macrophages triggers differential transcriptional modulation in the latter as compared to avirulent strains (manuscript communicated elsewhere). A parallel gene expression study in the intracellular amastigotes also pointed toward higher expression of transporter and calcium-dependent cysteine-type endo peptidase activity genes (GO: 0005215; GO: 0004198) in more virulent forms (manuscript communicated elsewhere), strengthening the relevance of pre-adaptation in virulent parasites. The most important factor responsible for adaptive changes is the nutritional status of the immediate environment, sensing which the promastigotes undergo morphological and metabolic switch. This phenomenon is also strongly connected with transporter system in parasitic protists (Dean et al., 2014). In our genetic studies, we found domain duplication in acetyl CoA synthetase gene in the early passage promastigotes. Recent works have highlighted the importance of this enzyme in L. donovani infectivity owing to lipid and ergosterol biosynthesis (Soumya et al., 2017). On a broader note, this enzyme and its product acetyl CoA has been linked to global gene expression and chromatin regulation (van der Knaap and Verrijzer, 2016). This supports the theory of differential gene expression, and epigenetic changes in the virulent versus avirulent parasites, and reiterates the fact that nutrition and transcription are tightly linked. Another facet of the study relates to the importance of DNA repair and recombination activity prevalent in these parasites which promote antigenic variation suitable for intracellular adaptability and pathogenicity and needs further delving. This was indicated by differential regulation of MutS like protein at transcriptional level which has been implicated in quorum sensing independent developmental regulation in Trypanosoma recently (Zimmermann et al., 2017). Nevertheless, this in vitro cultured parasite model of gene regulation and virulence provides a basis for understanding not merely parasite adaptation to culture conditions and its genetic basis but also underlines the fact that common pathways are involved in intraspecific and intrastrain variations observed in Leishmania. Most of the analysis done so far in literature is based on comparison of several clinical isolates in Leishmania. However, comparison of various cultured passages of promastigotes are reported here.
Interestingly, in this study, we report a large number of ABC transporters undergoing structural changes in later passages, thus their inactivation may be a possible cause of loss of virulence. We have also found that in addition to several important genes such as GP63 and heat shock proteins which are already known to play major role in virulence, calpain like cysteine proteases may have significant role in loss of virulence since the genes undergo alteration and the expression levels change.
This work re-establishes the findings of other studies at transcriptomic and proteomic level to exercise caution while using serially passaged promastigotes for infectivity and therapeutic studies but also adds genetic analysis to this list. The revelation from our study that the NCBI reference strain was derived from clinical isolate which was cultured in vitro for multiple passages reiterates that subtle changes in the genome due to axenic cultivation may have important implications on the virulence phenotype. Moreover, similar genomic modifications may come into play when Leishmania infects different hosts, different vectors or even different sites of the same host which leads to differential host responses (Elso et al., 2004). Thus within-host or even within-vector selection pressure appears to be the major factor driving the modifications in the parasite which make them adaptable to their environment (Shaw, 1997), and is accompanied by no major gene losses but rather subtle polymorphisms that possibly alter the functionality of the genes. It is further evidenced from the higher expression of transporter and calpain-like proteins in visceralizing L. donovani strains from Sri Lanka (Zhang et al., 2014) compared to strains causing cutaneous lesions. Virulence is a function of nutritional status of the parasites and more time spent in the nutrient rich culture medium diminishes their pathoadaptive characteristics. Significant is the fact that changes in genes and gene expression occur in same/similar set of genes both in culture and in nature for virulence attenuation and cultured promastigotes can well mimic their sandfly counterparts but long-term cultures should be avoided. Changes in transporters and certain housekeeping genes can be markers of virulence. Additionally the new class of calpain peptidases are emerging as important virulence determinants along with more established metallopeptidases and cathepsin-like cysteine peptidases.

CONCLUSION
The work presented here highlights the importance of newly emerging pathoadaptive factors like transporters and calpain-like proteins in Leishmania virulence. Our work can add to the pool of information on adaptive genomics in L. donovani and careful mining of these genes can be used for virulence surveillance at least throughout the Indian subcontinent. The genome sequences of early and late passage promastigotes can act as references for future genomic studies on Leishmania, particularly related to virulence and drug resistance. Annotated leads from this study and further annotation and functional characterization of hypothetical proteins can provide novel drug/vaccine targets for disease management.