Epitope-Based Immunoinformatics and Molecular Docking Studies of Nucleocapsid Protein and Ovarian Tumor Domain of Crimean–Congo Hemorrhagic Fever Virus

Crimean–Congo hemorrhagic fever virus (CCHFV), the fatal human pathogen is transmitted to humans by tick bite, or exposure to infected blood or tissues of infected livestock. The CCHFV genome consists of three RNA segments namely, S, M, and L. The unusual large viral L protein has an ovarian tumor (OTU) protease domain located in the N terminus. It is likely that the protein may be autoproteolytically cleaved to generate the active virus L polymerase with additional functions. Identification of the epitope regions of the virus is important for the diagnosis, phylogeny studies, and drug discovery. Early diagnosis and treatment of CCHF infection is critical to the survival of patients and the control of the disease. In this study, we undertook different in silico approaches using molecular docking and immunoinformatics tools to predict epitopes which can be helpful for vaccine designing. Small molecule ligands against OTU domain and protein–protein interaction between a viral and a host protein have been studied using docking tools.


INTRODUCTION
Crimean-Congo hemorrhagic fever (CCHF) is a viral hemorrhagic fever of the Nairovirus group, primarily distinguished as zoonosis. Occurrence rate of sporadic cases and outbreaks of CCHF affecting humans have been reported. It has a history of endemic infections in South Africa, Pakistan, Kosovo, Turkey, United Arab Emirates and some parts of Asia. It is so named because it was first observed in Cremia of Congo in 1944(WHO, 2011. It belongs to Bunyaviridae family of Nairovirus group. It spreads in animals through ticks or fleas and can infect human beings through the animal reservoir (Whitehouse, 2004). CCHF virus, abbreviated as CCHFV, recently noted in Gujarat, killing four people, the first outbreak reported in India. National Institute of Virology (NIV), India confirmed this outbreak and found to be due to hyalomma ticks (International Society for Infectious Diseases, 2011). The infected person normally develops febrile illness with headache, joint, lumbar, and abdominal pain with nausea, initially. After 3-5 days, hemorrhaging develops with skin discoloration and blood in saliva, urine, and vomit, leading to vascular collapse and death around 10 days after onset (WHO, 2011).
Crimean-Congo hemorrhagic fever virus has a tripartite genome consisting of large (L), medium (M), and small (S) negative-sense RNA segments that encode the viral RNA polymerase, the glycoproteins, and the nucleocapsid protein (NP), respectively (Clerx et al., 1981). Viral infection is detected by host cells via Toll-like receptor (TLR)-dependent and -independent mechanisms which results in ubiquitination and ISGylation of target proteins to induce the innate immune system. Ubiquitination involves a regulatory protein called Ubiquitin (Ub) which conjugates with target (viral) proteins and forms an Ub-tag which in turn, recognized by proteosome for destruction (Staheli et al., 2007). ISGylation engages an Ub-like molecule known as interferon-stimulated gene product 15 (ISG15), which covalently conjugates and induce innate response (Arguello and Hiscott, 2007). The viral L protein has an ovarian tumor (OTU) domain of cysteine protease family, located in N terminus, is dispensable for RNA-dependent RNA polymerase (RdRp) and deubiquitinating (DUB) activity. In order to invade host antiviral mechanisms (Figure 1), OTU domain conjugates with Ub and ISG15 to establish viral state (Bergeron et al., 2010). It was also reported that host system makes use of interferon (IFN) -induced defense mechanism against a number of viruses belonging to Bunyaviridae. MxA protein, a member of dynamin superfamily of large GTPases is exclusively induced by alpha and beta IFNs (Haller and Kochs, 2002). It has been found that MxA interacted with NP and the yield of progeny virus was reduced to 1,000-fold in CCHFV infected host cells. Hence, it is clear that human MxA protein has antiviral activity against CCFHV (Andersson et al., 2004). Current status for the treatment of CCHFV infections utilizes ribavirin to combat the infection (WHO, 2011). Treatment in infected individuals was shown to exhibit the in vivo genotoxicity and potential side effects caused by ribavirin (Tatar et al., 2005).
In the present study, in silico drug designing and immunoinformatics strategies have been exploited using bioinformatics software and programs. Epitope-based immunoinformatics study was carried out for NP in order to predict informative epitopes www.frontiersin.org which can be helpful for future vaccine designing. To examine the protein-protein interaction and the background of biochemical functionality between a viral and a host protein, protein-protein docking analysis of NP with human MxA protein was performed through structure-based approach. Finally, identification of effective ligand molecules to inhibit viral OTU domain were made using small molecule docking simulations.

Protein sequence retrieval and epitope prediction parameters
We retrieved the protein sequence of NP of CCHFV (Accession No. AAX86921.1) from National Centre for Biotechnology Institute (NCBI) database (Benson et al., 2005). The locations of continuous epitopes have been correlated with various parameters (such as hydrophilicity, flexibility, accessibility, turns, exposed surface, polarity, and antigenic propensity of polypeptide chain). The epitope predictions are based on propensity scales for each of the 20 amino acids.

Assessment of solvent accessibility regions
The prediction for regions of surface accessibility is based on Emini's surface accessibility scale. Accessibility profile is predicted using the formula Sn = ( i−1 Π 6 δn + 4 + i) × (0.37) −6 where Sn is the surface probability, δn is the fractional surface probability value, and i vary from 1 to 6. A hexapeptide sequence with Sn greater than 1.0 indicates an increased probability for being found on the surface (Emini et al., 1985).

Epitope flexibility prediction
To strengthen the prediction accuracy, we used the Karplus and Schulz flexibility scale. This scale is derived from the mobility of protein segments based on the known temperature B factors of the alpha carbons of 31 proteins of known structure. The calculation is performed taking the center amino acid as the first amino acid of the six amino acid window length (threshold setting = 1.000; Karplus and Schulz, 1985).

Prediction of antigenicity
Peptides could be predicted in such a manner that it should possess the antigenic character and are likely to be antibody responsive. Antigenicity prediction was carried out using Kolaskar and Tongaonkar antigenicity scale. This prediction is based on a semiempirical approach, developed on physicochemical properties of amino acid residues and their frequencies of occurrence in experimentally known segmental epitopes and has the efficiency to detect antigenic peptides with about 75% accuracy (threshold setting = 1.000; Kolaskar and Tongaonkar, 1990). The location of linear B-cell epitope was predicted using Bepipred linear epitope prediction which utilizes a combination of a hidden Markov model and a propensity scale method (threshold setting = 0.350; Larsen et al., 2006).

Hydrophobicity and hydrophilicity analysis
The amino acids making up the epitope are usually charged and hydrophilic in nature. Parker hydrophilicity scale was applied to evaluate the hydrophilicity. It is based on peptide retention time in high-performance liquid chromatography (HPLC) on a reversedphase column. Using a window of seven residues, these experimental values are calibrated for each of the seven residues and the arithmetical mean of the seven residue value was assigned to the fourth, (i + 3), residue in the segment (threshold setting = 1.678; Parker et al., 1986).

Frontiers in Genetics | Bioinformatics and Computational Biology
To understand the distribution of polar and apolar residues along a protein sequence, hydrophobicity plot was studied using Kyte-Doolittle hydropathicity index (threshold setting ≥0 are hydrophobic) available at Expasy Protscale server (Gasteiger et al., 2005). Kyte-Doolittle is a widely applied scale for delineating hydrophobic character of a protein and useful in predicting membrane-spanning domains, potential antigenic sites, and regions that are likely exposed on the protein surface (Kyte and Doolittle, 1982).

HLA binding peptide prediction
Major histocompatibility complexes (MHC-I and MHC-II) display specificity for binding with their respective epitopes. In human, these MHC molecules are known by human leukocyte antigen (HLA) alleles. The HLA binding peptides were predicted (threshold setting = 3.000) using TmhcPred server. This prediction is based on the virtual and quantitative matrices based on 97 MHC alleles using position specific scoring matrices (PSSMs) and utilizes supervised learning method called support vector machine (SVM; Bhasin et al., 2003).
Note that the predictions based on the above mentioned scales viz. Emini, Karplus and Schulz, Kolaskar and Tongaonkar, Bepipred linear epitope prediction and Parker are different webbased tools centralized in a repository implemented by immune epitope database and analysis resource (IEDB) for the prediction and analysis of immune epitopes (Zhang et al., 2008).

Ab initio modeling of NP
An attempt was made to construct a homology model of NP using Swiss Model (Schwede et al., 2003). No suitable template was found by target-template alignment programs such as BLAST and HHSearch with the significant cutoffs for template identification (BLAST: e-value = 0.0001; HHSearch: e-value = 0.0001 and p-value = 50). Therefore, an ab initio model was constructed using Bhageerath server. Bhageerath, an energy based approach which narrow downs the search space of tertiary structures employing sequence and secondary structure information as building stage; and later filtered by atomic potential based biophysical methodology to present 10 plausible candidate structures (Jayaram et al., 2006).

Structure validation and energy minimization of modeled NP
Modeled NP structure was validated for structure correctness and stereochemistry using Ramachandran plot (Ramachandran et al., 1963) from RAMPAGE server (Lovell et al., 2003). The validated structure was subjected to energy minimization using NOMAD-Ref server (Lindahl et al., 2006). Normal mode analysis, deformation, and refinement, abbreviated as NOMAD-Ref, make use of elastic network model (ENM) as well as classical force fields to calculate normal modes for structural refinement and optimization.

Functional site identification of MxA protein
The protein structure of human MxA protein (Gao et al., 2010) was retrieved from Protein Databank (PDB ID: 3LJB; Bernstein et al., 1977). The MxA protein structure is comprised of two unique chains, functional site identification was carried out for a single chain using ConSurf server (Ashkenazy et al., 2010). This enables the identification of functionally important regions on the surface of a protein or domain of known three-dimensional protein structure and relied on the phylogenetic relations between its close sequence homologs.

Protein-protein docking
Protein-protein docking simulations were performed using global range molecular matching (GRAMM)-X protein-protein docking web server v.1.2.0 (Tovchigrechko and Vakser, 2006). In this process, optimal functional site of MxA protein deduced by Con-Surf was implemented as potential interface region for docking simulations in order to avoid blind search and modeled NP was given as ligand input and best 10 returned docked conformations were analyzed. GRAMM, an empirical approach which smoothes the intermolecular energy function by changing the range of the atom-atom potentials, locates the area of the global minimum of intermolecular energy using a characteristic six-dimensional exhaustive search through the relative translations and rotations of the molecules. Moreover, this docking simulation was repeated using ClusPro protein-protein docking program (Kozakov et al., 2010). The program recruits PIPER, a FFT (Fast Fourier Transform) based rigid docking program in its first stage to generate 1,000 low energy docked conformations using pairwise interaction potentials. In the second stage, ClusPro clusters these conformations and retains 30 largest clusters having low energy. Later, the retained clusters are analyzed by SDU (Semi-Definite programming based Underestimation) program which predicts the stability of the clusters using medium-range optimization algorithm (resembling the funnel-like behavior of the free energy to attain local minima) and the stable clusters are then refined using Monte-Carlo simulation.

Sequence retrieval and multiple sequence alignment to find active site
Active site of viral OTU domain was identified using multiple sequence alignment (MSA) with human Ub-specific protease otubains 1 and 2 (Accession No. AAO27702.1 and AAO27703.1). Protein sequences were retrieved from NCBI (Benson et al., 2005) and MSA was performed using EBI ClustalW program (Blosum weighting matrix, gap opening penalty = 10 and gap extension penalty = 0.1; Larkin et al., 2007). Further, active site pocket was structurally analyzed using CASTp program (Dundas et al., 2006).

Preparation of protein target structure and ligand molecules
The protein structure of viral OTU domain protease (James et al., 2011) was retrieved from PDB (ID: 3PT2). The dataset comprised of four ligands having known inhibitory activity chosen from literature, out of which one molecule is N -ethylmaleimide (NEM) and the other three were its analogs. All the molecules were computationally designed using Marvin Sketch 5.2 and subsequently, geometrically optimized using three-dimensional cleaning utility (ChemAxon LLC, 2008).

Protein-small molecule docking
Active site residues predicted from sequence alignment was specified for active site directed molecular docking. Ligand dataset under study were docked separately into the binding site of the receptor using PATCHDOCK (Duhovny et al., 2005). PATCH-DOCK performs docking simulations in three phases viz. molecular shape representation, surface patch matching, and finally, filtering and scoring. Molecular shape representation step initially computes the molecular surface of given molecule, followed by detection of geometric patches using segmentation algorithm. Surface patch matching stage applies geometric hashing and poseclustering techniques to match the filtered-hot spot residues containing patches. The candidate molecule is refined at last from patch penetration between ligand and receptor molecules and ranked by shape complementarity score in its final stage.

NP EPITOPE PREDICTIONS
Epitope prediction for the NP showed regions spanning the sequence between positions 39 and 125, and scored well when all the scales were considered (Figure 2). Kolaskar and Tongaonkar antigenicity scale predicted a 24 length peptide in the positions 16-39. When all the scales were taken into consideration, this peptide did not show any significance in terms of surface accessibility, flexibility, and hydrophobicity/hydrophilicity. Similar setback occurred when a Bepipred linear epitope prediction showed an 11 length peptide in the positions of 8-18, which is in low significance region. Therefore, these two predicted epitopes were discarded. Peptides in the sequence positions 42-48, 68-77, 79-88, and 104-125 predicted using Kolaskar and Tongaonkar antigenicity scale displayed antigenicity. Further, Bepipred analysis (B-cell epitope prediction) revealed a continuous predicted epitope with 13 amino acid residues in the sequence positions 56-68 (Table 1). Beside, we found that these predicted epitope regions were observed in the surface of the modeled NP protein. Hence, the region spanning the sequence positions of 42-48, 56-68, 68-77, and 79-88 will be of greater importance for epitope-based vaccine design.

HLA BINDING PEPTIDES PREDICTIONS OF NP AND OTU DOMAIN
Nucleocapsid protein and OTU domain sequences generated 141 and 179 nanomers, respectively. Promiscuous HLA binders are those peptides which bind most HLA alleles and its respective sub-alleles with increased affinity. Prediction of promiscuous HLA peptide binders displayed similarity as well as variation. Dissimilar pattern was found in NP analysis. For example, H-2Db and H-2Dd alleles expressed different peptides, 96-RVNANTAAL-104 and 71-VEVPKIEQL-79. Similar peptide prediction was observed in OTU domain sequence (PDB ID: 3PT2 chain A). For example, HLA-B7 and B8 alleles exhibited similar peptide 81-EARLVGLSL-89 ( Table 2). Despite the use of scoring system of Tmhcpred server to rank its predictions, an in vitro and pharmacogenetic study will help to detect promiscuous HLA binders.

PROTEIN-PROTEIN DOCKING STUDY OF MxA-NP PROTEINS
Among the 10 plausible candidate structures generated, a best ab initio model for the NP was selected using Ramachandran plot. Selected model exhibited 98% amino acids in allowed regions with the exception of two outliers ( Figure 3A). Subsequently, the selected model displayed the structure with energy minimized to −5760.383 KJ/mol ( Figure 3B).
In order to evade blind docking of MxA protein, functional site identification was carried out to understand the region of protein interaction sites ( Figure 4A). It is well known that MxA protein is a homodimer of unique chains. Dimer interface site Frontiers in Genetics | Bioinformatics and Computational Biology  was predicted to be non-interacting site since it is required for its dimerization. The regions exposed to solvent were predicted to be entirely accessible for protein-protein interaction. Further, MSA of MxA protein with its close sequence homology revealed similar results. Hence, regions excluding dimer interface was utilized as potential receptor-interaction site for protein-protein docking. Protein-protein docking simulations with MxA protein as receptor and modeled NP as ligand molecule was also performed.
Among the best 10 docked conformations, a best docked conformation ( Figure 4B) was selected in which the docked regions between two proteins exhibited significant surface complementarity. To examine the possibility of any penetrations between two molecules, the selected conformation was manually studied using standard three-dimensional protein structure visualizer.
Further, this docking simulation was repeated using Clus-Pro protein-protein docking program to confirm the interactions of MxA-NP protein. Rigid body dock PIPER program yielded a weighted score of −1066.40 Kcal/mol whereas Clus-Pro provided a largest cluster (first cluster as per rank) in which 154 docked conformations were found with cluster center scored −923.80 Kcal/mol. The variation observed in the energy value of lowest energy docked conformer with its cluster centered conformer was −142.60 Kcal/mol. The comparative analysis of the docked conformations generated from GRAMM-X and ClusPro showed that the interacting site was found to be the MxA protein's dimer interface and in good correspondence with the predictions of ConSurf server. Such studies on expression level of human MxA protein in CCHFV infected human subjects will help us to demonstrate that a purified MxA protein usually a product of recombinant genetic engineering will be a best candidate for CCHFV vaccinomics, if the MxA protein expression in infected human cells is expected to below.

PROTEIN-SMALL MOLECULE DOCKING OF VIRAL OTU DOMAIN
Active site prediction of OTU domain was achieved using sequence and structure-based approaches. MSA of OTU domain with human otubains 1 and 2 revealed that a conserved pattern, DGNCFY was preserved in these cysteine proteases. The MSA analysis ( Figure 5A) showed that the active site consisted of Asp40 and Cys43. Further, structure-based active site prediction was made to find whether the conserved pattern containing the two active site residues, Asp40 and Cys43, is exposed or buried. It was observed that conserved motif is exposed and solvent accessible ( Figure 5B).
Small inhibitor molecules targeting the active site residues will be of greater importance for designing specific inhibitors for viral OTU domain. Since OTU domain belongs to cysteine protease family and has DUB activity, the literature survey could not reveal any DUB small molecule inhibitors. Hence, search was carried on for cysteine protease small molecule inhibitors having proven inhibitory activity and the potential to modify active site cysteine residue (Singh and Liu, 2000). N -ethylmaleimide (NEM) and its analogs, N -Phenyl maleimide (NPM), N,N -(1,2-phenylene) dimaleimide (NNPD) and N -(1-pyrenyl) maleimide (NPYM) was used as ligand dataset (Figure 6). Conserved motif was employed as active site residues (residues: 37-42) for active site directed docking procedure using PATCHDOCK. Among the 200 docked conformations generated for each procedure, best conformation was selected using the shape complementarity score and area of docking translations (Table 3; Figure 6). Due to non-specificity of potent cysteine proteases having the ability to harm all the cellular proteins exposing cysteine residues on surface, there arises a requirement of potent cysteine protease inhibitors which should www.frontiersin.org    target the complete conserved motif. On the other hand, fragment based and de novo drug design of DUB inhibitors will prove to be a best drug candidate to treat against this deadliest CCHFV infection.

DISCUSSION
The present study was focused on in silico epitope-based immunoinformatics and molecular docking against the CCHFV deadliest virus. Despite the hindrances to initiate research on CCHFV as it requires biosafety level-4 (BSL-4), the high pathogenicity to establish endemic and the non-availability of effective treatments poses a great threat to human community. These predicted epitopes may play a highly informative and crucial role in antidote production against CCHFV. Common antigenic peptides are also important for synthetic peptide vaccine production or antidote production. Similarly, small molecule based drug designing can also be initiated simultaneously. Highly specific NEM analogs and/or DUB inhibitors designed using fragment based or de novo approaches will prove to be the best drug candidates to www.frontiersin.org establish effective treatment against this infection. The real consequences will arise if this pathogenic form takes the form of a biological warfare as similar to Anthrax. We hope that our current findings would help the pharmacologists as well as scientific research bodies to instigate research on this very recent outbreak.