Contriving Multi-Epitope Subunit of Vaccine for COVID-19: Immunoinformatics Approaches

COVID-19 has recently become the most serious threat to public health, and its prevalence has been increasing at an alarming rate. The incubation period for the virus is ~1–14 days and all age groups may be susceptible to a fatality rate of about 5.9%. COVID-19 is caused by a novel single-stranded, positive (+) sense RNA beta coronavirus. The development of a vaccine for SARS-CoV-2 is an urgent need worldwide. Immunoinformatics approaches are both cost-effective and convenient, as in silico predictions can reduce the number of experiments needed. In this study, with the aid of immunoinformatics tools, we tried to design a multi-epitope vaccine that can be used for the prevention and treatment of COVID-19. The epitopes were computed by using B cells, cytotoxic T lymphocytes (CTL), and helper T lymphocytes (HTL) base on the proteins of SARS-CoV-2. A vaccine was devised by fusing together the B cell, HTL, and CTL epitopes with linkers. To enhance the immunogenicity, the β-defensin (45 mer) amino acid sequence, and pan-HLA DR binding epitopes (13aa) were adjoined to the N-terminal of the vaccine with the help of the EAAAK linker. To enable the intracellular delivery of the modeled vaccine, a TAT sequence (11aa) was appended to C-terminal. Linkers play vital roles in producing an extended conformation (flexibility), protein folding, and separation of functional domains, and therefore, make the protein structure more stable. The secondary and three-dimensional (3D) structure of the final vaccine was then predicted. Furthermore, the complex between the final vaccine and immune receptors (toll-like receptor-3 (TLR-3), major histocompatibility complex (MHC-I), and MHC-II) were evaluated by molecular docking. Lastly, to confirm the expression of the designed vaccine, the mRNA of the vaccine was enhanced with the aid of the Java Codon Adaptation Tool, and the secondary structure was generated from Mfold. Then we performed in silico cloning. The final vaccine requires experimental validation to determine its safety and efficacy in controlling SARS-CoV-2 infections.


INTRODUCTION
In December 2019, COVID-19, caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) was first discovered in China and has rapidly spread across the world. As of 12:00 noon on June 4, a total of 6,392,319 confirmed cases of COVID-19 have been reported globally, including 383,318 deaths. The prevalence of the disease has been increasing at an alarming rate.
There were 1,849,852 cases in the United States, 555,383 in Brazil, 431,715 in Russia, 281,270 in the United Kingdom, and 3,275,736 in a number of other countries (1).
The incubation period for the virus is ∼1-14 days, and all age groups are susceptible to a fatality rate of about 5.9%. The most common clinical manifestations are low-grade fever, dry cough, fatigue, and gastrointestinal symptoms (2). About half of all patients with COVID-19 develop shortness of breath, and severe cases may rapidly develop SARS, septic shock, difficult-to-correct metabolic acidosis, and coagulation disorders (3). COVID-19 may also affect other organs, most commonly the heart and kidneys (4)(5)(6). Some patients may have mild symptoms, without fever, and may recover after 1-4 weeks (7). Other patients may show signs of serious illness and some may die; however, most patients show favorable progress (8). Male individuals with the disease and aged patients have the worst prognosis. In children, the disease is relatively mild (9).
COVID-19 is caused by a novel single-stranded, positive (+) sense RNA beta coronavirus, which is a pathogen of the Coronaviridae family, named SARS-CoV-2 (10). The full-length genome sequences revealed that SARS-CoV-2 has the greatest genetic similarity to bat coronavirus, ∼45-90% similarity to severe acute respiratory syndromerelated coronavirus (SARSr-CoV), and a smaller similarity of 20-60% to the Middle East respiratory syndrome-related coronavirus (MERS-CoV) (10). Thus, a bat might be the original host of SARS-CoV-2, but the intermediate host remains undiscovered (10).
The genes of SARS-CoV-2 encode structural proteins and non-structural proteins. Four structural proteins are absolutely vital for viral assembly and invasion of SARS-CoV-2. Spike protein homotrimers constitute the spikes on the viral surface, and these spikes are responsible for attachment to host cells by binding to their receptors (10). The M protein has three transmembrane domains, which determine the shape of the virion, facilitate membrane curvature, and bind to the nucleocapsid. The E protein plays an important role in virion assembly and release, as well as involved in viral pathogenesis. The N protein has two different domains, both of which bind to the viral RNA genome via totally different mechanisms. In addition, some reports have shown that non-structural proteins are essential for the replication of coronaviruses (10).
Vaccination is a vital tool for the control and elimination of the virus, and the development of a vaccine for SARS-CoV-2 remains an urgent need (11). Traditional methods of vaccine development are time-consuming and very labor-intensive (12). The realm of immunoinformatics tools considers the mechanism of the host immune response to yield additional methodologies in the design of vaccine against diseases are cost-effective and convenient, as in silico predictions can reduce the number of experiments needed (13,14). Dozens of studies have generated epitope-based peptide vaccine of SARS-CoV-2. Baruah and Bose (15) used immunoinformatics tools to discover cytotoxic T lymphocyte (CTL) and B cell epitopes for the spike protein of SARS-CoV-2. Then, Abraham et al. developed a multi-epitope vaccine that was designed using immunoinformatics tools that potentially trigger both CD4 + and CD8 + T-cell immune responses (16).
Although there are many vaccines generated by immunoinformatics tools, most of these are based on spike protein. The spike protein is responsible for attachment to host cells by binding to angiotensin-converting enzyme 2 (ACE2) (17). A vaccine based on the spike protein could induce antibodies to block SARS-COV binding and fusion or neutralize virus infection (18). But there are still many obstacles, spike protein-based SARS vaccine may induce harmful immune responses that cause liver damage of the vaccinated animals (19). Other virus proteins are considered as the candidates for designing vaccine with protective and less harmful immune responses (20). Vaccine-based on structural and non-structural proteins of the virus is revealed potential vaccine inducing protective immune responses (20,21). Pandey et al. reported the more scientifically rigorous strategy of multi-epitope subunits based on multiple proteins against parasitic and viral diseases, such as malaria, visceral leishmaniasis, and HIV (22)(23)(24). In this present, we employed immunoinformatics to predict multiple immunogenic proteins from the SARS-CoV-2 proteome and thereby design a multi-epitope vaccine. These proteins included non-structural and structural sequences of SARS-CoV-2, their reference sequences were retrieved from the National Center for Biotechnology Information (NCBI) database.

Retrieving COVID-19 Protein Sequences
The proteins of the SARS-CoV-2 have been reported and reference could get from NCBI (25,26). The reference sequences of SARS-CoV-2 proteins were retrieved from NCBI Protein Database (https://www.ncbi.nlm.nih.gov/protein) and accession numbers in Table 1, then we stored the reference sequences as a FASTA data type. The proteins with <100 amino acid sequences which are too short to predict epitopes were excluded, the remaining proteins were used for further analysis.

Identifying Antigenicity of Protein Sequences
VaxiJen is the first server for alignment-independent prediction of protective antigens, which overcome the limitations of alignment-dependent methods (27). To identify the potential antigenicity of SARS-CoV-2 proteins, an online prediction server, VaxiJen v2.0 (http://www.ddg-pharmfac.net/vaxijen/ VaxiJen/VaxiJen.html) was used to predict the antigenic values of each protein (28). This identification was applied according to the default parameters of the server. Proteins having antigenicity were sorted according to an antigenic score of ≥ 0.5 (Threshold for this model is 0.5) and were selected for further structural modeling (27).

Structural Modeling of SARS-COV-2 Proteins
There are no available experimental structures of SARS-COV-2 proteins, Phyre 2 provide model regions trough a new ab initio folding simulation with no detectable homology (29). The SARS-CoV-2 proteins were modeled by Phyre 2 server (http://www.sbg. bio.ic.ac.uk/phyre2/). Because the SARS-COV-2 proteins with no detectable homology protein to finish the modeling, we chose the intensive search and output the accurate alignment by the alignment of hidden Markov models. ModRefiner was used by the GalaxyRefine server (http:// galaxy.seoklab.org /cgi-bin/submit.cgi?type=REFINE) (30). The structure assessment was performed by the SWISS-MODEL workspace (https://swissmodel.expasy.org/assess) (31). The three dimensional (3D) models were used for the conformational (discontinuous) B-cell epitope predictions while the sequences were utilized in linear B-cell and T-cell epitope predictions.

Prediction of CTL Epitopes
NetCTL-1.2 is demonstrated to have a high predictive performance (32). The NetCTL 1.2 server (http://www.cbs. dtu.dk/services/NetCTL/) was applied to predict CTL epitopes for the SARS-CoV-2 at the threshold value of 0.75 with high sensitivity and specificity (32). To cover ∼90% of the world's population, three supertypes (A2, A3, and B7) were selected based on artificial neural networks, to predict MHC class I binding epitopes (33). The best candidates for the SARS-CoV-2 vaccine construction were sorted for further prediction, based on a half-maximal inhibitory concentration (IC50) < 500 nm and an integrated score. The IC50 < 500 nm represents epitope has a high affinity to receptor. The integrated score indicated the transporter of antigenic peptides (TAP) transport efficiency, class I binding, and proteasomal cleavage prediction (34)(35)(36). Then the specific Treg epitopes were screened and excluded by the EpiToolKit (https://epivax.com/).

Prediction of Helper T Lymphocyte (HTL) Epitopes
For MHC class II T cell epitope predictions, The Immune Epitope Database server predicted binders based on the percentile rank or MHC binding affinity (37). The Immune Epitope Database server (IEDB; http://tools.iedb.org/mhcii/) was used to predict helper T lymphocyte (HTL) epitopes (37). We chose the combinatorial approach which recommended by IEDB to predict HTL epitopes. The combinatorial approach combined NN-align, SMM-align, CombLib, Sturniolo, and NetMHCIIpan methods (38)(39)(40)(41)(42). The 17 alleles of the human leukocyte antigen (HLA) were selected for the prediction at α and β chains, separately (43). For final construction, epitopes were selected based on their scores (low scores indicated favorable binding), the release of interferongamma (IFN-γ), induction of emergent properties, and the IC50 < 500 nm.

Prediction of IFN-γ Inducing Epitopes
The IFN-γ cytokine makes a major contribution to antiviral mechanisms. It excites both native and specific immune responses by activating macrophages and natural killer cells (44). Further, IFN-γ augments the response of MHC to antigens. The IFN-γ epitope server (http://crdd.osdd.net/raghava/ifnepitope/ scan.php) was used to recognize IFN-γ epitopes (45). We entered the HTL epitopes with low scores into the IFN-γ epitope server. Positive IFN-γ induction was predicted based on the support vector machine (SVM) hybrid approach. The final HTL epitopes were determined based on IFN-γ induction and MHC Class II binding, both of which facilitate the stimulation of T-helper cells (46).

Prediction of Line and Conformational B Cell Epitopes
The ABCpred (http://crdd.osdd.net/raghava/abcpred/) and BepiPred linear epitope prediction (http://tools.iedb.org/bcell/ result/) servers were utilized to predict linear B cell epitopes. The ABCpred server is based on an artificial neural network (ANN) (47,48). The linear B cell epitopes of the SARS-CoV-2 protein were predicted at a threshold of 0.5. The BepiPred linear epitope prediction server is based on seven methods:  (49)(50)(51)(52)(53)(54). We used these seven methods separately to predict the average threshold. The overlap between ABCpred and BepiPred severs was selected to determine the candidate epitopes for the SARS-CoV-2 vaccine construction (55).
Unlike T-cell epitopes that are linear continuous stretches of residues, B-cell epitopes are generally conformational (discontinuous) (56). In this study, the ElliPro servers (http://tools.iedb.org/ellipro/) were applied to predict the conformational B-cells epitopes (57). The server predicts epitopes based on PI (Protrusion Index) value. The epitope with PI = 0.9 would include 90% of residues with 10% being outside the ellipsoid, discontinues B-cells epitopes with the top PI value was selected for vaccine designing (57).

Multi-Epitope Subunit Vaccine Design
To develop the final vaccine, epitopes determined by various immunoinformatics software were linked together with the aid of separate linkers. The CTL epitopes were linked by the AAY linker, HTL epitopes by the GPGPG linker, and B cells were linked by the KK linker (48,58,59). To increase the vaccine immunogenicity, the β-defensin (45 mer) amino acid sequence was adjoined to the N-terminal of the vaccine with the help of the EAAAK linker (60). The β-defensin peptides provoke innate immunity cells and recruit naive T cells through the chemokine receptor-6 (CCR-6) (61). The pan-HLA DR binding epitopes (13aa) as well as added to the N-terminal of the vaccine with the aid of the same linker (59). The pan-HLA DR binding epitopes in vaccine construct facilitating binding to many different types of mouse and human MHC-II alleles to induce CD4-helper cell responses (59). To enable the intracellular delivery of the modeled vaccine, a TAT sequence (11aa) was appended to C-terminal (62). Linkers (AYY, KK, and GPGPG) play vital roles in producing an extended conformation (flexibility), protein folding, and separation of functional domains, and therefore, make the protein structure more stable (59).

Prediction of Allergenicity, Antigenicity
The allergenic proteins induce a harmful immune response, allergenicity of the vaccine should be non-allergic (63). The non-allergic character of the vaccine sequence was evaluated by the AlgPred server (http://www.imtech.res.in/raghava/algpred/) (63). We predicted allergenicity of vaccine sequences choosing a hybrid approach (SVMc+IgE epitope+ARPs BLAST+MAST) with the highest accuracy and sensitivity (63).
The Vaxijen v2.0 server (http://www.ddgpharmfac.net/ vaxijen/VaxiJen/VaxiJen.html) was applied to evaluate the antigenicity of the vaccine (27). The antigenicity prediction method was solely based on the physicochemical properties of proteins without recourse to sequence alignment. The precision rate of the server ranged from 70 to 89%.

Immune Simulations
To determine immune response profile of this multi-epitope vaccine, computational immune simulations were performed by the C-ImmSim online server at (http://kraken.iac.rm.cnr.it/ C-IMMSIM/) (64). The C-ImmSim utilizes the Celada-Seiden model for describing both humoral and cellular profiles of a mammalian immune system against designed vaccine. As per the literature, three injections were administrated at different intervals of 1 month. The simulation was performed with default parameters. The vaccine sequence was administered 4 weeks apart. The simulation volume was 1,000, simulation steps was 1,000, random seed was 12,345, and the vaccine injection with no LPS (64).

Prediction of Various Physicochemical Properties
The ProtParam tool (http://web.expasy.org/protparam/) was used to evaluate the physicochemical properties of the final vaccine protein (65). The physicochemical properties included the number of amino acids, molecular weight, theoretical isoelectric point (pI), amino acid composition, atomic composition, formula, extinction coefficients, estimated half-life, instability index, aliphatic index, and grand average of hydropathicity (GRAVY) (66). The molecular weight and theoretical pI were computed by user-entered sequences. The amino acid and atomic compositions were self-explanatory. The extinction coefficient of a protein was based on information about its amino acid composition. The instability index of a protein indirectly indicated the stability of the protein. If the computed instability index of protein was <40, it was regarded as a stable protein, while values >40 were regarded as unstable.
In vivo half-life evaluation of proteins was based on the principle of the "N-end rule." Furthermore, GRAVY is a measurement of the hydrophobic nature of the protein, which is calculated by determining the total hydropathy of all amino acids divided by the number of amino acid residues in the protein.
To avoid inducing pathogenic priming and autoimmunity, the sequence homology of the final vaccine to human protein was screened by BLASTp online server (https://blast.ncbi.nlm.nih. gov/Blast.cgi) (67). An ideal vaccine should have non-sequence to human proteins.

Prediction, Refinement, and Quality Assessment of the Tertiary Structure of the Developed Vaccine Construct
The designed vaccine was a reconstructed protein with no detectable homology (29). Phyre2 incorporates an ab initio folding simulation to model regions of proteins with no detectable homology. The Phyre 2 server (http://www.sbg.bio. ic.ac. uk/phyre2/) was used to predict the three-dimensional structure of the designed vaccine. The server generates a fulllength 3D model of a protein sequence by employing both multiple template modeling and simplified ab initio folding simulation (29).
To enhance the overall and partial structural quality of the protein, the output 3D structure of the final vaccine from the Phyre 2 server was further refined by the GalaxyRefine server (http://galaxy.seoklab.org/cgi-bin/submit.cgi?type=REFINE) (30). The GalaxyRefine server predicted five refined models of our developed vaccine construct, in which Model 1 was made by the structural perturbation based simply on the clusters of the side chains; whereas, Models 2-5 were generated by deeper perturbations of loops and secondary structural elements (30).
For the assessment of the tertiary structure of the final vaccine protein, a Ramachandran plot was performed by the SWISS-MODEL workspace (https://swissmodel.expasy.org/ assess) (31). The Ramachandran plot illuminates favored regions for backbone dihedral angles against amino acid residues in protein structure (31). The Structure Assessment page shows the most relevant scores provided by Molprobity and help we easily identify where residues of low quality lie in their model or structure (31). Then, ProSA-web (https://prosa.services.came. sbg.ac.at/prosa.php) was employed in the final vaccine protein structure validation. A positive Z-score commonly means an erroneous or erratic section found in the generated 3D protein model (68).

Molecular Docking of the SARS-CoV-2 Vaccine Construct With the Related Antigenic Recognition Receptor
To revealing the binding affinity between the vaccine construct and antigenic recognition receptors of toll-like receptor-3 (TLR3, 2A0Z) and major histocompatibility complex (MHC-I, 4WUU, and MHC-II, 3C5J) present on the surface of immune cells (69). Docking analysis was performed using the ClusPro server (https://cluspro.bu.edu/login.php?redir/queue. php). TLR3 act as receptors for antigenic recognition. The ClusPro server computed the models based on electrostatic interactions and desolvation energy (69). To reconfirm the binding affinity of the designed vaccine construct between these receptors, the PatchDock server (https://bioinfo3d.cs.tau.ac.il/ PatchDock/) was used for docking (70). The server predicted the potential complex with the help of three algorithm-molecular shape representations, surface patch matching, filtering, and scoring (70). After the acquisition of the output from the PatchDock server, the complexes were refined by the FireDock algorithm, which predicted the optimal complex with the aid of energy functions (70).

Molecular Dynamic Simulation
The pdb file of vaccine protein and receptor complex (TLR3, MHC-I, and MHC-II) were used to start the molecular dynamic (MD) simulations. The complexes were placed in a octahedron box of water molecules represented by the threepoint charge SPC model, whose boundary is at least 10 Å from any protein atoms. The solvated protein was subsequently neutralized by chloridions. Covalent bonds involving hydrogen atoms were constrained using the LINCS algorithm, and longrange electrostatic interactions were treated with particle-mesh Ewald employing a real-space cutoff of 10 Å. The system was first briefly minimized with backbone atoms restrained to the initial coordinates to remove close contacts, and the restrained system was gradually heated to 300 K under constant volume conditions in 100 ps. Each system was equilibrated for 1 ns using the constant isothermal-isobaric ensemble at 1 atm and 300 K without any restraints. The Parrinello-Rahman barostat and a V-rescale thermostat were used with an integration time step of 2 fs. Production run MD simulations were performed for 10 ns with coordinates recorded every 10 ps. All simulations were performed using GROMACS 2018.2 along with the GROMOS96 54a7 force field (16,24).

Codon Adaptation and in silico Cloning
For the purpose of cloning, codon adaptation of the designed vaccine was performed for analyzing the codon usage by the prokaryotic organism (Escherichia coli, E. coli). The Java Codon Adaptation tool (http://www.jcat.de/) was used to optimize codon (71). Then the secondary structure of mRNA was predicted by Mfold (http://unafold.rna.albany.edu/? q=mfold) (72). For raising the expression efficiency of the final vaccine protein, the E. coli K12 strain was chosen. For the valid translation of the vaccine gene, we proofread and avoided rho-independent transcription termination, prokaryote ribosome binding site, and cleavage site of restriction enzymes. Restriction endonuclease sites XhoI and BamHI were appended to N and C terminals of vaccine, respectively. Then, it was inserted into the pET28a (+) vector between the XhoI and BamHI. The flow chart of the designed work is shown in Figure 1.

RESULTS
The strategy of vaccine construction is presented in Figure 1.

Antigenicity Analysis of SARS-CoV-2 and Selection of Protein Sequences for Vaccine Construction
The proteome of SARS-CoV-2 was retrieved, which comprised 27 proteins. The reference sequences of those proteins were retrieved in the FASTA format and their details are presented in Table 1. Five proteins with <100 amino acid sequences are too short to predict epitopes (ORF6 protein, ORF10 protein, ORF7b protein, nsp11, and envelope protein) were excluded.
In order to develop a subunit vaccine, it is critical to identify candidate proteins that are important for inducing a protective immune response (27). The remaining 22 proteins sequence were relayed to the VaxiJen server to determine their antigenicity based on the antigenic scores ( Table 1). Proteins with antigenic scores >0.5 were selected for further analysis (28). Nine proteins, namely ORF7a protein, ORF8 protein, nsp9, nsp6, nsp3, endoRNAse, ORF3a protein, membrane glycoprotein, and nucleocapsid phosphoprotein were finally selected for further epitope prediction.
There is no available experimental structures of these nine proteins, we predicted homology models for the nine proteins applying the normal mode of phyre2 online server. The most suitable templates for the nine proteins were identified to be the PBD entries ( Table S1). All of the modeled structures were showed over 90% residues in the Ramachandran favored region Figure S1 and Table S2.

Identification of Cytotoxic T Cell Epitopes
The prediction of CTL epitopes (9 mer) was performed by the NetCTL server. The binder sites were determined based on three supertypes (A2, A3, and B7), with a 95% coverage rate of the world's population. Nine proteins were selected based on antigenicity. One epitope of each supertype was selected based on the highest score and an IC50 value < 500 nm. Then the specific Treg-inducing epitopes were excluded by Epitoolkit. A total of 18 epitopes were selected from nine proteins as the candidates for the construction of the vaccine ( Table 2).

Identification of Line and Conformational B-Cell Epitopes
We used the ABCpred and BepiPred servers to identify the line B cell candidate epitopes. All predicted epitopes from both servers were compared, and only the overlapping epitopes were selected for the development of the vaccine. The line epitopes identified by ABCpred had prediction scores ranging from 0.52 to 0.93, and line epitopes identified by BepiPred had prediction scores ranging from 0.5 to 1. Among these line epitopes, only 12 (16 mer) were found to be common or partly common in both servers ( Table 4). These 12 line epitopes were selected for vaccine construction ( Table 4).
The non-continuous B cell epitopes were predicted by the ElliPro severs, a total number of 27 non-continuous B cell epitopes were generated from ElliPro. Amino acid residues, sequence location, the number of residues, and the PI scores of the predicted conformational epitopes are shown in Table 5 and the graphical depiction of these epitopes can be seen in Figure S2. Twenty-four epitopes were excluded because it added the allergenicity of vaccine, three epitopes were marked red and selected for vaccine construction.

Construction of the Subunit Vaccine
The best candidate epitopes were used for the construction of the vaccine. A total of 18 CTL epitopes, 14 HTL epitopes, 12

FPRGQGVPI (3.82)
The half-maximal inhibitory concentration (IC50) value was > 500 nm, which ensured a higher binding capability of the selected epitopes to MHC molecules.
linear, and three non-continuous B cell epitopes were fused together with the aid of linker sequences. The CTL epitopes were linked by AYY (The AAY liner helps the epitopes produce suitable sites for binding to TAP transporter and enhances epitope presentation), the HTL epitopes were combined with the aid of GPGPG (The GPGPG linker stimulate HTL responses and conserve conformational dependent immunogenicity of helpers as well as antibody epitopes), and B cell epitopes were merged with the aid of KK. The final to enhance vaccine immunogenicity, the human β-defensin-3 sequence (45aa) and pan-HLA DR binding epitopes (The pan-HLA DR binding epitopes in vaccine construct facilitating binding to many different types of mouse and human MHC-II alleles to induce CD4-helper cell responses.) was added to the Nterminal of the vaccine with the aid of the EAAK linker.
To enable the intracellular delivery of the modeled vaccine, a TAT sequence (11aa) was appended to C-terminal. The vaccine was developed to be 864 amino acids in length ( Figure S3). The sequence homology of final vaccine protein to human protein sequence shown that there were no significant alignments ( Figure S4).

Evaluation of Allergenicity, Antigenicity, and Physiochemical Parameters of the Vaccine
The allergenic character of the vaccine was determined by the AlgPred server and was based on the hybrid approach (SVMc + IgE epitope + ARPs BLAST + MAST) with a 93.5% coverage. The vaccine was non-allergen with 84% accuracy and 82.78% sensitivity at threshold value was −0.2. Similarly, the antigenic  The half-maximal inhibitory concentration (IC50) value was < 500 nm, which ensured a higher binding capability of selected epitopes to MHC molecules.
nature of the vaccine construct was evaluated and showed that the protein was a favorable antigen with a global prediction score of a protective antigen of 0.5308 (Probable antigen). The default threshold value for antigenicity was 0.4 in the virus model. Moreover, the vaccine constructs contained 864 amino acids, and its molecular weight was 95.4 kDa. The theoretical pI was predicted to be 9.71. The vaccine contained 63 negatively charged residues and 125 positively charged residues. The vaccine construct was composed of 13,541 atoms, and its chemical The Immune Response Profile in silico Immune Simulation The immune stimulation of the final vaccine was performed using C-ImmSim online server, which gives the immune profiles of the designed vaccine. The proliferation in the secondary and tertian immune response were identified by IgG1 + IgG2 and IgM, as well as, the decreasing of the antigen count IgG + IgM showed the proliferated (Figure 2A). The stimulation result revealed the development of immune response after immunization. B cell population was highly stimulated upon immunization ( Figure 2B). Similarly, the cytotoxic and helper T cell levels were proliferated that suggested the development of secondary and tertian immune response (Figures 2C,D). During the exposure time, it was also observed that the production of IFN-γafter immunization ( Figure 2E). These results were significant for the immune response against SARS-CoV-2. Hence,

Prediction, Refinement, and Quality Assessment of the Tertiary Structure of the Developed Vaccine Construct
The tertiary structure of the full-length vaccine sequence was predicted by Phyre 2, and it was applied for refinement and further analysis. Twenty-five templates were employing modeling as Figure S5 shown. There were three templates from human defensin which were we added in to enhance the immunogenicity, others from virus ( Figure S5). The immune epitopes were not structural homology to human proteins that could avoid inducing autoimmune. The secondary structure of the predicted model contained 18% alpha-helix, 21%TM helices 44% beta-sheets, and 27% disordered Figure S6.
To optimize the 3D structure of the modeled protein, the initial model was refined in the GalaxyRefine server. The GalaxyRefine server-generated five models based on the rootmean-square deviation (RMSD) and MolProbity algorithm. The details of the five models are shown in Table S3. Model 1 with the top Ramachandran favored, therefore selected for docking purposes (Figure 3). A model with more residues in the Ramachandran favored region, less in outliers region and rotamer region was considered as a more ideal one. The initial model generated from Phyre 2 server and refine model from GalaxyRefine were evaluated with the aid of the SWISS-MODEL workspace. The initial model was 63.46% of residues in the Ramachandran favored region, 19.49% in the Ramachandran outliers region, and only 10.22% in the rotamer region (Figure 4). The refine model was 89.1% of residues in the Ramachandran favored region, 2.09% in the Ramachandran outliers region, and only 0.15% in the rotamer region (Figure 3). Other favorable parameters of the refined model were as follows: GDT score of 0.9922, RMSD value of 0.260, MolProbability of 2.049, clash score of 8.9, and poor rotamers totaling 0.3 ( Table S1).
The quality and potential errors in the final vaccine 3D model were verified by ProSA-web. The Z-score indicates overall model quality, the model with a lower Z-score was considered as the higher quality one. The z-score of the initial model was −2.81, refine model is −3.64 (Figure 5).

Molecular Dynamic Simulation
To accomplish the estimate of the stability of the vaccinereceptor complex, we performed the simulation of the docked complexes (vaccine and TLR-3, MHC-I, and MHC-II) with the help of GROMACS. Then, various analysis like energy minimization, pressure assessment, temperature, and potential energy calculations were performed. The temperature and pressure of the simulation system during the production run was around 300 K and 1 atmosphere, respectively, indicating a stable system and successful md run. The temperature and pressure of the three simulation systems (vaccine and TLR-3, MHC-I, and MHC-II complexes) during the production run were around 300 K and 1 atmosphere, respectively, indicating the stable systems and successful MD run (Figures 7A-F). The complex root mean square deviation (RMSD) plot represents the structural fluctuation of the overall structure of the complex of vaccine and immune receptor. The RMSD of vaccine-TLR3 complex has large fluctuation during 0-6 ns simulation. After 6 ns, the RMSD value was kept around 1.25 nm, indicating that the conformation of this complex was stable ( Figure 7G). Otherwise, the RMSD of vaccine-MHC-I and -MHC-II complexes has large  fluctuation during 0-4 ns simulation. After 4 ns, the RMSD value were kept around 1 nm, indicating that the conformation of the two complexes were stable (Figures 7H,I)    has low RMSF value, indicating these residues has low structural flexibility. By contrast, residue 0-200 and 600-800 has relatively higher RMSF value, indicating the larger flexibility during those regions (Figures 7J-L).

In silico Cloning and Prediction of RNA Secondary Structure
To fuse the final vaccine to an expression vector, codon conversion of the vaccine protein was performed by the Java Codon Adaptation tool. Restriction site XhoI and Bam HI were added to N and C terminals of the codon sequence, then was inserted into the pET28a (+) vector between the XhoI and BamHI (Figure 8). The RNA secondary structure using the Mfold program was generated foldings contain 4,381 base pairs out of 2.3% in the energy dot plot. Mfold predicted an identical secondary structure of 4,381 bp formed by nucleotide fragments (Figure S7).

DISCUSSION
SARS-CoV-2 is characterized by high infectivity and high transmission speed; thus, a prophylactic vaccine is needed (11). The availability and advantages of the multi-peptide vaccine developed by immunoinformatics methods have been confirmed by previous studies (73,74). Ojha et al. used the immunoinformatics methods to develop a multiepitope subunit vaccine to Epstein-Barr virus-associated malignancy (73). In recent studies, genomics and proteomics information of SARS-CoV-2 have been retrieved, stored, and utilized (75,76). In the present research, we tried to develop a multi-epitope subunit prophylactic vaccine of SARS-CoV-2, with the help of immunoinformatics tools. A line of research have tried to develop the vaccine of SARS-CoV-2 by immunoinformatics tools. Baruah and Bose (15) used immunoinformatics tools to discover cytotoxic T lymphocyte (CTL) and B cell epitopes for the spike protein of SARS-CoV-2. Then, Abraham et al. developed a multi-epitope vaccine that was designed using immunoinformatics tools that potentially trigger both CD4+ and CD8+ T-cell immune responses (16). Most of those research just focus on the spike protein-based vaccine. A vaccine based on the spike protein could induce antibodies to block SARS-COV-2 binding and fusion or neutralize virus infection (18), as well as induce harmful immune responses that cause liver damage (19). Other proteins should be ideal candidates for designing vaccines.
In the present report, we selected nine proteins with positive antigenicity for further epitope prediction. All proteins from SARS-CoV-2 with <100 amino acid sequences were excluded, and the antigenic nature of the remaining proteins was evaluated. This method can facilitate the discovery of potential antigens of SARS-CoV-2 when the precise immunity mechanisms are unknown. To design an effective vaccine, we selected the SARS-CoV-2 protein through the above-mentioned methods for epitope prediction. In recently, Asaf et al. reported that identify multiple epitopes for CD4 + 12 and CD8 + T cells based on muti-protein (77). Their protein list was the same as this in our research. In Asaf 's report, they just predicted the T cell epitopes, non-B cell, B cell peptide was not predicted (77).
The B cell epitopes are antigenic determinants from the antigen that are recognized by the B cell surface membrane receptor and evoke the production of specific antibodies. The persistent challenge in immunological prediction tools is the prediction of epitopes to a higher level of accuracy (78). To determine accurate linear B cell epitopes from the antigenic proteins, we used two bioinformatics tools based on different algorithms of prediction. We identified nine overlapping linear B cell epitope candidates from two different bioinformatics tools. This method was superior to the prediction of epitopes from a single tool (78). Moreover, we also have predicted the noncontinue B-cell epitopes.
The B cell immune response is preferred in the design of a vaccine. However, T cells may also elicit a strong immunoreaction. The vaccine that activates both CTLs and HTLs should be more effective than a vaccine that only targets CTL responses (79). To generate a more effective vaccine, we predicted both CTL epitopes and HTL epitopes. The T cell epitopes were decomposed fragments from the antigen presented by the MHC molecules of T cells and stimulated the production of effector T cells, immunological memory T cells, and IFN-γ. The cellmediated immune response induced by CTLs plays a vital role in the defense against viral infections through the recognition of intracellular viral pathogens by MHC class I molecules.
In the present report, MHC-I binding epitopes were predicted by choosing A2, A3, and B7 alleles, which cover ∼95% of world's population. We selected 18 CTL epitopes. The HTLs play a vital role in the antiviral immune response by producing IFN-γ. Moreover, HTLs are able to induce and maintain CTL responses. Furthermore, 14 HTLs epitopes were chosen based on both the binding capability and IFN-γ induction. Bhattacharya et al. also used the spike protein sequence predicted for MHC-I and MHC-II epitopes of SARS-CoV-2, but not predicted capability of producing IFN-γ (80). The T cell epitopes enhanced IFN-γ inducing capability, which evokes both the native and specific immune responses by activating macrophages and natural killer cells, and augmenting the response of the MHC to the antigen (81,82).
In this study, the immunogenic epitopes from B cells, CTLs, and HTLs were chosen to develop a more valid, reliable, and effective vaccine against SARS-CoV-2. A multiepitope approach was used by splicing together epitopes with the aid of their respective linkers. To improve the immunogenicity of this multiepitope vaccine, an adjuvant β-defensin and pan-HLA DR binding epitopes (13aa) were fused to the N-terminal with the aid of an EAAAK linker, then A TAT sequence (11aa) was appended to C-terminal with the added of KK. The final vaccine constituted 864 amino acids. The allergenicity, antigenicity, and stability of the designed vaccine constructs were then evaluated. The tertiary structure of the generated vaccine was predicted by using the Phyre 2 server and then refined by the GalaxyRefine server. The binding affinity of complexes of the developed vaccine and receptors, in which TLR-3, MHC-I, and MHC-II (present on the surface of the immune cell) were confirmed by the ClusPro server was based on molecular docking. Furthermore, to ensure the translation efficiency of the designed vaccine in a specific expression system, the mRNA of the vaccine was enhanced with the aid of the Java Codon Adaptation Tool. The restriction enzyme cutting sites of Xho? and BamH? were then appended to the N and C terminals, respectively. The vaccine sequence was subsequently cloned in pET28a (+), the expression vector. Further experimental validation of the safety and efficacy of the designed vaccine for SARS-CoV-2 is warranted.

DATA AVAILABILITY STATEMENT
All datasets presented in this study are included in the article.

AUTHOR CONTRIBUTIONS
RD and ZC performed the experiments. RD and YZ wrote the paper. YZ and FY edited the final version. All authors participated in the experimental design, data analysis, and agreed with the final version of the paper.