Identification of Unique Peptides for SARS-CoV-2 Diagnostics and Vaccine Development by an In Silico Proteomics Approach

Ongoing evolution of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus strains is posing new COVID-19 diagnosis and treatment challenges. To help efforts to meet these challenges we examined data acquired from proteomic analyses of human SARS-CoV-2-infected cell lines and samples from COVID-19 patients. Initially, 129 unique peptides were identified, which were rigorously evaluated for repeats, disorders, polymorphisms, antigenicity, immunogenicity, toxicity, allergens, sequence similarity to human proteins, and contributions from other potential cross-reacting pathogenic species or the human saliva microbiome. We also screened SARS-CoV-2-infected NBHE and A549 cell lines for presence of antigenic peptides, and identified paratope peptides from crystal structures of SARS-CoV-2 antigen-antibody complexes. We then selected four antigen peptides for docking with known viral unbound T-cell receptor (TCR), class I and II peptide major histocompatibility complex (pMHC), and identified paratope sequences. We also tested the paratope binding affinity of SARS-CoV T- and B-cell peptides that had been previously experimentally validated. The resultant antigenic peptides have high potential for generating SARS-CoV-2-specific antibodies, and the paratope peptides can be directly used to develop a COVID-19 diagnostics assay. The presented genomics and proteomics-based in-silico approaches have apparent utility for identifying new diagnostic peptides that could be used to fight SARS-CoV-2.


INTRODUCTION
According to a World Health Organization report issued in May 2021, the SARS-CoV-2 virus has infected more than 158 million people, causing more than 3.3 million deaths worldwide (1,2). Moreover, ongoing evolution of SARS-CoV-2 strains is posing constant challenges to develop new COVID-19 diagnoses and treatments for shifting life-threatening symptoms, inter alia, fever,

Collection of SARS-CoV-2 Virus Sequences to Explore Genomic Variability in the Spike and Nucleocapsid Proteins
All available SARS-CoV-2 spike and nucleocapsid nucleotide and protein sequences were extracted from the NCBI database using combinations of the keywords "COVID-19", "SARS-CoV-2", "spike," and "nucleocapsid" both singly and in combinations with the Boolean operator AND. To generate a protein dataset, a local BLAST database was searched to find sequences with ≥ 95% similarity using protein sequences of Wuhan-Hu-1 isolates of SARS-CoV-2 (MN908947.3) as references. Sequences with nonstandard amino acids were removed, and the remaining sequences were clustered using CD-HIT software with 100% sequence identity setting (28). To explore the genomic variability among the sequenced isolates, we applied multiple sequence alignment with ClustalW (29). Conserved and variable regions of the spike protein were identified using Gblocks software (30). To avoid selecting peptides with poor diagnostic potential, mutations in the protein detected in variants in all countries that had reported more than 10 spike protein sequences were analyzed. A binary matrix was generated for clustering based on the presence and absence of each identified mutation in the spike protein with respect to countries. This was done using the Clustvis web tool (31) and the following parameters. Clustering distance for rows and columns: binary. Clustering method for rows and columns: average. Tree ordering: tightest cluster first.

Peptide Cataloging of the SARS-CoV-2 Proteome From Mass Spectrometric Proteome Data
The ProteomeXchange database was explored to extract SARS-CoV-2 mass spectrometric proteomic data using various keywords such as "SARS-CoV-2", "COVID-19", and "spike." Two cell-line proteomes (PXD017710 and PXD018581) and four naturally infected patient proteomes (PXD019686, PXD021328, PXD018682, and PXD019423) were used to identify expressed SARS-CoV-2 peptides with Proteome Discoverer software (32)(33)(34)(35). The extracted SARS-CoV-2 protein sequences and raw proteome files were the initial input for peptide identification with the following settings: 5% max. false discovery rate (FDR) at the protein level, at most one missed cleavage (1), 2-3 charge range (2)(3), and 396-1,600 m/z range. A mass tolerance of 10 ppm was set for parent ions and 0.8 Da for fragment ions. The cell-line and patient sample proteomes were processed separately using human and virus reference sequences to explore differences between the two kinds of proteomes associated with infection by the virus. key immune regulator genes among the identified proteins using the STRING 11.0 database with a threshold confidence score of 0.4 (36). The resulting interaction network was imported into Cytoscape 3.8.0 software for visualization. The Cytoscape plugin Cytohubba with an implemented 11-node ranking method was used to analyze the protein-interaction network. In addition, the degree of association and bottleneck approach was used to identify hubs and bottlenecks in the interaction network generated by the Network Analyzer plugin of Cytoscape (37).

Filtering of Cross-Reacting Peptides
All the peptides in the generated catalogs similar to peptides of humans and other pathogens were removed to avoid misleading results from cross-reactive antibodies. Expressed human and human saliva microbiome peptides were extracted from The Human Protein Atlas (https://www.proteinatlas.org/) and proteomeXchange database (PXD003028), respectively. SARS-COV-2 peptides similar to peptides of host origin were filtered out using the phmmer program with default parameters (38). Peptides similar to those of pathogens inducing a clinical presentation similar to COVID-19, such as SARS-CoV, Influenza, Middle East Respiratory virus, Pneumoniae, Respiratory syncytial virus, Rhinovirus, Staphylococcus aureus, and Streptococcus species in the Uniprot database were also filtered out using phmmer. The SARS-CoV-2 infected NHBE and A549 cell line proteomes were then explored for evidence of the selected peptides' presence (26). Peptides expressed in all three experimentally generated data sources (cell lines, human patients, and proteome generated from cell-line RNA-Seq data) were retained for further study.

Assessment of Antigenicity and Potential Immunogenicity of the Generated Peptides
In accordance with widely accepted definitions, the antigenicity of a peptide is regarded here as its capacity to bind specifically with a paratope, and its immunogenicity as its ability to induce an immune response, specifically production of antibodies against the antigenic protein (26). We used the Predicted Antigenic Peptides server (http://imed.med.ucm.es/Tools/antigenic.pl) to explore identified peptides' antigenic potential and the Immune Epitope Database (IEDB) toolkit to explore their class-I pMHC immunogenicity (http://tools.iedb.org/immunogenicity), CD4 Tcell immunogenicity (http://tools.iedb.org/CD4episcore/), and binding to both class-I MHC (http://tools.iedb.org/mhci/), and class-II MHC (http://tools.iedb.org/mhcii/). A peptide inhibitory concentration (IC 50 ) ≤ 900 nM was considered diagnostic of MHC class-I and II binding genes and alleles (39). B-cell epitopes for the spike protein RBD domain were identified using the Bepipred2.0 server with default parameter settings. All predicted epitopes were compared with those predicted by other tools for B-cell epitope prediction (BcePred, ABCpred, and SVM Trip) (40).

Paratope Identification: Antigen-Binding Peptide Sequences
Complementarity determining regions (CDRs) are antibodies' main antigen-binding domains, and most antigen-binding residues (ca. 80%) in paratopes are in CDR regions (41). To explore the SARS-CoV-2 antigen-binding peptide sequences, available crystal structures of antibody-antigen complexes involved in SARS-CoV-2 infection (PDB id: 7BWJ, 7BZ5, 7B3O, and 6W41) were downloaded from the RCSB Protein Data Bank (PDB) to extract light and heavy chain protein sequences. Paratome (42) and Parapred server (43) tools were used in conjunction with the extracted sequences to identify paratopes. Parapred applies a deep-learning architecture to integrate functionality from all local neighborhoods, while Paratome applies a machine learning approach based on multiple structure alignment (MSTA) of all available Ab-Ag complexes in the RCSB database. Only paratope sequences including sequences predicted by both tools were selected. The identified paratope peptides were assembled using the synthetic peptide linker GSGSGS to prevent undesired interactions between the discrete domains (44).

Three-Dimensional Interaction Analysis of Selected Antigenic Peptides With Known Viral TCR, Class I and II MHC, and Paratope Peptides
Next, structural information on 19 well-known T-cell receptors (TCR) and 28 pMHC structures for different viruses were downloaded from the TCR3d database (45) for use in docking studies to assess the identified antigenic peptides' structural compatibility with them. The antigen binding affinity of peptides of SARS-CoV-2 were identified by docking with selected paratopes of B cell and T-cell peptides (46,47). 3D structures of B-and T-cell epitopes and those of the paratope peptides were predicted using the PEP-FOLD3 server (48). The identified SARS-CoV-2 peptides were docked with TCR and pMHC proteins using Cluspro 2.0, while paratopes were docked with the identified antigens, the independently predicted antigens of the RBD protein, and whole spike and RBD proteins using Cluspro 2.0 in antibody mode (49). Proteinparatope complexes were visualized and hydrogen bonds analyzed using the UCSF chimera (50) and LIGPLOT software (51).

RESULTS
Numerous groups have studied the severity of COVID-19 since the pandemic began, resulting in massive genomics and proteomics resources in the public domain. Therefore, we have developed a strategic approach to identify unique SARS-CoV-2 antigenic peptides and potential paratope peptides to detect viral antigens using publicly available experimental resources. This involves a multi-step genomic and proteomic approach (Figure 1) for diagnostic peptide identification, and validation. Our study demonstrates a practical and precise approach for identifying diagnostic peptides when access to experimental sample data is limited. The identification of SARS-CoV-2 viral proteins highlights the value of today's protein informatics resources in responses to a public health emergency.   Figure 2. In total, 149 spike mutations were identified in samples from all the countries. Mutation G614D, which increases transmissibility (52), was found in samples from 40 countries, while mutations F5L and F12S were found in samples from seven countries (Australia, Bahrain, Bangladesh, Canada, France, India, USA), and three countries (Egypt, Hong Kong, The Philippines), respectively. The numbers of protein sequences before and after clustering and heat map illustrating distributions of mutations in them are presented in Supplementary File 1 (Table S1, Figures S1, S2). Distribution of the mutations in countries and a binary matrix are provided in Supplementary File 2 (Tables S1, S2).

SARS-CoV-2 Peptide Identification From Proteomic Data
Two cell lines and four naturally infected human patient proteomes were selected for the high-confidence identification of peptides using viral and human protein sequences as references. In total, 361 and 81 peptides of viral origin were identified in the cell lines and patient samples, respectively. Only three viral peptides in the cellline and patient samples were identical. Analysis of the peptides revealed that they are encoded by various parts of the viral genome, such as the ORF1ab, nucleocapsid, envelope, and spike gene regions. Multiple peptides with varying lengths from different parts of the same proteins were found, including 57 component peptides of the spike protein. Of these 57 peptides, 28, 29, and one are components of the S1 (14-685), S2 (686-1273), and RBD (319-541) regions of the spike protein, respectively. The selected proteomes, samples, numbers of peptides, and identified viral proteins are briefly described in Table 1 and Supplementary File 2 (Tables S3-S5).

Functional Analysis of the Proteomes From Infected Cell Lines and Samples From Naturally Infected Patients
Like any virus, SARS-CoV-2 must enter host cells and manipulate host responses to enable its replication. Therefore, exploration of protective immune responses to infection can provide important insights regarding viral pathogenesis. Thus, we explored host responses to the virus in both cell lines (Colon Carcinoma-2 and H1299) and naturally infected COVID-19 patients' samples (mouth gargle, nasal swab, and respiratory tract). In total, 323 and 143 human peptides were identified in the cell line and patient samples, respectively. Only five (MDGA1, PIK3C2A, FOXP2, DCAF5, and IVD) were detected in both sets of samples. MDGA1 plays a role in formation or maintenance of inhibitory synapses (53), whereas PIK3C2A is involved in several intracellular trafficking and signaling pathways (54). FOXP2 is a transcription factor that may regulate hundreds of genes in several tissues, including the brain (55). DCAF5 is a receptor of CUL4-DDB1 E3 ubiquitin-protein ligase (56), and IVD is an essential enzyme for mitochondrial fatty acid beta-oxidation. Many of the other proteins are involved in immune system-related biological processes such as regulation of immune responses, autophagy, immune system development, leukocyte migration, antigen processing and presentation, or leukocyte-mediated cytotoxicity, and were detected in both cell line and naturally infected patient proteomes. Proteins involved in biological processes such as production of molecular mediators of immune response and myeloid cell homeostasis were only found in the cell-line proteome. As anticipated, peptides associated with the immune response and leukocyte activation were only found in the proteome of infected patients. In total, 58 and 23 unique genes related to immune system biological processes were found in the cell line and naturally infected patient proteomes, (Supplementary File 3: Tables  S1-S4).
The human innate immune system, which plays a crucial role in preventing infection and killing pathogens, involves various kinds of cells, including natural killer cells, macrophages, neutrophils, dendritic cells, and mast cells. Therefore, identifying proteins associated with both these cells and SARS-CoV-2 infection through analysis of experimental resources such as cellline and patient datasets can improve understanding of interactions between the virus and human hosts. We found 33 innate proteins that matched entries in the InnateDB database. Most of these proteins are involved in immune-related functions such as protein binding (TAB1, SREBF2, HSP90AA1, RB1, STAT3, DCN, IL1R1, BNT3A2, PIK3R2, CCR6), transferase activity (TREM2, ABL1, S100A12, C4BPB), protein dimerization (UBE2N, CSF1R), and lipopeptide binding (EPS8, CD36). TAB1 may be involved in up-regulation of TAK1, IRF7, and IFN signaling during activation of the antiviral innate immune system (57). STAT3 has a well-known role in inflammation and immunity (58), and IL-1R signaling in CD4+ T-cells promotes Th17 immunity and atherosclerosis (59). TREM2 controls phagocytic pathways, which are involved in removal of neuronal debris (60). ABL1 is involved in regulating release of filoviruses through VP40 protein phosphorylation and might also be involved in the virus life cycle (61). EPS8 is a key regulator of the LPSstimulated TLR4-MyD88 interaction and contributes to macrophage phagocytosis (62), while CD36 is a known scavenger receptor involved in immunity, metabolism, and angiogenesis (63). The major challenge was to identify key expressed immune genes in a complex network of the immune system. Therefore, the identified proteins related to the immune system process from cell-line and patient proteomes were used to generate a protein-interaction network (Figure 3) ( Supplementary  File 1-Tables S2, S3). The generated protein-interaction network, which includes 403 nodes and 671 edges, was used to identify the top rank hubs and bottlenecks (Supplementary File 1, Table S2).

Selection of Diagnostic Peptides From the Generated Peptide Catalog
Antigenic peptides must, by definition, have sufficient antigenicity and immunogenicity to bind detectably to antigen-specific receptors on lymphocytes or the Fab region of antibodies. The antigenicity of a peptide is determined by surface epitopes of 5-7 amino acid residues, whereas four intrinsic properties of peptides determine their immunogenicity: chemical composition, molecular size, foreignness, and heterogenicity for processing and presentation on the surface of antigen-presenting cells (APCs). Therefore, we applied multi-step filtering to identify potential diagnostic peptides. Initially, to avoid future cross-reactivity, the identified peptides (442) were filtered to exclude human and human saliva microbiome peptides (418) and subsequently peptides of a targeted group of pathogenic bacteria and viruses (129). Next, to avoid selection of poor peptides for diagnostic purposes, the selected peptides' expression was checked, using results of the infected celllines RNA-Seq data analysis. Finally, four peptides ( Table 2 and Figure 2), present in the NHBE and A549 cell lines, infected patient samples, and the RNA-Seq-derived proteome were selected after conservation analysis (Supplementary File 4: Tables S1-S3, and S7). A sequence alignment of all matched peptides from the three types of sources is provided in Figure S3 of Supplementary File 1.
MHC genes, containing a set of closely linked polymorphic genes, encode crucial cell surface proteins that bind antigens, thereby alerting the immune system. Therefore, we evaluated the identified peptides' antigenicity and CD4 immunogenicity to enable potency-based selection ( Table 3). The average immunogenicity and antigenicity scores of peptides were approximately 89.06 and 1, respectively, which clearly showed the potential of selected peptides.
Class I and II MHC molecules have small grooves that present self-antigens and pathogen-derived peptides. Members of class I present intracellular antigens such as viruses, intracellular bacteria, or parasites to T cells, whereas the MHC class II presents exogenous antigens to professional APC, including

Paratope Identification for Selected Peptides
Paratopes, sequences of 5-10 amino acids on antibodies that bind specific antigens, are preset at the three CDR regions (CDR1,  CDR2, and CDR3), which are thus key regions for paratope identification. Two light chain (L1 and L2: IYAASTLQSGV and TCRASQGISSYLAWY, respectively) and one heavy chain (H: VIYSGGSTY) paratope sequences were identified using two prediction approaches (Supplementary File 5, Table S3). The 3D structure of all three paratopes is shown in Figure 2. To increase the specificity of paratope sequences for the SARS-CoV-2 antigen, three paratope peptides were linked with a peptide linker (GSGSGS) to ensure that each assembled paratope peptide could work protein-independently, thus reducing unspecific antigen binding. Therefore, the light chain paratope L1 and heavy chain paratope H were stitched at the N and C termini of the first linker, and the second linker was attached to the C terminus of the heavy chain paratope (H) and N terminus of light chain paratope L2. In addition, we used a glycine-serine dimer (GSGSGS) triplet to assemble paratope peptides (IYAASTLQSGVGSGSGSVIY SGGSTYGSGSGSTCRASQGISSYLAWY).  Figure S4. The binding potential of the paratopes for the independently identified RBD antigens was also explored (Supplementary File 5,  Figure 4. Our analysis indicates that the assembled paratope has strong binding affinity for the four identified antigens, RBD protein antigen, and whole spike protein. Moreover, the assembled paratope showed lower binding affinity for SARS-CoV T-cell (KCYGVSATKL, and NYNYKYRYLR) and B-cell (ISPYNTIVAKLR, and LSPLGALVACYK) epitopes.

DISCUSSION
RT-PCR is a widely accepted method for COVID-19 detection that involves sample collection, RNA extraction, reverse transcription, and targeted amplification of cDNA using appropriate primers for conserved regions: procedures that require high technical expertise. In addition to the long processing time (24-48 hours), RT-PCR also requires continuous monitoring of the genomic evolution of the virus to ensure that the primers are still valid. For these reasons, several COVID-19 diagnosis kits are available for testing. However, most kits lack field applicability, lack sufficient sensitivity, have long processing times, and provide undesirably high false-positive results. Globally, the number of cases is increasing due to various mutant strains. Therefore, fast, simple, and reliable diagnosis methods that can be applied used readily portable equipment are required for largescale screening.
To assist efforts to develop such methods, we applied in-silico method techniques to identify unique SARS-CoV-2 peptides using experimentally generated data. The data explored in this study were originally generated with specific objectives. The PXD017710 cell line proteome was first used to identify drug targets and host cellular response players (34), the PXD018581 proteome was generated to compare SARS-CoV and SARS-CoV-2 virus disease progression, and the PXD021328, PXD019686, and PXD019423 proteomes were generated from infected human samples (mouth gargle, nasal and oral swab). Our primary objective was to develop robust, convenient, diagnostic methods for large-scale screening of human patient samples (20,32,33). However, none of the studies aligned with our aim and objectives. In this study, extracted virus protein and human proteome sequences were used to identify peptides from mass spectrometry data by exclusive data-processing flow (PWF_QE_Precursor_Quan and LFQ_MPS_SequestHT_ Percolator). Several peptides of different lengths were identified from the whole genome of SARS-CoV-2 from both the cell line and patient proteomics data ( Table 1). These results indicated that the trypsin digestion originally used was an appropriate choice for detecting viral peptides. To select mutation-free peptides, spike protein sequences were subjected to mutation analysis, and three major mutations (F5L, F12S, and G641D) were identified in samples from several countries. The G641D mutation, found in samples from all the countries, might be involved in viral conformational plasticity, increasing viral fitness (65). F5L and F12S mutations were also found in samples from several countries, but their impact on infection and disease progression is unclear. In a recent study (66), the E484K mutation was detected in a new variant (B.1.1.33) of the SARS-CoV-2 virus in Brazil. The E484K mutation has raised concern because it may increase the transmissibility of the virus. In our study, the E484K mutation was found in the Bahrain spike protein dataset.
The SARS-CoV-2 spike protein is one of the crucial targets for disease prevention, diagnosis and therapeutic antibody development. Its S1 region is responsible for binding to the host ACE2 receptor, and the S2 region is responsible for membrane fusion (67). Our results highlight four immunodominant SARS-CoV-2 peptides of the S1 region (A26, A349, A194, and A343). The identified peptides have high diagnostic potential due to appropriate proportions of hydrophilic residues (lysine, arginine, histidine, aspartic acid, glutamic acid, serine, threonine, tyrosine, asparagine, and glutamine) and immunogenic residues (lysine, arginine, glutamic acid, aspartic acid, glutamine, and asparagine), low number of internal cysteine residues, and absence of the arginine-glycine (RGD) tripeptide motif. Moreover, analysis of RNA-Seq data confirmed that the identified peptides are expressed in NHBE and A549 cells. The identified antigens were expressed in four cell lines (Colon Carcinoma-2, H1229, NHBE, and A549) and three types of human patient samples (mouth gargle, nasal, and oral swab) corroborating their expression in various cell types.
The SARS-CoV-2 virus binds to the human ACE2 receptor through the spike protein's RBD ( Figure 5A), and enters cells via a mechanism involving a series of conformational changes in both viral and cell membrane proteins followed by an endocytic process. The identified expressed human genes reflect a protective immune response to SARS-CoV-2. Combined cellline and human proteome data analysis captured immune proteins involved in different phases of the protective immune response, including antigen processing and presentation, and autophagy ( Figures 5B, C). Various identified proteins such as DCTN2, KIF3B, and AP2A1 are involved in antigen processing for MHC class II molecules and the binding of antigen MHC-II

Hydrogen bonds
Interacting residues A26 L1 (-145.8)  5  ILE1-PHE25, TYR2-PHE25, TYR2-GLU4, ALA3-GLN23, ARG40-THR6  L2 (-242.3)  7  TYR11-LEU18, TRP14-HR22, TYR15-CYS15, TYR 15-MET1, TYR15-PHE2, TYR11-THR 19, TRP14-THR22  H (-202.6)  3  TYR3-SER12, SER7-ASN15, THR8-ASN15  A194 L1 (-163.2)  3  TYR2-LEU18,THR20-TYR2, THR20-TYR2  L2 (-233.6)  3  TYR11-LEU18, TYR15-THR22, TYR11-THR22  H (-190.1)  5  THR8-THR22, TYR9-VAL16, SER4-ARG21, TYR3  complexes to TCR receptors (68,69). Other proteins, like STAT3, ABL1, and IL1R1, help in the activation and multiplication of helper T-cells ( Figure 5D) (70,71). Activation of helper T-cells leads to B-cell activation and differentiation with OPTN, PLCG2, KLHL6, and TXLNA followed by production of B-cell antibodies ( Figure 5E) (72,73). Key immune hub and bottleneck genes were identified through protein-interaction network analysis. According to gene ontology analysis, most of the key genes are  involved in host-virus interaction (STAT3, CREBBP, HSPA8, and HSP90AA1), innate immunity, T-cell differentiation, and the inflammatory response. (Supplementary File 3, Tables S2, S3). Three paratopes [one heavy chain paratope from the CDR2 region (VIYSGGSTY), two light chain paratopes from CDR2 (IYAASTLQSGV), and a CDR1 paratope (TCRASQGIS SYLAWY)] were identified from available X-ray crystallographic structures of antibodies ( Figures 5F-H). Docking methods enable evaluation of the strength and nature of binding between biomolecules and hence validation of putative in vitro or in vivo interactions. Therefore, all four antigenic peptides were docked with 19 TCR receptors and MHC receptors, and the results clearly indicate that they had high binding affinity. Three paratope peptides were identified for diagnostic purposes, and most showed high binding affinity with all antigens. However, paratope L2 had the strongest binding affinity and formed several interacting hydrogen bonds ( Table 4). To increase the diagnostic specificity for SARS-CoV-2 antigens, all the paratopes were then linked with commercially available glycine-serine-rich linkers. Docking studies showed that the designed paratope combination (IYAASTLQSGVGSGSGSVIYSGGSTYGSGSG STCRASQGISSYLAWY) had stronger better binding affinity to different antigens and whole SARS-CoV-2 RBD and spike protein than the individual paratopes. The binding affinity of the assembled paratope peptide was also evaluated for experimentally validated B-and T-cell epitopes of SARS-CoV. The assembled paratopes showed higher binding affinity for SARS-CoV-2 antigens and proteins than for SARS-CoV. (Supplementary File 5, Tables S5-S7). Hence, the three identified paratopes and their assembled configuration with a glycine-serine rich linker were CONCLUSION Various experimental and in silico efforts have provided valuable knowledge and resources (including massive genomic and proteomic datasets) to explore (inter alia) structural mechanisms of host-pathogen interactions, immune responses, drug candidates, antibodies, epitopes, genomic sequences and variation, infection rates, genome sequences. In this study we explored, available in silico resources, namely the cell-line and naturally infected COVID-19 patient's proteomes, and identified four SARS-CoV-2 antigens and three antigen-binding peptides that could be used to develop diagnostic assays. The proposed antigenic peptides can be used for antibody generation, and the paratope sequences can be used directly for COVID-19 diagnostic assay and vaccine development. Moreover, the developed method and approaches can also be used to explore other infectious diseases

DATA AVAILABILITY STATEMENT
The RNA-Seq data used are available in the NCBI SRA database under project accession number PRJNA615032. Proteomic data are available in the ProteomeXchange database (cell-line proteomes PXD017710 and PXD018581; naturally infected patient proteomes PXD019686, PXD021328, PXD018682, and PXD019423).