A reverse vaccinology approach identifies putative vaccination targets in the zoonotic nematode Ascaris

Ascariasis is the most prevalent helminthic disease affecting both humans and pigs and is caused by the roundworms Ascaris lumbricoides and Ascaris suum. While preventive chemotherapy continues to be the most common control method, recent reports of anthelminthic resistance highlight the need for development of a vaccine against ascariasis. The aim of this study was to use a reverse vaccinology approach to identify potential vaccine candidates for Ascaris. Three Ascaris proteomes predicted from whole-genome sequences were analyzed. Candidate proteins were identified using open-access bioinformatic tools (e.g., Vacceed, VaxiJen, Bepipred 2.0) which test for different characteristics such as sub-cellular location, T-cell and B-cell molecular binding, antigenicity, allergenicity and phylogenetic relationship with other nematode proteins. From over 100,000 protein sequences analyzed, four transmembrane proteins were predicted to be non-allergen antigens and potential vaccine candidates. The four proteins are a Piezo protein, two voltage-dependent calcium channels and a protocadherin-like protein, are all expressed in either the muscle or ovaries of both Ascaris species, and all contained high affinity epitopes for T-cells and B-cells. The use of a reverse vaccinology approach allowed the prediction of four new potential vaccination targets against ascariasis in humans and pigs. These targets can now be further tested in in vitro and in vivo assays to prove efficacy in both pigs and humans.


Introduction
The giant roundworm Ascaris is the most prevalent soil-transmitted helminth (STH) infection in humans, being responsible for 0.861 million disability-adjusted life years (DALYs) worldwide (1). In parallel, Ascaris infections remain an issue on many pig farms worldwide. This parasite is especially important in farms with lower levels of biosecurity, such as organic farms, and/or in lower to medium income countries, where backyard . /fvets. .
farming is common (2,3). Being a zoonotic disease, the close contact between humans and pigs in many highly endemic areas increases the risk Ascaris transmission among both hosts (3,4). Preventive chemotherapy, bolstered by implementation of water, sanitation and hygiene (WASH) protocols, continues to be the mainstay of ascariasis control in humans, as advocated in the World Health Organization (WHO) roadmap for the control of Neglected Tropical Diseases (NTDs) (5). Reduced efficacy and resistance to benzimidazole drugs have already been reported in human Ascaris infections (6,7), and this directly affects the implementation of control protocols in endemic areas, necessitating the development of effective vaccines and vaccination protocols. Vaccination against Ascaris has had some degree of efficacy in mouse and pig animal models, but no candidate vaccine has undergone human clinical trials yet (8). The proteins used in the past vaccination assays were either secreted proteins or those retrieved from crude extracts of adult worms. Recently, vaccination with a chimeric protein led to up to 73% larval burden reduction in mice, in contrast to a 99.8% reduction when mice were repeatedly infected with Ascaris sterile eggs (9)(10)(11)(12). These results highlight that there are other antigens that could be tested. These antigens could then be incorporated in a multi-component vaccine along with the already known vaccination targets to stimulate a more complete immune response.
Several annotated genomes are now available for Ascaris lumbricoides and Ascaris suum, which enable the search for putative vaccine candidates using a reverse vaccinology analysis. A reverse vaccinology approach combines genome information and bioinformatic tools for identification of vaccine candidates, and has been successfully applied to other nematodes before, such as Toxocara canis and Trichuris muris (13,14). Such methodology allows researchers to uncover vaccine targets from predicted proteomes by assessing if proteins have useful characteristics. Different analyses, such as protein sub-cellular location and the prediction of B-cell and T-cell epitopes, are examples of steps used in these approaches that help select proteins for further testing. This is especially important when funding is limited, and vaccination trials need to be focused. The aim of this study was to apply an in silico methodology to analyse protein sequences predicted from three Ascaris annotated proteomes to identify potential new vaccination targets that could be used in vaccination assays against Ascaris.

Protein subcellular location analysis
The retrieved proteomes were first visualized and analyzed with BioEdit v7.2.5 (20). A total of 2,604 protein sequences that included unknown amino acids (aa) (indicated with the symbol "X") were excluded from further analysis as they tend to be a consequence of poor annotations and most of the bioinformatics tools used do not analyse protein sequences with unknown amino acids.
As Vacceed makes use of different bioinformatic tools, there was the need to check how much each tool influences the final protein scores. After retrieving the Vacceed scores from runs including all programs, the process was repeated six times excluding one program per run (i.e., first run with all the bioinformatic tools except TMHMM, second with all the bioinformatic tools except DeepLoc, and so on) (28). The proteins scores were then compared between the different assessments through a Pearson correlation test using R v3. 6 was used against the 27 human leukocyte antigen (HLA) allele reference set, covering around 99% of the worlds human population (33). The default peptide epitope length of 15 aa with a core of 9 aa was selected. The protein sequences which had epitopes ranked from zero to one (from the maximum of 100) and were simultaneously predicted to bind to all the 27 alleles in the used reference set were selected for further testing. These MHC-II binding predictions were only made for human alleles due to the lack of in silico tools that make binding predictions for swine MHC-II alleles.

Allergenicity, antigenicity, and function predictions
The protein sequences retrieved with MHCII-IEDB were tested for antigenicity and allergenicity. Protein antigenicity was analyzed using IEDB Kolaskar and Tongaonkar antigenicity scale and VaxiJen 2.0 (34,35). The IEDB Kolaskar and Tongaonkar antigenicity scale method, available at http://tools. iedb.org/bcell/, was used with default setting and threshold of 1.000 and the VaxiJen 2.0 tool, server accessible at http://www. ddg-pharmfac.net/vaxijen/VaxiJen/VaxiJen.html, was used with the default threshold of 0.5. Allergenicity was evaluated using AllerTop 2.0 and AllergenFP (36, 37). The AllerTop 2.0 server, https://www.ddg-pharmfac.net/AllerTOP/index.html, and the AllergenFP tool, http://www.ddg-pharmfac.net/AllergenFP/ index.html, analyse protein sequences and compare them to a training set of 2,427 known allergens and 2,427 non-allergens using two different algorithms for the better recognition of allergens and non-allergens, respectively, for each tool. Proteins that were classified as both potential antigens and non-allergens were considered as good vaccination targets.
The function of the selected proteins, their protein family and tissue location in both A. lumbricoides and A. suum were analyzed based on the most recent genomic and transcriptomic studies (4,19). These assays were able to map the location where all the genes and, therefore, proteins were transcribed in both Ascaris species as well as in what stages of the parasite lifecycle. This information is important as it allows us to infer what roles these proteins could play in the Ascaris lifecycle and how useful they would be as components in a vaccine.

CD + Th cell binding epitope selection
To further streamline the epitope selection process, we integrated the information gathered using the Phobius and MHCII-IEDB tools to select extracellular CD4+ Th binding epitopes. We used Phobius to assess the extracellular domains of the predicted proteins and identify if the previously predicted epitopes were present in those areas. Proteins that lacked the full 15 aa epitopes in these areas were disregarded. For each protein, the two non-redundant epitopes (without overlapping aa) predicted to bind to the largest combined number of unique alleles in MHCII-IEDB tool analysis were retrieved. These epitopes were submitted to a BLASTp (Basic Local Alignment Search Tool protein) search (available at: https://blast.ncbi.nlm. nih.gov/Blast.cgi) to check for identity in humans and pigs. Epitopes with 100% identity to human or pig epitopes were discarded. The BLASTp analysis was conducted with a threshold of e-value=0.05. Each epitope was also submitted to Allertop 2.0 (36) to confirm they were not potential allergens. The best expressed protein transcript for each vaccine target, as found in the most recent genomic studies (4,19), was used to predict the presence of glycosylation sites using NetNGlyc 1.0 (38). Only epitopes that were not in direct contact with glycosylation sites were selected. These transcripts were also submitted to Protter to draw the two-dimensional (2D) protein structure using the information retrieved with Phobius (27, 39).

B-cell linear epitope prediction and selection
The presence of B-cell linear epitopes was assessed using Bepipred 2.0 webserver (available at: http://www.cbs.dtu.dk/ services/BepiPred/) (40). Using the same methodology as in the previous step, proteins that had epitopes found in extracellular domains, exposed to the host immunological system by not being in predicted glycosylation sites, predicted to be nonallergens, and present in most of the protein transcripts of the different genes were considered good vaccination targets. The two highest scored non-allergen epitopes, with a size between 8 and 40 aa, were retrieved for each protein.

Protein phylogenetic analysis
To assess the relationship between the predicted vaccine targets and their orthologs in other nematodes, a phylogenetic analysis of the predicted vaccine targets was performed including predicted orthologs/similar proteins present in annotated genomes of other parasitic nematodes and Caenorhabditis elegans available in WormBase ParaSite database (15). Proteins were selected according to the list of orthologs provided by WormBase and, if no ortholog was present for a given nematode, a BlastP analysis was carried out and proteins were retrieved according to the combination of query coverage, identity percentage and e-value. The protein sequences were aligned using MUSCLE (http://www.ebi.ac.uk/Tools/muscle/ index.html), the phylogenetic analysis was performed with Maximum-likelihood method using the JTT + G substitution model in MEGA-X (41) and the predicted trees were visualized in iTOL (42). Nodal support was tested using bootstrap values, which were calculated with 500 replicates.

Initial protein selection
From the 100,114 protein sequences extracted from the combined A. suum and A. lumbricoides proteomes, 97,510 were analyzed with the Vacceed framework for protein subcellular location. A total of 28.085 proteins were selected for further analysis (Figure 1), from which 5,984 protein sequences were from A. lumbricoides, and 5,004 and 17,097 were A. suum protein sequences from the two A. suum proteomes, BioProject PRJNA80881 and BioProject PRJNA62057, respectively. The BioProject PRJNA62057 has a higher number of proteins in its proteome due to the way it was assembled, where all the expressed transcripts were kept instead of being merged into a single protein sequence. The Vacceed final scores for each protein in the three proteomes are provided in Supplementary Table 1.
The correlation assessment of the Vacceed framework demonstrated a high correlation between the results of the different Vacceed runs (Figure 1). The removal of TMHMM from the framework resulted in a slightly lowered correlation score when compared with the original Vacceed run (r = 0.97).

Selection of vaccine targets and identification of epitopes
Before the CD4+ Th cell binding prediction was carried out, a further 2,906 duplicate sequences were identified and removed from the A. suum PRJNA62057 dataset. A total of 25,179 protein sequences were then analyzed for epitope binding to MHC-II alleles. When analyzing the final output, 32 protein sequences had epitopes predicted to bind to all the 27 MHC-II alleles in the reference set with a rank between 0.01 and 1. From this set of 32 protein sequences, 9 proteins belonged to the A. lumbricoides proteome, while the other 23 proteins were from the two A. suum proteomes, 3 from BioProject PRJNA80881 and 20 BioProject PRJNA62057. Only three of these protein sequences were predicted to be secreted proteins. A table detailing the WormBase protein identifiers and protein aa length for these 32 candidate proteins is provided in Supplementary Table 2.
The analysis with AllerTop 2.0 and AllergenFP did not predict any allergens in the 32 protein sequences. However, antigen prediction tools classified four proteins sequences as non-antigenic, and these proteins were disregarded. The remaining protein sequences that had similar results between their respective orthologs in A. lumbricoides and A. suum were grouped into five clusters, represented by distinct genes in all three Ascaris genomes. The five genes and their respective transcripts are predicted to be responsible for the transcription of four membrane transporters and one cell adhesion protein.
Four genes and respective proteins were selected as promising vaccination targets based on epitope exposure to the host immunological system: "Voltage-dependent T-type calcium channel subunit alpha" [ATtype (WormBase gene identifiers: GS_24322, ALUE_0000418301 and AgB13X_g094)], "Piezo-type mechanosensitive ion channel component" [APiezo (WormBase gene identifiers: GS_03113, ALUE_0000666901 . /fvets. . According to the most recent transcriptomic data, both ATtype and AProto are highly expressed in the ovaries while APiezo and ALtype are highly expressed in the muscle of adults (4,19). These proteins were predicted to have both CD4+ Th cell and B cell binding epitopes in extracellular areas (Figure 2). For each protein, two non-allergen CD4+ Th cell binding epitopes were selected based on the results of the Allertop 2.0, MHCII-IEDB tool and their presence in the extracellular areas of the protein (Table 1). Two B cell epitopes were also selected for each target. Supplementary Table 3 lists the epitopes found for each predicted vaccine target using MHCII-IEDB that scored between 0 and 1 and the MHC-II alleles they were predicted to bind to (only epitopes that were predicted to bind to two or more MHC-II alleles were retrieved). None of the selected epitopes were in the direct vicinity of predicted glycosylation sites (Supplementary material 4). A workflow diagram of the analysis can be seen in Figure 3.

Phylogenetic relationships of vaccine targets identified across nematode species
To analyse the potential relationship between the predicted vaccine targets and proteins in other nematodes, orthologs of the selected vaccine targets were retrieved from other nematode proteomes present in WormBase Parasite (Supplementary Table  5), aligned and used for the generation of a phylogenetic tree. There is a general clustering of the proteins within each nematode clade. The exception occurred in the APiezo orthologs where clade III nematodes were separated in two different groups, with the Ascaridomorpha nematodes (e.g., Ascaris, Toxocara and Parascaris) being more closely related to the nematodes in Clade IV and the Spiruromorpha nematodes (e.g., Onchocerca, Dirofilaria, Loa, Brugia and Wuchereria) being closer to the Clade V nematodes. These relationships can be seen in Figure 4.

Discussion
In this study we have used bioinformatic approaches to identify four different Ascaris proteins that could be included in multi-epitope vaccines against human and pig Ascaris infections. These genes are highly expressed in two distinct regions of the parasite: the ovaries and early egg stages, in the case of ATtype and Aproto, and in the muscle, the case for APiezo and ALtype (4,19). While both ATtype and ALtype are predicted to be calcium channels that promote calcium import with muscle and smooth muscle contraction through a voltage mechanism, APiezo is a mechanosensitive calcium channel that in C. elegans was shown to affect reproductive tissue development and malfunction (43). One interesting relationship shown in Figure 4B is how APiezo appears to be more closely related between blood feeding parasites, such as Necator americanus and filarial parasites such as Loa loa, suggesting a role in adaptation to contact with blood. The AProto protein is the most unique as to our knowledge, as protocadherins have not been reported before in nematodes. The structural domains appear to be more closely related to that of Flamingo/Stan cadherins due to the presence of both laminin-G and EGF-like receptors with seven transmembrane domains (with these last highlighted in Figure 2D) (4,44). Being highly transcribed in the ovaries and early egg stages, while predicted to be responsible for homophilic cell adhesion, AProto could have a role in early oocyte development. Although present in regions usually considered difficult recognize by Frontiers in Veterinary Science frontiersin.org . /fvets. .

FIGURE
Workflow and summary diagram of the reverse vaccinology approach used in this study to identify and select potential vaccine targets in Ascaris lumbricoides and A. suum. the host immune system, such as the parasite's muscle and ovaries, an IgG immunoblot assay showed that this is possible, as proteins highly transcribed in the muscle, ovaries and intestines of the parasite were recognized by serum from pigs infected with Ascaris (45). It is also interesting to highlight the close relationship between the predicted vaccine targets and the orthologs in other nematodes, suggesting that these orthologs might also be useful in the control of the respective species. Previous attempts to generate a vaccine against ascariasis used crude extracts, recombinant proteins and, more recently, chimeric proteins (9-12). The highest lung larvae burden reduction achieved using a multi-epitope or recombinant protein vaccine was 73.5%, when using a chimeric protein containing B-cell epitopes of the As14, As16 and As37 Ascaris proteins (12). In comparison, vaccination against trichuriasis in mice resulted in up to 97% reduction of adult nematodes (46). The vaccine against Trichuris is based on recombinant proteins and showed efficacies vastly superior to similar vaccines against ascariasis. With vaccine development against ascariasis lagging behind other parasitic diseases, there is a need to discover other antigens to be tested as vaccine candidates. The reverse vaccinology approach used in our study is based on genomic and proteomic data to predict which proteins may be usable as vaccine candidates prior to new in vivo studies. This methodology allows researchers to focus down vaccination assays to a smaller set of proteins, effectively reducing costs and time. This reverse vaccinology approach has been successfully applied to identify vaccination targets for other helminths, for example T. canis (14), T. muris (13), Echinococcus granulosus (47), and Schistosoma mansoni (48, 49).
Our reverse vaccinology analysis used all the proteins predicted in the three Ascaris proteomes. Previous in silico studies on vaccine candidate prediction in Ascaris focused exclusively on secreted proteins (50). The workflow applied in this study allows the testing of all the proteins predicted from a genome analysis, without automatically excluding non-secreted proteins. In a recent study, an A. lumbricoides multi-epitope vaccine candidate was developed using in silico methodology and proteins were selected based on their binding to the HLA-DRB1 * 07:01 and HLA-DRB1 * 15:01 MHC-II alleles (51). Although these MHC-II alleles are known to recognize Ascaris antigens in humans, they only cover up to 15% of the human population in areas where human ascariasis is endemic (50). This might prove detrimental in in vivo studies that cover population that do not have these MHC-II alleles, limiting its usefulness.
Each of the three genomes included in the analysis were predicted to have over 5,000 proteins that could be further investigated as good vaccine targets according to Vacceed, which corresponds to 25-29% of all the proteins present. As the number of secreted proteins in Ascaris is predicted to be 254 proteins (50), this suggests that the number of potential vaccination targets might be vastly superior to the ones that are usually investigated in these species and other helminths. The final candidates are all predicted to be nonsecreted proteins. This contrasts with most of the previously studied vaccine candidates, except for the muscle membranebound As37 protein (52). As37 was not predicted to be a good target in Vacceed (with scores of 0.001 in all three genomes), showing some limitations to the method we used. However, our predictions and the protection achieved with the As37 recombinant protein support the idea that only targeting secreted proteins can be detrimental to the selection of good vaccination targets in Ascaris. Recent work in Toxocara canis, a parasite of the same family as Ascaris, showed that membrane proteins are capable to induce protection in a mouse model, .
/fvets. . reinforcing the idea that it is an error to disregard these proteins when looking for vaccination targets in nematodes (14). The underlying host's immune responses against Ascaris are still only partially understood (53). This is a disadvantage when selecting the right tools to predict which proteins could be useful as vaccination targets. MHC-II molecules appear to have a prominent role in the control of nematode infections in mammals, including A. lumbricoides infections in humans (53,54). This role makes the discovery of epitopes that bind to these molecules a priority for the design of multi-epitope/subunitbased vaccines. The MHCII-IEDB tool was chosen for this purpose as it was used in selecting epitopes for vaccination assays against other nematodes and is one of the most accurate tools available (13,50). A reference set of 27 different alleles was chosen due to the fact that heterogenicity throughout the human population leads to different immune responses (50). Thus, we wanted to selected proteins that would be able to induce a helpful immune reaction in a large portion of the population, and not just focus on one specific allele. Unfortunately, this biases toward selecting larger proteins as they contain a larger number of epitopes. These predictions were made only for human alleles due to unavailability of similar open-access bioinformatic tools for swine MHC-II alleles. This is a limitation of the reverse vaccinology approach to discovering vaccine targets in pigs. Bepipred 2.0 predicted the presence of B cell binding epitopes in same proteins which would enhance the host immune response against these proteins in their native form in the parasite. Another feature that needs to be considered when selecting vaccination targets and respective epitopes for a multi-epitope vaccine is the presence of glycosylation sites in the native antigen. In general, glycosylation is known to play a key role in regulating T-cell activity and highly glycosylated areas in proteins will downregulate the activity of these cells (55, 56). Selecting epitopes that are not predicted to be in these glycosylation sites would then improve the chance of the host reacting to the native antigen after receiving the multi-epitope vaccine (56). However, in Haemonchus contortus vaccination assays, glycosylation was fundamental for the recognition and protection induced by some antigens (57). Glycosylation in Ascaris proteins and how it affects the host is still not entirely understood and requires more work, but it is known to be a strategy that the parasite has developed to modulate the host immune response and make it less responsive (58, 59). Another consideration is how difficult or impossible it is to mimic the protein glycosylation that happens in nematodes using the most common recombinant protein production systems, such as the ones using Escherichia coli (60). To replicate glycosylation the use of other systems (e.g., mammalian cell lines) is required that tend to be less productive and harder to maintain than bacterial systems (61).
Whilst the proteins we selected have been predicted to be useful and have similarities to other proteins tested in vaccination assays against other nematodes, there is still the need to test them in vitro and in vivo. The next step should be to confirm in vitro that both humans and pigs are able to recognize these proteins targets and their respective epitopes. Should these proteins prove useful in stimulating an immune reaction in the host, we propose that the proteins and respective epitopes identified in this work should be incorporated into a multi-epitope vaccine which, ideally, would include CD4+ Th cell and B-cell binding epitopes from other proteins, such as As14, As16 and As37. Ideally, such a vaccine should be tested using a pig model to assess its potential effect on both larvae and adult Ascaris, impossible to assess using a mouse model. This is more relevant when testing for the utility of incorporating ATtype and AProto epitopes due to protein higher expression in the ovaries of adult female Ascaris and the released eggs (4,19). Should these proteins not prove useful, a protocol optimization should be done to identify new targets so that, in the future, it might also be applied to other species that are in need of further studies.
In conclusion, this study highlights the role that reverse vaccinology and in silico methodology can play in identification of vaccine candidates for parasitic diseases. The genome-wide approach, without bias toward secreted proteins, led to the prediction of four novel candidates that were not identified in previous studies but show the promise of promoting a useful immune response in vaccination assays against A. lumbricoides and A. suum. These proteins should now be tested in vitro and in combination with already known vaccine targets. Ultimately, the findings of this study will support the future development of a vaccine against both ascariasis in humans and pigs, thus promoting the health of both populations by reducing the need to use mass-drug administration and decreasing the risk of anthelmintic resistance appearing.

Data availability statement
The original contributions presented in the study are included in the article/Supplementary materials, further inquiries can be directed to the corresponding author.

Author contributions
Concept and design, data interpretation, and critical review and editing of the manuscript: FE, AV, SL, and MB. Data acquisition and analysis and manuscript draft: FE. All authors contributed to the article and approved the submitted version.

Funding
This study was funded by a Doctoral College Studentship Award from the University of Surrey awarded to FE.