# NOVEL FRONTIERS IN HELMINTH GENOMICS

EDITED BY : Jose F. Tort, Gabriel Rinaldi, Makedonka Mitreva and Klaus Rüdiger Brehm PUBLISHED IN : Frontiers in Genetics

#### Frontiers eBook Copyright Statement

The copyright in the text of individual articles in this eBook is the property of their respective authors or their respective institutions or funders. The copyright in graphics and images within each article may be subject to copyright of other parties. In both cases this is subject to a license granted to Frontiers. The compilation of articles constituting this eBook is the property of Frontiers.

Each article within this eBook, and the eBook itself, are published under the most recent version of the Creative Commons CC-BY licence. The version current at the date of publication of this eBook is CC-BY 4.0. If the CC-BY licence is updated, the licence granted by Frontiers is automatically updated to the new version.

When exercising any right under the CC-BY licence, Frontiers must be attributed as the original publisher of the article or eBook, as applicable.

Authors have the responsibility of ensuring that any graphics or other materials which are the property of others may be included in the CC-BY licence, but this should be checked before relying on the CC-BY licence to reproduce those materials. Any copyright notices relating to those materials must be complied with.

Copyright and source acknowledgement notices may not be removed and must be displayed in any copy, derivative work or partial copy which includes the elements in question.

All copyright, and all rights therein, are protected by national and international copyright laws. The above represents a summary only. For further information please read Frontiers' Conditions for Website Use and Copyright Statement, and the applicable CC-BY licence.

ISSN 1664-8714 ISBN 978-2-88966-070-4 DOI 10.3389/978-2-88966-070-4

#### About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

#### Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

#### Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

#### What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# NOVEL FRONTIERS IN HELMINTH GENOMICS

Topic Editors:

Jose F. Tort, Universidad de la República, Uruguay Gabriel Rinaldi, Wellcome Sanger Institute (WT), United Kingdom Makedonka Mitreva, Washington University School of Medicine in St. Louis, United States Klaus Rüdiger Brehm, Julius Maximilian University of Würzburg, Germany

Citation: Tort, J. F., Rinaldi, G., Mitreva, M., Brehm, K. R., eds. (2020). Novel Frontiers in Helminth Genomics. Lausanne: Frontiers Media SA. doi: 10.3389/978-2-88966-070-4

# Table of Contents


Guillermo Lamolle, Santiago Fontenla, Gastón Rijo, Jose F. Tort and Pablo Smircich

*60 Males, the Wrongly Neglected Partners of the Biologically Unprecedented Male–Female Interaction of Schistosomes*

Zhigang Lu, Sebastian Spänig, Oliver Weth and Christoph G. Grevelding


Stephen R. Doyle, Geetha Sankaranarayanan, Fiona Allan, Duncan Berger, Pablo D. Jimenez Castro, James Bryant Collins, Thomas Crellen, María A. Duque-Correa, Peter Ellis, Tegegn G. Jaleta, Roz Laing, Kirsty Maitland, Catherine McCarthy, Tchonfienet Moundai, Ben Softley, Elizabeth Thiele, Philippe Tchindebet Ouakou, John Vianney Tushabe, Joanne P. Webster, Adam J. Weiss, James Lok, Eileen Devaney, Ray M. Kaplan, James A. Cotton, Matthew Berriman and Nancy Holroyd

*101 A Case for Using Genomics and a Bioinformatics Pipeline to Develop Sensitive and Species-Specific PCR-Based Diagnostics for Soil-Transmitted Helminths*

Jessica R. Grant, Nils Pilotte and Steven A. Williams

*111 Profiling Transcriptional Regulation and Functional Roles of* Schistosoma mansoni *c-Jun N-Terminal Kinase*

Sandra Grossi Gava, Naiara Clemente Tavares, Franco Harald Falcone, Guilherme Oliveira and Marina Moraes Mourão

*124 Complex I and II Subunit Gene Duplications Provide Increased Fitness to Worms*

Lucía Otero, Cecilia Martínez-Rosales, Exequiel Barrera, Sergio Pantano and Gustavo Salinas

#### *134 The Effect of Gut Microbiome Composition on Human Immune Responses: An Exploration of Interference by Helminth Infections* Ivonne Martin, Maria M. M. Kaisar, Aprilianto E. Wiria, Firdaus Hamid, Yenny Djuardi, Erliyani Sartono, Bruce A. Rosa, Makedonka Mitreva, Taniawati Supali, Jeanine J. Houwing-Duistermaat, Maria Yazdanbakhsh and

#### Linda J. Wammes *144 Genomic Epidemiology in Filarial Nematodes: Transforming the Basis for Elimination Program Decisions*

Shannon M. Hedtke, Annette C. Kuesel, Katie E. Crawford, Patricia M. Graves, Michel Boussinesq, Colleen L. Lau, Daniel A. Boakye and Warwick N. Grant

# Editorial: Novel Frontiers in Helminth Genomics

#### Jose F. Tort <sup>1</sup> \*, Makedonka Mitreva<sup>2</sup> , Klaus R. Brehm<sup>3</sup> and Gabriel Rinaldi <sup>4</sup>

<sup>1</sup> Department of Genetics, Faculty of Medicine, University of the Republic, Montevideo, Uruguay, <sup>2</sup> Department of Medicine, Washington University School of Medicine, St. Louis, MO, United States, <sup>3</sup> Institut für Hygiene und Mikrobiologie, Julius Maximilian University of Würzburg, Würzburg, Germany, <sup>4</sup> Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, United Kingdom

Keywords: flatworm, nematodes, genomics, helminths, neglected diseases

#### **Editorial on the Research Topic**

#### **Novel Frontiers in Helminth Genomics**

At the brink of the third decade of the twenty-first century, almost a quarter of the world's population are infected by parasitic worms, mainly in deprived areas worldwide. Despite huge and promising efforts from the World Health Organization, there is still much road to cover for the control of these highly prevalent neglected diseases. The current issue highlights how the knowledge of helminth genomics is being translated into novel strategies for diagnosis, control, and the general understanding of parasite biology and adaptation.

Determining infection burdens precisely is critical for surveillance, to evaluate the impact of control programs and to make informed decisions in helminth eradication programs. Two articles in this issue demonstrate how genomics can contribute to these efforts. Grant et al. show that it is feasible to exploit available genomic information to identify species-specific repeated sequences in order to generate qPCR-based diagnostic assays for diverse nematodes. A comparison between these approaches and traditional parasitological methods in soil-transmitted helminths field tests is evaluated and discussed.

Hedtke et al. highlight the relevance of the epidemiological perspective based on genomics in control program decision making. For filarial parasites in Africa, they show that the combination of genomic epidemiology and genome-wide associations could be helpful in defining transmission zones, identifying local or introduced parasites as sources of reinfection and distinguishing genetic markers associated with parasite response to chemotherapy. These are key elements to elaborate surveillance and eradication strategies.

A relevant aspect for these epidemiological approaches is the ability to obtain relevant parasitic samples from natural infections, usually in difficult field conditions where the quantity and quality of samples are limited. Doyle et al. describe the optimization of different methods for extraction and sequencing of high-quality DNA from minimal amounts of parasitic material in several nematodes and trematodes. Although difficult, the authors prove that it is feasible to extract DNA from single parasite eggs and early larval stages, a breakthrough in epidemiological surveillance and genetic population studies, among other relevant applications.

Host-parasite interactions have always caught the attention of parasitologists. It is welldocumented that parasites modulate the immune responses of their hosts, but in this issue Martin et al. extend this view by including the gut microbiome. Based on a randomized placebo-controlled anthelminthic trial, the work is the first to analyze changes in the gut microbiome, the presence of parasitic helminths and whole blood cytokine responses in parallel, and opens the avenue to a novel and more complex picture of host-parasite interaction.

The work of Jasmer et al. is a good example of multi-omics approaches applied to understand the complexity of parasitic helminths at the molecular level. They focus on the nematode intestine as

#### Edited and reviewed by:

John R. Battista, Louisiana State University, United States

> \*Correspondence: Jose F. Tort jtort@fmed.edu.uy

#### Specialty section:

This article was submitted to Evolutionary and Genomic Microbiology, a section of the journal Frontiers in Genetics

Received: 12 March 2020 Accepted: 03 July 2020 Published: 14 August 2020

#### Citation:

Tort JF, Mitreva M, Brehm KR and Rinaldi G (2020) Editorial: Novel Frontiers in Helminth Genomics. Front. Genet. 11:791. doi: 10.3389/fgene.2020.00791

**5**

a target for therapies, collecting and combining genomics, transcriptomics, proteomics, and miRNA profiling data from representative species of different nematode clades that live and feed on diverse locations within the host digestive tract. The review provides a comprehensive view of the advancements, methods and resources available to analyze gene and protein expression and regulation on the nematode intestine.

Translational research that involves mining available genomic data to understand relevant biological processes is exemplified by the work of Otero et al.. They have taken advantage of available genomes to analyze the basis of alternative mitochondrial electron transport chains used by helminths in hypoxic conditions, showing that although the mechanism seems to be conserved, different gene duplications of the genes involved in the pathway are evident in diverse lineages of both nematodes and flatworms, and evaluated their significance with functional analysis in C. elegans.

Reliable methods of functional genomics are needed for analyzing gene function. Lok reviews the methodologies in DNA and RNA transformation in parasitic nematodes, and the seminal works that set proof of principles of CRISPR-CAS9 mutagenesis in strongilyd worms. The value of dominant mutations and regulable Cas9 expression in gene functional validation in parasitic nematodes are thoroughly discussed.

Two articles in this issue are focused on the use of genomics to analyze evolutive aspects of helminths. Compositional features of flatworm genomes are analyzed in detail by Lamolle et al.. Extreme biases in base composition are correlated to profound preferences in codon usages in diverse lineages, with differences between and within free living and parasitic flatworms. The evolutionary implications of these findings are discussed.

By sequencing the tapeworm Echinococcus oligartus, and comparing with other species responsible for cystic, multicystic and alveolar echinococcosis, Maldonado et al. provide a novel genomic perspective of cestode evolution, showing that this sylvatic species responsible for unicystic echinococcosis is basal to the genera, and highlighting the relevance of studying sylvatic species.

The use of transcriptomics approaches to shine a light on diverse biological phenomena in helminths is illustrated by works on trematodes. One of the more intriguing aspects in helminth biology is the existence of separate sexes in schistosomes, and the need of pairing for female sexual maturation. While several efforts have focused on the egg-producing female, Lu et al. review the male partner of the couple, highlighting genes related to nervous system and signaling pathways in the complex nature of this interaction.

Helminth parasite survival is supported by massive egg production. Grossi Gava et al. use a combination of RNAi and transcriptomics to demonstrate the involvement of cJun terminal kinases in egg production. Again, using comparative genomics they show that orthologous genes in the nematode C. elegans are related to sterility and oocyte maturation.

Regulation of gene expression by diverse RNA mediators is now a central topic in biology, and helminths are no exception. Maciel et al. investigated long non-coding RNAs (lncRNAs) in S. mansoni by reanalyzing more than 600 RNAseq libraries. Their extensive characterization shows that lncRNA expression is regulated epigenetically, and they identified sets of correlated gene co-expression modules comprising lncRNA and mRNAs across diverse stages.

In brief, this Research Topic highlights the diversity and richness of genomic-based perspectives used to advance the study of critical aspects of the biology of helminths, that still today inflict much suffering to the most deprived in the world.

## AUTHOR CONTRIBUTIONS

All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.

**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Tort, Mitreva, Brehm and Rinaldi. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# CRISPR/Cas9 Mutagenesis and Expression of Dominant Mutant Transgenes as Functional Genomic Approaches in Parasitic Nematodes

*James B. Lok\**

*Department of Pathobiology, School of Veterinary Medicine, University of Pennsylvania, Philadelphia, PA, United States*

DNA transformation of parasitic nematodes enables novel approaches to validating predictions from genomic and transcriptomic studies of these important pathogens. Notably, proof of principle for CRISPR/Cas9 mutagenesis has been achieved in *Strongyloides* spp., allowing identification of molecules essential to the functions of sensory neurons that mediate behaviors comprising host finding, invasion, and location of predilection sites by parasitic nematodes. Likewise, CRISPR/Cas9 knockout of the developmental regulatory transcription factor *Ss-daf-16* has validated its function in regulating morphogenesis of infective third-stage larvae in *Strongyloides stercoralis*. While encouraging, these studies underscore challenges that remain in achieving straightforward validation of essential intervention targets in parasitic nematodes. Chief among these is the likelihood that knockout of multifunctional regulators like *Ss*-DAF-16 or its downstream mediator, the nuclear receptor *Ss*-DAF-12, will produce phenotypes so complex as to defy interpretation and will render affected worms incapable of infecting their hosts, thus preventing establishment of stable mutant lines. Approaches to overcoming these impediments could involve refinements to current CRISPR/Cas9 methods in *Strongyloides* including regulatable Cas9 expression from integrated transgenes and CRISPR/Cas9 editing to ablate specific functional motifs in regulatory molecules without complete knockout. Another approach would express transgenes encoding regulatory molecules of interest with mutations designed to similarly ablate or degrade specific functional motifs such as the ligand binding domain of *Ss*-DAF-12 while preserving core functions such as DNA binding. Such mutant transgenes would be expected to exert a dominant interfering effect on their endogenous counterparts. Published reports validate the utility of such dominant-negative approaches in *Strongyloides*.

Keywords: transgenesis, parasitic nematode, CRISPR/Cas9, dominant transgene, mutagenesis

## INTRODUCTION

The advent of transgenesis in parasitic nematodes has opened new means of ascertaining the functions of specific genes in these important pathogens by both gain- and loss-of-function approaches. The first successes in this area involved translating methods for transgenesis in *Caenorhabditis elegans* to parasites in the genera *Strongyloides* and *Parastrongyloides*, which were apt subjects given their

#### *Edited by:*

*Makedonka Mitreva, Washington University Medical Center, United States*

#### *Reviewed by:*

*Anil Kumar Challa, University of Alabama at Birmingham, United States Rui Chen, Baylor College of Medicine, United States*

> *\*Correspondence: James B. Lok jlok@vet.upenn.edu*

#### *Specialty section:*

*This article was submitted to Evolutionary and Genomic Microbiology, a section of the journal Frontiers in Genetics*

*Received: 28 March 2019 Accepted: 21 June 2019 Published: 16 July 2019*

#### *Citation:*

*Lok JB (2019) CRISPR/Cas9 Mutagenesis and Expression of Dominant Mutant Transgenes as Functional Genomic Approaches in Parasitic Nematodes. Front. Genet. 10:656. doi: 10.3389/fgene.2019.00656*

**7**

capacity to undertake one or more generations of free-living development. Transgenesis in *Strongyloides* spp. has enabled targeted mutagenesis by CRISPR/Cas9, and this method has already found application in studies that assigned function to genes regulating morphogenesis and host-finding behavior in infective larvae. Transgenes designed to exert dominant interfering effects over specific genes have also revealed essential functions of these genes in growth and developmental of preparasitic larvae of *Strongyloides stercoralis*. Robust new methods for integrative transgenesis in *Brugia malayi* promise to open these and other avenues of functional genomic study in the filariae and potentially in other obligately parasitic nematodes. None of these functional genomic approaches are new to the basic fields of cellular and molecular biology, but their application to parasitic helminths marks a recent innovation that ranks among the salient advances in parasitology in the past two decades. In view of this, the following review discusses these current findings in detail and proposes new directions in their application to increasingly informative and practical studies of gene function in a group of pathogens that impose the risk of morbidity and mortality on a large segment of the world's population and exact a heavy toll on the health and welfare of domestic animals.

### TRANSFORMATION WITH DNA OR RNA CONSTRUCTS IS A PREREQUISITE TO DEPLOYING CONTEMPORARY METHODS FOR ASSESSMENT OF GENE FUNCTION IN PARASITIC NEMATODES

#### Approaches to Transgenesis in Clade III and IV Parasitic Nematodes

Parasites in the genus *Strongyloides* and *Parastrongyloides* were the first to be successfully transformed with plasmid-based constructs (Lok and Massey, 2002; Grant et al., 2006a; Li et al., 2006; Li et al., 2011). The main factor allowing this advancement was these parasites' unique ability to execute one or more generations of development as free-living males and females with their progeny in the soil (Viney and Lok, 2015; Grant et al., 2006b). These free-living cycles provide access to the adult germlines of *Strongyloides* and *Parastrongyloides* that is unavailable in obligately parasitic nematodes. Free-living generations of *Strongyloides* and *Parastrongyloides* can be maintained in agar plate culture using methods adapted from those used for the free-living nematode *C. elegans* (Grant et al., 2006a; Lok and Unnasch, 2013). *Parastrongyloides trichosuri* can execute an indefinite number of sequential free-living generations in plate culture, enhancing its utility as a subject for genetic study. Most crucially, the body plans of free-living female *Strongyloides* and *Parastrongyloides* and of *C. elegans* hermaphrodites are so similar that it was straightforward to adapt methods for gene delivery by **gonadal microinjection** from *C. elegans* (Fire, 1986; Fire et al., 1990a; Fire et al., 1990b; Mello et al., 1991; Mello and Fire, 1995) to the parasites (**Figure 1A**) (Lok and Massey, 2002; Grant et al., 2006a; Li et al., 2006; Li et al., 2011). A similar approach was used to achieve transgenesis in *Strongyloides stercoralis* by microinjection of plasmid constructs into the testes of free-living males (Shao et al., 2017). These approaches, involving transfer of plasmidbased vector constructs, allow promoter-regulated, tissuespecific transgene expression in F1 transformants (**Figure 1A**) (Grant et al., 2006a; Li et al., 2006; Junio et al., 2008; Li et al., 2011; Shao et al., 2017), and integration of transgene sequences into the chromosomes of *Strongyloides ratti* permits establishment of stable transgenic lines of this parasite (**Figure 1A**) (Shao et al., 2012). These methods have enabled studies of gene function in *Strongyloides* and *Parastrongyloides* that have revealed anatomical and temporal patterns of gene expression and essential functions of specific genes in development and survival of the pre-infective stages of these parasites (**Figure 1A**) (Massey et al., 2003; Grant et al., 2006a; Junio et al., 2008; Castelletto et al., 2009; Stoltzfus et al., 2012; Massey et al., 2013; Yuan et al., 2014a; Yuan et al., 2014b; Lei et al., 2017; Yuan et al., 2017).

Significant strides have also been made towards transgenesis in the filaria *Brugia malayi*. This parasite, and all filariae, are far more challenging subjects for molecular genetic study, as they have no free-living generation comparable to *Strongyloides* and related genera, and furthermore, unlike soil transmitted parasitic nematodes, their pre-infective stages require an arthropod vector for development. These barriers notwithstanding, transient DNA transformation of *B. malayi* (Higazi et al., 2002; Higazi and Unnasch, 2013) and *Litomosoides carinii* (Jackstadt et al., 1999) has been achieved by particle bombardment or biolistics (**Figure 1B**). In the case of *B. malayi*, the primary object of this approach has been to investigate the structure and function of gene regulatory sequences by transforming embryos within the uteri of female worms with plasmid constructs encoding fluorescent reporters linked to wild-type or mutant regulatory sequences, promoters mainly, and evaluating function by quantifying reporter activation. These bombarded embryos are not developmentally competent, but this approach has, nevertheless, yielded a wealth of basic information on the structural motifs required for promoter activity (Shu et al., 2003; Higazi et al., 2005; de Oliveira et al., 2008; Bailey et al., 2011), trans-splicing (Higazi and Unnasch, 2004; Liu et al., 2007; Liu et al., 2010a), and operon function (Liu et al., 2010b) in *B. malayi*. This approach to transient transgenesis has also revealed the function of microRNAs in regulation of *B. malayi* genes (Liu et al., 2015), the presence of tetracycline regulatable promoters (Liu et al., 2011), and the transcriptional targets of ecdysone signaling in this parasite (Tzertzinis et al., 2010; Liu et al., 2012). Notably, the approach of transiently transfecting embryos has also been applied to *Ascaris suum* (**Figure 1E**) (Davis et al., 1999).

Stable, heritable transformation has been achieved in *B. malayi* by chemically mediated transfer of plasmid vector constructs into developing infective third-stage larvae (L3i; **Figures 1C**, **D**). Initial applications of this approach involved forming calcium phosphate co-precipitates with plasmid DNA constructs encoding a luciferase (GLuc) reporter (Xu et al., 2011). Mosquito-derived infective third-stage larvae (iL3) were then co-injected along with the calcium phosphate plasmid DNA co-precipitate into the peritoneal cavities of susceptible gerbils. Substantial proportions of *B. malayi* adults and their microfilarial progeny recovered from gerbils inoculated with transduced iL3

transformed F1 larvae may be investigated for reporter expression patterns or phenotypes induced by dominant transgenes or CRISPR/Cas9-mediated gene disruption or editing. Lines of parasites with transgenes integrated by the co-injected *piggyBac* transposon system may be selected and maintained by alternating host and culture passage. (B) Biolistic transfer of reporter DNA constructs into *Brugia malayi* embryos within adult female worms has enabled definition of basic gene regulatory elements and has identified elements for ecdysteroid signaling in this parasite. Embryos transformed in this manner have thus far been incapable of further development. (C) Chemically mediated vector transfer in *B. malayi* by co-injection of parental infective third-stage larvae (L3i) into susceptible gerbils. Transgenic adult parasites and transgenic F1 microfilariae result from this procedure. (D) Refinement of chemically mediated transfection involving lipofectant transfer of plasmid vectors with the *piggyBac* transposon system to parental L3i of *B. malayi* undergoing the molt to L4 in culture and then inoculated intraperitoneally in to susceptible gerbils. F1 transgenic microfilariae include integrants and may prove amenable to selection of stable lines by mosquito and gerbil passage. (E) Biolistic transfer of DNA and RNA into embryos of *Ascaris suum*. This approach has enabled studies of RNA processing in these parasites.

expressed GLuc, proving inheritance of the reporter transgene in the F1 generation (**Figure 1C**). This method was subsequently refined such that DNA transfer was achieved by lipofection of mosquito-derived L3i of *B. malayi* that were undergoing the L3i-L4 molt in co-culture with bovine embryo skeletal muscle cells (Liu et al., 2018). Such cultured larvae co-transformed with GLuc and reporters encoding fluorescent proteins and flanked by inverted tandem repeats specific for the *piggyBac* transposon system as well as a plasmid encoding the *piggyBac* transpose were capable of establishing patent infection in susceptible gerbils (**Figure 1D**). Analysis adult *B. malayi* and F1 microfilariae revealed transgene expression and also integration of transgene sequences into the parasite genome. Altogether, this represents a remarkable advancement in functional genomic methodology for filariae and for all parasitic nematodes. For the former, it marks the first integrative transformation of *B. malayi*, the most widely used model in experimental filariasis, and the first deployment of fluorescent reporters that will allow rapid visual selection of transgenic parasites and assessment of temporal and anatomical patterns of transgene expression. For all parasitic nematodes, the work on *B. malayi* identifies chemically mediated DNA transfer as an alternative to gonadal microinjection as a means of transgenesis. This method has the advantage of DNA transfer into the larval germline, making it applicable to obligate parasitic nematodes in which the adult germline is not as easily accessible as it is in the free-living adults of *Strongyloides* and related parasites. **Chemically mediated gene transfer** has the additional and paramount advantage of being a "handsoff " method not requiring the meticulous and painstaking process of microinjection and having the potential for high capacity transformation of hundreds or thousands of parasites in a single replication of the procedure (**Figure 1B–E**). The common factor in success of the two chemically mediated approaches used to date in *B. malayi* (Xu et al., 2011; Liu et al., 2018) appears to have been that DNA transfer was undertaken during an intermolt period, in this case the L3–L4 molt, in which the nascent cuticle was transiently competent to take up either the calcium phosphate–DNA co-precipitate or the lipofectamine–DNA micelles. Parasitic nematodes such as hookworms or trichostrongyles and related rodent parasites such as *Nippostrongylus brasiliensis* and *Heligmosomoides polygyrus*, whose pre-parasitic larvae develop in the soil or on herbage, would seem to be particularly amenable to this approach, as they can be cultured through two molt cycles (L1–L3) in simple laboratory media. The same is true of *Strongyloides* and related genera, where the combination of life cycles containing both freeliving and parasitic generations and a hands-off, high-capacity method of DNA transfer would make a particularly powerful model for molecular genetics in parasitic nematodes.

### FUTURE DIRECTIONS IN TRANSGENESIS FOR PARASITIC NEMATODES

#### Gene Editing *via* Transgenesis in Pre-Parasitic Larvae of Soil-Transmitted Parasitic Nematodes

The microinjection-based methods of gene transfer into free-living females of *Strongyloides* spp. and of *Parastrongyloides trichosuri* have not been adapted to obligately parasitic nematodes whose pre-parasitic larvae develop in and are transmitted from the soil or vegetation and whose adult stages reside solely in vertebrate animal host. These include the hookworms and trichostrongyles of paramount importance to human medicine and animal agriculture. While the adult germlines of these parasites are not readily accessible for gene transfer, the rudimentary gonads of their pre-parasitic larvae, which are easily cultured in the laboratory, are accessible. Certainly, the chemically mediated and lipofectantfacilitated systems of gene transfer that have been so successful for *ex vivo* iL3 of *Brugia malayi* (Xu et al., 2011; Liu et al., 2018) should be investigated as possible approaches to gene transfer into hookworms and trichostrongyles of humans and domestic animals and their rodent counterparts that are important subjects for immunological research (Bouchery et al., 2017).

In addition to these "hands-off " approaches to gene transfer, it is also possible to transfer CRISPR/Cas9 elements into preparasitic larvae of soil-transmitted parasites by microinjection (Adams et al., 2019). Mutations were targeted to the *rol-6* ortholog in *S. stercoralis* by microinjecting a nucleoprotein comprising Cas9 and a target specific gRNA along with lipofectamine into the body cavities of iL3. This approach produced P0 mutant larvae exhibiting a classic "roller" phenotype identical to that seen in *rol-6* mutants of *C. elegans*. This approach might be adapted to the pre-infective larvae of other soil-transmitted parasitic nematodes as well.

#### Deployment of "Hands-Off", High-Capacity Gene Transfer in *Strongyloides* and Other Soil-Transmitted Parasitic Nematodes

A major shortcoming of gene transfer by gonadal microinjection in *Strongyloides* and related parasites has been the limited numbers of F1 transgenic larvae that can be produced by this approach (**Figure 1A**). This creates difficulties in amassing samples of transgenic larvae of sufficient size to achieve statistical power and is a particular hurdle to the establishment of transgenic lines of *S. stercoralis*, where the most susceptible rodent host, the gerbil, requires an approximate minimum of 100 transgenic larvae to mount a patent infection yielding transgenic F2 progeny. This situation calls for a concerted effort to achieve chemically or lipofectant-mediated gene transfer into pre-infective larvae of *Strongyloides* spp. in a manner similar to the successful methods now available for *ex vivo* iL3 of *Brugia* spp. (Xu et al., 2011; Liu et al., 2018). It seems likely that these "hands-off " methods (**Figure 1B–E**) could be similarly deployed for gene transfer into pre-parasitic larvae of hookworms, trichostrongyles, and their rodent-infecting counterparts.

### CRISPR/CAS9 MUTAGENESIS IS FEASIBLE IN PARASITIC NEMATODES

DNA transformation of parasitic nematodes opens the formal possibility of deploying CRISPR/Cas9 for targeted mutagenesis as a means of directly assessing gene function in these pathogens. Proof of principle for **CRISPR/Cas9 mutagenesis** has been achieved in *Strongyloides stercoralis* and *Strongyloides ratti*. This includes preliminary evidence of precisely targeted double-stranded DNA breaks (DSB) and homology-directed repair with a co-transfected DNA template (Lok et al., 2017) and comprehensive demonstration and phenotypic evaluation of targeted disruption of *Ss-unc-22* and *Sr-unc-22* in *S. stercoralis* and *S. ratti,* respectively (Gang et al., 2017). Mutations have been created in *Strongyloides* by transducing free-living females with basic CRISPR elements, including gRNA, Cas9 codonoptimized for expression in *Strongyloides*, and selectable markers encoded in plasmid vectors (Gang et al., 2017; Lok et al., 2017) or by microinjecting gonads of free-living females with pre-formed nucleoproteins comprising recombinant Cas9 and *in vitro* transcribed gRNAs (Gang et al., 2017; Adams et al., 2019). To date, transduction of parental worms with plasmid-encoded CRISPR/Cas9 elements has resulted in more efficient mutagenesis than is seen with delivery of pre-formed nucleoproteins (Gang et al., 2017).

Informative phenotypes have been associated with both large deletions at those sites mediated by non-homologous end joining (NHEJ) and with insertional mutagenesis *via* homologydirected repair (HDR) of CRISPR/Cas9-induced DSB. Most notably, a prominent "twitcher" phenotype, which is exacerbated by nicotine exposure, results from CRISPR/Cas9 disruption of *Ss-unc-22* and *Sr-unc-22*. These genes, like their *C. elegans* ortholog, encode the intracellular muscle protein twitchin (Moerman and Baillie, 1979; Moerman et al., 1988; Gang et al., 2017). The *unc-22* phenotypes in *Strongyloides* spp. are highly reminiscent of the twitcher phenotype resulting from mutations in C. *elegans unc-22.* Notably, the *C. elegans* twitcher phenotype is dominant, occurring in both homozygous and heterozygous *unc-22* mutants. Homology-directed repair has been exploited to integrate short disrupting oligonucleotides containing multiple stop codons and novel priming sites for genotyping at the site of CRISPR/Cas9-induced DSBs (Lok et al., 2017) as well as a full reporter cassette encoding a fluorescent reporter that serves to both disrupt the target gene and to provide a convenient marker for mutant selection (Gang et al., 2017). In the latter instance, F1 larvae expressing the red fluorescent protein mRFPmars under the body wall-specific promoter for *Ss-act-2* from the repair template in an episomal array or from the template integrated into a CRISPR/Cas9 cut site in *Ss-unc-22* could be distinguished by PCR using primers specific for the repair template and the genomic region flanking the cut site and by the consistent pattern of body wall expression in integrants (Gang et al., 2017). Integrants in this case exhibited a uniform twitcher phenotype, providing proof of principle for the use of reporter-encoding repair templates to disrupt target genes by CRISPR/Cas9- and HDR-mediated integration.

Most significantly, CRISPR/Cas9-induced mutations in *S. stercoralis* are heritable and may be propagated by passage in gerbils, which constitute the most authentic rodent model of the entire spectrum of *S. stercoralis* infection and disease in humans (Nolan et al., 1993; Gang et al., 2017).

In addition to the studies of *Ss-unc-22*, which demonstrated that its encoded muscle component is required for normal motility of iL3 (Gang et al., 2017), CRISPR/Cas9 and the *Strongyloides* model have found application in studies of genes involved in the sensory biology of these infective larvae. Soil transmitted parasitic nematodes such as hookworms and *Strongyloides* spp. invade the host by skin penetration and develop as free-living pre-parasitic larvae in contaminated soil. iL3 of these parasites orient toward potential hosts using a complex of chemical and physical cues (Bryant and Hallem, 2018b; Castelletto et al., 2014). Orienting towards a thermal source is a behavior that many species of environmentally motile parasitic nematode iL3 use to acquire a host (Castelletto et al., 2014; Bryant and Hallem, 2018a; Bryant and Hallem, 2018b; Bryant et al., 2018). In *C. elegans*, thermal signals are received by receptor guanylate cyclases in sensory neurons of the amphidial complex, which are in contact with the external environment. These receptors signal through a heteromeric cGMP-activated cation channel, comprising TAX-2 and TAX-4 subunit proteins, that regulates polarization of the sensory neuron and potentiation of thermotactic responses (Komatsu et al., 1996; Bargmann and Mori, 1997; Komatsu et al., 1999). The *tax-4* ortholog in *S. stercoralis* has been disrupted by CRISPR/Cas9, and knockout iL3 exhibit diminished themotaxis towards 34°C compared to controls derived from parental worms receiving all CRISPR/Cas9 elements except the Cas9 endonuclease. This provides direct evidence that *tax-4* is required for normal thermotaxis by iL3 of this parasite (Bryant and Hallem, 2018a; Bryant and Hallem, 2018b; Bryant et al., 2018). Given the involvement of the TAX-2/TAX-4 channel in other sensory signaling pathways in free-living nematodes (Komatsu et al., 1996; Komatsu et al., 1999), it is likely that *S. stercoralis tax-4* knockouts will exhibit decrements in chemotactic as well as thermotactic behaviors when these mutants are subjected to the appropriate assays (Bryant and Hallem, 2018b; Bryant et al., 2018).

## FUTURE DIRECTIONS FOR CRISPR/CAS9 IN PARASITIC NEMATODES

### Deployment of CRISPR/Cas9 in Filariae

Robust systems for both integrative (Liu et al., 2018) and nonintegrative (Xu et al., 2011) transgenesis are now available for *Brugia malayi*. With these systems in place, targeted mutagenesis and gene editing in this important model of human filariasis are imminently possible. The toolkit for integrative transgenesis assembled by Liu et al. (Liu et al., 2018) comprises a set of multifunctional vectors encoding multiple elements of the *piggyBac* transposon system that could be readily adapted to encode CRISPR/Cas9 elements such as the Cas9 endonuclease, gene-specific gRNAs, and templates for homology-directed repair in a compact and efficient gene delivery system. The time is right for this advancement.

## Single-Copy Transgene Integrations

Among the very significant findings that arose from proof of principle for CRISPR/Cas9 in *Strongyloides* (Gang et al., 2017) was that full gene coding sequences, in this case an *Ss-act-2* transcriptional reporter encoding mRFPmars, can be integrated into the parasite genome at CRISPR/Cas9 cut sites by homologydirected repair. This approach provides a distinct advantage over current transposon-based methods for transgene integration, which integrate multiple transgene copies at random sites in the genome. Integrations by *piggyBac* typically favor coding sequences, creating the possibility of confounding insertional mutagenesis (Shao et al., 2012; Lok, 2013).

### Regulatable CRISPR/Cas9

The capability to target transgene integrations to precise locations within the *Strongyloides* genome by CRISPR/Cas9 creates the potential to establish lines of these parasites that stably express the Cas9 endonuclease from single transgene copies. Parasites from such lines could subsequently be transformed to express gene-specific gRNAs to disrupt target genes and assess resulting phenotypes. Constitutive expression of Cas9 in *C. elegans* exerts some deleterious effects on fitness of the transgenic worms (Waaijers et al., 2013). Likewise, disrupting genes essential for invasion or establishment of the worms in mammalian hosts would prevent propagation of mutants and assessment of phenotypes affecting host infectivity, or crucial host–parasite interactions. A solution to these problems could be conditional or inducible Cas9 expression from integrated transgene sequences in stable parasite lines. We envision two possible approaches to regulating Cas9 expression in this context. The first would be to express Cas9 under promoters that are activated in the presence of a small molecule regulator such as tetracycline (Tet). Tetresponsive genes have been identified in *Brugia malayi* (Liu et al., 2011; Wurmthaler et al., 2019), and a tet-regulated promoter has been deployed for inducible transgene expression in *C. elegans*  (Liu et al., 2011; Wurmthaler et al., 2019). These findings argue for identifying Tet-responsive genes in *Strongyloides* that could be similarly incorporated into **regulatable transgene expression**

systems generally and particularly for the regulated expression of Cas9 in stable parasite lines.

Another approach to regulatable expression of Cas9 and other transgene-encoded factors would be to fuse these to degradation domains that are stabilized in the presence of small molecules. Under these conditions, the transgene product would be stabilized in the presence of the small molecule but targeted to the proteasome for degradation in its absence. Such a system, incorporating the degradation domain of dihydrofolate reductase, which is stabilized in the presence of the folate-targeting drug trimethoprim, has been used to regulate transgene expression in *Plasmodium falciparum* (Muralidharan et al., 2011), and it has shown promise in preliminary experiments with transgenic Strongyloides (Lok et al., 2017).

### TRANSGENES WITH DOMINANT MUTATIONS ARE TOOLS FOR FUNCTIONAL GENOMIC STUDY IN PARASITIC NEMATODES.

#### Functions of Genes Encoding a Variety of Catalytic or Regulatory Proteins may be Interrogated With Dominant Interfering Transgenes

Prior to the advent of targeted mutagenesis, transgenesis in *Strongyloides* spp. enabled a functional genomic approach based on expression of transgenes encoding mutant proteins designed to exert a dominant interfering effect on the target of interest. This approach may be used to suppress the functions of catalytic, signaling, and regulatory proteins that comprise separate functional domains and domains that bind to substrates, ligands, or DNA (**Figure 2**). The strategy holds that when overexpressed from multi-copy arrays relative to the single-copy endogenous target in the subject organism, transgene-encoded mutant proteins that have disrupted functional domains and intact binding domains will out-compete their endogenous wildtype counterparts for binding partners but fail to undertake their putative functions, thus resulting in dominant loss of function in the target gene of interest. Basic mechanisms for this approach to simulate loss of function in target genes are illustrated for cytoplasmic signaling kinases and metabolic enzymes (**Figure 2A**), membrane receptor kinases (**Figure 2B**), and transcription factors (**Figure 2C**).

#### A Dominant Transgene Approach has been Used to Investigate the Function of *Ss*-DAF-16 in *S. Stercoralis*

The gene encoding this transcription factor is the ortholog of *daf-16* in *C. elegans*, which encodes a forkhead transcription factor that is regulated by insulin-like signaling and controls lifespan and the switch between dauer and continuous larval development (Lin et al., 1997; Ogg et al., 1997; Lee et al., 2001; Massey et al., 2003; Massey et al., 2006). A loss-of-function approach to interrogating the function of *Ss-daf-16* employed a dominant interfering construct. In this construct, AKT phosphorylation sites necessary for cytoplasmic localization of the protein were disrupted and the C-terminal domain of the molecule was truncated sufficiently to delete predicted transactivator binding sites (**Figure 2C**). Sequence encoding the DNA-binding domain of the mutant *Ss*-DAF-16 was left intact. The putative **dominant interfering transgene** also fused the coding sequence of the mutant *Ss-daf-16* with that of *gfp* to confirm constitutive nuclear localization of the encoded protein (Castelletto et al., 2009). This construct, therefore, was designed to express a mutant form of *Ss*-DAF-16 that is constitutively localized to the nucleus, but lacks the transactivating functions of the wild-type transcription factor. We also assume that the mutant transgene, like other plasmid-encoded transgenes introduced to nematodes by gonadal microinjection, will be incorporated into a multi-copy episomal array (Mello et al., 1991) and therefore overexpressed relative to the endogenous single-copy target gene (**Figure 2C**). Indeed, a proportion of F1 progeny of free-living female *S. stercoralis* microinjected with the mutant *Ss-daf-16* construct exhibited pronounced nuclear localization of GFP and a range of phenotypes including severe defects in intestinal morphology, loss of secretory granules from intestinal cells, retention of rhabditiform pharyngeal morphology in third-stage larvae (L3), and initiation of an aberrant L3–L4 molt (Castelletto et al., 2009). These last two phenotypes are particularly noteworthy given that 100% of wild-type progeny of free-living *S. stercoralis* females arrest their development as infective third-stage larvae, which have filariform, as opposed to rhabditiform morphology (Schad, 1989; Viney and Lok, 2015). In contrast to parasites expressing the putative dominant interfering construct, F1 progeny transformed with a control construct fusing the wildtype *Ss-daf-16* coding sequence to *gfp* exhibited normal intestinal architecture, with intestinal cells replete with secretory granules, filariform pharyngeal morphology in L3, and no evidence of L3– L4 molting (Castelletto et al., 2009).

#### A Dominant Interfering Transgene Construct was Used to Assess the Function of *Ss-riok-1* in *S. Stercoralis*

*Ss-riok-1* encodes a RIO protein kinase homologous to RIOK-1 in *C. elegans* (Yuan et al., 2014b). RIO kinases are conserved in eukaryotic organisms and are essential for ribosomal biogenesis and cell cycle progression (Angermayr et al., 2002; Widmann et al., 2012; Yuan et al., 2014b). The putative dominant interfering construct encoded *Ss-riok-1* with a D282A mutation in its catalytic site that served to disrupt kinase activity. Sequence encoding the substrate binding site of *Ss-riok-1* was left intact in the putative dominant interfering construct (**Figure 2A**). As with the *Ss-daf-16* constructs described above, both the wildtype and mutant constructs encoding *Ss*-RIOK-1 fused its coding sequence to that of *gfp* to confirm an authentic pattern of expression under the *Ss-riok-1* promoter. As expected, both mutant and wild-type constructs were expressed in head and tail neurons and hypodermal cells of transgenic *S. stercoralis*  larvae (Yuan et al., 2014b; Yuan et al., 2017). It was also expected that, by virtue of its situation in a multi-copy episomal array,

for dominant interfering transgenes targeting functions of metabolic enzymes and cytoplasmic signaling kinases. Overexpression of mutant construct with intact substrate binding domain and catalytic site disrupted by mutation serves to outcompete endogenous gene product for substrate while yielding no phosphorylated or enzyme metabolized product. (B). Design of dominant interfering transgenes targeting functions of membrane receptor kinases. Ablation of the kinase domain by mutation, an intact ligand binding domain and overexpression combine to out-compete the endogenous receptor kinase for ligand while failing to phosphorylate downstream cytoplasmic signaling elements. (C). Design of dominant interfering transgenes targeting the action of transcription factors. Overexpression of a mutant transgene encoding an intact DNA binding domain, mutations in phosphorylation sites to effect constitutive nuclear localization, and disruption of the transactivating domain serve to outcompete the endogenous transcription factor for genomic response elements while failing to execute gene regulatory function.

the mutant protein expressed from the dominant transgene construct would outcompete endogenous single-copy Ss-RIOK-1 for substrate binding but fail to exert its kinase function (**Figure 2A**). Larvae transformed with the construct encoding the kinase dead *Ss-*RIOK-1 exhibited profound decrements in motility and a significant blockade in development to the infective third-stage. By contrast, *S. stercoralis* larvae expressing a control construct encoding a fusion of *Ss*-RIOK-1 and GFP exhibited typical progressive motility and developed normally to the iL3 (Yuan et al., 2017). The study of *Ss-riok-1* function included an additional and crucial control experiment in which co-expression of wild-type *Ss*-RIOK-1 with the kinase-dead mutant served to rescue the motility and developmental phenotypes to a degree roughly proportional to concentration of wild-type vector plasmid microinjected into parental free-living female worms (Yuan et al., 2017). This important control experiment served to confirm the specificity of the dominant interfering transgene's effects on *Ss*-RIOK-1 function.

### FUTURE DIRECTIONS IN THE USE OF DOMINANT TRANSGENE CONSTRUCTS TO STUDY GENE FUNCTION IN PARASITIC NEMATODES

#### Controlling Experiments Involving Dominant Interfering Constructs

Both studies employing dominant interfering constructs to evaluate specific gene function in *S. stercoralis* (Castelletto et al., 2009; Yuan et al., 2017) controlled for confounding effects of transgene overexpression by comparing phenotypes in cohorts of parasites overexpressing fusions of wild-type target proteins fused to GFP to those exhibited by parasites expressing mutant ones. This is crucial given the possibility that overexpression alone can, in some instances, produce dominant effects in transgenic organisms (Chandler and Werr, 2003). Another crucial control introduced in the study of *Ss*-RIOK-1 function was rescue of dominant interfering phenotypes by co-overexpression of the wild-type protein. In this case, the dose dependency of rescue was established by assessing frequencies of mutant phenotypes in F1 larvae derived from parental females microinjected with the mutant construct along with increasing concentrations of vector plasmid encoding the wild-type protein fused to GFP. There was a decreasing trend in the frequency of phenotypes with increasing concentration of wild-type vector plasmid, confirming the specificity of the mutant constructs dominant effect (Castelletto et al., 2009; Yuan et al., 2017). Inclusion of both of these essential controls will be crucial to the correct interpretation of data in future experiments with dominant interfering transgene constructs.

#### Deployment of Dominant Transgenes Incorporating Heterologous Activator or Suppressor Domains

To date, dominant interfering effects have been achieved with transgenes encoding the target gene with synthetic loss-of-function mutations in functional domains and wild-type sequence in DNA, protein, or substrate binding sites to facilitate competition for binding partners with the product of the endogenous target gene (**Figure 2**). Another approach to creating dominant interfering constructs targeting transcription factors in parasitic nematodes is to design the constructs to encode a chimeric protein comprising a heterologous transcriptional repressor domain fused to the DNA binding domain of the target factor. This serves to direct a strong repressor to promoters of genes regulated by the target transcription factor. An example of this approach is the use of the *Drosophila* ENGRAILED homeodomain to impart repressor function to the transcription factors, thereby creating loss-of-function phenotypes in both invertebrate and mammalian models (Badiani et al., 1994; John et al., 1995; Tolkunova et al., 1998; Chandler and Werr, 2003). Conversely, transgene constructs comprising viral activator domains such as VP16 from herpes simplex virus can activate target genes when fused to GAL-4 (Sevin-Pujol et al., 2017). The specific gene targeting capabilities of nucleoproteins comprising an inactive form of the Cas9 endonuclease, dCas9, and gRNAs can also be harnessed to tether repressor domains such as KRAB or activators such as VP64 to precise genome loci to achieve transcriptional repression (CRISPRi) or activation (CRISPRa) of target genes (Dominguez et al., 2016).

## CONCLUSIONS

Transgenesis in *Strongyloides* and related parasitic nematodes in phylogenetic Clade IV and in the filaria *Brugia malayi* within Clade III opens possibilities for unprecedented studies of gene function in these pathogens that degrade the health of hundreds of millions of people. These approaches have enabled experiments employing dominant interfering transgenes to reveal functions of genes essential to ribosomal biogenesis and to morphogenesis and developmental arrest of infective larvae in *Strongyloides stercoralis*. Moreover, transgenesis has recently supported proof of principle for targeted mutagenesis *via* CRISPR/Cas9 in *Strongyloides* spp. Recent successes with integrative transgenesis in *Brugia malayi* create the possibility that similar studies of gene function will be possible in the filariae. Recent work with *Brugia* (Liu et al., 2018) and *Strongyloides* (Adams et al., 2019) illustrate the potential of lipofectant-mediated gene transfer, alone or in combination with microinjection, to facilitate applications of transgenesis and CRISPR/Cas9 gene disruption and editing to functional genomic study of a wide range of obligately parasitic nematodes with vector borne and free-living pre-parasitic larvae.

### KEY CONCEPTS

**Gonadal microinjection** refers to a method of gene transfer whereby solutions of plasmid or linear transgene constructs are injected into the germlines of parental male or female worms. Nematode gonads frequently have syncytial regions that comprise germ cell nuclei within a common cytoplasm. Infusion of transgene construct solutions into these gonadal syncytia results in significant numbers of transgenic oocytes or sperm.

**Chemically mediated gene transfer**, in the present context, involves gene transfer across the cuticles or exposed cellular surfaces of nematodes facilitated by incorporating transgene DNA into calcium phosphate co-precipitates or lipofectant micelles. Presenting these DNA preparations during intermolt periods of developing larvae, presumably when nascent cuticles are competent to take them up, may be crucial to this process.

**Regulatable transgene expression** is a strategy that allows transgene expression to be activated by the experimenter. This may be accomplished by using promoters that respond by activation to small molecules such as tetracycline or by coupling transgene encoded proteins of interest to degradation domains that are stabilized in the presence of some small molecule and degraded in the proteasome in its absence.

**CRISPR/Cas9.** Based on a prokaryotic defense response to foreign DNA, CRISPR/Cas9 harnesses sequence-specific targeting of the Cas9 endonuclease to precise gene loci by short "guide RNAs" to induce double-stranded DNA breaks. These may be repaired by non-homologous end joining, leaving random insertions or deletions, or by homology-directed repair, which can serve to incorporate a synthetic DNA repair template.

**Dominant interfering transgenes** encode mutations that ablate functional domains of a recombinant protein of interest, but leave crucial protein- or DNA-binding domains intact. When

#### REFERENCES


overexpressed, products of these dominant interfering transgenes outcompete the endogenous wild-type gene product for binding partners but fail to execute their designated functions, creating a loss-of-function phenotype in the subject.

### AUTHOR CONTRIBUTIONS

JL formulated the concepts in the review, searched and analyzed the articles referenced, wrote all drafts of the text and designed and drafted the figure, which was then rendered by a graphic artist in the author's department and who is acknowledged in the paper.

### FUNDING

The author has received support from grants AI50688, AI105856, AI144572, and OD P40-10939 from the US National Institutes of Health and from a grant from the University of Pennsylvania Research Foundation.

#### ACKNOWLEDGMENTS

The author is grateful to Drs. Hongguang Shao, Xinshe Li, and Tegegn Jaleta for helpful discussions and to Ms. Deborah Argento for assistance in preparing this paper.


63–71. doi: 10.1016/j.ijpara.2009.07.001

07.012

Liu, C., Mhashilkar, A. S., Chabanon, J., Xu, S., Lustigman, S., Adams, J. H., et al. (2018). Development of a toolkit for *piggyBac*-mediated integrative transfection of the human filarial parasite *Brugia malayi*. *PLoS Negl. Trop. Dis.* 12, e0006509. doi: 10.1371/journal.pntd.0006509

Liu, C., Oliveira, A., Chauhan, C., Ghedin, E., and Unnasch, T. R. (2010b). Functional analysis of putative operons in *Brugia malayi. Int. J. Parasitol.* 40,

Higazi, T. B., and Unnasch, T. R. (2004). Intron encoded sequences necessary for trans splicing in transiently transfected *Brugia malayi. Mol. Biochem. Parasitol.*

Higazi, T. B., and Unnasch, T. R. (2013). Biolistic transformation of *Brugia malayi*. *Methods Mol. Biol.* 940, 103–115. doi: 10.1007/978-1-62703-110-3\_9 Higazi, T. B., Merriweather, A., Shu, L., Davis, R., and Unnasch, T. R. (2002). *Brugia malayi*: transient transfection by microinjection and particle bombardment.

John, A., Smith, S. T., and Jaynes, J. B. (1995). Inserting the Ftz homeodomain into engrailed creates a dominant transcriptional repressor that specifically turns off

Junio, A. B., Li, X., Massey, H. C., Jr., Nolan, T. J., Todd Lamitina, S., Sundaram, M. V., et al. (2008). *Strongyloides stercoralis*: cell- and tissue-specific transgene expression and co-transformation with vector constructs incorporating a common multifunctional 3' UTR. *Exp. Parasitol.* 118, 253–265. doi: 10.1016/j.

Komatsu, H., Jin, Y. H., L'Etoile, N., Mori, I., Bargmann, C. I., Akaike, N., et al. (1999). Functional reconstitution of a heteromeric cyclic nucleotide-gated channel of *Caenorhabditis elegans* in cultured cells. *Brain. Res.* 821, 160–168.

Komatsu, H., Mori, I., Rhee, J. S., Akaike, N., and Ohshima, Y. (1996). Mutations in a cyclic nucleotide-gated channel lead to abnormal thermosensation and chemosensation in *C. elegans*. *Neuron* 17, 707–718. doi: 10.1016/S0896-

Lee, R. Y., Hench, J., and Ruvkun, G. (2001). Regulation of C. elegans DAF-16 and its human ortholog FKHRL1 by the daf-2 insulin-like signaling pathway. *Curr.* 

Lei, W. Q., Lok, J. B., Yuan, W., Zhang, Y. Z., Stoltzfus, J. D., Gasser, R. B., et al. (2017). Structural and developmental expression of *Ss-riok-2*, an RIO protein kinase encoding gene of *Strongyloides stercoralis*. *Sci. Rep.* 7, 8693. doi: 10.1038/

Li, X., Massey, H. C., Jr., Nolan, T. J., Schad, G. A., Kraus, K., Sundaram, M., et al. (2006). Successful transgenesis of the parasitic nematode *Strongyloides stercoralis* requires endogenous non-coding control elements. *Int. J. Parasitol.*

Li, X., Shao, H., Junio, A., Nolan, T. J., Massey, H. C., Jr., Pearce, E. J., et al. (2011). Transgenesis in the parasitic nematode *Strongyloides ratti. Mol. Biochem.* 

Lin, K., Dorman, J. B., Rodan, A., and Kenyon, C. (1997). *daf-16*: an HNF-3/forkhead family member that can function to double the life-span of *Caenorhabditis elegans.*. *Science* 278, 1319–1322. doi: 10.1126/science.278.5341.1319 Liu, C., Chauhan, C., and Unnasch, T. R. (2010a). The role of local secondary structure in the function of the trans-splicing motif of *Brugia malayi. Mol. Biochem. Parasitol.* 169, 115–119. doi: 10.1016/j.molbiopara.2009.10.003 Liu, C., Enright, T., Tzertzinis, G., and Unnasch, T. R. (2012). Identification of genes containing ecdysone response elements in the genome of *Brugia malayi*. *Mol. Biochem. Parasitol.* 186, 38–43. doi: 10.1016/j.molbiopara.2012.09.005 Liu, C., Kelen, P. V., Ghedin, E., Lustigman, S., and Unnasch, T. R. (2011). Analysis of transcriptional regulation of tetracycline responsive genes in *Brugia malayi. Mol. Biochem. Parasitol.* 180, 106–111. doi: 10.1016/j.molbiopara.2011.09.004 Liu, C., de Oliveira, A., Higazi, T. B., Ghedin, E., DePasse, J., and Unnasch, T. R. (2007). Sequences necessary for trans-splicing in transiently transfected *Brugia malayi*. *Mol. Biochem. Parasitol.* 156, 62–73. doi: 10.1016/j.molbiopara.2007.

*Parasitol.* 179, 114–119. doi: 10.1016/j.molbiopara.2011.06.002

*Biol.* 11, 1950–1957. doi: 10.1016/S0960-9822(01)00595-4

36, 671–679. doi: 10.1016/j.ijpara.2005.12.007

*Exp. Parasitol.* 100, 95–102. doi: 10.1016/S0014-4894(02)00004-8 Higazi, T. B., Deoliveira, A., Katholi, C. R., Shu, L., Barchue, J., Lisanby, M., et al. (2005). Identification of elements essential for transcription in *Brugia malayi*

promoters. *J. Mol. Biol.* 353, 1–13. doi: 10.1016/j.jmb.2005.08.014 Jackstadt, P., Wilm, T. P., Zahner, H., and Hobom, G. (1999). Transformation of nematodes *via* ballistic DNA transfer. *Mol. Biochem. Parasitol.* 103, 261–266.

Ftz target genes *in vivo*. *Development* 121, 1801–1813.

137, 181–184. doi: 10.1016/j.molbiopara.2004.04.014

doi: 10.1016/S0166-6851(99)00089-4

doi: 10.1016/S0006-8993(99)01111-7

exppara.2007.08.018

6273(00)80202-0

s41598-017-07991-2


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2019 Lok. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Omics Driven Understanding of the Intestines of Parasitic Nematodes

#### *Douglas P. Jasmer1\*, Bruce A. Rosa2, Rahul Tyagi2 and Makedonka Mitreva2,3\**

*1 Department of Veterinary Microbiology and Pathology, Washington State University, Pullman, WA, United States, 2 McDonnell Genome Institute, Washington University, St. Louis, St. Louis, MI, United States 3 Department of Internal Medicine, Washington University School of Medicine, St. Louis, MI, United States*

The biological and molecular complexity of nematodes has impeded research on development of new therapies for treatment and control. We have focused on the versatility of the nematode intestine as a target for new therapies. To that end, it is desirable to establish a broad and deep understanding of the molecular architecture underlying intestinal cell functions at the pan-Nematoda level. Multiomics data were generated to uncover the evolutionary principles underlying both conserved and adaptable features of the nematode intestine. Whole genomes were used to reveal the functional potential of the nematodes, tissue-specific transcriptomes provided a deep assessment of genes that are expressed in the adult nematode intestine, and comparison of selected core species was used to determine a first approximation of the pan-Nematoda intestinal transcriptome. Differentially expressed transcripts were also identified among intestinal regions, with the largest number expressed at significantly higher levels in the anterior region, identifying this region as the most functionally unique compared to middle and posterior regions. Profiling intestinal miRNAs targeting these genes identified the conserved intestinal miRNAs. Proteomics of intestinal cell compartments assigned proteins to several different intestinal cell compartments (intestinal tissue, the integral and peripheral intestinal membranes, and the intestinal lumen). Finally, advanced bioinformatic approaches were used to predict intestinal cell functional categories of seminal importance to parasite survival, which can now be experimentally tested and validated. The data provide the most comprehensive compilation of constitutively and differentially expressed genes, predicted gene regulators, and proteins of the nematode intestine. The information provides knowledge that is essential to understand molecular features of nematode intestinal cells and functions of fundamental importance to the intestine of many, if not all, parasitic nematodes.

Keywords: nematode, intestine, genome, transcriptome, proteome, miRNA, dsRNA

## INTRODUCTION

Infections caused by parasitic nematodes result in substantial mortality and morbidity in human populations, especially in tropical regions of Africa, Asia, and the Americas (Pullan et al., 2014). Infections by parasitic nematodes also compromise the health and productivity of livestock species on a global basis. The impact on food production most significantly affects human health in underdeveloped regions of the world where small holders of livestock depend on food animals for basic nutrition and commerce (Perry and Grace, 2009). Losses in livestock production directly deplete resources needed

#### *Edited by:*

*Baolei Jia, Chung-Ang University, South Korea*

#### *Reviewed by:*

*Eileen Devaney, University of Glasgow, United Kingdom Garry Wong, University of Macau, China*

#### \**Correspondence:*

*Douglas P. Jasmer djasmer@vetmed.wsu.edu Makedonka Mitreva mmitreva@wustl.edu*

#### *Specialty section:*

*This article was submitted to Evolutionary and Genomic Microbiology, a section of the journal Frontiers in Genetics*

*Received: 06 April 2019 Accepted: 19 June 2019 Published: 25 July 2019*

#### *Citation:*

*Jasmer DP, Rosa BA, Tyagi R and Mitreva M (2019) Omics Driven Understanding of the Intestines of Parasitic Nematodes. Front. Genet. 10:652. doi: 10.3389/fgene.2019.00652*

**18**

for basic nutrition, income, and trade. The negative impact of parasitic nematode infections in humans and animals is reduced by anthelmintic treatments; however, the propensity of these pathogens to acquire resistance to anthelmintics (Kaplan, 2004; Leathwick, 2013) threatens the existing methods to prevent and treat diseases they cause. Hence, there is a continuing and compelling need to identify new approaches, therapeutic targets, and applications to prevent and treat infections by nematode pathogens.

This review summarizes the progress made in the last 15 years on an approach that focuses on a single tissue of importance for survival of parasitic nematodes (the intestine) with a goal of developing an integrated research model that will have pan-Nematoda application for deriving essential and broadly conserved intestinal cell functions of these pathogens. One anticipated application of this knowledge is advancement toward the development of pharmaco- and immunotherapeutics for the treatment, prevention, and control of infections caused by nematode pathogens. Earlier studies have validated the value of the intestine of parasitic nematodes for this purpose (next section), which prompts the need to develop resources that can accelerate research on this tissue. The summarized progress relates to the establishment of a multiomics database approach to identify pan-Nematoda conserved intestinal cell functions, coupled with the development of methodologies that can be used to experimentally dissect those conserved functions with new applications to therapeutics as one end goal.

### THE NEMATODE INTESTINE AS A TARGET FOR NEW ANTHELMINTIC THERAPIES

#### General Considerations

Multiple approaches can be considered when aiming to identify targets for pharmacotherapy, including targets that are expressed in all tissues and parasitic stages and conserved across diverse nematode pathogens (**Figure 1**). However, this requirement overlooks the idiosyncrasies of individual parasite tissues, especially since many, although not all, marketed anthelmintics appear to specifically target one system, the neuromuscular system. Similarly, other tissue/organ systems of nematodes, if disrupted, can theoretically offer the same value for pharmacological targets as the neuromuscular system. The nematode intestine is but one example where experimental data support this view, as will be described in this section. One challenge to more forcefully move forward on tissue-specific research is the relatively shallow knowledge on specific functions, and then molecules, that perform or regulate basic functions in tissues of parasitic nematodes. Furthermore, a focus on an individual tissue does not exclude the possibility that findings generated from one tissue will have application to many or all other tissues and/or stages of parasitic nematodes.

The tubular intestine of nematodes is composed of polarized epithelial-like cells arranged as a single cell layer (**Figure 2**). Some parasitic nematodes, such as the filarial nematodes, have a somewhat vestigial intestine, possibly due to relocating elements of nutrient acquisition to the cuticular surface (Howells and Chen, 1981). Nevertheless, more recent evidence has clarified roles that intestinal cells of filariae appear to perform (Morris et al., 2015; Ballesteros et al., 2018). The apical intestinal membrane (AIM) of the nematode intestine is continuous with the external environment *via* the nematode stoma and, as such, represents in parasitic species an internal interface with the host. Numerous essential functions are known, or expected, to exist on the AIM related to nutrient digestion and acquisition, general homeostasis of intestinal cells and the intestinal tract, and protection from toxins. Additionally, regeneration of damaged tissue *via* cell proliferation is not known to occur in nematodes. In the absence of effective regeneration, which

**19**

still requires additional investigation to validate, therapies that cause intestinal cell death and breach of the single cell layer separating the lumen from the pseudocoelom are likely to have lethal outcomes for the parasite. Several lines of research indicate that the intestine of parasitic nematodes has high potential for targeting in immunotherapies and pharmacotherapies that are relevant to treatment, prevention, and control of infections by parasitic nematodes as discussed here.

#### Immunotherapeutic Applications of Nematode Intestinal Antigens

From an immunotherapeutic perspective, digestive enzymes localized to the AIM provide one example of antigens for inducing protective immune responses, and antibodies in particular. The expectation is that when the relevant antibodies are ingested by parasitic nematodes, they can inhibit nutrient digestion leading to parasite expulsion. Progress on this approach has been most impressive in blood-feeding parasitic nematodes, such as *Haemonchus contortus* and hookworms, leading to vaccine preparations with potential value (Knox et al., 2003; Knox, 2011; Hotez et al., 2013; Bassetto et al., 2018). However, application of this approach to parasitic nematodes that feed at other host locations has not been sufficiently investigated. For instance, *Ascaris* spp. live in the lumen of the small intestine, and *Trichuris* spp. burrow into the lining of the colon and cecum. Additionally, it remains to be determined if the immune mechanism(s) involved depends on inhibition of functions within the intestinal lumen of the parasite, which has been an underlying premise of this approach. Intestinal antigens from *H. contortus* also localize to the abomasal mucosa and induce local mucosal responses during infection (Jasmer et al., 1996; Jasmer et al., 2007), the importance of which to protective immunity remains undetermined. Despite lack of knowledge of mechanisms involved, intestinal antigens have proven value for inducing immunity against some parasitic nematodes and have likely application to others.

A more recent nuance of possible immunization with intestinal antigens involves exosomes from *Heligmosomoides polygyrus* that are isolated from excretory–secretory products. The exosomes are thought to derive at least in part from the intestine of *H. polygyrus* released in intestinal excretions from the worm (Buck et al., 2014). Immunization with the exosome preparation induced protective immunity against challenge infections in mice (Coakley et al., 2017). It will be of interest to learn if there is a role of intestinal antigens in inducing this immunity.

#### Pharmacotherapies That Cause Irreparable Damage to the Nematode Intestine

Effective pharmacotherapy that targets the nematode intestine has previously been demonstrated. The first and most thoroughly investigated examples involve benzimidazole anthelmintics. Benzimidazole treatment of host animals infected with *Ascaris suum*, *H. contortus*, or other parasitic nematodes leads to disruption of intestinal cell architecture and frank disintegration of intestinal tissue; the range of observations depends on species and experimental designs (Borgers and De Nollin, 1975; Borgers et al., 1975; Atkinson et al., 1980; Comley, 1980; Zintz and Frank, 1982; Jasmer et al., 2000; Hanser et al., 2003; O'Neill et al., 2015). The anterior region of the *H. contortus* intestine displayed hypersensitivity regarding these effects, which reflected the occurrence of relatively rapid (within 12 h), irreparable intestinal cell damage, inclusive of nuclear DNA fragmentation, from which the parasite is unlikely to recover (Jasmer et al., 2000). Meanwhile, in *Brugia malayi*, flubendazole exposure does significantly less damage to the intestine, and after a brief exposure *in vitro*, the intestinal and hypodermal damage to adult worms resolved over time after transplantation into the host (O'Neill et al., 2016). The observations clearly show that idiosyncrasies of specific tissues and tissue regions are of importance in the context of inducing irreparable tissue damage, which has high value as an end point for effective anthelmintics. Regarding a possible mechanism, current evidence supports that benzimidazoles inhibit microtubulemediated apical vesicle transport of hydrolytic enzymes (digestive enzymes) in intestinal cells, which then become dispersed in the cytoplasm where intracytoplasmic digestion of intestinal cells/ tissue might occur (Shompole et al., 2002). Stating this possibility in another way, benzimidazoles appear to induce a pathologic process that leads to irreparable intestinal cell damage, which goes beyond simple disruption of nutrient acquisition as might be expected for a downstream effect on intestinal cells by neuromuscular toxins that inhibit pharyngeal pumping. It seems clear that elucidating the actual mechanism responsible will present opportunities to identify alternative methods (targets and drugs) to induce this irreparable damage. If the specific hypothesis is correct, then multiple cellular components of the apical secretory process represent potential targets for inducing this lethal effect in nematode intestinal cells. An additional possibility is that other pathologic processes contribute to, or are fully responsible for, the irreparable damage to intestinal cells. Hence, it will be important to develop methods that can distinguish among these possible explanations. The pan-Nematoda multiomics resource described below can greatly facilitate progress on sorting these alternative explanations and identification of potential new anthelmintic targets.

A second example involves use of suramin against filarial nematodes (Howells et al., 1983) in which the anthelmintic effect was attributed to activity against the intestine, producing ultrastructural changes. Notably, the timeframe for efficacy extended to 5 weeks posttreatment, indicating survival for a prolonged period despite apparent intestinal tissue damage.

A third example of pharmacotherapy targeting of the intestine involves the use of *Bacillus thuringiensis* crystal (CRY) protein toxins. The pore-forming CRY toxins bind to the AIM upon ingestion by insects and nematodes and initiate an irreparable pathologic process that leads to death of the insect or nematode, presumably resulting from generation of pores in the AIM (Hu and Aroian, 2012). This effect also appears (Huffman et al., 2004; Hu and Aroian, 2012) to involve only intestinal cells in the live invertebrates. CRY proteins have also demonstrated interesting utility as anthelmintics for parasitic nematode infections when administered to host animals (Cappello et al., 2006). Once again, it is possible that other inducible pathologic processes contribute to the demise of intestinal cells initiated by CRY toxin. For instance, intestinal factors (p38 MAP kinase pathway) that confer protection against CRY toxins have been identified in *C. elegans*, and better understanding of the protective mechanisms could aid understanding of pathologic mechanisms involved.

A fourth example involves trans-cinnamaldehyde-induced destruction of intestinal tissue in *A. suum* larval stages (Williams et al., 2015). It is worth noting that the pathologic mechanism(s) induced is unclear, and the damage is not restricted to intestinal cells; nonetheless, prior studies have contributed to our knowledge on the susceptibility of intestinal cells in parasitic nematodes to diverse anthelmintic modalities.

#### Take-Home Lessons From Past Findings

Major points that can be taken from the foregoing examples include: 1) biological idiosyncrasies (cellular processes specific to the intestine, or accessibility by location on an internal interface with the host) make the intestine of parasitic nematodes a particularly compelling target for immunotherapies and pharmacotherapies; 2) components of the AIM are implicated in roles related to mechanisms of action for at least three of the five (AIM antigens, benzimidazole anthelmintics, and CRY toxins) described immunotherapeutic and pharmacotherapeutic examples, which makes this membrane surface especially interesting; 3) specific processes either involving inhibition of apical secretion or disruption of the AIM have been implicated in pathology induced by benzimidazole anthelmintics and CRY toxin anthelmintics, respectively; and 4) idiosyncrasies of the intestine translate into apparent irreparable tissue damage for at least two of the four (benzimidazoles and CRY toxins) pharmacotherapy examples discussed. The various immunological and pharmacological findings have different conceptual origins (e.g., direct inhibition of digestion and nutrient acquisition, inhibition of apical secretion, or direct disruption of the AIM, respectively), which might lead to an array of new targets to achieve similar outcomes once the mechanisms are better understood for each of them. Hence, there are compelling reasons to improve capabilities for research on intestinal cells in parasitic nematodes, including generation of deeper molecular understanding of nematode intestinal cell functions and developing experimental capabilities to test predictions, progress on both of which are described in the sections below.

#### PAN-NEMATODA AND MULTIOMICS APPROACH TO RESEARCH ON THE INTESTINE

#### General Rationale

Despite most nematodes having a recognizable intestinal tract, the morphology and organization of this tissue are diverse among groups, suggesting functional diversity. At a tissue level, intestinal development ranges from superficial or somewhat vestigial (Eisenback, 1985), to the more common occurrence of a fully developed intestinal tract (Munn and Greenwood, 1984). Nevertheless, intestinal cells reportedly can be organized as syncytia (multinucleate, Munn and Greenwood, 1984), or not, with polyploid nuclei that vary among species from an estimated 4N for adult *A. suum* (Anisimov et al., 1975) to 32N for adult *C. elegans* intestinal cells (Hedgecock and White, 1985). Otherwise, phylogenetic variation of intestinal cells has generally been understudied. More recent progress is summarized in the following sections to clarify basic intestinal cell functions that are conserved at molecular and cellular levels among most nematode species. This progress was facilitated by the selection of core species that span the phylum and represent much of the phylogenetic distance across the Nematoda. The design supported development of a pan-Nematoda multiomics database that was interrogated to investigate broadly conserved biological functions of intestinal cells/tissue for applications toward new pharmacotherapies and immunotherapies.

#### Selection of Research Core Species That Sample the Broad Phylogenetic Diversity of the Nematoda

Core adult parasite species have been selected based on four main criteria: 1) each represented a major lineage that collectively represented much of the phylogenetic diversity of the Nematoda and concurrently incorporated the fewest species needed for this purpose; 2) each species could easily be dissected to provide sufficient intestinal tissue for high quality transcriptomic data generation and analyses; 3) extensive genome sequence was available, or near-term obtainable; and 4) each species presented potential for development, or was already established, as an experimental model for research on the intestine. The core species selected (**Table 1**) included: 1) *H. contortus*, a clade V haematophagous parasitic nematode of small ruminants and longtime model for research on its intestine; 2) *A. suum*, a clade III parasitic nematode of the small intestine of swine and humans, the large size of which provided unique advantages that facilitated this research; and 3) *Trichuris suis*, a clade I parasitic nematode of the cecum and large intestine of swine that supported our criteria and provided a needed connection to the most ancient clade of the Nematoda. Each of these core species also has importance for human or veterinary medicine and provides examples of soil-transmitted helminths in their own context. Finally, each also occupies distinct anatomical locations in the host and obtains nutrients from distinct host compartments (blood, intestinal lumen content/epithelial cells, or nutrients accessible from attachment to the mucosa, respectively). Thus, these three core species represent the extensive phylogenetic and biologic diversity sought with this design.

The multiomics approach delineated the elements of genomes, transcriptomes, proteomes (predicted or/and determined by mass spectrometry), and regulomes (microRNA) that serve intestinal cell functions in a pan-Nematoda context. This effort identified intestinal genes, predicted RNAs, microRNAs, and proteins for which degree of conservation among species was documented to the extent possible. Resulting knowledge bases generated are summarized in the sections below and in **Table 2**.

### ANTECEDENTS TO DEEP PAN-NEMATODA INTESTINAL TRANSCRIPTOMICS DATABASE DEVELOPMENT

#### Genome Resources

Next generation sequencing technologies revolutionized many biomedical research fields including parasitology. As technology was advancing, genomes and predicted proteome sequences for nematode species were accumulating and were subsequently improved over the past two decades. The efforts resulted in generating low coverage genome survey sequences and draft genomes before good quality assemblies and annotations were available. While great progress has been made (Coghlan et al., 2019), most of the available parasitic nematode genomes are draft assemblies with many gaps to be closed and accuracy to be improved (with only a few exceptions, such as *H. contortus* and *O. volvulus*) (Cotton et al., 2016; Laing et al., 2016). Postgenomic applications frequently require comparative genomics on a gene and single nucleotide level, and performing these analyses on draft genomes is inadequate. It is more than ever important to continue efforts on minimizing gene fragmentations, eliminating gene model errors, and resolving collapse of recently duplicated and diverged sequences.


*\*Improved version of published genome (Laing et al., 2013).*

#### TABLE 2 | Available omics resources useful for the study of the helminth intestine.


*\*Inferred using the closest significant C. elegans ortholog.*

#### Early Intestinal Transcriptome Research

Experimental observations listed in the previous section motivated early efforts to expand knowledge of intestinal proteins that are expressed by individual parasitic nematodes (Gao et al., 2016), those that are conserved among nematodes (Wang et al., 2015), and those that differentiate intestinal versus other tissues in nematodes (Rosa et al., 2014). Initial comparisons utilized expressed sequence tag (EST) (Yin et al., 2008) and gene microarray (Wang et al., 2013b) methods, and identified a modest number of predicted intestinal protein families (IntFams) that are orthologs conserved among *H. contortus*, *A. suum*, and *C. elegans*, as well as major differences. One notable difference involved a large family of intestinal cathepsin B-like cysteine proteases expressed by *H. contortus*, whereas a single intestinal family member was detected for *A. suum* and 12 family members have been described for *C. elegans* (Jasmer et al., 2004). This early foundation fostered anticipation that while many intestinal

functions may be conserved among nematodes, there are likely many adaptations that differentiate nematodes species, and possibly phylogenetic lines.

Due to its large size, *A. suum* (male, 15–30 cm; female, 20–35 cm) is a particularly good nematode model to study tissue transcriptomics, since the individual tissues can be cleanly and accurately dissected. In the first ever comprehensive tissuespecific RNAseq-based transcriptome studies for any parasitic nematode, comparative transcriptomics was performed on three nonreproductive tissues (head, pharynx, and intestine) in both male and female worms, as well as four reproductive tissues (testis, seminal vesicle, ovary, and uterus) (Rosa et al., 2014). This study identified thousands of genes associated with the different tissues (or combinations of tissues), including 1,387 genes overexpressed in the intestine samples (male or female). Examination of the 5′ untranslated region (UTR) sequences preceding the coding sequences for the intestine-associated genes identified enrichment for the motif that is bound by the transcription factor ELT-2, the predominant transcription factor controlling intestinal development and function in *C. elegans* (McGhee et al., 2009). Intestine-enriched functions included nine "Molecular Function" child terms of "hydrolase activity" (GO:0016787; including "cysteine-type endopeptidase activity"). Gene coexpression networks linked some genes of the highly intestine overexpressed genes to other tissues, including the head-overexpressed FAR-1 ortholog, a proposed anthelmintic target with a crucial role in parasitism (Bradley et al., 2001; Basavaraju et al., 2003). Overall, this study provided a foundation for cataloguing and profiling intestinal expression in *A. suum*, and a gene expression and annotation database was built using its datasets (Rosa et al., 2014).

#### Pan-Nematoda Database of Intestine Expressed Gene Transcripts and Predicted Proteins

The next step after the *A. suum* comparative tissue expression data was to determine the level of conservation of intestinal cell functions among nematode lineages and species. Three omics resources were central to this investigation. The first was annotated genomes and predicted proteome information for each of the core species under investigation. The second was deep intestinal transcriptomes (generated by RNAseq) for each of the core species representing clades I, III, and V of the Nematoda. These two resources allowed for direct assessment of intestinal genes/ proteins expressed by each of the phylogenetically diverse core species and transcriptomes from other tissues. A third essential omics resource was genome and predicted proteome information from each of the other nematode and outgroup species included in the analysis. As a first step, orthologous protein groups were predicted from combined data for all species (10 nematodes and 5 outgroups) included in the analysis. Then, transcriptomic data from each core species were related to predicted proteome data from each and the orthologous protein groups derived from all 15 species. Intestinal protein families (IntFams) could then be inferred across orthologous protein families identified at any of numerous phylogenetic levels based on the expression from the core species. The relationship of the core species data to inferred intestinal proteomes from other nematode species is shown in **Figure 3**. The study based on intestinal expressed sequence tags identified 241 predicted IntFams from *H. contortus*, *A. suum*, and *C. elegans*  (Yin et al., 2008) (based on 3,121 and 1,755 identified transcripts from *A. suum* and *H. contortus*, respectively), whereas the pan-Nematoda RNAseq study (Rosa et al., 2014) expanded the number of IntFams to 10,772 (based on complete gene sets from nematode species spanning the phylum). Phylogenetic assessment of IntFam gene births and deaths among the 10 species analyzed produced phylogenetic tree associations expected for trees of this species assemblage based on other quantitative criteria (Rosa et al., 2014).

The phylogenetic affinities and conservation (predicted orthologs) of the IntFams determined by this analysis were parsed into 14 categories, ranging from IntFams/functions that are conserved among all the core nematode species (**cIntFams**, 2,853 from intestinal transcriptome data) to the ones conserved in all 10 of the diverse nematode species investigated (**nem-cIntfams**, 1,863). One can rationalize the biological or practical importance for each of these groupings, including the taxonomically restricted ones. The most inclusive group for intestinal genes included orthologs conserved in all 10 nematodes and 5 outgroups (**uni-cIntFams**). Conservation of this kind might be viewed to preclude utility

of proteins in therapeutic approaches. Nevertheless, insertion/ deletion (indel) analysis identified distinctions of potential practical significance between some nematode and mammalian members of these IntFams. Thus, the differences existed below the threshold utilized for parsing orthologous proteins. We can also expect for members of this group that amino acid substitutions may exist that are nemato-centric and have functional significance, such as in the case for many beta tubulins in nematodes (Hoti et al., 2003). Although this particular analysis was not conducted in a comprehensive manner, the database exists to do so. In some cases, IntFam members were detected based on transcripts from each of the core species (cIntFam) but not necessarily from genomes of all nematodes investigated. This level of conservation (clades I, III, and V) based on direct sampling of intestinal transcripts reflects broad conservation across many parasitic nematode species, although maybe not all. Some IntFams lacked representation in all core species (e.g., intestinal transcripts from all core species) but nevertheless had ortholog representation from genomes of all 10 nematode species evaluated. This situation might reflect false negatives (sampling artifact) from intestinal transcriptomes of individual core nematode species, or expression of some intestinal transcripts might be conditional and transient. Consequently, interpretation of the data warrants some caution, and the relationships described should be viewed as a beginning for developing hypotheses rather than a final determination of conservation.

Beyond information that can be gained on the molecular evolution of the intestine in adult parasitic nematodes, we envision multiple uses for this pan-Nematoda database of IntFams. From a basic omics perspective, the information renders less mysterious the specific intestinal proteins, protein isoforms, amino acid sequences, predicted functions, and diversity of those proteins that carry out essential functions of the intestine among nematode species. By organizing the data as predicted orthologs with specific proteins identified from each species investigated, researchers have a template to explore questions of individual interest and with some explicit gene and protein sequence information that can support experimental design efforts. As one example, protein isoforms from multimember families that are differentially expressed in the intestine or other tissues can be prospectively identified, by integrating the IntFam database and the comparative-tissue transcriptome database for *A. suum*, thus resolving substantial complexity ensconced in whole genome and predicted proteome data. Such resolution is critical when the biology of individual tissues is under investigation.

### INCREASING RESOLUTION BY STUDYING INTESTINAL REGIONS AT A TRANSCRIPTOME AND PROTEOME LEVEL

#### Differential Expression of Genes Along the Length of the Intestine and Their Posttranscriptional Regulation

Another dimension of functional dissection involves differential expression of intestinal genes along the length of the intestine. The anterior region of the *H. contortus* intestine is hypersensitive to benzimidazole treatments, by comparison to the posterior region (Jasmer et al., 2000). Consequently, biological differences among intestinal regions exist that appear to have practical applications. To study this in more depth, *A. suum* intestinal regions have been studied using RNAseq approach, resulting in resolving expression differences in a single, comprehensive effort, the results of which will enhance the use of this species in intestinal cell research and likely will have application to other nematode species.

Deep sequencing of transcripts obtained from contiguous anterior, middle, and posterior regions of the intestine from male and female *A. suum* worms expanded the set of previously known intestinal genes (Gao et al., 2016). No major differences were identified based on gender, with >80% intestinal genes expressed in both male and female. Genes expressed among the three regions were similar, with only 803 genes being differentially expressed. However, most of these (696/803) had higher expression levels in the anterior region as compared to the middle and posterior regions, supporting that the anterior intestinal region has certain functional distinctiveness as compared to the rest of the intestine (**Figure 4**). This difference was also indicated by assembling the genes based on different expression profiles along the intestinal axis. This less stringent analysis identified genes

the anterior, middle, and posterior intestine regions among all intestineoverexpressed genes and genes differentially expressed between the regions. \*\*\**P* < 10−5.

expressed at higher levels in the anterior region including those encoding certain hydrolases and those with functions related to signal transduction and membrane dynamics. Given that a vast majority of genes were expressed throughout the intestine, although most of these are expressed at relatively low levels, one way of delineating major intestinal functions is to analyze genes with relatively high overall intestinal expression. Based on a comparison with the expression levels of all genes, transcripts for 795 were considered to be highly expressed in the intestine. As expected, some of the functions encoded by these highly abundant intestinal transcripts were related to nutrient uptake and energy metabolism. Other enriched functions included protein synthesis and protein and lipid binding.

Samples from the same regions of the *A. suum* intestine were profiled for miRNAs. This provided another layer of possible differentiation among intestinal regions, i.e., regulators of intestinal gene expression. miRNAs were known to be expressed in gametogenesis and early developmental stages of *A. suum* (Wang et al., 2011). On the other hand, intestinal miRNAs have previously been identified in *C. elegans*, and these were shown to cause effective downregulation of mRNA expression (Kato et al., 2016). We leveraged the availability of samples from different intestinal segments to further expand the knowledge base of nematode miRNAs, their interactions with the transcriptome, and differential expression along the intestinal tract (Gao et al., 2016). As a result of this work, 277 miRNAs were identified in the *A. suum* intestine, and 11 of those were differentially expressed among intestinal regions. To integrate the information from mRNA and miRNA expression sets, potential targets of these miRNAs were predicted for the ~6,000 genes of the *A. suum* genome that had a 3'UTR sequence available and could be used for miRNA–target prediction. Out of these, 2,063 were predicted to be targeted by at least one intestinal miRNA. As is usually the case with predicted miRNA targets, the predicted miRNA-target network was dense with many-to-many targeting interactions commonly encountered (i.e., multiple targets associated with a given miRNA and vice versa). These predictions need validations, as does the existence of such a highly active regulatory network.

The large number of samples included in this study provided abundance data for both mRNAs and miRNAs that supported a novel statistical approach for predicting miRNA–target associations. Correlations between sample abundances of miRNAs and corresponding predicted mRNA targets identified 503 pairs categorized as most likely to be associated with each other (LAMPs: likely associated miRNA-mRNA pairs). In a similar vein, mean correlations of some miRNAs with corresponding target mRNAs were markedly higher than mean correlations of other miRNAs. These miRNAs were classified as the most likely influential miRNAs (LIMs). Interestingly, encoded proteins of many of the set of predicted targets for the 22 LIMs showed significant functional enrichment [GO (The Gene Ontology, 2019), InterPro (Mitchell et al., 2019), etc.], potentially indicating real biological functions under miRNA regulation. An analysis of miRNA mature and seed sequences identified some miRNAs and miRNA families whose intestinal expression, and hence, potential intestinal function is conserved between *A. suum* and *C. elegans*

or *Heligmosomoides bakeri* (Gao et al., 2016). The intestinal miRNA databases offer guidance to determine functions of these regulatory molecules in intestinal gene expression and functions of proteins encoded by target mRNAs.

Resolution of gene expression by transcriptomics along longitudinal regions of the intestine as done with *A. suum* is far more challenging for many other nematodes, parasitic or not. However, integrating information on differentially expressed genes and proteins across *A. suum* intestinal regions with the pan-Nematoda intestinal omics databases can lead to useful predictions on intestinal expression patterns of orthologous genes and proteins from other nematode species.

#### Intestinal Functions Detected at a Protein Level

Traditional research on intestinal antigens targeted in vaccine research has identified a modest number of AIM proteins, including many proteases (Andrews et al., 1995; Jones and Hotez, 2002; Smith et al., 2003a; Smith et al., 2003b; Williamson et al., 2003a; Williamson et al., 2003b; Loukas et al., 2004; Williamson et al., 2004; Jasmer et al., 2007; Knox, 2011). Factors that hindered more comprehensive identification of AIM functions include the relative narrow focus of those investigations, available methods, and limitations posed by the nematodes investigated. In contrast, more recent studies took advantage of the large size of *A. suum*, which when coupled with extensive databases of intestinal genes/predicted proteins, supported a proteomic approach that greatly increased knowledge of proteins and predicted functions on the AIM and other cellular compartments of the intestine. Cannulation of the adult *A. suum* intestine (**Figure 5**), below the pharynx, with ordinary blunt-ended hypodermic needles (Rosa et al., 2015) facilitated the perfusion of the intestine with, in this case, phosphate-buffered saline (PBS) to directly collect lumen content and directly identify proteins in this perfusate by mass spectrometry (**Figure 6**). Intestinal antigen research in *H. contortus* (Jasmer et al., 2007) indicated that many of the intestinal proteases are associated with the AIM (Williamson et al., 2003b), but because those proteins had predicted signal peptides and most often lacked evidence of transmembrane regions, it was suggested that they function as peripheral membrane proteins. Thus, *A. suum* intestinal lumen was also perfused with 4 M urea (4 MU) to solubilize peripheral membrane proteins. Additionally, the glycans on many of the *H. contortus* AIM proteases have figured centrally in methods used to isolate, identify, and functionally characterize these proteins (Jasmer et al., 1993; Smith et al., 1993; Smith et al., 1994; Rosa et al., 2015). Similarly, the lectin concanavalin A predominately bound to the AIM in *A. suum* intestinal tissue and was used to isolate and characterize apparent AIM glycoproteins by lectin affinity chromatography. These analyses of intestinal perfusates, lectin binding fractions, additional intestinal membrane fractions, and whole intestinal lysates by mass spectrometry have identify over 1,000 intestinal proteins that were assigned to cellular/tissue compartments based on methods of generating each fraction, and predicted physical properties including, charge, signal peptides, and transmembrane regions.

up in the system, and contained inside of plastic test tubes.

This approach identified numerous proteins located in the *A. suum* lumen or on the AIM along with predicted functions of those proteins (Rosa et al., 2015). Experimental evidence was also gained on a range of proteases and related activities that were detected in perfusates of the intestinal lumen (Jasmer et al., 2015). About 157 distinct proteins were obtained in the PBS and 4 MU perfusates. Two major groups of proteins attracted the greatest interest in these fractions, digestive hydrolases and channel/transport proteins, which are expected to contribute to ion and nutrient transport. The hydrolases included proteases (28 in 5 classes of proteases) with metallopeptidases being best represented, including 16 different metallopeptidases. Aspartic proteases and serine carboxypeptidases were the next best

represented with five and four proteins, respectively, followed by cysteine and threonine proteases representing the remainder. The relative abundance and diversity of apparent proteases identified in these fractions answered questions regarding the comparative need for protein digestion in *A. suum* which lives in the host small intestine where host protein digestion is nearing completion. Apparently, *A. suum* has substantial need for digestion of proteins consumed in this host location. O-Glycosyl hydrolases comprised a second group of hydrolases, which likely contribute to saccharidase activity previously investigated in *A. suum*  (Gentner and Castro, 1974), and some of these hydrolases were identified in previous investigations on *A. suum* larval stages and intestinal tissue (Wang et al., 2013a). A third group of hydrolases

FIGURE 6 | Proteomics-based inference of *A. suum* intestinal proteins in different compartments. (A) Anatomy of model intestinal nematode species *Ascaris suum*  (transverse section). (B) Protein sets detected by MS/MS proteomics from samples harvested from adult *A. suum* worms (left) have other protein sets strategically removed (center) to deduce final protein sets in different intestinal compartments (right). "Integral intestinal membrane" proteins are not labelled as "basal" because they may include some proteins from the apical intestinal membrane as well. \*Proteins annotated with "cellular compartment" Gene Ontology terms for endoplasmic reticulum, mitochondria, Golgi apparatus, and nucleus were removed to reduce contamination from proteins embedded in these organelles rather than the external cellular membrane. \*\*Only proteins annotated with predicted classical or nonclassical secretion signals were included since these are better candidates for proteins that are transported to the membrane.

included several lipases, thus rounding out a basic set of digestive enzymes in the lumen of *A. suum* intestine. The study detected many other predicted functions that fall outside these groups. These collective data can help formulate ideas to target digestive processes in anthelmintic strategies against *A. suum* and possibly other parasitic nematodes.

The detected channel/transport proteins that mediate transport of ions and nutrients across intestinal membranes are of interest because interference with these functions has potential for anthelmintic approaches, and new information about them should facilitate development of these approaches. Information in this context was largely derived from the 4 MU perfusates and the membrane preparation, which included many proteins detected in the 4 MU perfusate as a subset. Over 100 proteins were classified as channel/transport proteins, with predicted transmembrane (integral membrane proteins) or other annotation supporting this assignment, and detection in membrane enriched fractions added further support. However, many of the channel/transport proteins were detected only in the membrane fraction, which does not resolve apical, basolateral, or intracellular membrane localization in intact intestine, although an attempt was made to exclude proteins associated with cytoplasmic organelles. Distinguishing membrane location is important particularly if antibody-mediated inhibition is a goal in vaccine research, as this would likely require localization to the AIM. Accessibility by localization to the AIM may also be an important factor for pharmacotherapy, which becomes much more feasible with knowledge of potential targets that are accessible on the AIM. An example here is the apical localization of the receptor for CRY protein toxins, the absence of which on the AIM is likely to render CRY proteins less effective (Griffitts et al., 2003; Williamson et al., 2003b). Nevertheless, for a large number of predicted integral membrane proteins identified, channel/transport proteins and others, the actual membrane where they reside remains to be determined. Alternatively, some of these membrane proteins were also, or exclusively, detected in the 4 MU fraction, which provided evidence for an AIM localization, although not to the exclusion of the basolateral intestinal membrane (BIM), also. One example highlighted by this research involves subunits of the vacuolar ATPase V1 domain. Seven of the eight subunits were detected in the 4 MU fractions, which likely reflects solubilization by 4 MU (Rosa et al., 2015). The results provide evidence that similar to *C. elegans* (Allman et al., 2009), V-ATPases occupy the AIM and contribute to H+ transport across this surface in *A. suum*. This kind of evidence was obtained for 57 other channel/transport proteins, nevertheless leaving the majority of the predicted integral membrane proteins with little direct evidence for the membrane (s) on which they function.

Examining the sequence conservation of the orthologous proteins detected in different cellular compartments across species spanning the phylum Nematoda revealed significant variability. Most variable were the proteins detected in the intestinal lumen (IL), and integral intestinal membrane proteins were the most conserved among nematodes. Concurrently, proteins homologous to most of the *A. suum* IL proteins were detected in other nematode species, and because hydrolytic enzymes constitute a substantial subset of IL proteins, the result may reflect diversification from common ancestral digestive enzymes among species. Although neutral evolution could account for diversification in this background, directional evolution might be expected if adaptations are required of digestive enzymes for a given species to best exploit nutrients presented by different host niches. Hemoglobinases from hookworms may exemplify this idea in that human hemoglobin and serum proteins were digested more efficiently than orthologous proteins from dogs by the hemoglobinase Na-APR-2 from the hookworm of humans, *Necator americanus* (Williamson et al., 2003a). Despite the lower level conservation for individual protease proteins, major classes of IL proteases (aspartic, cysteine, metallo-, and serine) appear to be conserved among many nematode species and appear to retain broad functional characteristics related to inhibitors of the protease classes, which may have pharmacotherapeutic applications. The omics data utilized to identify probable IL proteins from other nematodes may have application toward identification of IL proteins from other nematode pathogens. It is important to note that many of the predicted proteases that colocalized to the PBS and 4 MU fractions were classified as IL proteins, even though they may function as peripheral membrane proteins that then are released into the lumen.

The proteomics analysis summarized here significantly clarifies functions that might be sited in the various intestinal cell and tissue compartments, and available evidence can be consulted for prospective development of research hypotheses stimulated by information in this resource. As one example, much of the past vaccine research has focused on proteases and their inhibition by host neutralizing antibodies ingested by the parasite. Alternatively, channel/transport proteins sited on the AIM have at least as much attraction as hydrolases for antibody-mediated inhibition of nutrient acquisition and other homeostatic functions (such as ion transport). The numerous specific examples of prospective AIM channel/transport proteins can now be prioritized for investigation in this direction. Additional membrane proteins were identified, some of which likely function on the AIM and can be investigated individually. However, and despite the progress reported, with only about 1,000 intestinal proteins identified by mass spectrometry, the advances are modest by comparison to what can be achieved with *A. suum* in a more concerted proteomic analyses of nematode intestinal cell compartments. Not many nematode species can support the methods and tissue demands needed to gain the insight offered on this topic by the *A. suum* intestinal system. Given that the tools exist to place resulting information into a pan-Nematoda context, this area should be exploited to the full extent that the *A. suum* system can support.

We provide here an example of one of the many ways the generated multiomics database can be interrogated. Published reports provide supplementary tables in which the data have been integrated into a database that includes (for every documented *A. suum* gene) functional annotation, pan-Nematoda phylogeny, RNAseq expression and differential expression, and proteomic data spanning all of the available datasets (**Table 2**) from the publications discussed in this review. This resource allows for straightforward and convenient cross-study dataset comparisons and the prioritization of genes of interest to suit the goals of new studies.

In addition to our ongoing *A. suum* intestinal omics research, other nematode intestinal omics datasets are available in the literature including a transcriptomics study of the *Ancylostoma ceylanicum* intestine, a proteomic study comparing the intestine, body wall, and reproductive tissues of *B. malayi*, and a proteomic study of larval-stage *A. suum* excretory–secretory (ES) products that provides more information about potential intestine-produced products during the larval stage (Wang et al., 2013a). As we have performed for our projects, future research will benefit from intersecting the results from these valuable intestinal omics resources, to provide more information about genes of interest and better insights into conserved and specific intestinal functions.

### ADVANCES ON METHODS TO EXPERIMENTALLY MANIPULATE *A. SUUM* INTESTINAL CELLS

#### The Intestinal Cannulation and Perfusion System

As with many aspects of helminth research, the small size of many nematodes, limited access to live parasite tissues of interest, and poor *in vitro* survival of the life cycle stages of interest all present challenges for investigations on intestinal cells. Again, *A. suum* presents distinct advantages relative to obtaining abundant biological materials for biochemical analyses. Its large size has facilitated improvements to experimentally investigate intestinal cell functions, particularly as relates to cannulation and perfusion of the intestine (**Figure 5**). This technique supports delivery of controlled treatments (amount, timing) into the lumen for experimental manipulation of the AIM and other intestinal cell functions. Delivery *via* perfusion leads to the immediate achievement of treatment concentrations in the lumen. In contrast, delivery by feeding is variable and difficult to quantify or is unachievable for many parasitic species. Maintenance of the cannulated intestine *in situ* where it is bathed by pseudocoelomic fluid leads to expectations for faithful replication of *in vivo* functions in this setup, particularly over short-term experiments. Cannulated *A. suum* can be maintained in culture media for at least 4 days after which time worms are still motile, although stable maintenance of intestinal cell functions during this time period requires verification. Experimental treatments can also be injected into the pseudocoelom for delivery across the BIM. The advantage of BIM delivery is that pseudocoelomic injections can be done with intact worms, which can then be maintained for several days in *in vitro* culture. Pseudocoelomic delivery of bacteria was shown to induce expression of intestinal transcripts encoding *A. suum* antibacterial factors (ASABF) (Pillai et al., 2003) as one example. However, in many cases, transport across the AIM is not expected to be replicated by the BIM, and transporters may have different polarities if found in both the AIM and BIM. Because BIM delivery of experimental treatments to intestinal cells is not an obvious alternative to dissect AIM functions,

simple methods like cannulation and perfusion of the intestinal lumen represent valuable capabilities to investigate functions peculiar to the AIM. It is also likely that the *A. suum* system will support experimental dissection of functions on the BIM, about which even less is known. Nevertheless, double-stranded RNA (dsRNA) treatments delivered by both routes causes knockdown of target intestinal cell transcripts, as discussed below. Hence, the two different routes of delivery may have complementary characteristics for at least some treatments, as described in the next paragraph.

#### Manipulation of Intestinal Cell Transcripts by Perfusion of dsRNA

The ability of the intestinal perfusion method to support experimental manipulation of intestinal cell functions has been tested using dsRNA as the treatment for several *A. suum* intestinal genes (Rosa et al., 2017). The genes used in this study were selected based on a number of criteria, including evidence of results with dsRNA from orthologous genes in heterologous species. A series of experiments was conducted in which dsRNA constructs were tested for each of five genes. Based on the results, dsRNA was selected for one of these genes (GS\_08011) to assess treatment variables, including dsRNA construct (intact long or dsRNA fragmented by ribonuclease III), amount, and comparative effectiveness of delivery across the AIM or BIM. These studies showed that while knockdown of transcripts was impressive for some genes (up to 99.7% expression reduction), there was also some variability among genes, which is not unusual for *A. suum* or other parasitic nematodes (Selkirk et al., 2012). Perfused quantities down to 0.5 µg per worm caused effective knockdown of transcripts, and knockdown was achieved to approximately the same level in anterior and posterior halves of the intestine. Likewise, knockdown of intestinal transcripts was achieved to approximately the same level for dsRNA perfused into the intestinal lumen or injected into the pseudocoelom (the effect of which was previously investigated for pseudocoelomic delivery of dsRNA) (McCoy et al., 2015). These effects were achieved within a 24-h treatment period.

#### Assessment of Off-Target dsRNA Effects

It is important to evaluate off-target dsRNA effects by RNAseq and challenges they might present for interpretation of experimental results even when effectiveness of dsRNA delivery and knockdown of intestinal cell transcripts is established. The *A. suum* dsRNA perfusion study (Rosa et al., 2017) quantified global transcript effects for treatments of dsRNA constructs from two different genes, then control decapitated worms at time 0 or cannulated for 24 h, and anterior and posterior regions of worms from each treatment group. By comparison to other species/model systems, only modest effects were observed for up- or downregulation of off-target genes related to dsRNA treatments. For instance, transcripts for only eight and four intestinal genes showed significant up- or downregulation, respectively, by both dsRNA gene constructs in both anterior and posterior intestinal regions (Rosa et al., 2017). Additional, but still modest, effects were observed related to individual dsRNA gene treatments and then anterior and posterior intestinal regions. Overall, off-target effects will require attention in relation to functional studies, perhaps in the form of using multiple distinct dsRNA constructs to rule out construct-specific effects and incorporating results from complementary approaches for functional analyses.

#### Summary of Progress on the *A. suum* Intestinal Research Model

Attempts to manipulate *A. suum* intestinal cells using the perfusion system have been successful. Overall, the published results indicate that the system provides a relatively facile method to manipulate intestinal cell functions, in this case knockdown of target transcripts by dsRNA. Although concomitant knockdown of protein will be an important follow-up to assess the utility of the dsRNA approach to determine protein functions, the research demonstrated the ability of the system to support exogenous manipulation of intestinal cells. The intestinal perfusion system should enable many questions to be investigated regarding functions that reside in the lumen, on the AIM, and in intestinal cells, thus adding motivation to gain a more complete picture of compartments occupied by proteins expressed by intestinal cells. In context of omics resources for the *A. suum* intestinal model coupled with pan-Nematoda IntFam resources, *A. suum* offers valuable attributes to a cooperative multispecies model aimed to derive essential functions of intestinal cells common to many parasitic nematodes. It remains to be determined how the other two core species (*H. contortus* and *T. suis*) will add to the cooperative model. Concomitantly, one can now predict intestinal genes in other parasitic nematodes, and *Strongyloides* spp. provide parasite examples where transgenic methods can be used to test functional predictions *in vivo* (Shao et al., 2017). Pan-Nematoda IntFams also provide a connection that can guide the use of *C. elegans* to investigate intestinal functions with broad application to nematode pathogens.

### THE FUTURE OF PARASITIC NEMATODE INTESTINE RESEARCH

#### Further Development of the *A. suum* Experimental System

The *A. suum* experimental system is well recognized as an important model for parasite research. Creative approaches by many investigators have capitalized on the advantages that this large parasite offers; yet, the potential of the model is far from fully realized. Development of the model for research on intestinal cell biology is at an early stage, with progress driving desire for much more detail, and extensive intestinal omics data will facilitate derivation of many hypotheses. For instance, while longitudinal queries by RNAseq delineate rather gross assessment of differentiation along the intestinal tract, functional nuances among individual intestinal cells interrogated by single cell sequencing (Hwang et al., 2018) remain an intriguing goal to achieve. Major interests of ours include AIM functions and intracellular processes that mediate secretion and renewal of functions that are sited on the AIM. Additionally, more recent findings have implicated exosomal vesicles containing miRNAs derived from the *H. polygyrus* intestine (presumably secreted from the AIM) in modifying host immune responses (Buck et al., 2014; Coakley et al., 2017). Improving the knowledge base and experimental methods for the nematode intestine may contribute to research on the worm side of the interaction. In these contexts, pan-Nematoda omics data coupled with the cannulation/perfusion system offer substantial capabilities to investigate hypotheses related to AIM biology, and efforts are needed to identify the best ways to maintain and manipulate intestinal cells outside of the parasite. Although proteomics data provide valuable information on AIM constituents, this resource can and should be expanded to more comprehensively document functions on the AIM. The adult intestinal cell system will support many experiments; however, the effort and cost to obtain adult *A. suum* are not without drawbacks. Larval stages of *A. suum* (lung stage larvae or early intestinal stages) can provide a less expensive and more ready source of parasite material and are likely to support investigation of some hypotheses on intestinal cells while extending application to additional life cycle stages. Larval stages have generated some useful findings relevant to intestinal cell biology (Wang et al., 2013a). Thus, effort is needed to establish capacity to utilize larval stages for experimental purposes directed at intestinal cell functions. The list can go on, but the areas mentioned are the ones that have priority for us.

### Drug Targets and Therapies

Omics data for intestinal cells and compartments, such as the AIM, provide enormous potential when integrated with drug and target databases to identify prospective parasite targets for existing inhibitors. Areas that are ripe for investigation included enzymes and membrane transporters localized to the AIM. Alternatively, an approach we are following involves predictions of essential cellular pathways and components, which preferentially are pan-Nematoda conserved. One example here involves pathways that transport secretory vesicles containing digestive enzymes and other cargo to the AIM. The goal is to more broadly clarify parasite components that are involved in the exocytic process and identify inhibitors that might replicate irreparable intestinal damage caused by benzimidazole anthelmintics. Whether or not new anthelmintics result, experimentation of this kind is likely to produce new information on basic intestinal cell biology that has potential for pan-Nematoda applications. Further, although the research is intestino-centric, results may have applications to other tissues and cells. Using secretion as an example, secretions in intestinal cells and neurons likely have similar and perhaps identical cellular components, although the consequences of inhibition may be idiosyncratic to a given tissue (tissue degeneration, or disruption of neurotransmission, respectively). There is evidence in *C. elegans* that some proteins that mediate/regulate exocytosis (e.g., syntaxin) have similar functions in each tissue type. Thus, learning more about intestinal cell secretion and inhibitors may have application to other parasite tissues, including the nervous system.

#### Vaccine Targets and Immune Control

Development of subunit vaccines against parasitic helminths (with some exception of tapeworms) has proven an elusive goal. Although not without caveats, given the remarkable levels of protection using intestinal antigens against blood-feeding nematodes, progress summarized has identified multiple categories of proteins with potential application to immune control. In addition to likely digestive enzymes, numerous predicted integral membrane and transport proteins were identified that are likely to be localized to the AIM, and host antibody consumed by the parasite would have access to them. Many of these proteins have predicted orthologs in other nematode species, which creates opportunities for research in those species. This dataset, and ideally in the future a larger dataset, identifies numerous functions that can be investigated with antibody-mediated inhibition of those functions. Recognizing that the best immunization results have been achieved thus far in context of blood-feeding parasites, the approach may also have application to somatically migrating stages of parasites such as *A. suum*, which will be in contact with antibodies found in serum. Transcripts for many adult intestinal genes have already been identified, inclusive of those encoding

#### REFERENCES


AIM-localized proteins. These genes are expressed in migratory larvae as well, although it remains unclear how extensively these larvae rely on feeding during their migration to the intestine. In any case, the omics data presented stimulate many lines of thought on possible applications of this information to future vaccine development related studies.

#### AUTHOR CONTRIBUTIONS

DJP, RT, and MM all contributed to the writing and all approved the content of this review article.

#### FUNDING

The research outlined in this study was supported by the National Institute of General Medical Sciences Grant R01GM097435 to MM.

generation of pathology? *Trends Parasitol.* 17 (10), 471–475. doi: 10.1016/ S1471-4922(01)02036-0


novel members of ASABF in the nematode Ascaris suum. *Biochem. J.* 371 (Pt 3), 663‒668. doi: 10.1042/bj20021948


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2019 Jasmer, Rosa, Tyagi and Mitreva. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Revisiting the Phylogenetic History of Helminths Through Genomics, the Case of the New *Echinococcus oligarthrus* Genome

*Lucas L. Maldonado1\*, Juan Pablo Arrabal2, Mara Cecilia Rosenzvit1, Guilherme Corrêa De Oliveira3 and Laura Kamenetzky1\**

*1 IMPaM, CONICET, Facultad de Medicina, Universidad de Buenos Aires, Buenos Aires, Argentina, 2 INMet, Instituto Nacional de Medicina Tropical, Puerto Iguazú, Argentina, 3 Instituto Tecnológico Vale, Belém, Brazil*

#### *Edited by:*

*Gabriel Rinaldi, Wellcome Trust Sanger Institute (WT), United Kingdom*

#### *Reviewed by:*

*Luis Carlos Guimarães, Federal University of Pará, Brazil Isheng Jason Tsai, Biodiversity Research Center, Academia Sinica, Taiwan*

#### *\*Correspondence:*

*Laura Kamenetzky lkamenetzky@fmed.uba.ar Lucas L. Maldonado lucas.l.maldonado@gmail.com*

#### *Specialty section:*

*This article was submitted to Evolutionary and Genomic Microbiology, a section of the journal Frontiers in Genetics*

*Received: 29 March 2019 Accepted: 04 July 2019 Published: 07 August 2019*

#### *Citation:*

*Maldonado LL, Arrabal JP, Rosenzvit MC, Oliveira GCD and Kamenetzky L (2019) Revisiting the Phylogenetic History of Helminths Through Genomics, the Case of the New Echinococcus oligarthrus Genome Front. Genet. 10:708. doi: 10.3389/fgene.2019.00708*

The first parasitic helminth genome sequence was published in 2007; since then, only ~200 genomes have become available, most of them being draft assemblies. Nevertheless, despite the medical and economical global impact of helminthic infections, parasite genomes in public databases are underrepresented. Recently, through an integrative approach involving morphological, genetic, and ecological aspects, we have demonstrated that the complete life cycle of *Echinococcus oligarthrus* (Cestoda: Taeniidae) is present in South America. The neotropical *E. oligarthrus* parasite is capable of developing in any felid species and producing human infections. Neotropical echinococcosis is poorly understood yet and requires a complex medical examination to provide the appropriate intervention. Only a few cases of echinococcosis have been unequivocally identified and reported as a consequence of *E. oligarthrus* infections. Regarding phylogenetics, the analyses of mitogenomes and nuclear datasets have resulted in discordant topologies, and there is no unequivocal taxonomic classification of *Echinococcus* species so far. In this work, we sequenced and assembled the genome of *E. oligarthrus* that was isolated from agoutis (*Dasyprocta azarae*) naturally infected and performed the first comparative genomic study of a neotropical *Echinococcus* species. The *E. oligarthrus* genome assembly consisted of 86.22 Mb which showed ~90% identity and 76.3% coverage with *Echinococcus multilocularis* and contained the 85.0% of the total expected genes*.* Genetic variants analysis of whole genome revealed a higher rate of intraspecific genetic variability (23,301 SNPs; 0.22 SNPs/kb) rather than for the genomes of *E. multilocularis* and *Echinococcus canadensis* G7 but lower with respect to *Echinococcus granulosus* G1. Comparative genomics against *E. multilocularis*, *E. granulosus* G1, and *E. canadensis* G7 revealed 38,762, 125,147, and 170,049 homozygous polymorphic sites, respectively, indicating a higher genetic distance between *E. oligarthrus* and *E. granulosus sensu lato* species. The SNP distribution in chromosomes revealed a higher SNP density in the longest chromosomes. Phylogenetic analysis using whole-genome SNPs demonstrated that *E. oligarthrus* is one of the basal species of the genus *Echinococcus* and is phylogenetically closer to *E. multilocularis*. This work sheds light on the *Echinococcus* phylogeny and settles the basis to study sylvatic *Echinococcus* species and their developmental evolutionary features.

Keywords: *Echinococcus oligarthrus*, genome, phylogeny, single nucleotide polymorphism, chromosomes, parasites

### HIGHLIGHTS


### INTRODUCTION

Helminth parasites are a highly diverse group that involves many parasites of biomedical, veterinary, and economic importance including roundworms (nematodes) and flatworms (Platyhelminthes: trematodes and cestodes). Taxonomic assignment of the parasitic helminth taxa is a particularly arduous task. Indeed, parasites are typically difficult to culture and analyze independently of their hosts and the parasite body fossil samples are scarce due to their small size, lack of hard parts, and their lifestyle within the host. In addition, molecular analyses do not often include all the species or include only partial sequences (De Baets et al., 2015). Among Platyhelminthes, the classification and nomenclature within the genus *Echinococcus* is a controversial topic (McManus, 2013; Lymbery et al., 2015). The strobilar stage of these parasites occurs in the small intestine of a definitive carnivore host; the metacestodes develop in the organs of an herbivorous intermediate host that is the prey of the final hosts. In the last decades, several subspecies of *Echinococcus granulosus* were proposed based mainly on the intermediate host specificity (Verster, 1969). However, most of the early proposed subspecific taxa were relegated to synonymy under the name *E. granulosus* due to their sympatric distributions and because they are indistinguishable at the morphological level (Rausch et al., 1967). Consequently, based on the production of a distinctive form of echinococcosis in humans, only four species had been retained: *E. granulosus sensu lato*, which causes cystic echinococcosis; *Echinococcus oligarthrus*, which causes unicystic echinococcosis; *Echinococcus vogeli*, which causes polycystic echinococcosis; and *Echinococcus multilocularis*, which causes alveolar echinococcosis. The pathogenicity degree of the echinococcosis infections depends on the characteristics of the metacestode development, which is different in each one of the four species mentioned before (Lymbery, 2017). Nowadays, genetic and genomic mitochondrial analysis allows the revision of the phylogeny of the *Echinococcus* genus determining the species rank for nine taxa: *E. granulosus*  *sensu stricto*, *Echinococcus canadensis*, *Echinococcus ortleppi*, *Echinococcus equinus*, *Echinococcus felidis*, *E. oligarthrus*, *E. vogeli*, *E. multilocularis*, and *Echinococcus shiquicus* (Xiao et al., 2006; Nakao et al., 2007; Hüttner et al., 2008; Nakao et al., 2013a; Kinkar et al., 2017). Nuclear DNA has also been used to reconstruct the phylogeny of taeniid parasites, which differs from the phylogeny obtained with mitochondrial data (Saarma et al., 2009; Knapp et al., 2011). Even though there are certain common features in the taxonomy regardless of the origin of the molecular data. *E. felidis* and *E. granulosus s. s.* are sister species, and *E. ortleppi* is closely related to the different genotypes of *E. canadensis*. However, the position of the neotropical species, *E. vogeli* and *E. oligarthrus* (basal or non-basal), and whether *E. multilocularis* and *E. shiquicus* are sister species remain unknown (Lymbery, 2017). Several authors agree that further analyses using more nuclear DNA sequences are required in order to completely resolve the relationships among putative species within the genus (Saarma et al., 2009; Nakao et al., 2013a; Lymbery, 2017). Particularly, neotropical species have been the least studied and only a few cases have been reported and published so far (D'Alessandro and Rausch, 2008; Soares et al., 2013; Arrabal et al., 2017). Hence, a better sampling and understanding of the neotropical species will help to resolve the *Echinococcus* phylogeny. In 2013, with the publication of the first tapeworm genomes that included *E. multilocularis* and *E. granulosus* G1 species (Tsai et al., 2013; Zheng et al., 2013), the "Tapeworm genome era" began. In 2017, we sequenced and assembled the *E. canadensis* G7 genome and performed several comparative genomic analyses confirming the species status of the taxa (Maldonado et al., 2017). In this work, with the aim of obtaining a deeper description of the *Echinococcus* phylogeny*,* we sequenced the complete *E. oligarthrus* genome and performed whole-genome variant analysis among all the *Echinococcus* species that are currently available. The results presented here demonstrate the basal origin of the neotropical *Echinococcus* species and propose that the use of complete genome data is crucial for the unequivocal helminth phylogenetic studies.

#### MATERIALS AND METHODS

#### Sample Collection, DNA Extraction, and Next-Generation Sequencing Parasites Material

*E. oligarthrus* cysts were collected from Iguazú National Park, in the North of Misiones province, Argentina. Cysts were obtained from the livers of naturally infected agoutis (*Dasyprocta azarae*). The animals involved in this study were not subjected to any experimental procedure. All the samples used in this study were collected post-mortem from road-killed animals. For genome sequencing purposes, protoscoleces were aseptically removed from the cysts and extensively washed in phosphate buffer saline and visualized under an optical microscope. The species and genotype were determined by sequencing a fragment of the mitochondrial cytochrome C oxidase subunit 1 (COX1) (Arrabal et al., 2017).

#### DNA Isolation, Library Construction, and DNA Sequencing

The isolation of high-quality genomic DNA was performed by the phenol/chloroform method as previously described (Maldonado et al., 2017). Briefly, the samples were quantified using a Qubit Fluorometer (Invitrogen), the quality was evaluated by the OD rate 260/280 and 260/230 using a NanoDrop (ThermoFisher Scientific). MiSeq Illumina libraries were prepared as follows. For each library preparation, 50 ng of DNA was subjected to a random tagmentation reaction, and DNA was simultaneously fragmented and linked to specific adapters using the Nextera® XT DNA Sample Preparation Kit, according to the manufacturer's instructions. Two libraries of 530-bp fragment size were obtained and subjected to 500 sequencing cycles (2 × 250 bp) using the MiSeq v2 Reagent Kit. The quality of the Illumina reads was evaluated with FastQC v0.10.1, and the reads were trimmed and end-clipped to a Phred score of 33 using Trimmomatic (Bolger et al., 2014).

#### *De Novo* Assembly of *E. oligarthrus* NGS Reads

The genome of *E. oligarthrus* was assembled from a combination of two paired-end libraries sequenced in the Illumina MiSeq platform. The genome assembly involved several steps. First, a preliminary *de novo* assembly using SPAdes 3.6 (Bankevich et al., 2012; Safonova et al., 2014) was performed. The assembly sequences were screened against the NCBI nt database, using Nucleotide– Nucleotide BLAST v2.6.0+ (available at ftp://ftp.ncbi.nlm.nih. gov/blast/db/FASTA/nt.gz) in megablast mode, with an e-value cutoff of 1e−25 and a culling limit of 2 and using DIAMOND tblastx against SwissProt (Buchfink et al., 2015). Raw, paired-end Illumina reads were mapped against the assembly using Bowtie2 (Langmead and Salzberg, 2012). The output was converted to a BAM file using Samtools (Li and Durbin, 2009). Blobtools v1.1 (Laetsch and Blaxter, 2017) was used to create taxon-annotated GC-coverage plots for *E. oligarthrus* genome assembly and to identify the target sequences and target reads using as input the Nucleotide–Nucleotide and DIAMOND BLAST (Buchfink et al., 2015) and the raw read mapping results. Sequences that did not match the Platyhelminthes taxon as a top BLAST hit at the phylum level were filtered out. After removing contaminants, sequences that did match as a top BLAST hit at the phylum level and whose base coverage was >4× were used to recover the target reads. Before the assembly, we used Jellyfish to create a k-mer histogram and the convergence was analyzed with genomescope in order to evaluate whether after cleaning with blobtools the remaining reads could be used to obtain an acceptable assembly. The target reads extracted from the bam files were used to re-assemble the *E. oligarthrus* genome using SPAdes 3.12 (Bankevich et al., 2012; Safonova et al., 2014). The scaffolds of *E. oligarthus* were obtained with Chromosomer (Tamazian et al., 2016) and using *E. multilocularis*  as the reference genome [WormBase ParaSite, Version WBPS12 (WS267)]. Redundant unplaced and unlocalized sequences were discarded from the final assembly. The fragment sequence was considered unplaced if two or more alignments located on different reference sequences or unlocalized if two or more alignments located on the same reference sequences according to Tamazian et al. (2016). The contigs that did not map to the reference genome were included in the final assembly. The standard quality metrics of the assembly such as N50, the total number of contigs and the total length of the assembly were evaluated using QUAST (Gurevich et al., 2013). Putative non-target contigs shorter than 500 bp in length were removed from the final assembly. The completeness of the gene space was validated using BUSCO2 (Simão et al., 2015) and eukaryote CEGGs database. Furthermore, the core of cestodes genes [genes contained in all cestode species according to Maldonado et al. (2017)] was screened using BLAST. Coverage and depth coverage were calculated with custom scripts. Depth coverage refers to the number of times that the same region or position in the reference genome is represented by the assembled genome. Coverage refers to the percentage of the total length of the reference genome that is represented by the assembled genome.

### Gene Prediction and Annotation

The gene annotation of *E. oligarthrus* was performed by transferring the gene annotations from the *E. multilocularis*  genome [WormBase ParaSite, Version WBPS12 (WS267)] using a manual and own scripting approach. CDS and protein sequences of *E. multilocularis* and the scaffolds of *E. oligarthrus* were used for this purpose. First, the gene allocated regions were identified using BLAST with an e-value cutoff of 1e−10 and one best target hit. The CDS fragments of *E. oligarthrus* were extracted using bedtools (Quinlan and Hall, 2010; Quinlan, 2014) and the transcripts were re-assembled using Chromosomer (Tamazian et al., 2016). GeneWise (Birney et al., 2004) was used to find the correct frameshift of the CDS and to obtain the final datasets of CDS and proteins. The coordinates of the annotations were assessed using Exonerate (Slater et al., 2005) and added to a final GFF file. The performance of gene annotation and basic statistics for *E. oligarthrus* gene models, including the average intron/exon lengths and the number of introns, were calculated using Eval (Keibler and Brent, 2003). The functional gene annotation was performed using InterProScan-5.7-48.0 (Quevillon et al., 2005) and InterPro2GO databases were used to assign Gene Ontology (GO) terms (Ashburner et al., 2000). Gene models were subjected to BLAST search (Stemmer et al., 2013) against UniprotDB and Blast2GO (https://www.blast2go.com/) was used to define the final annotations and GO terms stats. The GO terms were analyzed using GO.db database implementing R, Bioconductor version: Release (3.8) with custom scripts. Proteins studied in this work were searched using BLAST (Stemmer et al., 2013) against UniProtKB/Swiss-Prot databases. Protein domains were screened against PFAM, and Prosite databases using PFAM\_ scan (Finn, 2006) or HMMscan 3.0. Global pairwise alignments were performed using Needle software (Rice et al., 2000) in order to identify the orthologous genes related to host–parasite interactions between *E. oligarthrus* and *E. multilocularis* (Brehm and Koziol, 2017). Identity and coverage stats were calculated and only hits with more than 50% of coverage were selected.

### Phylogeny With Nuclear Molecular Markers

Molecular markers of the *Echinococcus* nuclear genome were downloaded from GenBank (Kinkar et al., 2017). For each locus, homologous regions were extracted from complete *Echinococcus* genomes [WormBase ParaSite, Version WBPS12 (WS267)] and the *E. oligarthrus* genome obtained in this work using custom scripts. Homologous genomic sequences from *Taenia solium*  were employed as outgroup. The DNA sequences were aligned with ClustalX (v2.0.12) and multiple alignments were edited with BioEdit (v7.1.3). Phylogenetic analyses of concatenated nuclear markers were performed using the maximum likelihood method and Tamura-Nei model implemented by MEGAX software (Kumar et al., 2018). The bootstrap consensus trees inferred from 1000 replicates were retained. The percentage of replicate trees in which the associated taxa clustered together in more than 50% in the bootstrap test are shown for each node. The concatenated data resulted in 1552 bp length for seven taxa. A second phylogeny was calculated using MrBayes 3.1.2 (Ronquist et al., 2012). The evolutionary model was set on generalized time reversible substitution model with gamma-distributed rate variation across the sites and a proportion of invariable sites (GTR +G + I model) with at least 200 samples from the posterior probability distribution, and diagnostics calculated every 1000 generations. We use an exponential prior on the branch length with mean = 0.1 substitutions/site.

#### Phylogeny Using Whole-Genome SNPs and Artificial Genome Sequence Construction

Variant calling was performed as described in the "Whole-Genome SNP analysis" section. Genome-wide SNPs were used to perform phylogeny analysis as follows: first, heterozygous SNPs were removed and only the homozygous SNPs with a depth coverage >20× and strictly covered in all the *Echinococcus* species were selected to correct for complete lineage sorting. Afterwards, the homozygous SNP loci were concatenated and the resulting sequence alignment was used to create a phylogenetic tree by implementing the Maximum Likelihood and Bayesian method as above. PartitionFinder 2 (Lanfear et al., 2016) was used to select the best-fit partitioning schemes and models of evolution for the phylogenetic analysis. The selected model was implemented to perform phylogeny analysis.

### Whole-Genome SNP Analysis

Genome sequences of *Echinococcus* species were ordered in chromosomes according to the last version of the *E. multilocularis* genome [WormBase ParaSite, Version WBPS12 (WS267)] using Chromosomer (Tamazian et al., 2016) and were used as mapping templates for the variant calling analyses. Variant calling was performed as follows: first, all of the raw reads from *E. canadensis* G7, *E. granulosus* G1, *E. multilocularis*, and *E. oligarthrus* libraries were processed and filtered by quality. Then, the reads were first mapped against their own reference genomes and then against the genomes of the other three *Echinococcus* species using bowtie2 (Langmead and Salzberg, 2012). Mapping statistics were calculated with Bamtools (Barnett et al., 2011), and duplications were marked and discarded using picard-tools v-2.18 (http://broadinstitute. github.io/picard/). The variant calling was performed using bcftools (Narasimhan et al., 2016; Danecek and McCarthy, 2017) and GATK (McKenna et al., 2010) using the following parameters: variation frequency was set >40% with a depth coverage of at least 70% of the total mean coverage; the base quality of both reference site and variation site was set to >30. Insertion and deletions (indels) were filtered out with VCFtools (Danecek et al., 2011), and SNPs with less than 10 bp far from indels were removed to avoid false-positive SNP. In order to annotate the heterozygous and homozygous polymorphic sites, the reads were first mapped against their own reference genomes and then against the other target genomes. Heterozygous sites were retained only if both forward and reverse reads mapped against the reference and alternative allele at a given nucleotide position whose depth coverage was at least 70% of the total mean coverage supporting that position. Homozygous polymorphic sites were annotated if the forward and reverse reads mapped onto the alternative allele with at least 70% of the total mean coverage supporting that position and if there were no reads supporting the reference allele. Homozygous and heterozygous variant sites were registered for all of the species. The Transition/transversion ratios were calculated using VCFtools (Danecek et al., 2011), and the annotation and classification of SNPs based on the effect of annotated genes were carried out with SnpEff v4.0 (Cingolani et al., 2012). Graphics were built using R software. (https://www.r-project.org/).

## RESULTS

## The *E. oligarthrus* Genome

The *E. oligarthrus* genome was assembled from two paired-end libraries sequenced in Illumina MiSeq. High-quality genomic DNA was purified from two cysts isolated from its natural host, *D. azarae*, which was naturally infected and found in Iguazú National Park, Misiones province, Argentina. Microscopic and macroscopic findings indicated that it was a neotropical *Echinococcus* species. The species and the genotype were confirmed by polymerase chain reaction (PCR) amplification of cytochrome oxidase 1 (cox1) followed by direct sequencing. The *E. oligarthrus* group 1 cox1 sequence was determined (Arrabal et al., 2017). The *E. oligarthrus* genome assembly was performed using a *de novo* assembly strategy, and the best assembly was chosen based on its quality metrics. Due to the type of the metacestode development that consists of close interaction with the host's tissue and in spite of having done a careful extraction of the parasite material, we sequenced a high proportion of host's DNA. Indeed, preliminary analysis of raw reads revealed that the sequencing yield of parasite DNA was ~22%, whereas the remaining ~78% of the sequences was from presumably *D. azarae*, whose genome has not been sequenced yet. Therefore, we performed a screening of potential contaminants before obtaining the final assembly. Since the genome of *D. azarae* is unknown and non-related sequences are available, we made a thorough identification of sequences truly derived from the target genome using Blob tools software (Laetsch and Blaxter, 2017). This software was specially designed for DNA contamination assessment and is particularly useful for organisms of parasitic origin. Here, we performed a taxonomic selection of the target sequences using the Platyhelminthes taxon (Taxonomy ID: 6157) and created a taxon-annotated GC-coverage plot that allowed us to identify Platyhelminthes contigs and discard all the non-target sequences (**Supplementary Figure 1**). After removing the contaminants, sequences that did match as a top BLAST hit at the phylum level and whose base coverage was >4× and %GC ~ 41% (typical GC content of *Echinococcus* genus) were used to recover the target reads and re-assemble the *E. oligarthrus* genome (see Materials and methods for more details). The final assembly contained 74513 contigs that were further used to obtain the genome scaffolds. Redundant unplaced and unlocalized sequences (19.7 Mb) were removed from the contigs to obtain the final scaffolds assembly (see Materials and methods for more details). The contigs that did not map to the reference genome (did not allocate in scaffolds) were included in the final assembly. The final *E. oligarthrus* genome assembly was composed of 3764 sequences comprising a total of 86.22 Mb with an average GC content of 41% and showed an N50 ~ 10 Mb and ~20× in depth coverage. The assembly metrics for contigs and scaffolds are shown in **Table 1** and **Supplementary Table S1**. The nuclear genome of *E. oligarthrus* showed 76.3% coverage and ~90% identity with the genome of *E. multilocularis* whose genome size is ~115 Mb [WormBase ParaSite, Version WBPS12 (WS267)]. The percentage average of nucleotide identity among *Echinococcus* chromosomes ranged from 88.7% to 92.7% (**Supplementary Table S2**). The assembly also included the *E. oligarthrus* mitochondrial genome composed of one scaffold that was obtained from the contigs Eoli\_00665 and Eoli\_02629. The scaffold length was 13,893 bp with 98% coverage and 96% nucleotide identity to the *E. oligarthrus* mitochondrial reference genome (GenBank accession number AB208545). As expected, the nucleotide identity of the mitochondrial genomes was lower in comparison with other *Echinococcus* species (**Supplementary Figure 2**). To assess the completeness of the genome assembly, we evaluated the gene space using "Benchmarking Universal Single-Copy Orthologs" (BUSCO) (Simão et al., 2015), which measures the genome completeness based on evolutionarily informed expectations of gene content. In this analysis, we identified 77.6% (235 of the 303 core genes) that are expected to be present in all metazoans, including 125 complete and duplicated, 110 fragmented, and 68 missing orthologs. The fragmented nature of the assembly may have prevented many genes from meeting the stringent matching criteria implemented by BUSCO. Indeed, the BLAST results suggest that most of the core genes are identifiable in the genome, even though many genes are present as fragments within the assembly. Also, due to the fragmentation level of the genome assembly, a hybrid and *ab initio* gene prediction is likely TABLE 1 | Genome-wide statistics for the *E. oligarthrus* assembly and gene findings.


(\*) *Redundant sequences (19.7 Mb) were removed from the contigs to obtain the final scaffolds assembly.*

impracticable. Since all the available gene annotation transfer tools tried here showed low efficiency, the gene annotation was performed by manual gene annotation transfer as described in the Materials and methods section. First, we used BLAST to search for the best suitable *Echinococcus* species to be used as a template. In this regard, we found that *E. multilocularis* was the most suitable species because of the integrity of the assembly and the higher identity with the genome of *E. oligarthrus*  (**Supplementary Table S2**). The final set of *E. oligarthrus* genes comprised 8753 genes coding for proteins (4494 genes with coverage > 50% and 4259 genes with coverage < 50%) (**Supplementary Table S3**). The whole set of genes represented the 85.0% of the total genes expected to be found in species of the genus *Echinococcus.* The predicted proteins were also screened against the conserved core of genes that are present in all the cestodes species according to Maldonado et al. (2017) using BLAST. In this regard, 4872 out of 5203 genes were found for *E. oligarthrus* using e values of <1e−12, which comprised the 93.6% of the total conserved core of genes in cestodes, indicating that the genome contains useful molecular data (**Supplementary Table S4**).

#### *E. oligarthrus* Genes

GO terms were assigned to the 40% of *E. oligarthrus* proteins. In relation to the Molecular Function GO terms frequency, the two main categories found were "binding (GO:0005488)" and "catalytic activity (GO:0003824)", which is in accordance with the GO terms frequency observed in other cestode genomes (Hahn et al., 2014) and *Echinococcus* species (Maldonado et al., 2017). For the Biological Process GO term, the highest frequencies observed were the categories "cellular process (GO:0009987)" and "metabolic process (GO:0008152)", also in accordance with the GO terms frequency observed in the related organism (**Supplementary Figure 3**).

From the total gene-set, we searched for genes that have already been described as having a role in host–parasite interactions in *Echinococcus* (Brehm and Koziol, 2017). In this regard, we found 56 genes involved in host–parasite interactions whose coverages and identities to *E. multilocularis* orthologous genes were >50% (**Supplementary Table S5**). Particularly, two genes encoding for the epidermal growth factor (EGF) tyrosine kinase receptors (Eoli\_000075800 and Eoli\_000617300) with a high percentage of identity (59.8% and 52.6%) and coverage (65.3% and 51.4%) were identified. Also, we found a gene encoding for fibroblast growth factor (FGF) receptor tyrosine kinase (Eoli\_000833200) whose percentage of identity and coverage values were 76% and 100%, respectively. Regarding nuclear receptor hormones, we found nine orthologous genes with high identity and coverage (83.8% and 51.8% on average, respectively) including the cestode-specific nuclear hormone receptor Eoli\_000937000. Non-kinase receptors were also found, comprising four genes, including Frizzled G proteincoupled receptors (GPCR) (Eoli\_000682100) with high identity and coverage (88.6% and 87.6%, respectively). Finally, we identified the amino acid transporters (DAACS family), lipid binding proteins (FABPs), and antigens (AgB and Eg95), specific and conserved proteins in cestodes/*Echinococcus*.

#### Comparative Whole Genome Based on SNPs Analysis

The nucleotide variation in the genomes of different *Echinococcus* species was assessed through variant calling analysis. Genetic variants were identified among the genomes of *E. oligarthrus*, *E. multilocularis*, *E. granulosus* G1, and *E. canadensis* G7. Here, we focused on the study of single-nucleotide polymorphisms (SNPs). In order to perform this analysis, NGS raw reads were mapped against a reference genome composed of chromosomes and contigs. For all the analyses, the reads were first mapped against their own reference genomes and then against the genome of the corresponding analyzed species. Homozygous and heterozygous variant sites were identified and were marked in both the reference and the alternative allele (see Materials and Methods for more details). First, we evaluated the intraspecific variation of the *Echinococcus*

genomes. As described in Maldonado et al. (2017). *E. granulosus*  G1 genome exhibited the highest number of intraspecific variant sites (74,796 SNPs, 0.65 SNPs/kb), followed by *E. oligarthrus* (23,301 SNP, 0.23 SNP/kb), *E. canadensis* G7 (10,791 SNPs, 0.095 SNPs/kb), and *E. multilocularis* (1,287 SNPs, 0.011 SNPs/kb). With regard to the genetic diversity observed among the *Echinococcus* species, we observed that the SNP distribution was similar to the distribution described in our previous research (Maldonado et al., 2017). *E. canadensis* G7 and *E. granulosus* G1 showed a higher number of SNPs (842,322, Ts/Tv = 2.97) between each other than between *E. canadensis* G7 and *E. multilocularis* (314,176, Ts/Tv = 2.97), and in comparison to *E. granulosus* G1 and *E. multilocularis* (272,138, Ts/ Tv = 2.94). Furthermore, the pair *E. canadensis* G7 and *E. granulosus* G1 also showed almost 10 orders of magnitude more SNPs than between *E. oligarthrus* and *E. multilocularis* (38,911, Ts/Tv = 3.25). The number of SNPs between *E. oligarthrus* and *E. granulosus* (126,472 Ts/Tv = 2.99) was similar to the number of SNPs between *E. oligarthrus* and *E. canadensis* G7 (171,135, Ts/Tv = 3.00). We also identified homozygous and heterozygous variant sites for the four *Echinococcus* species, for both the reference and the alternative allele in each case. The number of homozygous SNPs was 313,992 for *E. canadensis* G7 and *E. multilocularis*, 266,180 for *E. granulosus* G1 and *E. multilocularis*, 38,762 for *E. oligarthrus* and *E. multilocularis*, 830,768 for *E. canadensis* G7 and *E. granulosus* G1, 125,147 for *E. oligarthrus* and *E. granulosus* G1, and 170,049 for *E. oligarthrus* and *E. canadensis* G7 (**Table 2**). Moreover, the SNP density was assessed in terms of the number of SNPs per 1-Mb length of the *Echinococcus*  chromosomes. For all the *Echinococcus* genomes, the highest SNP density (SNPs/Mb of the chromosome) was found for chromosome 1. The chromosomes 2, 3, and 4 showed a slightly lower SNP density than chromosome 1 and chromosomes 5, 6, 7, 8, and 9 showed the lowest SNP density (**Figure 1**). However, the intraspecific SNP density distribution for *E. granulosus* G1 was higher in chromosomes 1, 5, and 9. We also evaluated the SNP distribution of *E. oligarthrus* in the coding and non-coding genomic regions (intergenic, exons, and introns) of *E. multilocularis.* In this regard, the 38,911 SNPs distributed with a higher rate in exons and introns rather than in the intergenic regions. However, those distributed in the coding regions exhibited a higher rate of synonymous changes (67.3%) rather than missense changes (32.6%) (**Supplementary Figure 4**).


*aRef/Alt : Hom/Hom: 6; Ref/Alt : Hom/Het:0; Ref/Alt : Het/Het:1; Ref/Alt : Het/Hom:9. bRef/Alt : Hom/Hom: 7; Ref/Alt : Hom/Het:0; Ref/Alt : Het/Het:0; Ref/Alt : Het/Hom:11. cRef/Alt : Hom/Hom: 0; Ref/Alt : Hom/Het:0; Ref/Alt : Het/Het:0; Ref/Alt : Het/Hom:2. dRef/Alt : Hom/Hom: 944; Ref/Alt : Hom/Het:0; Ref/Alt : Het/Het:13; Ref/Alt : Het/Hom:9238. eRef/Alt : Hom/Hom: 55; Ref/Alt : Hom/Het:0; Ref/Alt : Het/Het:0; Ref/Alt : Het/Hom:603. f Ref/Alt : Hom/Hom: 7; Ref/Alt : Hom/Het:0; Ref/Alt : Het/Het:7; Ref/Alt : Het/Hom:35.*

#### *Echinococcus* Phylogeny Reconstruction by Whole-Genome SNP Analysis

In order to evaluate the contribution of the SNPs to the genetic diversity among the different *Echinococcus* species, we performed phylogenetic analyses by implementing three different approaches: the first used the whole genome variant sites for the four *Echinococcus* species (only the SNPs), which consisted of analyzing 244,246 sites in each genome arising a total of 2.08 Mb; the second approach used genome regions for the four *Echinococcus* species whose depth coverage was >20× and that strictly contained variant sites supported by more than 20 reads, which involved the analysis of 40,179,279 sites in each genome arising in a total of ~162 Mb. For the third approach, we used only coding regions for the four *Echinococcus* species whose depth coverage was >20× and that strictly contained variant sites supported by more than 20 reads. This involved the analysis of 42,200 sites in each genome arising in a total of 168.8 kb. In all the cases, only the homozygous SNPs were used to perform phylogenetic analyses. The sites and sequences retained here were concatenated and the resulting alignment was used to create the phylogenetic trees by implementing the maximum likelihood and Bayesian methods (see Materials and methods for more details).

The construction of the phylogenetic tree implementing the Bayesian method and using whole-genome SNPs showed a topology that demonstrated a higher genetic distance between *E. canadensis* G7 and *E. granulosus* G1 in comparison with the common node from which *E. multilocularis* and *E. oligarthrus* diverge equidistantly at a very short distance (**Figure 2A**). *E. multilocularis* and *E. oligarthrus* exhibited a low genetic diversity between each other. PartitionFinder 2 (Lanfear et al., 2016) was used to select the best-fit partitioning schemes and models of evolution for the phylogenetic analysis. Transversional substitution

model (TVMef) was the best-fitted model. Phylogenetic analysis using TVMef was consistent with the previous result (**Figure 2B**). The topologies of the trees were also consistent with the topology observed using the Maximum Likelihood method (**Supplementary Figure 5A**). Phylogenetic analysis using genomic regions with depth coverage >20× sites and coding regions with depth coverage >20× also showed the same topology, except for the branches that exhibited different lengths (**Supplementary Figures 5B, C**). The total length of the coding sequences used here was ~42.2 kb and the genes sampled and used for this purpose are listed in **Supplementary Table S6**.

In addition, we registered how many polymorphic loci are shared among the *Echinococcus* species containing the same polymorphism. In previous research, we determined that the number of shared loci was higher when we used *E. multilocularis* as the reference genome, demonstrating to be the basal species

and discarding *E.granulosus* G1 and *E. canadensis* G7 as possible candidates (Maldonado et al., 2017). In order to evaluate whether *E. oligarthrus* occupies a basal position in the genus, we incorporated *E. oligarthrus* and analyzed the number of shared loci using both *E. multilocularis* and *E. oligarthrus* as reference genomes and compared the results to each other. Due to the different genome quality assemblies and coverage, we normalized the number of SNPs in shared loci using the effective length of the sampled regions under the assumption that all the genome regions are equally subjected to mutation. In this regard, we found that 12,282 loci shared the same nucleotide change among *E. canadensis* G7, *E. granulosus* G1, and *E. oligarthrus* with respect to *E. multilocularis* (*E. multilocularis* used as reference). Moreover, the number of shared loci between *E. oligarthrus* and *E. granulosus* G1, between *E. oligarthrus* and *E. canadensis* G7, and between *E. canadensis* G7 and *E. granulosus* G1 was 4115,

8919, and 92,326, respectively. Most of the loci were unique for *E. granulosus* G1 (226,350) and *E. canadensis* G7 (335,728) rather than for *E. oligarthrus* (40,386). On the other hand, the number of shared loci containing the same nucleotide change among *E. canadensis* G7, *E. granulosus* G1, and *E. multilocularis* was 12,277 with respect to *E. oligarthrus* (*E. oligarthrus* used as reference). The number of shared loci between *E. multilocularis* and *E. granulosus* G1, between *E. multilocularis* and *E. canadensis*

G7, and between *E. canadensis* G7 and *E. granulosus* G1 was 8675, 8875, and 54,792 respectively. Furthermore, and similar to the above results, the highest numbers of unique loci were for *E. granulosus* G1 (133,417) and *E. canadensis* G7 (242,815), rather than for *E. multilocularis* (60,455) (**Figure 3**).

In order to gain accuracy and supporting evidence for our previous results, we reconstructed the *Echinococcus* phylogeny with nuclear molecular markers previously described (Kinkar

et al., 2017). Nuclear molecular marker sequences of all the available *Echinococcus* species were concatenated and then were aligned and analyzed using both the Bayesian and maximum likelihood methods. This analysis involved the study of 1552 nucleotides that allowed to root the trees and reinforced the basal position of *E. oligarthrus* (**Supplementary Figure 5D**). The phylogenetic topology obtained was consistent with previous phylogenetic analyses of (Nakao et al., 2013b) that have placed *E. oligarthrus* (and *E. vogeli*) in a basal position for the genus *Echinococcus*. Similar phylogenetic topologies were obtained with the two methods employed here (**Supplementary Figure 5E**).

#### DISCUSSION

In this work, we sequenced the first sylvatic species of *Echinococcus*, *E. oligarthrus*, isolated from its natural host and performed comparative genomics between both domestic and sylvatic species. One of the limitations of obtaining complete genomes of wildlife parasites from natural infections resides on the difficulty of obtaining DNA samples free of the host material. In particular, for parasites whose development occurs in intimate contact with the host tissue, such is the case of *E. oligarthrus*. Here, we performed an extensive effort to identify the target sequences of the parasite and assemble the genome. Several steps were applied before the final assembly was obtained, and the final assembly was compared with the *Echinococcus* genomes obtained previously by us (Maldonado et al., 2017). Even though the quality of *E. oligarthrus* genome assembly was lower than other *Echinococcus* genomes, it was high enough to locate the genes and perform comparative genomic analysis.

Hereby, we used these data to unravel the phylogeny of this genus. In previous studies, we described the genetic variation among three *Echinococcus* species (*E. canadensis* G7, *E. granulosus* G1, and *E. multiloccularis*) and assessed the distribution of SNPs in the whole genome as well as the effect and the type of SNPs in the coding regions. In this work, we added a new genome to the analysis of genetic variants and studied the SNP distribution in each one of their chromosomes. Regarding the genetic diversity among the *Echinococcus* species, we found that *E. canadensis* G7 and *E. granulosus* G1 contained almost 10 orders of magnitude more SNPs than between *E. oligarthrus* and *E. multilocularis.* The SNP distribution observed is similar to the distribution described in our previous research (Maldonado et al., 2017) where the genetic diversity within the *E. granulosus sensu lato* species was high. On the other hand, the genetic diversity between the sylvatic species *E. oligarthrus* and *E. multilocularis* is remarkably lower*.* Furthermore, the genetic variability showed to be unequal for different chromosomes. This fact was revealed by a higher SNP density in larger chromosomes than in the smallest ones. In previous studies, we have also reported a higher gene density in larger chromosomes (Maldonado et al., 2018). However, since most of the SNPs produce synonymous nucleotide changes, the amino acid sequences derived from these genes, even those located in the larger chromosomes, are not altered by the presence of such changes and presumably neither their function.

For several years, the mitochondrial sequences were employed to analyze the *Echinococcus* phylogeny (for a review, see Nakao et al., 2013a). However, the construction of phylogenetic trees based only on mitochondrial DNA data may be biased because it is maternally inherited and, therefore, under particular evolutionary forces that may not represent the evolutionary history for each species (Lymbery, 2017). Indeed, it has been suggested that nuclear sequences should be used when evaluating the phylogenetic positions of new *Echinococcus* isolates (Saarma et al., 2009). Here, we performed maximum likelihood and Bayesian phylogenetic analyses using nuclear DNA sequences and compared the results implementing different models including the best-fitted evolutionary model predicted by PartitionFinder 2 (Lanfear et al., 2016). For this purpose, we used all the shared loci within SNPs that were identified among the *Echinococcus* genomes in both whole genome and coding regions and under the strict criteria of having more than 20× depth coverage in all the species. Hereby, and adding previously described nuclear molecular markers, which provides high accuracy to our results, the tree topology retrieved as the most frequent reconstruction placed *E. oligarthrus* in a basal position. Based on these analyses, we conclude that *E. oligarthrus* may be one of the basal species of the genus *Echinococcus*, together with *E. multilocularis*. These findings also agree with our previous studies (Maldonado et al., 2017) where we proposed a basal sylvatic species that could have accumulated mutations over time until a speciation phenomenon could have given rise to *E. granulosus* G1 and *E. canadensis* G7, which afterwards would have diverged, independently increasing the genetic diversity. The fact that *E. canadensis* G7 and *E. granulosus* G1 share more homozygous polymorphic loci with the same variant supports the hypothesis of a basal sylvatic species. However, since the number of homozygous polymorphic loci with the same variant shared among three of the species is almost equal (12,282 for *E. multilocularis* and 12,277 for *E. oligarthrus* as reference genomes), this result does not allow one to resolve whether *E. multilocularis*, *E. oligarthrus*, or other unknown ancestral related species is the ancestral species from which modern *Echinococcus* genomes could have arisen, and thus remains unclear. This hypothesis could be further probed with the complete genome analyses of more *Echinococcus* species, which would be really useful to describe the complete evolutionary history of these parasites. One of the most interesting implications of the nuclear phylogeny based on SNP analysis found in this work is the position of the *Echinococcus* sylvatic species as basal to the genus *Echinococcus*; in addition, it also demonstrates relevant genetic similarities between *E. multilocuaris* and *E. oligarthrus.* This evolutionary framework may enable data-driven investigation of morphological features and developmental evolutionary studies that would provide relevant information about the neotropical echinococcosis. The generation of genome datasets from additional cestode species would further improve these findings. Phylogenetic studies have allowed the development of hypotheses about the evolutionary history of several taxonomic groups from other perspectives. Such is the case of parasitic organisms and the phylogeography of their hosts that helps to interpret parasite evolution in relation to the migratory patterns of their hosts and vice versa. In the neotropical region, several felids serve as definitive hosts for *E. oligarthrus*. Recently, we determined for the first time the presence of *E. oligarthrus* in ocelot (*Leopardus*  *pardalis*) and the puma (*Puma concolor*) in the north of Argentina using nuclear and mitochondrial molecular markers (Arrabal et al., 2017). Indeed, all the cases reported so far have come from South America (D'Alessandro and Rausch, 2008). In terms of phylogeography, the most suitable explanation is that carnivores originate from immigrants from North America and the ancestral species of *Echinococcus* migrated to South America together with their felid hosts. Early studies suggested that the differentiation of the species of *Leopardus* was likely facilitated by the formation of the Panamanian land bridge. The current hypothesis about felid evolution based on molecular phylogenetic studies suggests that the endemic neotropical felids (genus *Leopardus*) have diverged from other main felid lineages and that before the emergence of the Panamanian isthmus they could have migrated to South America. (Johnson et al., 2006). The finding of an archaic lineage of *Trichinella* in South America also supports the hypothesis of this early carnivore expansion (Pozio et al., 2009). Although originating in North America, the Puma currently has an extensive geographic range in South America and could explain the presence of *E. oligarthrus* in several felid species from the neotropical region. Regarding intermediate hosts, the rodents of the *Hystricomorpha* suborder, natural hosts of *E. oligarthrus*, are known to be the dominant small terrestrial herbivores in South America by the Miocene (Eisenberg et al., 1989). Hence, both the neotropical *Echinococcus* species and their respective hosts seem to have an ancient origin (Nakao et al., 2013b). The need for more genomes and analyses of host–parasite interactions are evident in order to further understand the co-evolution between this parasite and its felid host and the lifestyles of *Echinococcus* species.

Before sequencing the *E. oligarthrus* genome, there was a paucity of related molecular data of this organism. Indeed, only 47 nucleotide sequences have been reported so far, representing only nine genes. Most of them were obtained with the sole purpose of being used as molecular markers (e.g., 18s rRNA nuclear gene or cox1 mitochondrial gene). The scarce sequences information about this parasite was significantly improved through our effort to sequence, assemble, and annotate the genome of *E. oligarthrus.* Despite the fragmented nature of the assembly, we have thoroughly analyzed the gene content in comparisons with other members of the genus. Most of the core genes are identifiable in the genome even though many genes are present as fragments within the assembly. Although many genes presented coverage < 50%, the GO terms distribution found for *E. oligarthrus* was according to that observed in other *Echinococcus* species. Even more, we found that many genes are typically conserved in all the cestodes species. Indeed, these genes composed the core of cestodes genes according to Maldonado et al. (2017), which means that these genes can be found in all the cestodes species whose genomes have been sequenced so far. Therefore, both the genome and the gene annotations of *E. oligarthrus* are suitable to be used in several bioinformatic and comparative analysis as well as to guide hummed and molecular assays. Several works have reported the molecular and cellular mechanisms implicated in the larvae development of *E. multilocularis* and *E. granulosus sensu stricto* (Brehm and Koziol, 2017) but not much is known about the neotropical *Echinococcus* species that have a particular larvae morphology (D'Alessandro and Rausch, 2008). By means of this new genome, we expanded the repertoire of the available genes and reached the 85% of the genes expected to be present in *Echinococcus*, providing for the first time a large set of proteins of a neotropical *Echinococcus* species that can be further studied. Until now, only two families of coding genes implicated in host–parasite interactions had been sequenced from *E. oligarthrus*, the antigen Eg95 and the partial sequence of the antigen B (Haag et al., 2006, Haag et al., 2009). In this regard, we searched for proteins already known to be involved in host– parasite interactions, including genes that are responsible for larval development and signaling pathways in *Echinococcus*. Genes such as Wnt or TNFα receptor and putative regulatory genomic sequences of thousands of important genes were described here. By means of this work, the study of the genes and genomic regulatory regions of the neotropical species *E. oligarthrus* are now reachable by the scientific community. The data obtained here will allow the design of data-driven experiments of gene expression that will provide clues about the particular behavior of the parasite into the mammalian hosts and its differences between sylvatic and domestic species.

### DATA AVAILABILITY

The assembled sequences of the E. oligarthrus genome were deposited in ENA (BioProject PRJEB31222, http://www.ebi.ac.uk/ ena/data/view/PRJEB31222). The sequences and the annotation data can also be downloaded from our FlatDB project web page (http://www.bmhid.org/flatdb/). All the data generated and analyzed in this study are included in this published article and within the **Supplementary Material**.

### AUTHOR CONTRIBUTIONS

LM performed the bioinformatics analysis and wrote the manuscript. JA collected the parasite material and performed the genetic and morphological analysis. LM and LK designed the study. GO constructed the libraries and performed the sequencing. LM, LK, MR, and GO wrote and revised the manuscript. All authors read and approved the manuscript.

### FUNDING

This study was supported by the MinCyT CAPES BR/RED 1413 (L.K.), Secretaria de Políticas Universitarias-Ministerio de Educación, Cultura, Ciencia y Tecnología, Argentina (CAPG-BA 070/13) (L.K. and G.O.) and Sistema Nacional de Computación de Alto Desempeño (SNCAD-MiNCyT) (L.K.).

### ACKNOWLEDGMENTS

The authors thank Gabriel Lichtenstein for server administration and maintenance.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene.2019.00708/ full#supplementary-material

#### REFERENCES


*felidis* Ortlepp, 1937 (Cestoda: Taeniidae) from the African lion. *Int. J. Parasitol.* 38, 861–868. doi: 10.1016/j.ijpara.2007.10.013


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2019 Maldonado, Arrabal, Rosenzvit, Oliveira and Kamenetzky. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Compositional Analysis of Flatworm Genomes Shows Strong Codon Usage Biases Across All Classes

*Guillermo Lamolle1, Santiago Fontenla1, Gastón Rijo1, Jose F. Tort1\* and Pablo Smircich2,3\**

*1 Departamento de Genética, Facultad de Medicina, Universidad de la Republica, UDELAR, Montevideo, Uruguay, 2 Departamento de Genómica, Instituto de Investigaciones Biológicas Clemente Estable, IIBCE, MEC, Montevideo, Uruguay, 3 Laboratorio de Interacciones Moleculares, Facultad de Ciencias, Universidad de la Republica, UDELAR, Montevideo, Uruguay*

#### *Edited by:*

*Daniel Yero, Autonomous University of Barcelona, Spain*

#### *Reviewed by:*

*Deng-Feng Xie, Sichuan University, China Arif Uddin, Assam University, India*

#### *\*Correspondence:*

*Jose F. Tort jtort@fmed.edu.uy Pablo Smircich psmircich@fcien.edu.uy*

#### *Specialty section:*

*This article was submitted to Evolutionary and Genomic Microbiology, a section of the journal Frontiers in Genetics*

*Received: 24 May 2019 Accepted: 22 July 2019 Published: 05 September 2019*

#### *Citation:*

*Lamolle G, Fontenla S, Rijo G, Tort JF and Smircich P (2019) Compositional Analysis of Flatworm Genomes Shows Strong Codon Usage Biases Across All Classes. Front. Genet. 10:771. doi: 10.3389/gene.2019.00771*

In the present work, we performed a comparative genome-wide analysis of 22 species representative of the main clades and lifestyles of the phylum Platyhelminthes. We selected a set of 700 orthologous genes conserved in all species, measuring changes in GC content, codon, and amino acid usage in orthologous positions. Values of 3rd codon position GC spanned over a wide range, allowing to discriminate two distinctive clusters within freshwater turbellarians, Cestodes and Trematodes respectively. Furthermore, a hierarchical clustering of codon usage data differs remarkably from the phylogenetic tree. Additionally, we detected a synonymous codon usage bias that was more dramatic in extreme GC-poor or GC-rich genomes, i.e., GC-poor Schistosomes preferred to use AT-rich terminated synonymous codons, while GC-rich *M. lignano* showed the opposite behavior. Interestingly, these biases impacted the amino acidic usage, with preferred amino acids encoded by codons following the GC content trend. These are associated with non-synonymous substitutions at orthologous positions. The detailed analysis of the synonymous and non-synonymous changes provides evidence for a two-hit mechanism where both mutation and selection forces drive the diverse coding strategies of flatworms.

Keywords: flatworms, GC content, synonymous codons, codon usage, non-synonymous substitutions, amino acid usage, mutation, selection

### INTRODUCTION

The phylum Platyhelminthes with more than 30,000 species is one of the major phyla of invertebrate animals containing an enormous diversity of life forms that had colonized very diverse niches (Caira and Littlewood, 2013). Almost three quarters of the flatworms are parasitic and belong to the Neodermata, a monophyletic clade characterized by a syncytial tegument and the presence of diverse specialized organs to attach to hosts like suckers and hooks. The Neodermata comprise three classes: the Monogenea (primarily external parasites of cold-blooded aquatic vertebrates), the Cestoda (obligate endoparasites of vertebrates), and the Trematoda (endoparasites of vertebrates as adults, with intermediate stages endoparasitic in other invertebrates, mainly mollusks) (Caira and Littlewood, 2013). Besides the parasitic Neodermatans, an enormous diversity of species occurs in seas, rivers, and lakes and on all continental land masses comprising one of the successful phyla of invertebrates (Collins, 2017). A few species exist as either commensals or occasional parasites of

**48**

invertebrates, but most of them are free-living predator species forming a single paraphyletic group collectively referred to as "turbellarians" (Caira and Littlewood, 2013). Studies based on rRNA (Larsson and Jondelius, 2008; Laumer and Giribet, 2014) and transcriptomic data (Egger et al., 2015; Laumer et al., 2015) showed that the phylum Platyhelminthes split early into two clades: the ancestral Catenulida and the Rhabditophora, which includes several free-living orders and the Neodermatans (**Table 1**). Taxonomically, the Macrostomorpha was placed as the earliest diverging Rhabditophoran linage and the order Tricladida, which contains the model organism *Schmidtea mediterranea*, as part of the later-evolved "turbellarians."

The huge diversity of flatworm's life forms seems to be paralleled at genomic level. The recent publication of several genomic assemblies of the phylum (most of them corresponding to parasitic Neodermatans) has revealed a wide genomic diversity. For example, genome sizes range from 67 or 104 Mbases in the monogenean *Gyrodactilus salaris* or the cestode *Hydatigera taeniaeformis*, respectively, to 1,200 Mbases in the trematode *Fasciola hepatica* (Coghlan et al., 2019)*.* Interestingly, this variation has little correlation with gene set completeness among genomes and is mostly due to non-coding elements, including repetitive and non-repetitive elements, with repeat content ranging from less than 4% in the smallest genomes of cestodes to 68% in *Fasciola hepatica*. Additionally, guanine and cytosine (GC) contents are very diverse from 28% in the planaria *S. mediterranea* and 33% in Monogenea *Gyrodactylus salaris*, to more than 45% in *M. lignano* and the food-borne trematodes (FBT) *F. hepatica*, *C. sinensis,* and *O. volvulus* (Coghlan et al., 2019).

We wondered if these large variations in genomic composition and structure could be correlated with the morphological and ecological diversity. It is well known that genomic GC content determines codon usage across species (Bernardi and Bernardi, 1985; Plotkin and Kudla, 2011) and the use of alternative synonymous codons is a non-random process (Sharp et al., 2010; Plotkin and Kudla, 2011). Due to the degeneracy of the genetic code, most amino acids, with the exceptions of methionine and tryptophan, are encoded by more than one codon. Codon usage bias (CUB) is a phenomenon where synonymous codons are not used with equal frequencies in coding DNA. It has been suggested that codon usage bias is the result of an equilibrium between mutational bias and natural selection and that natural selection could be acting in presumably highly expressed genes (Sharp et al., 2010; Plotkin and Kudla, 2011). Besides the effect at synonymous codon usage, it has been shown that strong GC bias could lead to changes in amino acid frequencies (Behura and Severson, 2013; Li et al., 2015). While this has not been explored widely in flatworms, several advances have been made in nematodes (Cutter et al., 2006; Mitreva et al., 2006; Mazumder et al., 2017a; Mazumder et al., 2017b). It is not clear yet how genomic GC differences could be influencing the codon usage and amino acid composition of proteins in Platyhelminthes and if these variations correlate with the ecological and physiological diversity in the phylum.

First reports of flatworm codon usage predated the genomic era and were based on a low representative number of sequences in Schistosomes and *Echinococcus*. Heterogeneity was evidenced since Schistosomes preferred A+T-rich codons, while *Echinococcus* favored GC3-rich codons (Meadows and Simpson, 1989; Alvarez et al., 1993; Kalinna and McManus, 1994; Milho and Tracy, 1995). Further analysis in larger sets of genes showed that codon bias was not uniformly distributed between genes introducing the possibility of isochores (regions that differ in GC content) in the genomes of flatworms (Ellis and Morrison, 1995; Ellis et al., 1995). In agreement, a more recent compositional analysis of


TABLE 1 | List of species analyzed and their GC content.FL, Free-living; PR, parasitic; G.GC, genomic GC percentage; T.GC, transcript (CDS) GC percentage.

*FL, free-living; PR, parasitic; G.GC, genomic GC percentage; T.GC, transcript (CDS) GC percentage.*

the *S. mansoni* genome reported an isochore-like organization (Lamolle et al., 2016). Early studies analyzing the forces behind codon bias found evidences of both mutational pressure (Musto et al., 1998) in *S. mansoni* and selection (Fernandez et al., 2001) in *Echinococcus* spp. as preponderant forces. More recently, studies on *S. haematobium* and *S. japonicum* confirmed a major role of natural selection in shaping the codon usage bias in these species (Mazumder et al., 2017a; Mazumder et al., 2017b). Several studies analyzed codon usage on the available genomes and transcriptomes of *Taenidae* species showing weak codon bias and a higher GC3 in highly expressed genes explained by combined mutational and selection forces (Chen et al., 2013; Yang et al., 2014; Yang et al., 2015; Huang et al., 2017). A more preponderant contribution of selection shaping codon usage was identified in a comparative analysis in *Echinococcus* species (Maldonado et al., 2018). While these studies highlight that platyhelminthes are compositionally varied, they are focused just in the schistosomes and tapeworms. We took advantage of the wide array of transcriptomes and genomes now available to extend the study to a phylum-wide analysis of codon usage patterns, as a proxy of the molecular organization of flatworm genomes. We performed a comparative analysis at the genomic level of 22 species representative of the main clades and lifestyles of the phylum Platyhelminthes. Within these species, we picked a set of 700 orthologous gene groups conserved across the 22 species and measured changes in GC content, codon, and amino acid usage in orthologous positions. We found a class independent-wide diversity in codon and amino acid usages. Based on the study of orthologous positions in selected pairs of species with diverse GC content, we provide evidence of a combined contribution of mutational forces and selection that enforced synonymous codon usage bias and differential amino acid usage.

## METHODS

#### Data Acquisition

Genomic and coding sequences of 22 flatworm species were used in this work. To ease data visualization, a four-letter code was used to name the species (**Table 1**). Genomic and transcriptomic data of Mlig, Smed, Gsal, Pxen, Csin, Oviv, Fhep, Treg, Sjap, Sman, Mcor, Hdim, Egra, Emul, and Ssol were obtained from the public repository Wormbase parasite (Howe et al., 2017) (https:// parasite.wormbase.org/). Transcriptomic data on Sleu, Gapp, Pvit, Rros, Mfus, Kamp, and Bsem were generated by Laumer et al. (2015) and downloaded from the public repository Data Dryad (doi:10.5061/dryad.622q4).

#### Orthologues Determination

In-house Perl and Bash scripts that implemented a BLASTp best reciprocal hit strategy were used to identify a core of orthologous genes. An e-value cutoff of 1e-5 was used to define significant hits. The restrictive method produced one orthologue gene per species. A total of 700 orthologous groups were detected in all 22 species, and these sequences were used for the analysis, adding to more than 8 million codons analyzed (8.242.428).

Expression data for *S. mansoni* in reads per kilobase per million (RPKM) were taken from the study of Protasio et al. (2012). Expression data were available for 696 of the 700 *S. mansoni* orthologs. Expression data for the adult stages of *F. hepatica*, *E. granulosus*, *H. diminuta*, *S. mediterranea*, and *M. lignano* were downloaded from WormBase Parasite (Howe et al., 2017).

### Gene Alignment and Phylogenetic Tree

Each group of 22 orthologous sequences were translated with an in-house Perl script and aligned individually with Mafft (Katoh and Standley, 2013). Individual alignments were concatenated into a unique alignment. This alignment was used to build a phylogenetic tree with PhyML (Guindon et al., 2010). PhyML was run with the following options: -b -4, to calculate statistical branch support, -s BEST, for tree topology estimation, -m LG, to indicate the model substitution matrix, and -o tl, for tree topology and branch length optimization.

For the hierarchical clustering based on GC content of synonymous codons (RSCU), several clusters were built using hclust from R Stats package with different option settings (R Core Team, 2019). A final consensus cluster was made with the Ape package (Paradis et al., 2004), which retained the most frequent groupings.

### Codon Usage Analysis

Codon usage and compositional analyzes were done in R with the package seqinR (Charif and Lobry, 2007). Correlations between frequencies of each codon and GC3 were represented as heatmap with the R "Corrplot" package (Wei and Simko, 2017). In-house R scripts were used to evaluate significance of changes in frequencies between high- and low-expressed gene sets and defined preferred codons. A codon was considered "preferred" if its frequency (RSCU) significantly increases in a set of highexpression genes, compared with a low-expression set, regardless of whether it becomes the main codon for that amino acid or not. Correspondence analysis (COA) was performed in R.

#### Neutrality and Effective Number of Codons Plots

Neutrality plots (Sueoka, 1988) (GC3 vs GC12) of the 22 species were used to evaluate the relationship among the three codon positions. Additionally, a unique plot showing general GC3– GC12 for all species was calculated by using a concatenated super gene for each species.

The effective number of codons (ENC) is used to quantify the variation in codon usage, ranging from 20 (when only one codon per amino acid is used) to 61 (when all possible codons are used). GC3 vs ENC charts are useful to estimate selection contribution to CUB. Expected values of ENC based on mutation pressure generate a bell curve, so in these charts, the points that fall directly on the curve represent genes with neutral evolution, while the points under the curve suggest action of natural selection (Wright, 1990).

### Amino Acid and Codon Substitutions Matrices

From the amino acid alignments of each COG, the sites that had gaps in one or more sequences were eliminated. Degapped COGs with less than 35 amino acids were eliminated. The resulting sequences were then concatenated, generating a "super-peptide" (without gaps) for each species. Then, the amino acid changes between each pair of species were counted (with a homemade R script), creating 20 by 20 substitution matrices. Each value of the matrix (AZ,X) represents how many times amino acid Z is present in one species, while amino acid X is present in the corresponding orthologous position in the other species, being, therefore, an asymmetric matrix. The diagonal of the matrix represents the unchanged sites, while the sum of the remaining values in each column or row represents the total substitutions for each amino acid. To test for deviations in the amino acid usage between species, the total count for each amino acid in the species of a pair was calculated, and the average was considered as expected value to perform chi-square tests (**Figure 5**). For each reciprocal changes in the matrix (AZ,X, BX,Z), a chi-square test was performed considering the average of the counts as expected value. For simplicity of analysis, we focused in three comparisons between species with different global GC: cestodes (Hdim and Egra), trematodes (Sman and Fhep), and free-living species (Smed and Mlig). The last comparison involved the two more divergent species in GC content.

Based on the back-translation of the alignments, we generated a 61 × 61 (stop codons deleted) codon substitution matrix for the six selected species following similar procedures as the ones described in the previous section.

#### RESULTS

#### Global GC Composition Varies Across Diverse Flatworm Taxa

As a first approach to analyze if there is a GC compositional difference in the phylum Platyhelminthes, we inspected the difference in the global genomic and transcriptomic G+C content. At first glance, it was clear that there is no correlation between genomic and transcriptomic GC, so it was not possible to use transcript GC to infer genomic GC. In most of the species, transcripts were GC richer than global genomic GC with the only exception of *T.regenti* (**Table 1**). However, while Schistosomatidae species show almost no difference in GC content between the overall genome and the coding region, Cestodes transcripts, for example, were on average 9.7% GC richer than all the genome considered together.

#### GC Composition Varies Across Diverse Flatworm Taxa

To further analyze the GC composition in the coding region, we searched for a set of orthologous conserved genes in the available genomes and transcriptomes. Based on a best reciprocal hit BLAST search, we selected 700 orthologous genes present in the 22 species. A maximum likelihood tree confirmed that the orthologous groups strongly represented the accepted phylogeny of the species analyzed (organisms form the same group cluster together, with the only exception of the two Monogenea, which represent two distinct subclasses) (**Figure 1A**). Since GC varies across the different species, we calculated the relative synonymous codon usage (RSCU) and the GC values by codon position in this set of conserved genes. The clustering of the species based on the relative synonymous codon usage (RSCU) data showed an important reorganization respect to the phylogenetic tree (**Figure 1B**). Three main clusters were clearly appreciable: the first with high GC3 (with free-living species), the second with low GC3 values (including other free-living species and the schistosomatids), and a third with intermediate GC3 values. GC2 was the less variable between groups (0.42, 0.38, and 0.41 on average in groups 1, 2, and 3, respectively). Additionally, we noticed three subgroups within group 3: one that had lower GC1–2 than the rest but had high GC3 composed only by the monogean *G. salaris*; the cluster of *P. xenopodis, P. vittatus, M. fusca*, and the cestode *H. diminuta* that had lower GC3; and the subgroup composed by trematodes (*F. hepatica*, *O. viverrini*, *C. sinensis*) and cestodes (*S. solidus, M. corti, E. multilocularis,*  and *E. granulosus*) that had higher G1-3 compared with other species of the group. This shows that global synonymous codon usage varies widely across the phylum.

The relation of GC values in 1st and 2nd position versus those presented in 3rd codon position (neutrality plot) is usually used to evaluate if the variations in codon usage are driven by mutation or selection. Neutrality plots for the 22 organisms based on the 700 orthologue genes were analyzed. In all cases, low slopes were found for the regression curve (maximum value of 0.2) (**Supplementary Figure 1**). Careful inspection of the plots indicates that this can be explained by a low variability of GC1/2 among the genes (ranges between 0.4 and 0.6). These results suggest a contribution of selection in shaping codon usage for these organisms. To visualize all species together, we plotted the GC1–2 versus GC3 of the concatenated orthologue groups (COGs) for each species (see Methods). Expectedly, while GC1–2 presented little variation across species, the best discriminator was variation at GC3 (**Figure 2**). For example, between the most GC biased genomes, the GC-poor *S. mediterranea* and the GC-rich *M. lignano*, there was only 10% variation in GC1–2 axis but 40% variation in GC3. Interestingly, based on GC3 variability, we found two distinctive clusters within the freshwater "turbellarians," trematodes and cestodes. The trematodes species clearly differentiated the blood-dwelling flukes grouped on the lowest side of the GC3 spectrum to the food-borne liver flukes allocated in the middle upper GC3 range. Similarly, cestodes tend to cluster on the upper side of the GC3 range with the exception of the Hymenolepidae that fall on the lower-middle of the GC range. Freshwater "turbellarians" showed the largest variability in both the GC12 and GC3 range grouping into very distant clusters. However, we found no clear evolutionary-GC content correlation as species belonging to different lineages were mixed in both groups.

values are indicated for each node. (B) Hierarchical clustering based on GC content of relative synonymous codons (RSCU). Histograms reflect overall GC content in all orthologues (gray bar) and by codon position (black bars). GC3 is the most variable between species.

### GC Bias in Coding Sequences Affect Codon Usage

Codon usage bias is a general feature of genomes that has been widely associated with GC content (mutational bias) and natural selection (Bernardi and Bernardi, 1985; Sharp et al., 2010; Plotkin and Kudla, 2011). In this context, we decided to study its extent and its relationship with the genomic GC frequency discussed in the previous section. To this end, heatmaps were plotted to visualize the correlations between GC3 and codon usage (Palidwor et al., 2010). While codon usage bias is observed for all species, the more compositionally skewed organisms (the three plots on the right) show more intense correlations, indicating that the phenomena are stronger in these organisms as might be expected. Also, in most cases, the correlation values were positive for GC-ended codons and negative for the AT-ended ones (**Figure 3**). To further characterize this relationship, the distribution of the frequency of synonymous codons was analyzed for all organisms. A dramatic split of GC- vs AT-ended codons is observed in the species with the more biased GC genomes as the AT-rich model trematode *S. mansoni*. Notably, the split is seen in opposite directions in the free-living flatworms *S. mediterranea*  and *M. lignano* that are at the extremes of the GC distribution (**Figure 4**). A less marked but significant difference is seen within the cestodes consistent with a more balanced GC content, a feature confirmed in the analysis of the 22 species across flatworm diversity (**Supplementary Figure 2**). These observations suggest that genome-wide mutational bias is a major contributor to the observed codon frequency profiles for each organism.

The GC3 vs ENC charts for the analyzed species (**Supplementary Figure 3**) show a combined contribution of selection and mutation for most of the species supporting the trends observed previously, while heavily biased genomes of Schistosomes fall on the curve, suggesting a strong effect of mutational bias.

### Differential Codon Usage Is Associated With Expression Levels

While mutation bias influences codon usage in a genome-wide fashion, selection may also act on coding sequences to select for specific codons. This theory predicts that more frequent codons are actually more efficient and/or more accurate during translation of

the mRNA (Sharp et al., 2010; Plotkin and Kudla, 2011). To test if this phenomenon is observed in flatworms, steady-state mRNA levels for the available species were collected to differentiate high- and low-expression genes. As shown in **Figure 5**, where the two main components of a PCA of codon usage for *S. mansoni* are plotted, high- and low-expression genes do present a distinct usage profile. Interestingly, when comparing the 10% higher- and lower-expressed genes in the adult stage, a preference for using GC-rich codons is observed narrowing the distribution in the highly expressed genes and extending it in the lowly expressed (**Supplementary Figure 4**). Even though these results may be explained by biased repair mechanisms acting on highly transcribed sequences, this result is also compatible with translational selection acting on these genes to drive the observed bias.

#### Amino Acid Usage Is Also Biased in Diverse Flatworm Lineages

The strong bias observed in codon usage is expected to be associated with the 3rd codon position allowing synonymous changes. However, variations might also exist at the amino acid level (Li et al., 2015). To investigate this, we analyzed the amino acid usage within the set of 700 orthologue genes in pairs of species. Subtle but significant differences in the amino acid frequencies can be detected in cestodes and trematodes mainly involving the amino acids encoded by AT-rich [Ile (AUR), Asn (AAY), Lys (AAR)] or GC-rich [Arg (CGN), Ala (GCN)] codons (**Figure 6** and **Supplementary Table 1**). The variations are more pronounced in the comparison of the freeliving species, and in all the cases, the variation follows the GC trend of the species. Since these results are based on a set of orthologue genes, the variations in amino acid frequencies indicate that not only synonymous changes account for the variability observed but also non-synonymous substitutions are taking place.

#### Synonymous and Non-Synonymous Substitutions in Conserved Orthologous Genes

We decided to investigate if particular directional changes could be detected when analyzing orthologous positions in the three

FIGURE 3 | Heatmaps of correlations between codon usage and GC3. Correlation values are coded according to the color scale depicted in the side bar (blue = positive, red = negative). Color intensity and the size of the rectangle are proportional to the correlation coefficients. Black squares: ATG (Met), TGG (Trp), and STOP codons. Organism name abbreviations are as in Table 1.

and p values are indicated.

paired species. For this, we selected the ungapped regions of each pairwise alignment of orthologues and generated substitution matrixes based on the aligned orthologous positions. Amino acid conservation was generally high, with tryptophan (Trp) and glycine (Gly) as the more conserved residues in all the species, confirming Ile, Ser, Ala, Asn, and Arg as the more variable (**Supplementary Table 2**). Expectedly, the most frequent changes involved amino acids with similar properties (**Figure 7A**), particularly those involving aliphatic and hydrophilic residues. However, several reciprocal changes showed significant differences in the counts (**Figure 7B**). Similar effects can be observed in the other pairwise comparisons (**Supplementary Figure 5**).

To gain further insights into these phenomena, we evaluated the substitutions at the codon level generating substitutions matrices for the three pair of species selected (**Supplementary Table 3**).

The conservation at codon level as expected was much lower with a strong component of synonymous changes (**Supplementary Table 3**). The lower frequency of GC-rich codons in *S. mansoni* (depicted in **Figure 4**) is explained by a marked increase of synonymous substitutions toward AT-rich codons (**Figure 8**). Similarly, in *M. lignano*, synonymous substitutions toward GC-rich codons are associated with reduced AT codon counts (**Supplementary Figure 6**).

The analysis of the non-synonymous changes at the codon level showed an increased complexity (**Supplementary Table 3**). One striking feature is that amino acid changes involving two substitutions are more common than those explained by simple substitutions. A detailed example is presented in **Figure 9**.

Ala is a relatively GC-rich codon (GCN) that is frequently substituted by the more GC neutral Ser (TCN + AGY) and vice versa (**Figures 7** and **9**). When this substitution takes place, it is expected that the GCN codon would change for the corresponding TCN variant, i.e., that GCA would turn into TTA and GCG into TCG. In 49 positions in the alignment, a GCG coding Ala is present in *F. hepatica*, while Ser codons are present in *S. mansoni* (second row). The simple transversion GCG to TCG is underrepresented with only six occurrences,

while the changes toward TCA and TCT are more abundant (23 and 15 occurrences, respectively). Similarly, the GCC to TCC transversion (third row) represents only a 15% of the Ala (GCC) changes to Ser, while the more AT-rich variants (TCA and TCT) represent more than 56% of the substitutions. Notably, when the AT-ending Ala codons (GCA and GCT, 1st and 4th rows) are substituted, the more common codon is the one expected by a single 3rd position change.

The reciprocal Ser (in *F. hepatica*) to Ala (in *S. mansoni*) changes are less common (169 times vs 263 Ala to Ser), but again, in 77% of the cases, they are enforced to 3rd position T or A irrespective of the original Ser codon. Similar effects can be seen when analyzing other amino acid changes (**Supplementary Table 3**) and particularly in those regarded as significant (from **Figure 7**) as Ile to Val or Lys to Arg (**Supplementary Figure 7**).

Taken together, these results are strongly suggestive of a combined effect of mutation and selection in order to maintain both the compositional GC skew of the species and the property of the coded amino acid. In other words, whenever an amino acid change occurs through a simple substitution, this is then rapidly switched to those that follow the GC of the species.

#### DISCUSSION

Platyhelminthes classes show a wide range of GC composition, even within groups. Our results show that GC3 content explains most of the observed variability in the codon usage as reflected by the variation in the RSCU values. Based on GC3 variability, we found different clusters within the free-living species, trematodes, and cestodes. This can be clearly seen when the species tree is compared with the tree representing codon usage similarity. Indeed, Platyhelminthes show great differences between both trees, while this phenomenon is not seen in other models as different as bacteria and hexapoda (Behura and Severson, 2012; Dilucca et al., 2018). A similar study in nematodes show a comparably wide distribution of GC values, although the variations are more consistent with the phylogeny (Cutter et al., 2006). These results suggest more recent and strong compositional shifts for these groups of organisms. Further work is needed to explain this particular phenomenon in flatworms.

Codon usage bias is a generalized feature of the genomes of many organisms that is deeply influenced by evolutionary phenomena and results basically from the balance between mutational bias and natural selection (see Plotkin and Kudla, 2011, for a review). To assert the relative influence of these two factors, a plot of GC1–2 vs GC3 for all the species taken together was generated. This "pseudo" neutrality plot shows a slight slope, indicating that GC3 shows a different behavior when compared with GC1–2, a result that is generally taken as evidence that selection is acting to shape codon usage (Sueoka, 1988). Furthermore, this plot shows clearly distinctive clusters within trematodes, cestodes, and free-living species based on GC content. Notably, these differences seem to blurrily reflect the diversity of lifestyles and niches of the diverse flatworms. The observed differences question the use of single species as a model for each class; a clear demonstration of this is the differences between the model *S. mansoni* and other trematodes observed in the boxplot of **Figure 4**. Codon usage bias in flatworm mitochondrial genomes has also been reported (Le et al., 2004; Mazumder et al., 2018). Even though considering genic and genomic large differences, they may follow evolutionary pressures independent from the nuclear genomes.

Interestingly, a similar study across nematodes found robust evidence for selection on codon usage bias in free-living species, a feature found marginally in parasitic ones, and particularly in the most compositionally biased (Cutter et al., 2006). The association of selective bias in free-living or parasitic species is not clear-cut in the case of flatworms, which might be reflecting diverse evolutionary strategies.

In agreement with the hypothesis of translational selection driving synonymous codon usage bias, we observe a clear association of gene expression levels with codon usage where highly expressed genes are rich in GC-rich codons, while the opposite is observed for low-expression genes. Similar results have been previously reported for cestodes, among others (Chen et al., 2013; Yang et al., 2014; Yang et al., 2015; Huang et al., 2017; Maldonado et al., 2018). Even in highly AT-biased genomes—as observed for the schistosomes—the GC content

of highly expressed genes is relatively high when compared with that of the general trend. It is worth to mention that the bias in repair mechanisms of actively transcribed DNA has also being proposed to explain this observation.

The observed differences in CG and codon usage among these organisms are also reflected in the amino acid composition. Recently, Li et al. (2015) show the strong relationship of synonymous codon usage and differential amino acid usage, using a strategy based on classifying amino acids in three groups (high, medium, and low GC content) according to the GC composition of their corresponding codons. Our results on amino acid frequencies in the different flatworm species are consistent with these observations.

Furthermore, when orthologues positions are considered, we mainly observed amino acid substitutions conservative of the physicochemical properties as would be expected. However, these changes frequently involve codons of completely different GC content that follow the differences observed in general GC content of the genomes, i.e., Ile (AUH) vs Leu (CCN, UCR). In this way, AT-rich genomes accumulate changes to amino acids in the low GC group, while the opposite is observed in GC-rich genomes.

Remarkably, when the frequency of a certain amino acid substitution is not reciprocal between two given organisms, the amino acids involved belong to the different groups defined by Li et al. (2015). An interesting case is observed for the Lys to

*F. hepatica.* Detail of the substitution matrix of Supplementary Table 3 of the changes involving Ala and Ser (*S. mansoni* codons in columns, *F. hepatica* in rows). Note the lower-than-expected counts of changes toward GC3-rich codons and the enrichment in synonymous substitutions that imply two substitutions.

Arg substitution. Even though these amino acids have similar physicochemical properties, they belong to opposite groups, being Lys coded by the most AT-rich group of codons, while Arg is on the highest GC content side.

A detailed analysis of the non-synonymous changes showed a higher-than-expected frequency of codon changes involving two nucleotides. This is paralleled by a marked reduction in the counts of the expected codon substitutions involving a single change. A plausible explanation for this phenomenon is offered by a two-hit mechanism, providing a clear example of the combined effect of mutation and selection. The two-hit hypothesis proposed implies that when a mutation changes the coded amino acid, this non-synonymous substitution is rapidly adapted to the general GC content of the genome by a second synonymous change.

#### REFERENCES


### CONCLUSIONS

GC bias has a great influence on synonymous codon and amino acid usage across Platyhelminthes, a feature not shared by all metazoans. Both free-living and parasitic species show the phenomena, and no clear correlation with lifestyles or evolutionary closeness is evident so far. The changes introduced by GC bias impact not only in synonymous codon usage but also in amino acid frequencies. The evidence so far suggests that both mutation and selection are acting to shape the coding strategies of the diverse flatworms.

### DATA AVAILABILITY

All datasets generated for this study are included in the manuscript/supplementary files.

### AUTHOR CONTRIBUTIONS

GL and SF performed the bioinformatics analysis and contributed in writing the manuscript. GR performed bioinformatics analysis. PS and JT participated in the design of the study and the interpretation of data, drafting the manuscript, and critical revision of its content. All authors read and approved the final manuscript.

### FUNDING

Comisión Sectorial de Investigación Científica-Universidad de la República (CSIC-UdelaR), Uruguay. Award number: I+D16-516. Specific budget for publication was included in the project.

SF, GL, PS, and JT are researchers of Programa de Desarrollo de las Ciencias Básicas (PEDECIBA) and members of the SNI program of the Agencia Nacional de Investigación e Innovación (ANII) program. GR received a postgraduate scholarship from ANII.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene.2019.00771/ full#supplementary-material.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2019 Lamolle, Fontenla, Rijo, Tort and Smircich. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Males, the Wrongly Neglected Partners of the Biologically Unprecedented Male–Female Interaction of Schistosomes

*Zhigang Lu1,2, Sebastian Spänig3, Oliver Weth2 and Christoph G. Grevelding2\**

*1 Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, United Kingdom, 2 Insitute for Parasitology, BFS, Justus Liebig University Giessen, Giessen, Germany, 3 Department of Mathematics & Computer Science, University of Marburg, Marburg, Germany*

#### *Edited by:*

*Jose F. Tort, University of the Republic, Uruguay*

#### *Reviewed by:*

*Jianbin Wang, University of Colorado Denver, United States Patrick Skelly, Tufts University, United States*

#### *\*Correspondence:*

*Christoph G. Grevelding Christoph.Grevelding@vetmed.unigiessen.de*

#### *Specialty section:*

*This article was submitted to Evolutionary and Genomic Microbiology, a section of the journal Frontiers in Genetics*

*Received: 05 April 2019 Accepted: 30 July 2019 Published: 06 September 2019*

#### *Citation:*

*Lu Z, Spänig S, Weth O and Grevelding CG (2019) Males, the Wrongly Neglected Partners of the Biologically Unprecedented Male–Female Interaction of Schistosomes. Front. Genet. 10:796. doi: 10.3389/fgene.2019.00796*

Schistosomes are the only platyhelminths that have evolved separate sexes, and they exhibit a unique reproductive biology because the female's sexual maturation depends on a constant pairing contact with the male. In the female, pairing leads to gonad differentiation, which is associated with substantial morphological changes, and controls among others the expression of gonad-associated genes. In the male, no morphological changes have been observed after pairing, although first data indicated an effect of pairing on gene transcription. Comprehensive transcriptomic approaches have revealed an unexpected high number of genes that are differentially transcribed in the male after pairing. Their identities suggest roles for the male that are not restricted to feeding and enhanced muscular power to transport paired female and, as assumed before, to induce its sexual maturation by one "magic" factor. Instead, a more complex picture emerges in which both partners live in a reciprocal sender-recipient relationship that not only affects the gonads of both genders but may also involve tactile stimuli, transforming growth factor β signaling, nutritional parts, and neuronal processes, including neuropeptides and G protein-coupled receptor signaling. This review provides a summary of transcriptomics including an overview of genes expressed in a pairing-dependent manner in schistosome males. This may stimulate further research in understanding the role of the male as the recipient of the female's signals upon pairing, the male's "capacitation," and its subsequent competence as a sender of information. The latter process finally transforms a sexually immature, autonomous female without completely developed gonads into a sexually mature, partially non-autonomous female with fully differentiated gonads and enormous egg production capacity.

Keywords: schistosomes, male–female interaction, transcriptomics, pairing-dependent gene expression, TGF**β** signaling, neuropeptide, G protein-coupled receptor

### SCHISTOSOMES AND THE MALE–FEMALE INTERACTION

Schistosomes are parasitic platyhelminths causing schistosomiasis (bilharzia), an infectious disease of worldwide importance for humans and animals. The World Health Organization has listed schistosomiasis as one of the neglected tropical diseases. More than 200 million people required preventive treatment in 2016 (World Health Organization, 2019; McManus et al., 2018). This disease is

**60**

most prevalent in Africa but occurs also in Asia and South America due to the presence of tropical water snails as intermediate hosts, which prefer warm (sub)tropical habitats. Unexpectedly, there is recent evidence of an autochthonous site also in southern Europe (Corsica), where the snail-host species occurs due to moderate climate conditions (Boissier et al., 2016).

As a waterborne disease, schistosomiasis affects humans and animals exposed to water infested with cercariae, the infectious larval stage originating from snails. Some schistosome species comprise zoonotic potential, which increases the risk of infection (Standley et al., 2012). Together, these facts make disease control difficult and contribute to an additional, socioeconomic problem (Garchitorena et al., 2017).

The pathology of schistosomiasis is triggered by eggs that paired females deposit in the bloodstream of vertebrate hosts. These eggs eventually lodge in organs such as the liver where they cause inflammation and fibrosis (Olveda et al., 2014; McManus et al., 2018). The prerequisite for egg production is the complete development of the female gonads. This, however, is only achieved if a constant pairing contact with a male has been established. To this end, the female resides within a ventral groove formed by the male, the gynecophoral canal. This close liaison can last over years, an exceptional phenomenon in nature (Basch, 1991; Grevelding, 2004). Males play a pivotal role in controlling schistosome reproduction by inducing mitoses and differentiation processes in the reproductive organs (ovary and vitellarium, the latter providing vitelline cells for the production of mature eggs) of the paired female, a process that comes along with a significant increase of its body size (Popiel and Basch, 1984a; Den Hollander and Erasmus, 1985; Kunz, 2001). Pairing even controls the expression of female-specific expressed genes with functions in the vitellarium (LoVerde and Chen, 1991; Grevelding et al., 1997). Although the molecular consequences of pairing on females have been a strong focus of basic research (Hoffmann, 2004; LoVerde et al., 2009; Beckmann et al., 2010), pairing-dependent processes in males are somewhat neglected.

### A HISTORICAL SNAPSHOT OF THE MALE SCHISTOSOMES' PERSPECTIVE

With respect to their sexual biology, schistosome males appear "ready to go." Being independent of pairing, they possess fully developed testes and seminal vesicles filled with differentiated sperms as confirmed by morphological analyses (Neves et al., 2005; Beckmann et al., 2010). Sperm was excluded as a factor inducing the sexual maturation of the female, and irradiated or surgically manipulated males lacking testes were still capable of mating and inducing sexual maturation in paired females as well as egg production (Armstrong, 1965; Michaels, 1969). Alternatively, the male was proposed to deliver specific molecules during pairing that supervise body length and the sexual maturation of the female (Armstrong, 1965; Basch and Basch, 1984), which depends on the pairing status; unpaired females are infertile because their reproductive organs have not fully developed (Neves et al., 2005; Beckmann et al., 2010). Furthermore, female sexual maturation was hypothesized to be a consequence of local activities of molecules (Michaels, 1969; Popiel and Basch, 1984b). In addition, a tactile impulse was proposed (Basch and Basch, 1984). Finally, a malesecreted hormonal factor(s) was suggested to be transferred to the female (Ruppel and Cioli, 1977; Shaw et al., 1977; Atkinson and Atkinson, 1980; Basch and Nicolas, 1989). However, none of these hypotheses resulted in the identification of the "magic male factor".

From the metabolic perspective, glucose and cholesterol were demonstrated to be delivered by the male during pairing, and it was hypothesized that nourishment contributes to female sexual maturation (Conrford and Huot, 1981; Cornford and Fitzpatrick, 1985; Haseeb et al., 1985; Silveira et al., 1986). The most persuasive evidence for an important player in the game resulted from studies about the gynecophoral canal protein (GCP). First detected in adult *Schistosoma mansoni*, *Sm*GCP was identified as a glycoprotein putatively transferred from the male to the female (Gupta and Basch, 1987) and later, by immunolocalization, to be widely distributed on the surface of a paired female (Bostic and Strand, 1996). Structurally, *Sm*GCP lacks a transmembrane domain but reveals short, conserved repeat regions with sequence similarity to fasciclin I, a neuronal cell-adhesion protein. In males, *Sm*GCP expression appeared to be limited to the gynecophoral canal region, the mating partners' interface. Furthermore, *Sm*GCP seemed to be down-regulated in unpaired males. These findings suggested that *Sm*GCP is diffusible and delivered by the male during a pairing contact (Bostic and Strand, 1996). Indeed, results of a subsequent study in *Schistosoma japonicum* indicated the importance of *Sj*GCP for pairing. RNA interference experiments against *Sj*GCP resulted in reduced pairing stability *in vitro* and *in vivo* (Cheng et al., 2009). Finally, evidence was found for the regulation of *Sm*GCP *via* transforming growth factor β (TGFβ)-dependent signaling in *S. mansoni* (Osman et al., 2006). Although the biochemical activity of GCP has not yet been clearly addressed, there is accumulating evidence for its participation in male–female interaction.

In earlier studies, the DNA synthesis marker [3 H]thymidine was used in incorporation assays with females *in vitro* to determine the mitosis rates dependent on the pairing. Comparing females paired in the presence of thymidine to either pairing-experienced males (bM, bisex males) or pairing-inexperienced males (sM, single-sex males) demonstrated that maturity is decisive. To induce mitogenic activity in females, sM required a significantly longer mating period (≥24 h) than bM (Den Hollander and Erasmus, 1985), which stimulated mitogenic activity in females within the first 24 h of pairing. This early study already pointed toward bidirectional communication between the partners during the initial phase of pairing. Furthermore, this result suggests that males have to pass through a process of capacitation before they acquire competence to supervise female sexual maturation — part of which is the induction of mitoses (Knobloch et al., 2002).

### TRANSCRIPTOMIC PERFORMANCE OF MALE SCHISTOSOMES

During the last 15 years, *omic* studies have allowed unprecedented insights in the life processes of a great variety of organisms (Weissenbach, 2016), including schistosomes (Verjovski-Almeida et al., 2003; Hu et al., 2003; Berriman et al., 2009; Schistosoma japonicum Genome Sequencing and Functional Analysis Consortium, 2009; Protasio et al., 2012; Young et al., 2012; Anderson et al., 2015; Smit et al., 2015; Cai et al., 2016; Sotillo et al., 2017; Wang et al., 2017; Giera et al., 2018). Whereas the majority of these studies applied RNA-seq techniques, microarray analyses and serial analysis of gene expression (SAGE/SuperSAGE) were alternatively used. Among others, these techniques were also applied to compare bM and sM. One SAGE-based approach found differential regulation for transcripts contributing to developmental processes, metabolism, and the redox system (Williams et al., 2007). Even before the genome project was finished, an early microarray analysis found 30 genes to be exclusively transcribed in bM and 66 in sM (Fitzpatrick and Hoffmann, 2006). The identities of these differentially expressed genes indicated their involvement in RNA metabolic processes, which was independently supported in another microarray study (Waisberg et al., 2007). In another approach combining SuperSAGE (a second-generation SAGE technique allowing the identification of longer RNA sequence tags) and microarray analyses, corresponding data sets were produced to get a comprehensive overview of genes differentially transcribed between bM and sM. Among 6326 sense transcripts detected by both analyses, 29 were found to be significantly differentially transcribed (Leutner et al., 2013). Besides differences in the transcript levels of genes involved in metabolic processes, evidence was obtained for additional differences in neuronal processes and TGFβ signaling. In this context, a *S. mansoni* ortholog of follistatin (*Sm*Fst; Smp\_123300) was found to be differentially transcribed with an interesting bias toward sM. The latter was independently confirmed by a subsequent RNA-seq approach including paired and unpaired adults and their gonads (Lu et al., 2016; Lu et al., 2017) as well as by independent quantitative polymerase chain reaction (qPCR) analyses (Leutner et al., 2013; Haeberlein et al., 2019). Based on the corresponding results from microarray, SuperSAGE, RNA-seq, and qPCRs, all exhibiting higher transcript occurrence in sM, *Sm*Fst is probably the most intensively studied gene with respect to expression profiling. Thus, it can be used in future studies as a marker for differential transcription in bM versus sM (Haeberlein et al., 2019). Follistatins are known antagonists in TGFβ signaling pathways and block ligands of the TGFβ family such as TGFβ, activin, and bone morphogenetic protein (BMP) (Massagué and Chen, 2000; Moustakas and Heldin, 2009; Heldin and Moustakas, 2016). The first characterization of *Sm*Fst showed testes localization (by *in situ* hybridization). By yeast two-hybrid analyses, an interaction potential with *S. mansoni* orthologs of the TGFβ ligands *Sm*InAct and *Sm*BMP was found. The agonists colocalized with *Sm*Fst in the testes (Leutner et al., 2013). These results suggest that TGFβ signaling also plays a role in malefemale interaction and is part of the bidirectional communication between both genders. As such, *Sm*Fst could represent one of several competence factors of males expressed in response to pairing. Indeed, a recent *in vitro* study with paired, separated, and re-paired males demonstrated an immediate influence of pairing on the on/ off transcriptional status of *Sm*Fst (Haeberlein et al., 2019). This finding adds to previous hypotheses that TGFβ signaling is involved in pairing-dependent reproductive processes in schistosomes (LoVerde et al., 2007; Buro et al., 2013). One role of *Sm*Fst in sM might be the prevention of the activation of one or more of its TGFβ pathways before pairing. Figuratively seen, *Sm*Fst in sM appears like a systemic handbrake of a specific, male competence-related biological driving route needed after pairing to reach maturation – a hypothesis that awaits corroboration.

Today, RNA-seq represents the state-of-the-art technology for transcriptome analysis providing both a wide analytical range and the quantification of study samples. Theoretically, RNA-seq can cover all transcripts of a biological sample, an advantage over microarrays or SAGE/SuperSAGE (Marioni et al., 2008). Recently, RNA-seq was applied for comparative transcript profiling in paired and unpaired adult *S. mansoni* and their gonads. Of more than 7,000 transcripts detected in the gonads, 243 (testes) and 3,600 (ovaries) were transcribed in a pairing-dependent manner. In addition to genes preferentially or specifically transcribed in adults and gonads of both genders, evidence was obtained for pairing-dependent processes in the gonads affecting genes with, for example, stem cell-associated functions. This was particularly expected for females due to the pairing-induced differentiation processes in the vitellarium and the ovary (Erasmus, 1973; Shaw, 1987; Kunz, 2001; Neves et al., 2005; Beckmann et al., 2010). Remarkably, from their annotation, many differentially transcribed genes appeared to be involved in neuronal processes. This perception substantiated one of the results of the combinatory SuperSAGE/microarray approach comparing sM and bM transcript profiles (Leutner et al., 2013).

#### "NERVOUS" MALE SCHISTOSOME

One objective of human neuroscience is to understand how neuronal circuits direct behavior, how humans perceive the world, how they learn from experience, how memory works, how movements are directed, and how communication is realized (National Research Council (US) Committee on Research Opportunities in Biology, 1989). The basis for integrating all these interactions and requirements *via* neuronal circuits was laid in evolution. In principle, similar objectives also apply to schistosomes. From the male's perspective, the questions are (i) how does it behave within the final host perceiving its environment to organize migration to target locations such as the portal system of the liver, (ii) how does it find its mating partner, (iii) how does it "learn" from a first pairing experience (capacitation and gaining competence), (iv) how does it move from the liver further on to the mesenteric veins in the gut area while carrying its mate inside the gynecophoral canal, and (v) how does the male "communicate" with its partner during this process and later on after reaching the final destination to organize large-scale egg production and longevity? It appears obvious that molecular communication at different parallel levels is part of the answer, and all transcriptomics data obtained thus far are in favor of this assumption. Data analyses have indicated among others that regulatory RNAs (Cai et al., 2016; Zhu et al., 2016) and kinase activity (Grevelding et al., 2018) but also neuronal regulation are involved (Cai et al., 2016; Wang et al., 2017; Hahnel et al., 2018). In an RNA-seq analysis of *S. mansoni* (Lu et al., 2016; Lu et al., 2017), 39 genes with potential function in neuronal processes (Berriman et al., 2009) were identified to be transcribed in the adult stage with varying transcript levels in whole worms as well as gonads and, to a large extend, in a pairing-dependent manner (**Figure 1A**). Many of these genes were found to be preferentially transcribed in bM and sM but also in unpaired females (sF, single-sex females) (Lu et al., 2016). Remarkably, the transcript levels of 64% of these genes decreased in mature females (bF, bisex females) after pairing. Similarly, transcripts of some genes detected in ovaries (bO, ovaries of bF; sO, ovaries of sF) and testes (bT, testes of bM; sT, testes of sM) occurred in a tissue-preferential and/or pairing-dependent manner. This included genes with functions in neuronal stem cells such as (i) an ortholog of IRX6 (Smp\_149230), which is a homeodomain transcription factor of the iroquois family known to regulate interneuron development (Star et al., 2012) as well as germ-cell maturation in gonads (Kim et al., 2011), and (ii) a neuroglian ortholog (Smp\_176350), possibly involved in regulating neuronal circuits (Boerner and Godenschwege, 2010). Of these, a neuroglian ortholog was also listed as a gene showing male-biased transcript occurrence in *S. japonicum* (Cai et al., 2016).

Additional support was obtained from studies about neuropeptidergic signaling, which was discussed playing fundamental roles in flatworm locomotion, feeding, host finding, regeneration, and reproduction (McVeigh et al., 2005; McVeigh et al., 2009; Collins et al., 2010). In an *in silico* analysis, 46 potential flatworm neuropeptides emerging from 32 neuropeptide precursors (npps) have been predicted (Koziol et al., 2016). Of these, transcripts of 7 *npp* genes were localized in the protoscolex of *Echinococcus multilocularis* and appeared to be expressed in the nervous system. In another study, RNA-seq data of the schistosome esophagus area showed that transcripts of one of the *npps* (*Sm*\_*npp*\_20a; Smp\_088360 with transcripts enriched in males and unpaired females but not in gonads; **Figure 1B**) are enriched in the head part of male worms (Wilson et al., 2015). Furthermore, transcriptional

profiling comparing *S. mansoni* female head and tail showed that 23 of 27 listed *Sm*\_*npps* expressed in adult *S. mansoni* (**Figure 1B**) preferentially localized in the head part (Wang and Collins, 2016). Therefore, we assume that these NPPs are enriched in the head, possibly in the nervous system. Transcript levels of almost all 27 *Sm*\_*npps* dominated in bM, sM, and sF (**Figure 1B**). Although transcript levels may not be representative for protein levels and/ or protein half time, and although also a low amount of protein can be of high cell biological and/or physiological importance, it is tempting to speculate that the comparatively reduced *Sm\_npp* transcript levels in bF may point to a lower importance of these neuropeptides (and associated neuronal processes) for females after pairing. In contrast, females before pairing may have different physiological requirements comprising more neuronal and further processes. This view is in line with previous studies concluding that, from a transcriptomic point of view, schistosome females express divergent gene repertoires regulated by pairing (Fitzpatrick and Hoffmann, 2006) and unpaired females are much closer related to males than to paired females (Lu et al., 2016; Grevelding et al., 2018). In view of the male–female interaction of *S. mansoni*, these results also suggest that neuropeptide-mediated regulation circuits are more active in males than in paired females, which may point to a higher importance of neuronal processes for males. Interestingly, a similar tendency with respect to *npp* gene transcription was found in a study about male and female *S. japonicum* (Wang et al., 2017). Among others, the authors investigated different time points throughout the sexual developmental, from pairing to maturation. This included day 16 after final-host infection, when pairing starts, to day 28, when paired females produce eggs. Based on a search for *npp* orthologs of *S. japonicum* in WormBase Parasite (https:// parasite.wormbase.org) and looking for their transcript profiles within the data set provided by Wang et al. (2017), a clear tendency can be registered for a reduction of *Sj\_npp* transcript levels in females after pairing from day 18 on. In contrast, the transcript levels of these *Sj\_npps* remained at a constant level or, in one case (Sjp\_0097680, potential ortholog of Smp\_052880), increasing level from day 18 on (**Supplementary Data 1**). Figuratively, one could think that schistosome females "hand over" the responsibility for maintaining most of the neuronal circuits involving neuropeptide signaling to their partners after establishing the pairing contact. Indirect support for this assumption was obtained by the analysis of *S. mansoni* G protein-coupled receptors (GPCRs), of which some might represent *Sm*\_*npp* targets. Based on the RNA-seq data, a comparative analysis of the GPCR*ome* generally revealed a pattern of transcriptional activity for the majority of the investigated GPCRs that resembled the patterns of the majority of *Sm*\_*npps*: compared to bF, transcripts of these GPCRs occurred in a higher abundance in sM, bM, and sF (Hahnel et al., 2018). Preliminary data of a deorphanization approach to uncover GPCR-neuropeptide interaction are in support of the assumption that some of the *Sm*-*npps* and GPCRs correspondingly regulated in the mentioned way may indeed interact (Weth et al., *in preparation*).

Finally, a detailed analysis of transcript levels of two additional genes involved in neuronal processes fits into this scenario. These genes are *S. mansoni* orthologs of a dopa decarboxylase/ tyrosine decarboxylases (*Sm-tdc*, Smp\_135230), involved in neurotransmitter metabolism (De Luca et al., 2003), and *ebony*

(*Sm-ebony*, Smp\_158480), a gene controlling neurotransmitter inactivation (Richardt et al., 2003; Hartwig et al., 2014). Transcript profiling by RNA-seq showed high transcript levels of *Sm-tdc* in bM compared to sM and no transcripts in bF or sF. The meta-analysis of all life stages showed that *Sm-tdc* is stage- and gender-specific expressed in males, being significantly up-regulated after pairing (Lu and Berriman, 2018). Similarly, the amount of *Sm-ebony* transcripts dominated in bM but also in sF, whereas sM and bF showed significantly reduced transcript levels. The meta-analysis showed stage-specific expression in adults with a significant up-regulation in bM and sF (Lu and Berriman, 2018). A comprehensive quantitative reverse transcription-PCR analysis with RNA from males after pairing, separation, and re-pairing *in vitro* finally confirmed the direct influence of pairing on the transcript levels of both genes with a clear bias toward bM (Haeberlein et al., 2019). With respect to *Sm-tdc* and *Sm-ebony*, similar pairing-influenced transcript patterns were found for the orthologs, *Sj*AADC and *Sj*-*ebony*, of *S. japonicum*. Wang et al. (2017) observed an increase of *Sj*AADC (Sjp\_0075370) and *Sj*-*ebony* (Sjp\_0068110) transcripts in males after pairing, whereas in females transcript levels of these genes remained constant at a low level after pairing. The localization of *Sj*AADC transcripts at the gynecophoral canal region (Wang et al., 2017), the interface between paired male and female schistosomes, further supports the view that neuronal processes may govern at least part of the male-female interaction.

### OVERVIEW OF GENES DIFFERENTIALLY TRANSCRIBED IN PAIRED VERSUS UNPAIRED MALES

Based on the existence of three transcriptomics data sets for schistosome males, an overview was generated about all genes commonly found to be significantly differentially transcribed between bM and sM. These data sets were independently obtained by different methods with varying pros and cons depending on the technical basis of these methods (microarray, SuperSAGE, and RNA-seq; for details, see Leutner et al., 2013; Lu et al., 2016; Lu et al., 2017). However, these data sets were produced with RNA of the same origin, a Liberian strain of *S. mansoni* (Grevelding, 1995).

After merging these three data sets, transcripts of 5352 genes were detected by all approaches. Applying the same significance cutoff values used before in these studies (RNA-seq FDR < 0.05; microarray q < 0.01; SuperSAGE p < 1e-10), 154 genes were found to be significantly up-regulated and 153 genes significantly downregulated after pairing as identified by at least two approaches (**Figure 2** and **Supplementary Table 1**). In particular, we identified 43 genes that were up-regulated (21; bM > sM) or down-regulated (22; bM < sM) in all three data sets, including follistatin (highlighted in **Supplementary Table 1**; sheet 2, no. 21).

### CONCLUSION AND PERSPECTIVE

The analysis of transcriptomic data obtained thus far has provided conclusive evidence for a substantial molecular

genes, the values were compared among the data sets before one representative was chosen. Significantly differentially regulated genes were selected using the same threshold used in the mentioned studies: q < 0.01 for microarray, p < 1e-10 for SuperSAGE, and FDR < 0.05 for RNA-seq. Transcripts meeting the criteria of occurrence in at least two approaches were picked, and the corresponding genes were visualized using the R package plotly (https://plot.ly) [(Supplementary Table 1: http://schisto.xyz/male-deg-3d/ [Interactive visualization of differentially expressed genes between bM and sM from three approaches. The users can use the mouse to zoom in/out and rotate the plot as well as to see detailed information for each gene.)].

contribution of the male to the male–female interaction and the reproductive biology of schistosomes. The male's effort is not restricted to nutritional support and muscle power for carrying the paired female around, which lodges inside the male's gynecophoral canal. Instead, the pairing scenario appears more complex, involving different signaling systems mediating communication between the partners. This includes neuronal processes whose management asymmetrically shifts to the male side upon pairing. Therefore, this review about the males' perspective of the reciprocal sender-recipient relationship of schistosome couples may stimulate future research in this area. Understanding male–female interaction will give a twofold return: (i) for basic science, solving one of the most interesting but yet unanswered question of schistosome biology, and (ii) and for applied research in view of the high demand finding alternative treatment concepts to fight schistosomiasis (Bergquist et al., 2017).

## DATA AVAILABILITY

Publicly available datasets were analyzed in this study. This data can be found here: https://www.ncbi.nlm.nih.gov/pmc/articles/ PMC4976352/

## AUTHOR CONTRIBUTIONS

ZL, SS, and OW prepared data and figures and substantially contributed to the work. CG conceived and wrote the manuscript.

### FUNDING

Studies leading to this review were supported by grants of the Wellcome Trust (FUGI, 107475/Z/15/Z) and the Deutsche Forschungsgemeinschaft (GR 1549/7-3).

## ACKNOWLEDGMENTS

The authors acknowledge the effort of the Wellcome Trust for supporting parasite *omics* and for maintaining appropriate database resources.

### SUPPLEMENTARY MATERIAL

The Supplementary Material (Supplementary Table 1 and Supplementary Data 1) for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene.2019.00796/ full#supplementary-material

### REFERENCES

Anderson, L., Amaral, M. S., Beckedorff, F., Silva, L. F., Dazzani, B., Oliveira, K. C., et al. (2015). *Schistosoma mansoni* egg adult male and female comparative gene expression analysis and identification of novel genes by RNA-Seq. *PLoS Negl. Trop. Dis.* 9 (12), e0004334. doi: 10.1371/journal.pntd.0004334


variation in *Drosophila* longevity. *Nat. Genet.* 34 (4), 429–433. doi: 10.1038/ ng1218


technique. *Dev. Growth Diff.* 44, 559–563. doi: 10.1046/j.1440-169X.2002. 00667.x


lysosomal hydrolase gene expression: implications for blood processing. *PLoS Negl. Trop. Dis.* 9 (12), e0004272. doi: 10.1371/journal.pntd.0004272


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2019 Lu, Spänig, Weth and Grevelding. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Weighted Gene Co-Expression Analyses Point to Long Non-Coding RNA Hub Genes at Different *Schistosoma mansoni* Life-Cycle Stages

*Lucas F. Maciel1,2, David A. Morales-Vicente1,3, Gilbert O. Silveira1,3, Raphael O. Ribeiro1,3, Giovanna G. O. Olberg1, David S. Pires1, Murilo S. Amaral1 and Sergio Verjovski-Almeida1,3\**

#### *Edited by:*

*Gabriel Rinaldi, Wellcome Trust Sanger Institute (WT), United Kingdom*

#### *Reviewed by:*

*Thiago Motta Venancio, Universidade Estadual do Norte Fluminense Darcy Ribeiro, Brazil Zhigang Lu, Wellcome Trust Sanger Institute (WT), United Kingdom*

> *\*Correspondence: Sergio Verjovski-Almeida verjo@iq.usp.br*

#### *Specialty section:*

*This article was submitted to Evolutionary and Genomic Microbiology, a section of the journal Frontiers in Genetics*

*Received: 04 April 2019 Accepted: 09 August 2019 Published: 12 September 2019*

#### *Citation:*

*Maciel LF, Morales-Vicente DA, Silveira GO, Ribeiro RO, Olberg GGO, Pires DS, Amaral MS and Verjovski-Almeida S (2019) Weighted Gene Co-Expression Analyses Point to Long Non-Coding RNA Hub Genes at Different Schistosoma mansoni Life-Cycle Stages. Front. Genet. 10:823. doi: 10.3389/fgene.2019.00823*

*1 Laboratório de Expressão Gênica em Eucariotos, Instituto Butantan, São Paulo, Brazil, 2 Programa Interunidades em Bioinformática, Instituto de Matemática e Estatística, Universidade de São Paulo, São Paulo, Brazil, 3 Departamento de Bioquímica, Instituto de Química, Universidade de São Paulo, São Paulo, Brazil*

Long non-coding RNAs (lncRNAs) (>200 nt) are expressed at levels lower than those of the protein-coding mRNAs, and in all eukaryotic model species where they have been characterized, they are transcribed from thousands of different genomic *loci*. In humans, some four dozen lncRNAs have been studied in detail, and they have been shown to play important roles in transcriptional regulation, acting in conjunction with transcription factors and epigenetic marks to modulate the tissue-type specific programs of transcriptional gene activation and repression. In *Schistosoma mansoni*, around 10,000 lncRNAs have been identified in previous works. However, the limited number of RNA-sequencing (RNA-seq) libraries that had been previously assessed, together with the use of old and incomplete versions of the *S. mansoni* genome and protein-coding transcriptome annotations, have hampered the identification of all lncRNAs expressed in the parasite. Here we have used 633 publicly available *S. mansoni* RNA-seq libraries from whole worms at different stages (n = 121), from isolated tissues (n = 24), from cell-populations (n = 81), and from single-cells (n = 407). We have assembled a set of 16,583 lncRNA transcripts originated from 10,024 genes, of which 11,022 are novel *S. mansoni* lncRNA transcripts, whereas the remaining 5,561 transcripts comprise 120 lncRNAs that are identical to and 5,441 lncRNAs that have gene overlap with *S. mansoni* lncRNAs already reported in previous works. Most importantly, our more stringent assembly and filtering pipeline has identified and removed a set of 4,293 lncRNA transcripts from previous publications that were in fact derived from partially processed mRNAs with intron retention. We have used weighted gene co-expression network analyses and identified 15 different gene co-expression modules. Each parasite life-cycle stage has at least one highly correlated gene co-expression module, and each module is comprised of hundreds to thousands lncRNAs and mRNAs having correlated co-expression patterns at different stages. Inspection of the top most

**69**

highly connected genes within the modules' networks has shown that different lncRNAs are hub genes at different life-cycle stages, being among the most promising candidate lncRNAs to be further explored for functional characterization.

Keywords: parasitology, RNA-seq, single-cell sequencing data, *Schistosoma mansoni*, long non-coding RNAs, weighted genes co-expression network analysis

### INTRODUCTION

Schistosomiasis is a neglected tropical disease, caused by flatworms from the genus *Schistosoma*, with estimates of more than 250 million infected people worldwide and responsible for 200 thousand deaths annually at the Sub-Saharan Africa (Who, 2015). *Schistosoma mansoni*, prevalent in Africa and Latin America, is one of the three main species related to human infections (Cdc, 2018). In America, it is estimated that 1 to 3 million people are infected by *S. mansoni* and over 25 million live in risk areas, being Brazil and Venezuela the most affected (Zoni et al., 2016). The prevalence of this disease is correlated to social–economic and environmental factors (Gomes Casavechia et al., 2018).

This parasite has a very complex life-cycle comprised of several developmental stages, with a freshwater snail intermediate-host and a final mammalian host (Basch, 1976). Recently, it has been shown that epigenetic changes are required for life-cycle progression (Roquis et al., 2018). However, little is known about the genes and molecules that drive this process through the lifecycle stages of *S. mansoni*. A better understanding of the gene expression regulation mechanisms and of their components may lead to new therapeutic targets (Batugedara et al., 2017), and one key element could be the long non-coding RNAs (lncRNAs) (Blokhin et al., 2018).

LncRNAs are defined as transcripts longer than 200 nucleotides, without apparent protein-coding potential (Cao et al., 2018). The term "apparent" is included because it is already known that some lncRNAs actually have dual function roles, being functional both as lncRNAs and through peptides shorter than 100 amino acids that they encode (Nam et al., 2016; Choi et al., 2018). In mammalians, lncRNAs regulate gene expression through different mechanisms (Bhat et al., 2016), including mediating epigenetic modifications (Hanly et al., 2018), and were shown to be important in vital processes, such as cell cycle regulation (Kitagawa et al., 2013), pluripotency maintenance (Rosa and Ballarino, 2016), and reproduction (Golicz et al., 2018).

In *S. mansoni*, the expression of lncRNAs at different lifecycle stages was first detected by our group in 2011 using microarrays (Oliveira et al., 2011). Subsequently, large-scale identification of *S. mansoni* lncRNAs has been reported in three studies from our group and from others that analyzed highthroughput RNA-sequencing (RNA-seq) data (Vasconcelos et al., 2017; Liao et al., 2018; Oliveira et al., 2018), but each of them has used a the limited number of data sets (from 4 to 88 RNA-seq libraries). Because each work used different mapping tools and parameters (Vasconcelos et al., 2017; Liao et al., 2018; Oliveira et al., 2018), and given that Liao et al. (2018) did not compare their lncRNAs with the previously published ones, part of the lncRNAs are redundant among the three reports. In addition, the lncRNAs were annotated against the old version 5.2 of the genome and protein-coding transcriptome (Protasio et al., 2012); as a result, a set of transcripts that were previously annotated as lncRNAs (Vasconcelos et al., 2017; Liao et al., 2018; Oliveira et al., 2018), seem now to represent partially processed pre-mRNAs arising from novel protein-coding genes annotated in the new version 7.1 of the transcriptome (https:// parasite.wormbase.org/Schistosoma\_mansoni\_prjea36577/); these transcripts were previously annotated as having no coding potential due to intron retention, as exemplified in **Supplementary Figure S1**. Besides, these three works used expression data from whole parasites, while it is known from other species that lncRNAs have tissue- and cell-specific expression (Wu et al., 2016; Credendino et al., 2017).

The aim of the present work is to identify and annotate a robust and more complete set of lncRNAs that agrees with the most updated transcriptome annotation, and to analyze RNA-seq data sets still non-annotated for the presence of lncRNAs—e.g., gonads (Lu et al., 2016) and single-cell (Tarashansky et al., 2018; Wang et al., 2018) RNA-seq libraries. The goal is to provide a foundation that will enable future studies on the role of lncRNAs in *S. mansoni* biology, which could eventually identify potential new therapeutic targets.

#### MATERIALS AND METHODS

#### Transcripts Reconstruction

To identify new lncRNAs, 633 publicly available RNA-seq libraries from whole worms at different stages (miracidia, n = 1; sporocysts, n = 1; cercariae, n = 8; schistosomula, n = 11; juveniles, n = 9; adult males, n = 34; adult females, n = 37; and mixed adults, n = 20), from tissues (testes, n = 6; ovaries, n = 5; posterior somatic tissues, n = 3; heads, n = 5; and tails, n = 5), from cell populations (n = 81) and from single cells (from juveniles, n = 370 and mother sporocysts stem cells, n = 37) were downloaded from the SRA and ENA databases (**Supplementary Table S1**). The only whole-worm stage that was not included was eggs, because there is a single RNA-seq library available in the public domain (Anderson et al., 2016), which has only 252,000 egg reads, an amount that is fourfold lower than the minimum number of reads per library in the other whole-worm libraries that we used (namely 1 million good quality reads), being a too-low coverage for an unbiased detection of stage- or tissue-specific lncRNAs in complex organisms (Sims et al., 2014). The new versions of the genome (v 7) and transcriptome (v 7.1), which were used as reference in this study, were downloaded from the WormBase ParaSite resource (Howe et al., 2017) at https://parasite.wormbase.org/ Schistosoma\_mansoni\_prjea36577/.

Quality control was done with fastp v 0.19.4 (Chen et al., 2018) (default parameters), removing adapters and low-quality reads. The reads in each library were then mapped against the genome with STAR v 2.6.1c in a two-pass mode, with parameters indicated by STAR's manual as the best ones to identify new splicing sites and transcripts (Dobin et al., 2013). RSeQC v 2.6.5 (Wang et al., 2012) was used to identify RNA-Seq library strandedness to be used in transcripts reconstruction and expression levels quantification. For each library, multi mapped reads were removed with Samtools v 1.3 (Li et al., 2009) and uniquely mapped reads were used for transcript reconstruction with Scallop v 10.2 (–min\_mapping\_quality 255 -min\_splice\_ boundary\_hits 2) (Shao and Kingsford, 2017). A new splicing site should be confirmed at least by two reads to be considered. A consensus transcriptome from all libraries was built using TACO v 0.7.3 (–filter-min-length 200 -isoform-frac 0.05), an algorithm that reconstructs the consensus transcriptome from a collection of individual assemblies (Niknafs et al., 2016). As described by Niknafs et al. (2016), TACO employs change point detection to break apart complex loci and correctly delineate transcript start and end sites and a dynamic programming approach to assemble transcripts from a network of splicing patterns (Niknafs et al., 2016).

#### LncRNAs Classification

In the consensus transcriptome, transcripts shorter than 200 nt, monoexonic or with exon-exon overlap with proteincoding genes from the same genomic strand were removed from the set. The coding potential of the remaining transcripts was evaluated by means of the FEELnc tool v 0.1.1 (Wucher et al., 2017) with shuffle mode, which uses a random forest machine-learning algorithm and classifies these transcripts into lncRNAs or protein-coding genes, and also by CPC2 v 0.1 (Yang et al., 2017), which classifies through a support vector machine model using four intrinsic features. Only transcripts classified as lncRNAs by both tools were kept. ORFfinder v 0.4.3 (https://www.ncbi.nlm.nih.gov/orffinder/) was used to extract the putative longest open reading frames (ORFs); these putative peptides were then submitted to orthologybased annotation with eggNOG-mapper webtool (HMMER mapping mode) (Huerta-Cepas et al., 2017). Transcripts with no hits against the eukaryote eggNOG database were then considered as lncRNAs. If any transcript isoform was classified as a protein-coding mRNA at any step, all transcripts mapping to the same genomic *locus* were removed to avoid eventual pre-mRNAs. After this final step, a lncRNAs GTF file was created.

#### Histone Marks

To identify histone H3 lysine 4 trimethylation (H3K4me3) and H3 lysine 27 trimethylation (H3K27me3) marks near the transcription start site (TSS) of lncRNAs, we used 12 libraries of Chromatin Immunoprecipitation Sequencing (ChIP-Seq) data generated by Roquis et al. (2018) for cercariae, schistosomula, and adults (**Supplementary Table S1**), which had more than 90% overall mapping rate. The reads were downloaded from the SRA database and mapped against the genome v 7 with Bowtie2 v 2.3.4.3 (Langmead and Salzberg, 2012) (parameters end-to-end, -sensitive, -gbar 4). Because there are no input data sets publicly available in the SRA database for the Roquis et al. (2018) paper, we were not able to exactly reproduce the pipeline that was described in the Methods section of that paper, which used the input as a reference for peak calling. Instead, we used HOMER v 4.10 (Heinz et al., 2010) for removing multi-mapped and duplicated reads and for significant peak calling as described by Anderson et al. (2016), an approach also used by Vasconcelos et al. (2017) in the first large-scale annotation of lncRNAs in *S. mansoni*. The number of reads in the peak should be at least fourfold higher than in the peaks of the surrounding 10-kb area and the Poisson p-value threshold cutoff was 0.0001. The lncRNAs with significant histone mark peaks within 1-kb distance upstream and downstream from their TSS were annotated. The lncRNAs with overlapping marks are shown with an intersection diagram that was plotted using the UpSetR tool v 1.3.3 (Lex et al., 2014). The Venn diagram tool at http://bioinformatics.psb.ugent.be/beg/tools/ venn-diagrams was used for generating the lists of lncRNA genes belonging to each intersection set.

#### Co-Expression Networks

The lncRNAs GTF file was then added to the *S. mansoni* public protein-coding transcriptome version 7.1 GTF file, and the resulting protein-coding + lncRNAs GTF was used as the reference together with the genome sequence v.7 for mapping the reads of each RNA-seq library under study, again using the STAR tool, now in the one pass mode, followed by gene expression quantification with RSEM v 1.3 (Li and Dewey, 2011). Weighted gene co-expression network analyses v 1.68 (WGCNA) (Langfelder and Horvath, 2008) were then performed to identify modules related to the life-cycle stages and tissues of the organism. For this purpose, only libraries from whole worms or from tissues with more than 50% of the reads uniquely mapped were used. To reduce noise, only transcripts with expression greater than 1 transcript per million (TPM) in at least half of the libraries in one or more stages/ tissues were considered. Expression levels were measured in log space with a pseudocount of 1 (log2 (TPM+1)), and we set the transcript expression to zero when log2 (TPM+1) <1. For the construction of the adjacency matrix, the power adjacency function for signed networks was applied with the softthresholding beta parameter equal to 14, which resulted in a scale-free topology model fit index (*R*2 = 0.935). The adjacency matrix was then converted to the Topological Overlap Matrix (TOM) and the dissimilarity TOM (1 − TOM) was calculated (Langfelder and Horvath, 2008).

Correlation between the modules and the stages was calculated based on the Pearson correlation coefficient between the expression levels of the transcripts belonging to each module along the stages, as suggested in the WGCNA tutorial (https://horvath.genetics.ucla.edu/html/CoexpressionNetwork/ Rpackages/WGCNA/Tutorials/). As miracidia and sporocysts have only one library each, are closely related stages of development, and were clustered together as an outgroup based on their overall expression patterns (as shown in the Results), we decided to consider both stages together as one group (miracidia/ sporocysts) to calculate the correlation and p-values between modules and stages.

The Gene Trait Significance (GS) was calculated based on the correlation of an individual transcript and the trait, which in our case was always the stage of higher absolute Pearson correlation coefficient with the module where the transcript belongs. For example, for a transcript that belongs to the red module (most highly correlated with testes, see Results), the correlation was calculated between the expression of the transcript in the testes libraries and the expression of the transcript in all other nontestes libraries.

#### Gene Ontology (GO) Enrichment

Protein-coding genes were submitted to eggNOG-mapper (Huerta-Cepas et al., 2017) for annotation of GO terms. Based on this annotation (available at **Supplementary Table S2**), we performed GO enrichment analyses with BINGO (Maere et al., 2005). For each module, we used a hypergeometric test, the whole annotation as reference set, and FDR ≤ 0.05 was used as the significance threshold.

#### Single-Cell Analyses

The expression levels were quantified in single-cell RNA-seq libraries from juveniles' stem cells (Tarashansky et al., 2018) and mother sporocysts stem cells (Wang et al., 2018) by RSEM. We used Scater v 1.10.1 (Mccarthy et al., 2017) to normalize and identify high-quality single-cell RNA-Seq libraries, i.e., those that have at least 100,000 total counts and at least 1,000 different expressed transcripts, as recommended by Mccarthy et al. (2017); all libraries were classified as high quality.

Next, we used the R package Single-Cell Consensus Clustering (SC3) tool v 1.10.1 (Kiselev et al., 2017), which performs an unsupervised clustering of scRNA-seq data. Based on the clusters identified, we used the plot SC3 markers function to find marker genes based on the mean cluster expression values. These markers are highly expressed in only one of the clusters and indicate the specific expression at the cell level. As described by Kiselev et al. (2017), the area under the receiver operating characteristic (ROC) curve is used to quantify the accuracy of the prediction. A p-value is assigned to each gene by using the Wilcoxon signed rank test. Genes with the area under the ROC curve (AUROC) > 0.85 and with p-value < 0.01 are defined as marker genes.

#### Parasite Materials

All parasite materials were from a BH isolate of *S. mansoni* maintained by passage through golden hamster (*Mesocricetus auratus*) and *Biomphalaria glabrata* snails. Eggs were purified from livers of hamsters previously infected with *S. mansoni*, according to Dalton et al. (1997). After purification, eggs were added to 10 ml of distilled water and exposed to a bright light. Supernatant containing hatched miracidia was removed every 30 min for 2 h and replaced by fresh water. The supernatants containing the miracidia were pooled and chilled on ice, and miracidia were then recovered by centrifugation at 15,000*g* for 20 s (Dalton et al., 1997). Supernatant was discarded and miracidia stored in RNAlater (Ambion) until RNA extraction.

Cercariae were collected from snails infected with 10 miracidia each. Thirty-five days after infection, the snails were placed in the dark in water and then illuminated for 2 h to induce shedding. The emerging cercariae were collected by centrifugation, washed with PBS once, and then stored in RNAlater (Ambion) until RNA extraction.

Schistosomula were obtained by mechanical transformation of cercariae and separation of their bodies as previously described (Basch, 1981), with some modifications. Briefly, cercariae were collected as described above and then suspended in 15 ml of M169 medium (Vitrocell, cat number 00464) containing penicillin/streptomycin, amphotericin (Vitrocell, cat number 00148). Mechanical transformation was performed by passing the cercariae 10 times through a 23G needle. To separate schistosomula from the tails, the tail-rich supernatant was decanted and the sedimented bodies resuspended in a further 7 ml of M169 medium. The procedure was repeated until less than 1% of the tails remained. The newly transformed schistosomula were maintained for 24 h in M169 medium (Vitrocell, cat number 00464) supplemented with penicillin/streptomycin, amphotericin, gentamicin (Vitrocell, cat number 00148), 2% fetal bovine serum, 1 μM serotonin, 0.5 μM hypoxanthine, 1 μM hydrocortisone, and 0.2 μM triiodothyronine at 37°C and 5% CO2. Schistosomula cultivated for 24 h were collected, washed three times with PBS and stored in RNAlater (Ambion) until RNA extraction.

Adult *S. mansoni* worms were recovered by perfusion of golden hamsters that had been infected with 250 cercariae, 7 weeks previously. Approximately 200 *S. mansoni* (BH strain) adult worm pairs were freshly obtained through the periportal perfusion of hamster, as previously described (Anderson et al., 2016; Vasconcelos et al., 2017). After perfusion, the adult worm pairs were kept for 3 h at 37°C and 5% CO2 in Advanced RPMI Medium 1640 (Gibco, 12633-012) supplemented with 10% fetal bovine serum, 12 mM HEPES (4-(2-hydroxyethyl) piperazine-1-ethanesulfonic acid) pH 7.4, and 1% penicillin/ streptomycin, amphotericin (Vitrocell, cat number 00148). After 3 h of incubation, the adult worm pairs were collected, washed three times with PBS, and stored in RNAlater (Ambion) until RNA extraction. Before the extraction of RNA from males or females, adult worm pairs were manually separated in RNAlater (Ambion) using tweezers.

#### RNA Extraction, Quantification, and Quality Assessment

Total RNA from eggs (E), miracidia (Mi), cercariae (C), and schistosomula (S) was extracted according to Vasconcelos et al. (2017). Briefly, 100,000 eggs, 15,000 miracidia, 25,000 cercariae, or 25,000 schistosomula were ground with glass beads in liquid nitrogen for 5 min. Then, the Qiagen RNeasy Micro Kit (Cat number 74004) was used for RNA extraction and purification according to the manufacturer's instructions, except for the DNase I treatment, the amount of DNase I was doubled and the time of treatment was increased to 45 min.

Male (M) or female (F) adult worms were first disrupted in Qiagen RLT buffer using glass potters and pestles. RNA from males or females was then extracted and purified using the Qiagen RNeasy Mini Kit (Cat number 74104), according to the manufacturer's instructions, except for the DNase I treatment, which was the same used for egg, miracidia, cercariae, and schistosomula RNA extraction.

All the RNA samples were quantified using the Qubit RNA HS Assay Kit (Q32852, Thermo Fisher Scientific), and the integrity of RNAs was verified using the Agilent RNA 6000 Pico Kit (5067- 1513 Agilent Technologies) in a 2100 Bioanalyzer Instrument (Agilent Technologies). Four biological replicates were assessed for each life cycle stage, except for schistosomula, for which three biological replicates were assessed.

#### Reverse Transcription and Quantitative PCR (qPCR) Assays

The reverse transcription (RT) reaction was performed with 200 ng of each total RNA sample using the SuperScript IV First-Strand Synthesis System (18091050; Life Technologies) and random hexamer primers in a 20-μL final volume. The obtained complementary DNAs (cDNAs) were diluted four times in DEPC water, and quantitative PCR was performed using 2.5 μL of each diluted cDNA in a total volume of 10 μL containing 1X LightCycler 480 SYBR Green I Master Mix (04707516001, Roche Diagnostics) and 800 nM of each primer in a LightCycler 480 System (Roche Diagnostics). Primers for selected transcripts (**Supplementary Table S3**) were designed using the Primer 3 tool (http://biotools.umassmed.edu/bioapps/primer3\_www.cgi), and each real-time qPCR was run in two technical replicates. The results were analyzed by comparative Ct method (Livak and Schmittgen, 2001). Real-time data were normalized in relation to the level of expression of Smp\_090920 and Smp\_062630 reference genes.

#### RESULTS

#### LncRNAs Identification and Annotation

Using 633 publicly available *S. mansoni* RNA-seq libraries from whole worms at different stages, from isolated tissues, from cellpopulations, and from single-cells (see Methods), our pipeline assembled a consensus transcriptome comprised of 78,817 transcripts, of which 7,954 were classified as intergenic lncRNAs (lincRNAs), 7,438 as antisense lncRNAs, and 1,191 as sense lncRNAs, totalizing 16,583 lncRNA transcripts originated from 10,024 genes (on average, 1.65 lncRNA isoforms per lncRNA gene); the summary of all six filtering steps in the pipeline is presented in **Table 1**. With the FEELnc lncRNA classification tool (**Table 1, step 5**), the most important feature for transcripts classification was the ORF coverage (**Supplementary Figure S2A**), i.e., the fraction of the total length of the transcript that is occupied by the longest predicted ORF. In the FEELnc model training process, an optimal coding probability cutoff (0.348) was identified, which resulted in 0.962 sensitivity and specificity of mRNA classification (**Supplementary Figure S2B**). Analogous information is not provided in the output of the CPC2 classification tool (**Table 1, step 5**). Only the lncRNAs classified as such by both prediction tools were retained in the final set (**Table 1**).

From the total set of 16,583 lncRNAs obtained here, 11,022 are novel *S. mansoni* lncRNAs, whereas the remaining 5,561 transcripts comprise 120 lncRNAs that are identical to previously published ones, and 5,441 lncRNAs that have gene overlap with *S. mansoni* lncRNAs already reported in previous works (Vasconcelos et al., 2017; Liao et al., 2018; Oliveira et al., 2018) (**Supplementary Table S4**). In particular, among the 7,029 lincRNAs previously published ones reported by our group (Vasconcelos et al., 2017), a total of 4,368 transcripts have partial or complete sequence overlap with the lncRNAs obtained here, whereas the remaining 2,661 (37.8%) transcripts previously annotated by Vasconcelos et al. (2017) are no longer in the present updated *S. mansoni* lncRNAs data set.

Among the transcripts in the public data set that were previously classified as lncRNAs (Vasconcelos et al., 2017; Liao et al., 2018; Oliveira et al., 2018) and are now excluded, a total of 4,293 were reconstructed in our assembly; however, they were removed from our set of lncRNAs because they were partially processed pre-mRNA transcripts that have exon-exon overlap with new protein-coding genes of version 7.1. The remaining transcripts previously classified as lncRNAs were reconstructed here but were removed by the more stringent, presently used filtering steps. We have created a track on the *S. mansoni* UCSClike genome browser (http://schistosoma.usp.br/), where the set

TABLE 1 | Summary of transcripts removed at each filtering step and the final set of *S. mansoni* lncRNAs.


of 16,583 lncRNAs obtained here can be visualized and the GTF and BED files can be downloaded. In **Figure 1**, we show a selected protein-coding desert genomic *locus* on chromosome 2 covering 245 kilobases, which harbors only three protein-coding genes and where we identified seven lincRNAs, two sense lncRNAs, and one antisense lncRNA that were not previously described*.*

To identify the contribution from each type of RNA-Seq library to the final lncRNAs set, we used the TACO transcriptome assembler to obtain the transcriptomes of the four following groups: whole organisms, tissues, cell populations, and single cells. The result is presented in **Figure 2** and shows that each type of sample contributed with at least 1,000 unique lncRNAs, detected only in that group. It is worthy to mention that around 4% of the 16,583 lncRNAs are lost when the four transcriptomes are reconstructed separately.

Almost all lncRNAs encode short canonical ORFs within their sequences, however, as described by Verheggen et al. (2017), one can evaluate if these ORFs are originated only by random nucleotide progression by comparing the relative sizes of ORFs using the reverse-complement of the sequence as a control. As presented in **Figure 3**, it is very clear that the size distribution of *bona fide S. mansoni* mRNA ORFs (sense) from the annotated v 7.1 transcriptome is greatly shifted toward longer sizes, compared with the size distribution of random ORFs found in their reverse-complement sequences. It is also possible to observe that the size distribution of ORFs found both within the lncRNAs (sense) and within their

reverse-complement sequences is very similar and is also similar to the size distribution of random ORFs in the reversecomplement sequence of mRNAs.

#### Histone Marks at the TSS of LncRNAs as Evidence of Regulation

As reported earlier, cross-matching of the lncRNAs genomic coordinates with the genomic coordinates of different publicly available histone mark profiles, obtained by ChIP-Seq at different life-cycle stages, adds another layer of functionality evidence for this class of RNAs (Vasconcelos et al., 2017; Cao et al., 2018). We used the data for two different histone marks obtained by Roquis et al. (2018) in cercariae, schistosomula, and adult parasites, namely, H3K4me3 that is generally associated with active transcription, and H3K27me3 associated to transcription repression (Barski et al., 2007). First, we analyzed the histone mark profiles of H3K4me3 and H3K27me3 around the TSS of protein-coding genes through the stages, and they were very similar to the ones presented by Roquis et al. (2018) (**Supplementary Figure S3)**. **Figure 4** shows that these marks are also present around the TSS of *S. mansoni* lncRNAs at the three different life-cycle stages; a comparison with **Supplementary Figure S3** shows that these marks are less abundant in lncRNAs than that in the proteincoding genes loci and more spread away of the lncRNAs TSSs when compared with protein-coding genes. This profile is

similar to that observed by Sati et al. (2012) when comparing histone marks around the TSS of human protein-coding genes and lncRNAs. A total of 8,599 lncRNA transcripts have at least one histone modification mark within 1 kb from their TSS (**Supplementary Table S5)**, being 3,659 lincRNAs, 4,188 antisense lncRNAs, and 752 sense lncRNAs. A comparison of the lists of lncRNAs having a given histone mark at their TSS at either of the three different life-cycle stages (**Figure** 

**5**) shows that the most abundant mark is the transcriptional repressive mark, H3K27me3. This mark is present at the TSS of different sets of lncRNAs at each of the three stages, with abundancies ranging from 1,334 lncRNAs with the H3K27me3 mark exclusively in schistosomula to 1,147 lncRNAs with the mark exclusively in adults and 1,024 lncRNAs with the mark exclusively in cercariae (**Figure 5, red**). In addition, the transcriptional activating mark H3K4me3 is present at the

H3K27me3 marks (blue) mapping within 10 kb around the TSS of all lncRNAs in (A) adults, (B) schistosomula and (C) cercariae was computed.

TSS of a different set of lncRNAs, with abundancies ranging from 740 lncRNAs with the H3K4me3 mark exclusively in schistosomula to 282 lncRNAs with the mark exclusively in cercariae, and 214 lncRNAs with the mark exclusively in adults (**Figure 5, green**). Interestingly, among the lncRNAs with the most abundant patterns of marks at their TSS, there are 316 lncRNAs in cercariae that have the characteristic marks of bivalent poised promoters (having both H3K4me3 and H3K27me3 marks at their TSS) (Voigt et al., 2013) (**Figure 5, blue**). This is analogous to the marks at the TSS of proteincoding genes in cercariae, where most genes have the bivalent mark (Roquis et al., 2018), indicating that lncRNAs are under a similar transcriptional regulatory program as the proteincoding genes in cercariae. **Supplementary Table S5** has a complete UpSet plot similar to that of **Figure 5**, showing the number of lncRNAs found in all different intersections, along with the lists of lncRNAs belonging to each intersection set.

### Gene Co-Expression Analyses

Once we identified our final lncRNAs set, we applied weighted gene co-expression network analyses (WGCNA) to integrate the expression level differences observed for lncRNAs and mRNAs among all life-cycle stages and the gonads, using all RNA-seq libraries available. The file containing expression levels (in TPM) for all transcripts in all 633 RNA-Seq libraries is available at http:// schistosoma.usp.br/. After normalization and gene filtering (see Methods), 90 libraries out of the 112 from the different stages (mixed-sex adults were not included) remained in the WGCNA analyses, and 19,258 transcripts were retained (12,693 proteincoding genes and 6,565 lncRNAs).

Samples from miracidia, sporocysts, schistosomula, cercariae, and gonads (testes and ovaries) were correctly clustered together by their expression correlation, based on Euclidian distance metrics (**Figure 6**). For samples from adult worms, in spite of the fact that we have one cluster branch mainly composed of

of lncRNAs in each intersection set are shown in Supplementary Table S5. The intersection set in blue shows the number of lncRNAs with the simultaneous

females, and another mainly composed of males, there are some male samples among the female ones, and vice versa. Besides, due to the known similarity between males and juveniles (Wang et al., 2017), their samples were not well separated. It is interesting to note that immature females, which were shown to have a similar expression profile as that of males (Lu et al., 2016), are clustered here in the male branch. As the WGCNA performs an unsupervised co-expression analysis, we decided to keep all male and female samples in the analysis, including those that are clustered apart from their main group, in order not to add a bias in the construction of modules.

H3K4me3\_C/H3K27me3\_C marks at their TSS in cercariae, characteristic of poised promoters.

We identified 15 different lncRNAs/mRNAs co-expression modules **(Figure 7)**, the sizes ranging from 215 to 3,318 transcripts (**Table 2** and **Supplementary Table S6**). The ratio between the number of lncRNAs and mRNAs that comprise each module varies among the modules; thus, whereas lncRNAs comprise 86% of the transcripts in the cyan module, only 5% of the transcripts from the black module are lncRNAs (**Table 2**).

A Pearson correlation analysis indicates that each stage/tissue has at least one module whose gene expression has a statistically significant positive correlation with that stage or tissue (**Figure 8**). Some stages also have modules that have a statistically significant negative correlation, such as the black module that is negatively correlated with miracidia/sporocysts. For the black module, the transcripts that compose the module have an expression in miracidia/sporocysts that is lower when compared with the overall expression of those transcripts across the other stages. The gray color represents the group of transcripts with a highly heterogeneous co-expression pattern that could not cluster into any of the 15 modules. In fact, it can be seen in **Figure 8** that in

male data sets, whose clustering pattern is the most spread one.

this group, the best correlation coefficient obtained in juveniles is lower (|r| = 0.32), and the p-value is much higher (p = 0.002) than the best parameters that were obtained in at least one stage for any module (|r| ≥ 0.51 and p ≤ 3e-07). Here, our choice of keeping in the WGCNA analysis, those male and female samples that cluster apart from their main group (**Figure 6**) have an impact, decreasing the correlation coefficient of the modules mostly correlated to males or females (pink or turquoise, respectively) when compared with correlation coefficients in the other stages/ tissues, nevertheless, they still have a statistically significant high correlation.

We chose three RNA-seq library samples from each of the nine different stages/tissues (among all the libraries under

analysis) to construct a representative expression heatmap (**Figure 9**). This heatmap shows the expression across all stages of the top 50 transcripts with the highest gene module membership (GMM) to the most correlated module of each stage (as seen in **Figure 8)** (for GMM definition see WGCNA background and glossary, available at https://horvath.genetics. ucla.edu/html/CoexpressionNetwork/Rpackages/WGCNA/ Tutorials/) (Langfelder and Horvath, 2008). The heatmap (**Figure 9**) confirms that the top transcripts belonging to one module are more expressed in one given stage/tissue, which is the stage/tissue with which the module has the highest correlation. It is noteworthy that female library SRR5170160, which clustered inside the male group (**Figure 6**) when all


filtered transcripts under analysis were used for clustering, now is correctly clustered with the other female samples (**Figure 9**) when only the top 50 transcripts with the highest GMM are considered. Also, juveniles share with adult males a similar expression pattern of the top 50 male genes, which is in line with the clustering of juveniles along with males in the analysis of **Figure 6**.

#### Validation of lncRNAs Expression by RT-qPCR

We designed PCR primer pairs for a selected set of eleven lincRNAs belonging to five different modules, as determined by WGCNA, to detect their expression along the different *S. mansoni* life-cycle stages and to eventually validate their different expression levels at the stages. Our selection was based on the Gene Trait Significance score (GS score) (**Supplementary Table S7**) of each lncRNA in the module where it belongs, which varies from −1 to 1, using the stages as external information (see Methods). The higher the absolute value of the GS score, the more biologically significant and correlated to the stage of interest is the transcript expression. For the RT-qPCR assays, we used samples from eggs (E), miracidia (Mi), cercariae (C), schistosomula (S), adult males (M), and females (F).

First, we measured the expression of five protein-coding genes that were used as stage markers (Parker-Manuel et al.,


2011; Anderson et al., 2016), and we found that in our RNA samples, they were more highly expressed at the predicted stages (**Supplementary Figure S4**).

Then, we tested the selected eleven lincRNAs and detected that they were expressed in at least one of the six stages that were assayed; specifically, each of six lincRNAs were more highly expressed at the stage predicted by the correlation with the modules **(Figure 10)**, at four life-cycle stages: two more highly expressed in miracidia (SmLINC158013-IBu and SmLINC123205-IBu, purple module), two in cercariae (SmLINC123474-IBu and SmLINC134196-IBu, tan module), one in schistosomula (SmLINC105065-IBu, magenta module), and one in males (SmLINC100046-IBu, turquoise module) **(Figure 10)**. In **Supplementary Figure S5** we present the values in transcripts per million reads (TPM) from the RNA-seq libraries for each of these six validated lincRNAs. Additionally, the five other lincRNAs that were tested were detected as expressed across all stages; however, they were not differentially expressed as predicted by the RNA-seq (**Supplementary Figure S6**). This indicates that there is variability of lncRNAs expression between the experimental conditions and parasite strain used in our assays and those found among the dozens of samples that are publicly available.

#### Protein-Coding Genes Ontology Enrichment and lncRNA Hub Genes in the Modules

Gene ontology (GO) enrichment analyses show that the protein-coding genes belonging to the red module, which have a correlation of 0.99 with testes, are enriched with processes related to sperm motility such as cilium movement and the

axoneme assembly **(Figure 11A)**. Besides, the green module, correlated with both ovaries and testes, is enriched with proteins associated with cellular replication **(Figure 11B)**. All other modules with GO enrichment, which in general are enriched with proteins associated to general metabolism, are presented in **Supplementary Figures S7–S10**. The black, cyan, midnight blue, purple, and tan modules have no significantly enriched GO terms due to the small number of protein-coding genes with GO annotation within each of these modules.

All transcripts that belong to the same module are connected; however, to better visualize this, gene co-expression networks were constructed only with the most connected genes (as determined by the adjacency threshold) (**Figure 12**), and they show, along with the correlation values presented in **Supplementary Table S7**, that some lncRNAs are hub genes from the network. **Figures 12A**, **B** show lncRNA hub genes in the co-expression networks from the purple and tan modules strongly correlated with miracidia/sporocysts and cercariae life-cycle stages, respectively. In both modules, the lncRNAs represent around half of the transcripts that comprise the modules (see **Table 2**). However, there are some cases, such as in the red module, where three quarters of the member transcripts are lncRNAs, and among the most connected genes in that co-expression network, almost all are lncRNAs (**Figure 12C**). Also, in the blue module only, 16% of the member transcripts are lncRNAs, and only one is among the most connected genes in the co-expression network (**Figure 12D**). All the gene networks for all modules in a format compatible with Cytoscape are available at **Supplementary Table S8**. An adjacency cutoff threshold of 0.1 was used.

to calculate the statistical significance of the expression differences among the parasite stage samples (\*p value≤0.05; \*\*p value≤0.01; \*\*\*\*p value≤0.0001). For clarity purposes, we show only the highest p value obtained in the ANOVA Tukey test for expression comparisons against one another among the stages.

## LncRNAs Expressed in Single Cells

Finally, analyses using single-cell data from two stages, mother sporocysts stem cells and juveniles' stem cells, identified three different clusters. Cluster 1 is composed of a subgroup of juvenile stem cells, cluster 2 is composed of all mother sporocysts stem cells, and cluster 3 is composed of a second and smaller subgroup of juvenile stem cells (**Figure 13A)**. The marker gene analyses show, for the first time in *S. mansoni,* that lncRNAs have specific expression also at the single-cell level, where from the top 10 markers that allow us to differentiate mother sporocysts stem cells from juvenile stem cells, eight are lncRNAs (**Figure 13B**), confirming the stage specificity of lncRNAs also seen in whole worm analyses by WGCNA. Besides, another lncRNA was identified as a marker for cluster 3 when compared with the other two clusters (**Figure 13B**).

## DISCUSSION

When the human genome was first sequenced, the vast genomic regions that lie between protein-coding genes (intergenic regions) were considered junk DNA; one decade later, the Encyclopedia of DNA Elements (ENCODE) project found that 80% of the human genome serves some biochemical purpose (Pennisi, 2012), including giving rise to the transcription of nearly 10,000 lncRNAs (Derrien et al., 2012). Although we are still at the beginning of the studies with lncRNAs, with the vast majority of their roles and mechanisms of action in human beings still unknown, it is now clear that most of the lncRNAs are transcribed from intergenic regions and are key regulators in vital processes (Kitagawa et al., 2013; Rosa and Ballarino, 2016; Golicz et al., 2018), being associated to several pathologies in humans, such as cancer (Fang and Fullwood, 2016), Alzheimer's (Zijian, 2016), and cardiac diseases (Simona et al., 2018).

In *S. mansoni*, with the release in 2012 of version 5.2 of the genome and annotated transcriptome (Protasio et al., 2012), and with the accumulation until 2017 of large amounts of information on gene expression obtained through 88 publicly available RNA-seq libraries, our group decided to map the RNA-seq data and identify the lncRNAs repertoire expressed in this parasite (Vasconcelos et al., 2017); this was followed by two other papers that provided an additional set of lncRNAs

FIGURE 11 | Top 30 Gene Ontology most significantly enriched terms for protein-coding genes belonging to the red and green co-expression network modules. At left are the enriched GO term annotations. For the (A) red (testes) and the (B) green (gonads) modules, the enriched GOs are separately represented into the three major GO term categories, namely Biological Process, Cellular Component and Molecular Function. No Molecular Function term was significantly enriched in the green module. The size of the circles is proportional to the number of genes (counts scale on the right) in each significantly enriched GO category, and the colors show the statistical significance of the enrichment, as indicated by the -log10 FDR values (color-coded scales at the right).

(Liao et al., 2018; Oliveira et al., 2018). In the present work, by extending the analysis to 633 publicly available RNA-seq libraries, and by performing a detailed curation of the assembled transcripts, we observed that at the sequencing depth obtained with the current RNA-seq data sets, a considerable amount of partially processed pre-mRNAs is being sequenced. These premRNAs give rise to assembled transcript units showing intron retention and frequent stop codons in the retained introns,

and therefore, these transcripts can be mistakenly annotated

as lncRNAs. In fact, the failure to identify partially processed pre-mRNA in previous publications (Vasconcelos et al., 2017; Liao et al., 2018) may explain the report of probable protein-coding genes as lncRNAs (**Supplementary Figure S1**). Our current pipeline has removed at step 4 a total of 31,183 assembled transcripts that had partial or total exon-exon overlap on the same genomic strand with known *S. mansoni* protein-coding genes, and this included around 14,000 assembled transcripts that represented fully processed mature protein-coding transcripts that exactly matched the annotated v 7.1 transcripts from the Wellcome Sanger Institute, as well as some 17,000 assembled transcripts that for the most part represent partially processed pre-mRNAs with intron retention; among the latter are 4,293 transcripts that were previously classified as lncRNAs (Vasconcelos et al., 2017; Liao et al., 2018; Oliveira et al., 2018) and are now excluded. With the six stringent filtering steps used in the present work, we are confident that our final set of 16,583 lncRNAs is a robust representation of the lncRNAs complement expressed in *S. mansoni*, of which 11,022 transcripts are novel lncRNAs, and 5,561 have gene overlap with lncRNAs already reported in previous works (Vasconcelos et al., 2017; Liao et al., 2018; Oliveira et al., 2018).

One question that has been raised about lncRNAs is the possibility that their function is executed through translation into short peptides, a concern that arises from the fact that almost all lncRNAs encode short canonical ORFs within their sequences (Verheggen et al., 2017); the fact that the size distribution of ORFs found within our set of lncRNAs (sense) is very similar to

value for each transcript.

and another from mother sporocysts' stem cells, were analyzed with the SC3 tool that performed an unsupervised clustering of the cells based on the single-cell gene expression data. Principal component analysis plot, where the symbol colors and sizes indicate the three clusters identified by SC3, and the shapes indicate the two life-cycle stages from which the stem cells were isolated. The symbol size is inversely related to the number of cells that belong to the cluster. (B) In the marker-gene expression matrix (log-transformation, represented by the color scale), the statistically significant gene markers are the rows, and the cells are columns. The life-cycle stage from which each cell was isolated is indicated by the color bar at the top (stages). The clusters of cells are separated by white vertical lines and are indicated by the second color bar at the top (clusters). The cluster marker genes are separated by white horizontal lines, the markers groups are indicated at left, and the names of the marker genes at right. Only the top 10 most significant marker genes are shown for cluster 2.

the size distribution of random ORFs found within their reversecomplement sequences and within the reverse-complement sequence of mRNAs suggests that the putative short ORFs from the lncRNAs identified here are indeed random ORFs, most probably not translated into short functional peptides. Nevertheless, future functional characterization in *S. mansoni* of selected lncRNAs may eventually include a search for a possible

dual function role (Nam et al., 2016; Choi et al., 2018) both as lncRNA and through a translated short peptide.

Histone marks were found here at the TSS of lncRNAs, and the identification of different sets of lncRNAs that have at their TSS the transcriptional activation H3K4me3 mark, or the repressive H3K27me3 mark, when the three life-cycle stages are compared, suggests that lncRNAs expression in *S. mansoni*

is regulated by an epigenetic program. This finding reinforces the hypothesis that different lncRNAs may play important roles along the parasite life-cycle, and the sets of lncRNAs identified in this analysis might be the first candidates to be explored for further functional characterization.

Gene co-expression networks correlated to the different *S. mansoni* life-cycle stages were identified by our analyses, and they pointed to sets of protein-coding genes and lncRNAs with expression most correlated to one given stage. This information provides an initial platform for prioritizing the lncRNAs to be selected for further direct functional characterization, which will include a search for altered *S. mansoni* phenotypes upon knockdown of lncRNA candidates. In *Plasmodium falciparum*, the knockdown of antisense lncRNAs has down-regulated the active var gene, a gene related to immune evasion, erasing the epigenetic memory and substantially changing the var gene expression pattern (Amit-Avraham et al., 2015). In analogy, it is expected that characterization of lncRNAs in *S. mansoni* will help to recognize the biochemical pathways where they play a functional role, will permit to identify their interacting protein partners, and will eventually point to relevant ways of intervention in the parasite physiology.

Due to the complex and diverse mechanisms displayed by lncRNAs in regulating protein-coding genes and miRNAs, the majority of studies have not progressed beyond cell or animal models, and progression toward the clinic has been slow (Harries, 2019). Nevertheless, lncRNAs represent potentially good therapeutic targets (Matsui and Corey, 2017; Blokhin et al., 2018; Harries, 2019). As reviewed by Matsui and Corey (2017), in Angelman syndrome model mouse, the administration of antisense oligonucleotides (ASOs), which target the Ube3a‐ATS lncRNA for degradation, partially reversed some cognitive defects associated with the disease in the animals (Meng et al., 2014). Also, in xenograft melanoma models, the intravenous injection of ASOs targeting the lncRNA SAMMSON caused p53 activation, tumor growth suppression, decreased cell proliferation, and increased apoptosis (Leucci et al., 2016). In this respect, it is noteworthy that lncRNAs are considerably less conserved between species when compared with protein-coding genes (Pang et al., 2006; Blokhin et al., 2018), and that only a few dozen ancient lncRNAs have conserved orthologs between ancient non-amniote *Xenopus* and the closest amniote chicken model animals (Necsulea et al., 2014), which shows that lncRNAs have evolutionarily conserved gene regulatory functions but low-sequence conservation across distant species (Necsulea et al., 2014). This feature reduces the chances that targeting a lncRNA in *S. mansoni*, for example, with ASOs, will cause unwanted off-target effects against the mammalian host.

#### DATA AVAILABILITY

The data sets analyzed in this study can be found in the SRA repository (https://www.ncbi.nlm.nih.gov/sra) and in the ENA repository (https://www.ebi.ac.uk/ena). The specific accession numbers for each and all data sets that were downloaded from these databases and used here are given in **Supplementary Table S1**.

#### ETHICS STATEMENT

All protocols involving animals were conducted in accordance with the Ethical Principles in Animal Research adopted by the Brazilian College of Animal Experimentation (COBEA), and the protocol/experiments have been approved by the Ethics Committee for Animal Experimentation of Instituto Butantan (CEUAIB Protocol number 1777050816).

### AUTHOR CONTRIBUTIONS

LM, MA and SV-A conceived the project. LM and SV-A designed the experiments and wrote the paper. LM and DM-V performed the *in silico* analyses. MA, GS, RR, and GO performed the wet lab experiments and analyses. LM and SV-A analyzed and interpreted the data. DP contributed with informatic resources.

### FUNDING

This work was supported by the Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP) grant numbers 2014/03620-2 and 2018/23693-5 to SV-A. LM, GS and RR received FAPESP fellowships (grant numbers 2018/19591- 2, 2018/24015-0 and 2017/22379-2, respectively) and DM-V received a fellowship from Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq). SV-A laboratory was also supported by institutional funds from Fundação Butantan and received an established investigator fellowship award from CNPq, Brasil.

#### ACKNOWLEDGMENTS

We thank Dr. J.C. Setubal for access to the computational facilities of the Bioinformatics Laboratory of Instituto de Química, Universidade de São Paulo (USP). We also acknowledge Patricia Aoki Miyasato and Dr. Eliana Nakano, Laboratorio de Malacologia, Instituto Butantan, for maintaining the *S. mansoni* life cycle.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene.2019.00823/ full#supplementary-material

#### REFERENCES


cis-regulatory elements required for macrophage and B cell identities. *Mol. Cell* 38, 576–589. doi: 10.1016/j.molcel.2010.05.004


explain observed absence of lncRNA translation products. *J. Proteome Res.* 16, 2508–2515. doi: 10.1021/acs.jproteome.7b00085


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2019 Maciel, Morales-Vicente, Silveira, Ribeiro, Olberg, Pires, Amaral and Verjovski-Almeida. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Evaluation of DNA Extraction Methods on Individual Helminth Egg and Larval Stages for Whole-Genome Sequencing

*Stephen R. Doyle1\*, Geetha Sankaranarayanan1, Fiona Allan2, Duncan Berger 1, Pablo D. Jimenez Castro3,4, James Bryant Collins3, Thomas Crellen1,5, María A. Duque-Correa1, Peter Ellis 1, Tegegn G. Jaleta6, Roz Laing7, Kirsty Maitland7, Catherine McCarthy 1, Tchonfienet Moundai 8, Ben Softley 1, Elizabeth Thiele9, Philippe Tchindebet Ouakou8, John Vianney Tushabe1,10, Joanne P. Webster 11, Adam J. Weiss 12, James Lok 6, Eileen Devaney7, Ray M. Kaplan3, James A. Cotton1, Matthew Berriman1 and Nancy Holroyd1\**

*1 Parasites and Microbes, Wellcome Sanger Institute, Hinxton, United Kingdom, 2 Department of Life Sciences, Natural History Museum, London, United Kingdom, 3 Department of Infectious Diseases, College of Veterinary Medicine, University of Georgia, Athens, GA, United States, 4 Grupo de Parasitologia Veterinaria, Universidad Nacional de Colombia, Bogotá, Colombia, 5 Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom, 6 Department of Pathobiology, School of Veterinary Medicine, University of Pennsylvania, Philadelphia, PA, United States, 7 Institute of Biodiversity Animal Health and Comparative Medicine, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, United Kingdom, 8 Ministry of Public Health, N'Djamena, Chad, 9 Department of Biology, Vassar College, Poughkeepsie, NY, United States, 10 Medical Research Council/Uganda Virus Research Institute and London School of Hygiene & Tropical Medicine Uganda Research Unit, Entebbe, Uganda, 11 Centre for Emerging, Endemic and Exotic Diseases, Department of Pathology and Population Sciences, Royal Veterinary College, University of London, Herts, United Kingdom, 12 Guinea Worm Eradication Program, The Carter Center, Atlanta, GA, United States*

Whole-genome sequencing is being rapidly applied to the study of helminth genomes, including *de novo* genome assembly, population genetics, and diagnostic applications. Although late-stage juvenile and adult parasites typically produce sufficient DNA for molecular analyses, these parasitic stages are almost always inaccessible in the live host; immature life stages found in the environment for which samples can be collected noninvasively offer a potential alternative; however, these samples typically yield very low quantities of DNA, can be environmentally resistant, and are susceptible to contamination, often from bacterial or host DNA. Here, we have tested five low-input DNA extraction protocols together with a low-input sequencing library protocol to assess the feasibility of whole-genome sequencing of individual immature helminth samples. These approaches do not use whole-genome amplification, a common but costly approach to increase the yield of low-input samples. We first tested individual parasites from two species spotted onto FTA cards—egg and L1 stages of *Haemonchus contortus* and miracidia of *Schistosoma mansoni*—before further testing on an additional five species—*Ancylostoma caninum*, *Ascaridia dissimilis*, *Dirofilaria immitis*, *Strongyloides stercoralis*, and *Trichuris muris*—with an optimal protocol. A sixth species—*Dracunculus medinensis*—was included for comparison. Whole-genome sequencing followed by analyses to determine the proportion of on- and off-target mapping revealed successful sample preparations for six of the eight species tested with variation both between species and between different life stages from some species described. These results demonstrate the feasibility of whole-genome sequencing of individual parasites, and highlight a new avenue toward

#### *Edited by:*

*Jose F. Tort, University of the Republic, Uruguay*

#### *Reviewed by:*

*Masoud Zamani Esteki, Maastricht University Medical Centre, Netherlands Neil David Young, The University of Melbourne, Australia Guilherme Corrêa De Oliveira, Vale Technological Institute (ITV), Brazil*

#### *\*Correspondence:*

*Stephen R. Doyle stephen.doyle@sanger.ac.uk Nancy Holroyd neh@sanger.ac.uk*

#### *Specialty section:*

*This article was submitted to Evolutionary and Genomic Microbiology, a section of the journal Frontiers in Genetics*

*Received: 26 April 2019 Accepted: 12 August 2019 Published: 20 September 2019*

#### *Citation:*

*Doyle SR, Sankaranarayanan G, Allan F, Berger D, Jimenez Castro PD, Collins JB, Crellen T, Duque-Correa MA, Ellis P, Jaleta TG, Laing R, Maitland K, McCarthy C, Moundai T, Softley B, Thiele E, Ouakou PT, Tushabe JV, Webster JP, Weiss AJ, Lok J, Devaney E, Kaplan RM, Cotton JA, Berriman M and Holroyd N (2019) Evaluation of DNA Extraction Methods on Individual Helminth Egg and Larval Stages for Whole-Genome Sequencing. Front. Genet. 10:826. doi: 10.3389/fgene.2019.00826*

**89**

generating sensitive, specific, and information-rich data for the diagnosis and surveillance of helminths.

Keywords: helminths, genomics, whole-genome sequencing, DNA extraction, low input, diagnostics

#### INTRODUCTION

Accurate methods for diagnosis and surveillance of helminth infections are of increasing interest in both human and animal health settings. Such approaches are typically proposed to monitor the presence and ultimately decline of populations targeted by large-scale control measures, such as mass drug administration (MDA) for the prevention and/or treatment of human helminth infections, or prophylactic treatment of domesticated animals. An ideal diagnostic will be sensitive to detect the parasite if in fact present, and specific, to identify the targeted parasite species in the presence of non-target material, such as other parasite species or the host. Ideally, samples taken for diagnostic purposes could be used to gather additional information beyond the presence or absence of a specific parasite, so the same material could be used, for example, to predict how well the infection will respond to drug treatment or how the parasite is related to other endemic or imported parasites. As most parasitic stages of helminths of humans and animals are naturally inaccessible *in vivo* (not accounting for potential availability of some mature stages of helminths following chemo-expulsion, for example, *Ascaris lumbricoides* and *Trichuris trichiura*), a diagnostic should also be informative on non-invasive stages of the parasite, such as eggs deposited in feces, or intermediate stages of the parasite's life cycle that exist in the external environment.

A key challenge of working with environmental stages of helminth parasites is that they are often immature, for example, eggs or early stage larvae, and extremely small (for example, *Haemonchus contortus* eggs are approximately 75 × 44 μm and *Schistosoma mansoni* miracidia approximately 140 × 55 μm), limiting the amount of accessible material (e.g., DNA) available to be assayed. They are often environmentally resistant, and the same features that naturally protect the DNA from damage prior to reinfection make it difficult to extract DNA. In many cases, they are isolated from host feces and so are susceptible to bacterial contamination or from host tissues and so become contaminated with host DNA. Furthermore, samples may need to be transported efficiently to a laboratory setting without a significant loss of this already limited material. A number of approaches have been tested to preserve macromolecules from individual parasites for transport and storage, including ethanol, RNAlater, and Whatman® FTA® cards, from which robust PCR and microsatellite data could be profiled (Gower et al., 2007; Webster, 2009; Webster et al., 2012; Xiao et al., 2013; Marek et al., 2014; Boué et al., 2017; Campbell et al., 2017). Although, under ideal conditions, the detection of a single DNA molecule is possible, the limited material available per parasite has, to date, largely restricted assaying to a small number of loci, limiting the amount of information obtained from any individual parasite.

Genomic approaches offer an information-rich technology for diagnostic and surveillance applications. Increasing throughput and decreasing costs of whole-genome sequencing has resulted in the recent and steadily growing application of genomics in helminth parasitology, for example, for diagnostic applications, high-throughput amplicon sequencing for helminth species identification and community composition (Avramenko et al., 2015) and the presence of drug resistance alleles (Avramenko et al., 2019) have been described. Although low DNA concentrations are typically prohibitive for genomewide approaches on individual parasites, a number of studies have successfully used whole-genome amplification on DNA extracted from single larval stages to perform reduced representation (Shortt et al., 2017) and exome (Le Clec'h et al., 2018; Platt et al., 2019) sequencing on miracidia of *Schistosoma* spp*.*, and whole-genome sequencing of *Haemonchus contortus* L3 stage larvae (Doyle et al., 2018) and microfilariae of *Wuchereria bancrofti* (Small et al., 2018). Whole-genome amplification protocols do, however, add considerable expense per sample and can introduce technical artefacts, such as uneven and/or preferential amplification (potentially of contaminant sequences), chimeric sequences, and allele dropout (Tsai et al., 2014; Sabina and Leamon, 2015), that may lead to a reduction in genetic diversity, and in turn, relevance to the original unamplified material. The field of genomics is, however, rapidly advancing toward very low minimum sample input requirements, and single-cell protocols for DNA and RNA sequencing are now available. Such approaches have begun to be used on parasitic species, such as *Plasmodium* spp*.* (Trevino et al., 2017; Ngara et al., 2018; Reid et al., 2018; Howick et al., 2019), but are yet to be adopted by helminth parasitologists. Although these low-input, high-throughput approaches are not designed—and perhaps not currently suitable—for diagnostic applications, the development of molecular biology techniques for low-input sequencing could aid in the use of genomics for helminth applications. Here, we tested a number of low-input DNA extraction approaches for individual helminth samples stored on Whatman® FTA® cards, followed by low-input library preparation without whole-genome amplification and wholegenome sequencing. A total of five DNA extraction approaches were initially tested, after which the most promising approach was applied to multiple life stages from seven helminth species (with an additional species presented for comparison). The results presented here demonstrate the advancement of low-input whole-genome sequencing and may be broadly applicable to helminth and non-helminth species for which low-input wholegenome sequencing is to be performed. Finally, we discuss our results in the context of helminth diagnostics and surveillance.

### METHODS

#### Sample Collection

Samples representing accessible, immature life stages of a total of eight helminth species were tested, the collection of which is described below. A number of protocols and substrates or solutions have been described for the safe, effective storage of samples for downstream molecular biology applications; we chose Whatman® FTA® cards as a substrate to store and transport parasite material, as they are a relatively cost-effective method for storing samples without the need for specialized storage and transport conditions, for example, a cold chain from collection to sequencing. Removal of the logistical hurdles and their associated costs is particularly important for collection of specimens from endemic regions, for example, *S. mansoni* miracidia that were isolated from patients and purified in Africa before transported and stored on FTA cards for down-stream processing the UK as described here.

*Ancylostoma caninum:* Fresh feces from a research purposebred laboratory beagle (University of Georgia, AUP A2017 10-016-Y1-A0) infected with the Barrow isolate (drugsusceptible isolate from Barrow County Georgia, USA) were collected and made into a slurry with water, filtered through 425 and 180-µm sieves, and centrifuged at 2500 rpm for 5 min, after which the supernatant was discarded. Kaolin (Sigma-Aldrich, St. Louis, MO) was then added and resuspended in sodium nitrate (SPG 1.25–1.3) (Feca-Med®; Vedco, Inc. St Joseph, MO, USA). The tube was then centrifuged at 2500 rpm for 5 min, after which the supernatant was passed through a 30-µm sieve and rinsed with distilled water, and reduced to a volume of 10 to 15 mL. The volume was adjusted to one egg per 5 µl using distilled water. The eggs were stored at room temperature for 2 h before placing them onto the Whatman® FTA® cards. Eggs were also placed onto Nematode Growth Medium (NGM) plates (Sulston and Hodgkin, 1988) and incubated at 26°C to obtain the firststage (L1) larvae. After 48 h, larvae were rinsed off the plate with distilled water and centrifuged at 1000 rpm for 5 min. Larvae were counted, and the concentration adjusted to one larva per 5 µl. The larvae were stored at room temperature for 2 h before placing them onto the Whatman® FTA® cards. To obtain third-stage (L3) larvae, eggs were isolated from fresh feces from a research purpose-bred laboratory beagle (University of Georgia, AUP A2017 10-016-Y1-A0) infected with the Worthy isolate (Worthy 3.1F3Pyr; multiple-drug resistant isolate originally isolated from a greyhound dog, Florida, USA). Eggs were placed onto NGM plates and incubated at 26°C. After 7 days, larvae were rinsed off the plate with distilled water and centrifuged at 1000 rpm for 5 min. Larvae were counted, and the concentration adjusted to one larva per 5 µl. The larvae were stored at room temperature for 2 h before placing them onto the Whatman® FTA® cards.

*Ascaridia dissimilis*: Eggs of *A. dissimilis* (Isolate Wi: North Carolina, USA) were isolated from excreta of experimentally infected turkeys. Water was added to the excreta and made into a slurry, which was filtered using a 425-µm and 180-µm sieve to remove large debris. The remaining particulates were placed into 50-mL centrifuge tubes and centrifuged at 433*g* for 7 min. Supernatant was removed, and the pellet was resuspended in a saturated sucrose solution with a specific gravity of 1.15. The suspension was centrifuged as before, and eggs were isolated from the top layer. Eggs were rinsed over a 20-µm sieve with water to remove residual sucrose and then concentrated to one egg per 5 µl using deionized water. Multiple 5-µl aliquots of the egg solution were dispensed using a micropipette onto the Whatman® FTA® card.

*Dirofilaria immitis*: Blood was taken from a dog infected with the macrocyclic lactone (ML)-resistant Yazoo strain (Yazoo: originally isolated from a dog in Yazoo City, Mississippi, USA; see Maclean et al. (2017) for complete history). To obtain microfilariae, blood was collected in heparin tubes and centrifuged for 30 min at 2500 rpm, after which the supernatant was discarded. The pellet was suspended in 3.8% sodium citrate (Sigma-Aldrich, St. Louis, MO) and 15% saponin (Sigma-Aldrich) was added in a 1:7 dilution. The tube was then vortexed and centrifuged for 30 min at 2500 rpm, after which the supernatant was discarded and the pellet resuspended in 3.8% sodium citrate to the original blood volume, vortexed, and then centrifuged for 4 min at 2500 rpm. The pellet was then resuspended and mixed in a 1:9 solution of 10 phosphatebuffered saline (PBS) (Thermo Fisher Scientific, Waltham, MA) and distilled water. The tube was then centrifuged for 4 min at 2500 rpm, and the pellet resuspended in PBS. The microfilariae were then counted and adjusted accordingly to have one microfilaria per 5 µl and stored at room temperature for 2 h before placing them onto the Whatman® FTA® cards.

*Dracunculus medinensis*: Individual L1 samples were obtained as progeny of an adult female worm manually extracted from an infected dog in Tarangara village, Chad (9.068611 N, 18.708611 E) in 2016. This extraction forms part of the standard containment and treatment procedure for Guinea worm infections, as agreed upon and sanctioned by the World Health Organization and country ministries of health. The adult worm was submerged in ethanol in a microcentrifuge tube for storage; L1 stage progeny that were found settled on the bottom of the tube were collected for analysis.

*Haemonchus contortus:* Eggs representing the F5 generation of a genetic cross (described in Doyle et al. (2018)) were collected from fresh feces from experimentally infected sheep housed at the Moredun Research Institute, UK. All experimental procedures were examined and approved by the Moredun Research Institute Experiments and Ethics Committee and were conducted under approved UK Home Office licenses (PPL 60/03899) in accordance with the Animals (Scientific Procedures) Act of 1986. Briefly, feces were mixed with tap water and passed through a 210-µm sieve, then centrifuged at 2500 rpm for 5 min in polyallomer tubes. The supernatant was discarded, before adding kaolin to the fecal pellet, vortexing, and resuspending in a saturated salt solution. After centrifugation at 1000 rpm for 10 min, the polyallomer tube was clamped to isolate eggs, which were collected on a 38-µm sieve and rinsed thoroughly with tap water. Eggs were incubated on NGM plates at 20oC for 48 h to hatch to L1 stage larvae. In addition to freshly collected material, eggs collected in the same manner then stored at −20oC, from a previous generation of the cross, were also tested. Eggs and L1 larvae were resuspended in PBS and spotted onto Whatman FTA cards in 3 µl per egg/L1.

*Schistosoma mansoni*: Three collections of *S. mansoni* samples were used in this work. The first were field samples collected from humans on Lake Victoria fishing villages in Uganda as part of the LaVIISWA trial (Sanya et al., 2018). Ethical approval for this trail was given by the Uganda Virus Research Institute (reference number GC127), the Uganda National Council for Science and Technology (reference number HS 1183), and the London School of Hygiene & Tropical Medicine (reference number 6187). Parasite eggs were collected from participants' stool samples using a Pitchford-Visser funnel, washed with mineral water until clean, and transferred into a petri dish with water to be hatched in direct sunlight. After hatching, the miracidia were picked in 2-µl water using a pipette and placed on a Whatman® FTA® card for storage. The second were field samples collected as part of a repeated cross-sectional study of MDA exposure in school children in Uganda. Patient enrolment, including written consent, and sample collection have been described previously (Crellen et al., 2016). Ethical approvals for this study were granted by the Uganda National Council of Science and Technology (MoU sections 1.4, 1.5, 1.6) and the Imperial College Research Ethics Committee (EC NO: 03.36. R&D No: 03/SB/033E). Host stool was sampled 1 to 3 days prior to treatment with praziquantel (40 mg/kg) and albendazole (400 mg). A Pitchford-Visser funnel was used to wash and filter stool to retain parasite eggs. The filtrate was kept overnight in water and hatched the following morning in sunlight. Individual miracidia were isolated with a 20-μL pipette and transferred into petri dishes of nuclease-free water twice before spotting onto Whatman® FTA® cards. The third source of miracidia was derived from the livers of experimentally infected mice kept to maintain the *S. mansoni* life cycle at Wellcome Sanger Institute. Mouse infection protocols were approved by the Animal Welfare and Ethical Review Body (AWERB) of the Wellcome Sanger Institute, and in accordance with the UK Home Office approved project license P77E8A062. The AWERB is constituted as required by the UK Animals (Scientific Procedures) Act 1986 Amendment Regulations 2012. Balb/C mice (6–8 weeks old) were infected with 250 cercariae, after which livers were collected on day 40 postinfection. Eggs were isolated from the liver tissues using collagenase digestion followed by percoll gradient, and were washed well with sterile PBS, before being hatched in sterile conditioned water. The hatched individual miracidia were spotted onto Whatman® FTA® cards.

*Strongyloides stercoralis:* The *S. stercoralis* UPD strain and the isofemale isolate PVO1 were maintained in purpose-bred, prednisone-treated mix breed dogs according to protocol 804883 approved by the University of Pennsylvania Institutional Animal Care and Use Committee (IACUC), USA. IACUC-approved research protocols and all routine husbandry care of the animals were conducted in strict accordance with the Guide for the Care and Use of Laboratory Animals of the National Institutes of Health, USA. Feces were collected, moisturized, and mixed with equal volume of charcoal and cultured at 22oC in 10-cm plates (Lok, 2007). Post-parasitic stage one and two larvae (L1/L2), freeliving male and female adults, and infective third-stage (L3) were isolated by the Baermann technique after 24 and 48 h, and 6 days in these charcoal coprocultures, respectively (Lok, 2007; Jaleta et al., 2017). For this study, free living females (FL), L1 and L3 larvae were collected from the Baermann funnel sediments and washed three times using PBS.

*Trichuris muris*: Infection and maintenance of *T. muris* was conducted as described (Wakelin, 1967). The care and use of mice were in accordance with the UK Home Office regulations (UK Animals Scientific Procedures Act 1986) under the Project license P77E8A062 and were approved by the institutional AWERB. Female SCID mice (6–10 wk old) were orally infected under anaesthesia with isoflurane with a high dose (n = 400) of embryonated eggs from *T. muris* E-isolate. Mice were monitored daily for general condition and weight loss. At day 35 postinfection, mice were killed by exsanguination under terminal anesthesia, after which adult worms were harvested from cecums. Adult worms were cultured in RPMI 1640 supplemented with 10% fetal calf serum (v/v), 2 mM L-glutamine, penicillin (100 U/mL), and streptomycin (100 mg/mL; all Invitrogen), for 4 h or overnight, and eggs were collected. The eggs were allowed to embryonate for at least 6 weeks in distilled water, and infectivity was established by worm burden in SCID mice. *T. muris* eggs were hatched to produce sterile L1 larvae using 32% sodium hypochlorite in sterile water for 2 h at 37°C with 5% CO2. Eggs were washed with RPMI 1640 supplemented with 10% fetal calf serum (v/v), 2 mM l-glutamine, penicillin (100 U/mL), and streptomycin (100 mg/ mL; all Invitrogen), and incubated at 37°C with 5% CO2 for 4 to 5 days until they hatched.

For each species, unless otherwise stated, pools of individuals were washed in sterile PBS, before being transferred to a petri dish. Individuals were identified under the microscope, after which 5-µl PBS containing an individual parasite was transferred onto a Whatman® FTA® card and dried for a minimum of 20 min at room temperature prior to storage or shipping to the Wellcome Sanger Institute, UK. The Whatman® FTA® cards with samples spotted were stored in a clean plastic bag in the dark at room temperature prior to analysis.

#### DNA Extraction

Five DNA extraction methods were tested for their ability to isolate and purify DNA compatible with whole-genome sequencing approaches. The choice of approaches was not systematic or comprehensive, but was available to us at the Wellcome Sanger Institute as low-input DNA extraction approaches that had the potential for high-throughput and/or automatic extraction protocols to generate sequencing libraries suitable for wholegenome sequencing. Despite the arbitrary choice of extraction kits, the comparison of multiple kits with multiple helminth species provides a unique data set that is not easily achieved at this scale outside of a research and development environment, and forms the basis for further comparison with other low-input kits as they become available.

To extract DNA, the sample spots on FTA cards were punched out manually into 96-well plates using either a Harris Punch or autonomously using robotics. The DNA extraction was carried out for each method as described below:

1. *Nexttec* (NXT): Extraction using the Nexttec 1-step DNA Isolation Kit for Tissues & Cells (cat: 10N.904; Waendel Technology Limited, UK) was performed according to manufacturer's guidelines. Proteinase lysis buffer (75 µl) was used for digestion.


When the extracted DNA from all the above methods was present in a volume greater than 25 µl, samples were cleaned with Agencourt AMPure XP beads (Beckman-Coulter) and eluted in 25-µl nuclease-free H2O. The entire DNA samples were used downstream to make sequencing libraries. A summary of the number of species, life stages, and conditions tested, is presented in **Table 1**. The differences in the number of samples per species and per assay were largely dependent on the samples available to us, sampled across different times, and sometimes, sampled for different purposes, for example, while most test conditions contained between 6 and 10 replicates, the large number of *D. medinensis* samples (n = 129 for one condition) were prepared specifically for a different study but we have included it for comparative purposes, whereas the *S. mansoni* samples (n = 168 for five conditions) were readily available in-house at the Wellcome Sanger Institute, and thus were first used for further validation before samples from other species that required collection from live animal hosts.

#### Library Preparation and Sequencing

DNA sequencing libraries for all samples were prepared using a protocol designed for library preparation of laser capture microdissected biopsy (LCMB) samples using the Ultra II FS enzyme (New England Biolabs) for DNA fragmentation as previously described (Lee-Six et al., 2019). A total of 12 cycles of PCR were used (unless otherwise stated in **Table S1**) to amplify libraries and to add a unique 8-base index sequence for sample multiplexing. Prior to sequencing, library concentration was determined using a fluorometric dsDNA quantification assay (AccuClear® Ultra High Sensitivity dsDNA Quantitation Kit; Biotium) following the manufacturer's instructions, and measured using a FLUOstar Omega fluorescence plate reader (BMG Labtech). These data were used to normalized the concentration of library DNA for multiplexing.

Multiplexed libraries were sequenced using the Illumina MiSeq platform with V2 chemistry 150 bp paired end (PE) reads. The *D. medinensis* samples were sequenced as part of a different study using the HiSeq 2500 platform with V4 chemistry 125 bp PE reads. Metadata for each sample, including sample IDs, sequencing lane IDs, ENA sample accession numbers, and data generated are described in **Table S1**. Raw sequence data used in this study is available under the European Nucleotide Archive (ENA; https://www.ebi.ac.uk/ena) study ID ERP114942.

#### Analysis

We performed sufficient low coverage sequencing on each sample to enable us to identify: (i) the proportion of on-target mapped reads, (ii) the proportion of duplicate reads, i.e. library artefacts, and (iii) the proportion of off-target contaminant reads. On-target reads are defined as reads that mapped to the genome

TABLE 1 | Summary of the species, life stages, and conditions tested using one or more of the DNA extraction approaches followed by whole-genome sequencing.


Doyle et al. Whole Genome Sequencing of Helminth Larvae

of the species from which DNA extraction was performed, and thus, is a proxy for the specificity of the experiment. Similarly off-target reads are defined as sequencing reads that do not map to the genome of the species from which DNA was extracted, but are derived from putative contaminants. The presence and proportion of contaminant-derived sequencing reads in each sequencing library was further analyzed using the kmer-based classification tool, kraken (v0.10.6-a2d113dc8f) (Wood and Salzberg, 2014), which we have used to assign taxonomic labels to raw sequencing data. Kmers from raw reads are compared with a kraken database (custom database built at WSI containing human [GRCh38] and mouse [GRCm38] genomes, all plasmid, bacterial, and viral genomes, as well as all Illumina adapters, from NCBI at the date of generation [v.pi\_qc\_2015521]); any raw sequencing reads that match kmers present in the database are defined as contaminants, whereas reads that do not match the database are defined as "kraken\_unassigned." As no helminth genomes are present in the kraken database, these reads of interest will be in the kraken\_unassigned category, for example, a high proportion of "kraken\_unassigned" reads are indicative of low contamination (or at least low known contaminants present in the kraken database).

Reference genomes from each of the test species were obtained from the helminth genome repository WormBase Parasite (Howe et al., 2017) Release 12. Raw sequence data for each species were mapped to their respective reference genome (which included the mitochondrial genome) using *bwa* (v0.7.17-r1188) (Li, 2013) *mem* using default parameters (with the inclusion of -Y and -M options to use soft clipping for supplementary alignments, and to mark shorter hits as secondary, respectively), after which duplicate reads were marked using *Picard* (v2.5.0; https://github.com/broadinstitute/ picard). *Samtools* (v1.3) flagstat and *bamtools* (v 2.3.0) stats were used to characterise the outcome of the mapping, the results of which were collated using *MultiQC* v1.3 (Ewels et al., 2016). Data were manipulated and visualized in the *R* (v3.5.0) environment using the following packages: *ggplot2* (https://ggplot2.tidyverse. org/), *patchwork* (https://github.com/thomasp85/patchwork), and *dplyr* (https://dplyr.tidyverse.org/).

The relative ratio of the mitochondrial to nuclear genome was calculated using *bedtools* (v2.17.0) (Quinlan and Hall, 2010) *makewindows* and *samtools* (v1.6) *bedcov*. This was performed for each life stage of each species described in **Table 2**, whereby average read coverage from the mapped reads (bam files) were determined for the autosomes (characterised as defined chromosomes for *S. mansoni* and *H. contortus*, otherwise, the longest 10 scaffolds for the remaining species were used) and mitochondrial genomes. These data were used to compare the theoretical multiplexing that could be performed per species, either for whole-genome sequencing to achieve 30× genomewide coverage, or alternatively, sufficient sequencing to achieve 100× coverage of the mitochondrial genome (see **Table 2** for results). Here, these data were calculated based on Illumina high-throughput sequencing using the NovaSeq 6000 system with S4 2 × 150 bp PE chemistry generating 2.5 terabases (Tb) of data and accounting for 85% mapping rate. However, to aid in the design of experiments for other species and sequencing platforms, the number of samples that can be multiplexed can be determined by:

#### *Multiplex WGS* ( )


Where the "genome size" is the estimated number of base pairs in the genome, the "targeted genome coverage" is the intended number of reads covering every base pair of the genome, and the "average mapping efficiency" is the typical proportion of mapped reads from a sequencing library. This latter metric is difficult to predict before performing any

TABLE 2 | Breakdown of sequencing strategies per species based on whole-genome sequencing at 30× coverage and whole-genome sequencing to achieve 100× whole mitochondrial genome coverage.


*1Genome size: Obtained from Wormbase Parasite v12.*

*2Multiplex: total number of samples per NovaSeq 6000 sequencing run with S4 2* × *150 bp PE chemistry generating 2.5 Tb of data and accounting for 85% mapping rate. Theoretically possible, but may be limited by barcoding available.*

*3mtDNA/nuclear ratio: based on coverage of mitochondrial and nuclear-derived sequencing reads, normalized to the nuclear genome coverage.*

sequencing, however, knowledge of this from prior sequencing experiments can improve the actual coverage achieved by applying this scaling factor.

This equation can be extended to low-coverage whole-genome sequencing to target a defined mitochondrial DNA coverage using the following:

where "mtDNA to nuclear ratio" is the number of mitochondrial genomes per nuclear genome, which can be determined by relative reads counts performed here, or by alternative molecular approaches, such as qPCR.

The code to reproduce the analysis and figures for this manuscript is described in https://github.com/stephenrdoyle/ helminth\_extraction\_wgs\_test.

#### RESULTS AND DISCUSSION

The aim of this work was to determine the feasibility and efficiency of using a low input DNA extraction and library preparation approach for whole-genome sequencing of individual egg and larval stages of helminth parasites. We have targeted immature life stages that are found in the environment for which it is possible to collect samples non-invasively. We first tested our approaches on the nematode *H. contortus* and the trematode *S. mansoni*. Five approaches were tested using single egg (fresh and frozen) and L1 of *H. contortus*, and miracidia of *S. mansoni.*  Attempts to quantify the raw DNA extractions were inconsistent and largely unsuccessful, due to the extremely low DNA yield per extraction; in the life stages tested, we expect this yield to be in the range of 10 to 100 s of pg of DNA per sample. Quantification of the sequencing libraries of these extractions did however yield sufficient DNA, and revealed significantly more DNA recovered using the CGP and PIP extractions than the other three protocols (**Figure 1**). Although not a direct measure, sequencing library concentration is likely a sufficient proxy for DNA extraction efficiency, given that the library preparation was standardized across all samples.

We determined the success of the library preparation protocols by comparing (i) the proportion of reads mapped to the genome, representing "on-target" mapping as a measure of specificity (**Table S1**: mapped\_reads\_percent); (ii) the proportion of reads that matched a "contamination" database, which did not contain helminth genome sequences, and thus represented DNA derived from "off-target" sources such as host

described in Table S1.

or bacterial species commonly present in these samples; those reads that do not match the contamination database are defined as "kraken\_unassigned" and are putative helminth-derived reads (**Table S1**: kraken\_unassigned\_percent); and (iii) the proportion of duplicate reads, which typically represent library preparation artefacts due to over-amplification of DNA during PCR (**Table S1**: duplicate\_reads\_percent).

We examined the impact of library strength on mapping frequency for all DNA extraction protocols tested (**Figure 2**). For *H. contortus*, a clear correlation between these two variables were observed, with an inflection point at approximately 0.25 ng/µl, below which the proportion of reads mapped rapidly decreased toward 0 (**Figure 2A**). Similarly, below this point, the proportion of reads classified as contamination increased in frequency (**Figure 2B**). There was, however, a distinct difference between *H. contortus* and *S. mansoni* in the overall proportion of reads mapped, and the frequency of contaminating reads, with greater variation in both parameters in the *S. mansoni*  samples. While some of this variation may reflect differences in extracting DNA from the two species, the majority of *S. mansoni* MIR samples were isolated and aliquoted onto FTA cards under less clean conditions in the field and stored for between 2 and 5 years before processing, as compared with the laboratory prepared *H. contortus* samples that were collected and prepared directly from a current infection (EGG and LS1) or within the previous 6 months (EGGf) of processing. We achieved some on-target mapping for all extraction kits tested (**Figure S1A**); however, significant variation was observed between approaches. BSP and NXT generally performed poorly, with either low (median = 19.42; median absolute deviation [MAD] = 16.50) or significant variance (median = 66.20; MAD = 26.18) in mapping frequency between samples observed, respectively. PIP performed consistently well with high mapping rates across all stages in *H. contortus* (median = 90.30; MAD = 3.96); however, it was poor in *S. mansoni* (median = 11.00; MAD = 12.73). The duplication rate of all conditions were within an acceptable low range (median = 0.46, MAD = 0.4), with only 2% of samples having greater than 5% duplicate reads (**Figure S1B**). This suggests that the low DNA input did not noticeably impact duplication rates, and therefore, with greater sequencing depth of the same sequencing libraries, we would expect unbiased genome-wide coverage (at least relative to sequencing of libraries derived from higher DNA input). CGP and FGM performed most consistently between stages and species; however, FGM had higher variance and duplication rates relative to CGP across all samples tested. Considering the

sequencing library sample, colored by extraction protocol (A) or proportion of putative contaminating reads (100 - percent\_unclassified\_reads) (B). The dashed grey vertical line represents a library concentration of 0.25 ng/µl. Extraction protocols were performed on *Haemonchus contortus* eggs (EGG), frozen eggs (EGGf), and L1 (LS1) stages, and *Schistosoma mansoni* miracidia (MIR). The number of samples in each comparison is presented in Table 1, and the raw data used are described in Table S1.

higher library strength (suggestive of greater DNA recovery efficiency), consistent mapping efficiency, and the cost effective protocol, we chose the CGP extraction to explore further.

We expanded our analysis of the CGP protocol to a total of seven helminth species for which samples were available, including a total of five distinct life stages (**Figure 3**). High variability in mapping was observed between species, with 50% or greater mapping frequency achieved in at least one life stage of five of the seven species tested (**Figure 3A**, **Figure S2A**). Clear differences were observed between multiple life stages tested within a species, likely reflecting differences in extraction efficiency per life stage, for example: for *A. canium,* reads from eggs (median = 54.98) mapped much more effectively than L1 (median = 3.61) or L3 stages (median = 7.57), and in *S. stercoralis,* full-length females (median = 67.84) and L1 (median = 47.19) performed better than L3 (median = 9.01). The proportion of contaminating reads tended to increase as library concentration decreased for all species (**Figure 3B**); however, some samples for which mapping was poor but higher libraries concentrations were obtained were highly enriched for contaminants, for example, *A. canium* L1 larvae and *T. muris* eggs. Interestingly, *T. muris* L1 larvae (median = 41.85) and bleached eggs (median = 51.35) performed much better than untreated eggs (median = 2.72); bleaching is experimentally used for promoting hatching of *T. muris*, by dissolving the egg shell layers and in turn improving access to DNA within. However, bleached eggs were embryonated and developmentally more advanced than untreated unembryonated eggs, and therefore would have more DNA available for library preparation. Similar to untreated *T. muris* eggs, *A. dissimilis* eggs performed poorly (median = 0.16); for both species, few if any nematode sequences were recovered and the majority of sequencing reads were contaminating bacterial-derived contaminants, perhaps indicative of the challenge of accessing material with the environmentally resistant egg. Analysis of *A. dissimilis* was further limited by the lack of a reference genome for this species; we mapped against the available genome of *Ascaris lumbricoides*, the nearest species for which a reference was available, and therefore at best, would have expected suboptimal mapping due to sequence divergence. The duplication rates remain low for all species and conditions tested (**Figure S2B**), consistent with our initial observation that library input did not significantly influence duplication rates (**Figure S1B**).

We also generated data from microfilaria of *D. medinensis*; these samples were extracted using NXT rather than CGP as part of a separate study; however, we have included these for comparison (**Figure S3**). High mapping rates were observed for almost all libraries generated; however, the duplication rates per

library were also high (median = 36.90); this was unexpected, given NXT did not produce high duplication rates under the initial conditions tested with *H. contortus* or *S. mansoni*, nor were there excessive PCR cycles used in this instance. Duplication rates were not correlated with library concentration.

In summary, we present successful DNA extraction followed by whole-genome sequencing of individual egg and larval stages from six of eight parasites species examined. These results significantly extend the possibility of genomic analyses for life stages for which, at best, were limited to low-resolution, low-throughput PCR based assays without the addition of whole-genome amplification. Whatman® FTA® cards provide a convenient substrate for sample collection and storage, and do not limit the application of direct DNA extraction and wholegenome sequencing of parasite samples, even for field samples as demonstrated for *S. mansoni* miracidia that were collected and processed in Uganda before they were transported to the UK. Further optimization is required to improve the DNA recovery from eggs, for example, from *A. dissimilis* and *T. muris*, to provide greater applicability of our approaches to species that generate particularly environmentally resistant stages, such as the soil transmitted helminths. The application of whole-genome sequencing to diagnose and monitor helminth infections *at scale* is largely limited by the costs of library preparation and sequencing, and therefore, will be restricted to niche applications of the technology. However, the use of lowcoverage whole-genome sequencing with the specific aim to target the mitochondrial genome, which is present at a higher copy number than the nuclear genome (**Table 2**), may be a costeffective alternative and potentially provides greater diagnostic information than low-throughput PCR-based diagnostics. Continued development of genomic technologies and the associated reduction in sequencing and library preparation costs will make screening large samples by genome sequencing more routine as in viral (Dudas et al., 2017) and bacterial (Domman et al., 2018) population studies. In doing so, the ability to derive high resolution data may provide insight into, for example, (i) within host reproductive dynamics, including within host population size, differential fecundity (Hildebrandt et al., 2014), and reproductive traits such as polyandry (Doyle et al., 2018), (ii) defining parasite transmission zones and rates of transmission between zones to prioritize treatment foci (Crawford et al., 2019; this special edition, in review), (iii) defining effective population sizes of parasite populations (Sallé et al., 2019), and using this to estimate the impact of control strategies over time, and/or (iv) discrimination between reintroduction and recrudescence of parasites in regions where parasite control has been successful (Koala et al., 2019). The use of genomics to provide informationrich data will be increasingly important for diagnostic and surveillance purposes broadly (Cotton et al., 2018), and will be particularly informative as efforts to control human infective helminths using MDA move from control to elimination.

#### DATA AVAILABILITY

The datasets generated for this study can be found in ENA, ERP114942.

#### ETHICS STATEMENT

Given a number of host/parasite systems are described, we have provided information about the individual ethical approvals obtained associated with each species in the "Sample Collection" subheading of the *Methods* section.

### AUTHOR CONTRIBUTIONS

Conceptualization: SRD, JAC, NH; Methodology: SRD, GS, NH; Software: SRD; Formal analysis: SRD; Investigation: SRD, GS; Resources: FA, TJ, JL, JBC, PC, JW, TC, ET, MD-C, PE, RL, KM, CM, TM, BS, PO, JT, AW, ED, RK, DB, MB; Data curation: SRD, NH; Writing – original draft preparation: SRD; Writing – review and editing: All authors; Visualization: SRD; Supervision: SRD, JAC, NH; Project administration: NH.

### FUNDING

Work performed at the Wellcome Sanger Institute is supported by the Wellcome Trust (grant 206194) and by the Biotechnology and Biological Sciences Research Council (BB/M003949/1). We thank Alison Elliot for access to *S. mansoni* samples collected from Uganda, the collection of which was supported by the Wellcome Trust (grant 095778/Z/11/Z). *S. mansoni* samples were also obtained from the Schistosomiasis Collection at the Natural History Museum (NHM) [SCAN], which is funded with support from the Wellcome Trust (grant no. 104958/Z/14/Z).

### ACKNOWLEDGMENTS

We thank the Carter Center for supporting molecular work on D. medinensis, and the Guinea Worm Eradication Program for making samples available. We also thank Ian Still from DNA Pipelines at Wellcome Sanger Institute for his expertise in protocol development.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene.2019.00826/ full#supplementary-material

TABLE S1 | Complete metadata per sample (excel workbook)

FIGURE S1 | Summary boxplots of mapped on target reads (A) and duplicate reads (B) for all extraction kits tested. The number of samples tested per kit, per life stage is shown at the top of (A).

FIGURE S2 | Summary boxplots for mapped on target reads (A) and duplicate reads (B) for all species tested using the CGP protocol. The number of samples tested per species, per life stage is shown at the top of (A).

FIGURE S3 | Analysis of *D. medinensis* using the NXT protocol. (A) Mapped "on-target" reads. (B) Duplicate reads. (C) Comparison of the effect of sequencing library concentration on mapping efficiency, colored by the proportion of duplicate reads. (D) Comparison of the effect of sequencing library concentration on mapping efficiency, colored by the proportion of putative contaminant reads identified using Kraken (100 - percent\_unclassified\_reads).

### REFERENCES


sequencing of archived *Schistosome miracidia. Parasitology* 145 (13), 1739– 1747. doi: 10.1017/S0031182018000811


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2019 Doyle, Sankaranarayanan, Allan, Berger, Jimenez Castro, Collins, Crellen, Duque-Correa, Ellis, Jaleta, Laing, Maitland, McCarthy, Moundai, Softley, Thiele, Ouakou, Tushabe, Webster, Weiss, Lok, Devaney, Kaplan, Cotton, Berriman and Holroyd. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# A Case for Using Genomics and a Bioinformatics Pipeline to Develop Sensitive and Species-Specific PCR-Based Diagnostics for Soil-Transmitted Helminths

#### *Jessica R. Grant1\*, Nils Pilotte1,2 and Steven A. Williams1,2*

*1 Department of Biological Sciences, Smith College, Northampton, MA, United States, 2 Molecular and Cellular Biology Program, University of Massachusetts, Amherst, MA, United States*

#### *Edited by:*

*Makedonka Mitreva, Washington University Medical Center, United States*

#### *Reviewed by:*

*Emmanuel Dias-Neto, A.C.Camargo Cancer Center, Brazil John Stuart Gilleard, University of Calgary, Canada*

> *\*Correspondence: Jessica R. Grant jgrant@smith.edu*

#### *Specialty section:*

*This article was submitted to Evolutionary and Genomic Microbiology, a section of the journal Frontiers in Genetics*

*Received: 26 April 2019 Accepted: 21 August 2019 Published: 23 September 2019*

#### *Citation:*

*Grant JR, Pilotte N and Williams SA (2019) A Case for Using Genomics and a Bioinformatics Pipeline to Develop Sensitive and Species-Specific PCR-Based Diagnostics for Soil-Transmitted Helminths. Front. Genet. 10:883. doi: 10.3389/fgene.2019.00883*

The balance of expense and ease of use vs. specificity and sensitivity in diagnostic assays for helminth disease is an important consideration, with expense and ease often winning out in endemic areas where funds and sophisticated equipment may be scarce. In this review, we argue that molecular diagnostics, specifically new assays that have been developed with the aid of next-generation sequence data and robust bioinformatic tools, more than make up for their expense with the benefit of a clear and precise assessment of the situation on the ground. Elimination efforts associated with the London Declaration and the World Health Organization (WHO) 2020 Roadmap have resulted in areas of low disease incidence and reduced infection burdens. An accurate assessment of infection levels is critical for determining where and when the programs can be successfully ended. Thus, more sensitive assays are needed in locations where elimination efforts are approaching a successful conclusion. Although microscopy or more general PCR targets have a role to play, they can mislead and cause study results to be confounded. Hyperspecific qPCR assays enable a more definitive assessment of the situation in the field, as well as of shifting dynamics and emerging diseases.

Keywords: soil-transmitted helminth, molecular diagnostics, DNA diagnostics, polymerase chain reaction (PCR), quantitative PCR

## INTRODUCTION

Parasitic worms impact the health and economic well-being of billions of people worldwide. Soiltransmitted helminths (STH) are a burden in the tropics and subtropics and contribute to an estimated 1.9 to 2.1 million disability-adjusted life years (DALYs) and US \$7.5 billion to US \$138.9 billion in loss of productivity (Bartsch et al., 2016; Kyu et al., 2018). Efforts are underway to eliminate STH, with the ambitious goal of controlling morbidity by the year 2020 (Becker et al., 2018; Uniting to Combat NTDs). Mass drug administrations (MDA) and water, sanitation and hygiene (WASH) programs across endemic countries are making headway (Hicks et al., 2015; Truscott et al., 2016; Weatherhead et al., 2017; Truscott et al., 2019), but with 2020 fast approaching, there are still many challenges to reaching this goal. An important concern is where to enact and when to cease MDA. This depends on accurately mapping the current burden in communities (2018 Action Group Meeting). Sensitive, species-specific diagnostics are critical to properly evaluating the success of these programs, as well as addressing where to focus efforts and when interventions can be ended (Weatherhead et al., 2017).

Diagnostic techniques need to be inexpensive, practical, and give consistent results across technicians and laboratories. Importantly, they must be accurate, sensitive, and easily interpreted. Microscopy has long been relied on as the standard for diagnosis of intestinal parasites, including soil transmitted helminths (Beaver and Martin, 1968). Several copromicroscopic methods are in use, including FLOTAC (Cringoli et al., 2010), MINI-FLOTAC (Maurelli et al., 2014), several modifications of the McMaster technique (Mines, 1977), and Kato-Katz (Katz et al., 1972) (KK). Of these, KK is the most commonly used for STH diagnosis because it is relatively easy to perform in the field and is generally more sensitive than other microscopic methods (Moser et al., 2018). With any of these tests, even highly trained microscopists can misidentify species or give inconsistent results (Krauth et al., 2012), and they are notoriously insensitive in regions with low infection rates (Nikolay et al., 2014; Buonfrate et al., 2015; Speich et al., 2015; Acosta Soto et al., 2017).

Molecular diagnostics have been garnering more interest in the last few years, as their superior sensitivity has been proven and their acceptance by the research community has increased (Easton et al., 2016; Halfon et al., 2017; Holt et al., 2017). However, as with any advance, there are technical problems to overcome. For example, DNA extraction efficiency and preservation of samples prior to testing will affect diagnostic reliability (Andersen et al., 2013; Sarhan et al., 2015; Hidalgo et al., 2018; Papaiakovou et al., 2018). Notably, *Trichuris trichiura* eggs are notoriously difficult to break open, and this impacts the sensitivity of molecular assays, but techniques are being developed and improved to the point where consistently good results are achievable (Harmon et al., 2006; Nunes et al., 2006; Kaisar et al., 2017). Although molecular diagnostics are not inexpensive, microscopy techniques are also expensive and can be difficult to scale up, whereas the costs of qPCR have the potential to decrease, as studies show that multiple technical replicates may not be crucial and other cost-cutting measures, such as cheaper, more effective sample preservation and pooling are explored (Easton et al., 2017; Papaiakovou et al., 2018; Truscott et al., 2019). Until recently, most PCR-based assays have targeted well-characterized and conserved regions, such as ITS and 18S (Verweij and Stensvold, 2014; Hii et al., 2018), but increased availability of whole-genome sequence data is facilitating the discovery of more sensitive and species-specific genomic targets (Pilotte et al., 2016a; Papaiakovou et al., 2017).

Repetitive elements are essential parts of eukaryotic genomes that have structural and regulatory functions (Shapiro and Sternberg, 2005; López-Flores and Garrido-Ramos, 2012), and different types of repetitive DNA elements have been studied and classified (Charlesworth et al., 1994; Plohl et al., 2008; López-Flores and Garrido-Ramos, 2012; Biscotti et al., 2015). Ribosomal DNA is found in repeat arrays (Lafontaine and Tollervey, 2001; Pruesse et al., 2007). These have traditionally been used for primer design and can give sensitive results depending on the size of the array (Verweij and Stensvold, 2014). However, the repeat is oftentimes conserved between species and even genera, and rDNA-based assays are often less specific than those designed from other repeat types (Pilotte et al., 2016; O'Connell et al., 2018).

Tandemly repeated DNA is classified by the size of the repeated monomer, resulting in microsatellites (< 9 bp), minisatellites (< 15 bp in arrays of 0.5–30 kb), and satellites (satDNA, up to ~200 bp per monomer, in megabase-sized arrays) that are generally enriched within the centromeric, pericentromeric, and subtelomeric regions of the chromosome (López-Flores and Garrido-Ramos, 2012). Copy number can be quite variable in mini- and micro-satellites but larger satDNA monomers are more consistent within species (López-Flores and Garrido-Ramos, 2012). Microsatellites and minisatellites are not useful for assay design, as the repeats tend to be too short to allow for primer/probe design. Larger satDNA monomers, on the other hand, offer the best options for assay design, in that the repeat monomers are an optimal size for qPCR, they are extremely abundant, and the copy number is relatively stable within species (López-Flores and Garrido-Ramos, 2012).

Other types of repetitive DNA include transposons and retrotransposons, which are dispersed throughout the genome. These include short interspersed nuclear elements (SINEs) which are 100 to 500 bases long, and long interspersed nuclear elements (LINEs) which are larger—6,000 or 7,000 bases long. These may also be useful for repeat-based assay design (Funakoshi et al., 2017).

The amount of repetitive DNA in any eukaryotic species is variable and can make up quite a large percentage of the genome. A recent study of parasitic worms revealed that both genome size and repeat content of the genomes range widely, with repeat elements forming up to 37% of the genomes in STH of interest (**Table 1**). Repetitive elements make up even greater percentages in other eukaryotes (Brindley et al., 2003; Wickstead et al., 2003; Shapiro and Sternberg, 2005; Coghlan et al., 2019), up to an astonishing 97% in some plants (Flavell et al., 1974; Piednoël et al., 2012). These repetitive elements, because of their abundance in the genome, provide targets for molecular assays of exquisite sensitivity. In addition, since many appear unbound by selective pressure, they can be highly species-specific. This combination of improved sensitivity and species-specificity makes repetitive elements a prime target for molecular diagnostics. However, as mentioned above, some forms of repetitive DNA, such as simple short nucleotide repeats, will be unsuitable for assay design. Another potential stumbling block is sequence variation in the repeat itself as polymorphism in the primer and probe sites will decrease the sensitivity of the assay. However, although the repeats we have targeted have not been specifically investigated, maintenance of homogeneity in repetitive elements by concerted evolution has been discussed in relation to other repeats in other species (Ganley and Kobayashi, 2007; Teruel et al., 2014). Concerted evolution of repeats conserves the sequence within a species while allowing significant heterogeneity between species. There can, of course, be some variation within repeats (Onyabe and Conn, 1999; Lindner and Banik, 2011) which could lessen the sensitivity of repeat-based molecular assays. Copy number variation of repeats between individuals may also impact the sensitivity of these assays in some populations. Studies have


TABLE 1 | Genome size of representative helminth species and the amount/percent of genome masked as repetitive. Low complexity repeats and simple (for example di- or tri-nucleotide repeats) are not targets for molecular assays.

found population-level copy number variation in ribosomal repeats of different species (Bik et al., 2013; Schaap et al., 2013; Mascagni et al., 2018; Zhao and Gibbons, 2018), but there is a suggestion that there are both copy number variable-type repeats and constant-type repeats, whose copy number is consistent within species (Umemori et al., 2013). An understanding of copy number variation and its impact on assay sensitivity will likely need to be studied on an individual species-by-species basis.

The value of repeats as diagnostic tools has been understood for some time but these sequences were more difficult to find in the pre-genomics era (McReynolds et al., 1986; Meredith et al., 1989; Chanteau et al., 1994; Nekrutenko et al., 2000; Hamburger et al., 2001; Rao et al., 2002; Demas et al., 2011; Lodh et al., 2016). Now, abundant genomic data and robust bioinformatics tools are available to make these targets easier to identify and use in developing PCR-based assays. The pipeline leading from low coverage NGS data to hyper-sensitive and specific qPCR assay is not overly complicated or time-consuming, and the ability to repurpose low coverage NGS data from other studies makes this an attractive option for diagnostic development for helminthology and many other fields.

#### FINDINGS

Diagnostics give an estimate of the true prevalence of a disease, with the probability of correctly estimating the truth given by the sensitivity and species-specificity of the diagnostic. WHO guidelines for when to treat a community are informed by the prevalence of disease in that community, and are thus influenced by the sensitivity of the diagnostic used. A study modeling the probability of making the correct treatment decisions given WHO guidelines for treatment and varying the true prevalence and diagnostic sensitivity shows that there is a significant difference in outcome when more sensitive diagnostics are used (Medley et al., 2016). Medley et al. (Medley et al., 2016) found it especially true in areas of intermediate true prevalence (between 30% and 50%). More sensitive tests allowed the correct treatment decision to be made more often in intermediate cases.

The aforementioned study measured outcome by looking at DALYs and found that these were not as influenced by diagnostic sensitivity in low or high prevalence areas. However, there are other reasons to prefer more sensitive tests in low prevalence areas. As the goals of the WHO elimination programs are reached, there will be pressure to reallocate the funds spent on MDA to other programs. A well-defined threshold under which recrudescence will not occur is critical to preventing the reoccurrence of disease after the completion of MDA. Restarting such programs would be extremely difficult and expensive once they have ended. Modeling has shown that the threshold must be based on true prevalence (Truscott et al., 2017; Ásbjörnsdóttir et al., 2017), which can only be accurately estimated with highly sensitive and species-specific diagnostics. Improved diagnostics are crucial to meet this need (Kongs et al., 2001; Andersen et al., 2013; Clarke et al., 2018; Hidalgo et al., 2018). Post-MDA surveillance is also necessary. With highly sensitive diagnostics, the reappearance of disease can be recognized and addressed before infection levels rise, increasing the probability of controlling the recrudescence (Farrell and Anderson, 2018). In addition to testing human populations for infection, vectors that transmit helminths (or other parasites) or intermediate hosts can be screened to track disease prevalence in the community without the need for taking human samples (Sanpool et al., 2012; Pilotte et al., 2016b; Ramírez et al., 2018; Zaky et al., 2018). Pooling of samples is a common way to decrease cost but diagnostic tests used for the screening of pooled samples need to be highly sensitive (Masny et al., 2016; Okorie and de Souza, 2016; Pilotte et al., 2017), especially when the infection level is low, so as not to miss positive results in dilute samples.

In recent years, efforts to sequence nematode genomes by groups, such as the International Helminth Genomes Consortium, have made great strides in increasing the availability of helminth genomic sequence data (Howe et al., 2016). Although many of the genomes are in draft form, this is sufficient for probing the genome for species-specific repeats. Our method for recovering highly repetitive sequence from low depth, raw, short-read genome sequence data uses the Galaxy-based tool RepeatExplorer (Novák et al., 2010; Novák et al., 2013; Pilotte et al., 2016a). Originally used to investigate repeat sequences in plant genomes, RepeatExplorer takes as input short-read nextgeneration sequence data and creates graph-based clusters based on the similarity of the sequences. In these graphs (see **Figure 1**), each read is represented by a node, and each sequence pair (by default defined as ≥90% identity over at least 55% of the read lengths) is represented by an edge. The density of the graph represents the number and similarity of reads in the cluster. In

FIGURE 1 | Description of RepeatExplorer cluster output. (A) Short tandem repeats, including many satellite sequences, form characteristic "star burst"-type clusters. Because they are tandem, and such repeats are of similar size or shorter than the length of an individual sequence read, a very high percentage of reads within the cluster meet the RepeatExplorer-defined criteria for pair formation. This results in each read successfully pairing with a very high percentage of the other reads assigned to the cluster. Because nearly all of the reads within the cluster are paired with nearly all of the other reads within this same cluster, a compact network of very short edges forms between reads. This in turn generates a very dense cluster with a core of paired reads possessing nearly identical sequences. If of sufficient length for assay design, the consensus sequences for these clusters make ideal diagnostic targets, as they contain the greatest number of repeats per read. (B) Long tandem repeats characteristically result in "doughnut"-like clusters. In such clusters, neighboring reads within the underlying scaffold meet the criteria for pair formation. However, because the length of the repeat monomers generating these clusters is longer, reads may be significantly shorter than the monomer itself. This results in many reads within the cluster that do not meet the criteria for pair formation as they map to different regions of the same monomer unit. Yet because they are tandemly arranged, reads spanning a repeat-repeat junction will meet the pair formation criteria, closing the sequence "loop" and resulting in a "doughnut"-shaped cluster. (C) Long interspersed repeats, such as transposable elements form characteristic "line"-type clusters. While reads neighboring each other in the underlying scaffold meet the criteria for pair formation, the extended length of a repeat monomer means that distant reads within a single monomer will not meet this threshold. This results in similar pairings to those seen in clusters generated from long tandem repeats. However, because these elements are interspersed, reads do not span repeat-repeat junctions, so a "loop" is not formed, and clusters attain a linear appearance.

low-depth sequence data sets, low copy number sequences will not be well represented and will, therefore, graph as individual nodes or small clusters, whereas high copy number repetitive sequences will be found in dense clusters. The number of reads in a given cluster, combined with the structure and density of that cluster, can be used as a proxy for the representation of the number of repeats of that sequence in the genome. The copy number of the repeat will impact the sensitivity of the assay, since each copy in the genome will be an additional target for the assay. In addition, RepeatExplorer and its sister software TAREAN (Novák et al., 2017) provide information on the count of individual nucleotides in the repeat contigs. These counts can be used to design assays to the most conserved regions, limiting the problems associated with intragenomic variation. We have developed, and made available here, custom Python scripts to parse the output from RepeatExplorer and return highly repetitive sequences (https:// github.com/JessicaGrant/RepeatTargetScripts). Primer/probe qPCR assays targeting sequences discovered using this technique have been shown to amplify as little as 20 ag of genomic DNA, or less than the amount of DNA found in a single egg (Pilotte et al., 2016a). Care is needed in choosing which repeat to select for use as a diagnostic target, as some may be found in closely related species (Williams et al., 2000; Rao et al., 2006); however, similar to what has been reported in the literature (Subirana and Messeguer, 2013; Subirana et al., 2015), we have found that many repetitive sequences are species-specific. There may be times when a more general assay—one that will amplify several species of the same genus, for example—may be desired (Rao et al., 2006). A careful search of RepeatExplorer output can often reveal both species- and genera-specific targets.

Diagnostics based on targets discovered using this technique have proven useful in both past and ongoing field tests in Kenya (Pickering et al., 2019), Bangladesh (Benjamin-Chung et al., 2019), Ethiopia, Uganda, Timor Leste (Papaiakovou et al., 2017), Thailand (O'Connell et al., 2018), Liberia (Fischer et al., 2018), Japan, Benin, Malawi, India, and the Southern US, and have been adopted for use by large operational research efforts, such as the DeWorm3 cluster randomized trials (Ásbjörnsdóttir et al., 2018). However, testing biological samples, whether for diagnosing individuals or getting an overview of the epidemiological environment of a region, involves a myriad of factors, such as unexpected or emerging parasites, zoonotic infections, and misleading material in the samples. Although most of the criticism of the KK technique has been on the lack of sensitivity and potential for missed infection, there is also the risk of false positives, for example, mistaking other material in stool as eggs (Speich et al., 2015). Some fecal elements may resemble parasite ova, depending on environmental or dietary factors. Confounding elements may include pollen grains, fungal spores, diatoms, or any number of items. An entire chapter in Ash and Orihel's "Atlas of Human Parasitology" is dedicated to artifacts in fecal samples that can mislead copromicroscopic diagnostics (Cushion et al., 1990). Thus, this problem is more frequent than many researchers realize or acknowledge; what follows are some examples demonstrating the importance of this issue.

A field study comparing KK with repeat-based qPCR in Bangladesh (Benjamin-Chung et al., 2019) found that hookworm species and *Trichuris trichiura* prevalence, as measured by qPCR, was significantly higher than was measured by KK. This was expected, given the greater sensitivity of the qPCR assays and the results of many previous studies comparing KK and PCR (Pontes et al., 2003; Stensvold et al., 2006; Knopp et al., 2014; Pilotte et al., 2016a; Easton et al., 2017; Ng-Nguyen et al., 2017). For *Ascaris*, however, prevalence as measured by KK was significantly higher than by qPCR. This surprising result was investigated further, both by qPCR targeting a different part of the *Ascaris* genome, and also by amplicon sequencing that targeted the 18S gene of all eukaryotes in several of the KK positive/qPCR negative samples. All of the samples that were positive by KK but negative by the initial qPCR assay were also negative using the second qPCR target. Additionally, the 18S amplicon sequencing revealed no *Ascaris* in these samples, but did find it in the control samples that were positive by both KK and qPCR. Not one organism was found in the amplicon sequencing that could explain all of the false-positive results. What material in the samples had confounded the microscopists is still unknown, but there was no evidence by any of the molecular assays that *Ascaris* was present in the samples. Had the study relied on copromicroscopic results alone, the conclusion would have been that MDA or WASH interventions were less effective as an *Ascaris* intervention than they likely were, since the true prevalence of the parasite was in fact much lower than was measured by KK.

In a similar case, higher than expected rates of hookworm were noted in a survey of children in rural Rwanda. Further investigation suggested these results may have been confounded by *Caenorhabditis elegans* eggs (Irisarri-Gutiérrez et al., 2016). Additional examples of misidentification of hookworm ova as other eggs (Ralph et al., 2006; Werneck et al., 2007; Yong et al., 2007) show that such confusion may be a more common problem than previously thought. Thus, relying solely on microscopy may be misleading in some instances.

Discrepancies can occur between molecular assays as well, since some PCR targets are less species-specific than others. In developing our pipeline for repeat-based primer discovery, a previously published qPCR assay targeting the internal transcribed spacer region was compared against our newly developed assay targeting an *Ancylostoma duodenale* speciesspecific repeat (Llewellyn et al., 2016; Pilotte et al., 2016a). Surprisingly, the repeat-based assay failed to detect any of the samples that were determined to be positive for *A. duodenale* by the ITS-based assay. A previously published semi-nested PCR assay (George et al., 2015; Chidambaram et al., 2017) and Sanger sequencing later determined that the discordant results were due to all of the infections being the zoonotic species *Ancylostoma ceylanicum*. The ITS of these two species is highly conserved in the region targeted by the original qPCR assay and so the ITS-based assay did not distinguish between these closely related species. Our repeat-based *A. duodenale* assay, on the other hand, only detects *A. duodenale*, and so all of the *A. ceylanicum-*containing samples were negative. We have since used our pipeline to develop a species-specific qPCR assay for *A. ceylanicum* (Papaiakovou et al., 2017), which is more sensitive and specific than the ITS-based assay and easier to use than semi-nested PCR.

In a similar case, the same ITS-based primer set (Llewellyn et al., 2016) was used to investigate *Ancylostoma duodenale* in a field study of a refugee population in Thailand (O'Connell et al., 2018). Since the most common human *Ancylostoma* parasite is *A. duodenale*, the results were initially believed to indicate *A. duodenale* infection. Again, however, a corroboratory qPCR targeting the highly specific *A. duodenale* repeat failed to detect any *A. duodenale.* Use of the *A. ceylanicum*-specific assay (Papaiakovou et al., 2017), as well as confirmation with semi-nested PCR and Sanger sequencing, revealed that all of these infections were, in fact, caused by *A. ceylanicum* and not *A. duodenale.* In this case, although the more general ITSbased assay misdiagnosed the species causing the infection, the more specific assay for the expected parasite (*A. duodenale*) would have missed the infection. This highlights a risk of using extremely specific qPCR assays in the field if the precise parasite community is unknown. It also highlights that *A. ceylanicum* may be a much more common human pathogen than previously supposed. Here, the repeat-based, species-specific assays can be used to identify the true prevalence of various related parasite species.

The specificity of *Trichuris trichiura* (whipworm) detection by microscopy is assumed to be high, given the relatively distinct morphology of *Trichuris* eggs. However, there are several species of *Trichuris*, including some that infect companion or farm animals. Distinction between species of *Trichuris* relies on size differentiation, but there are overlaps in some species making misdiagnosis by microscopy possible. The most common species in humans is *T. trichiura*, but *Trichuris suis,* which commonly infects pigs, and *Trichuris vulpis,* usually found in dogs, have also been found infecting humans (Areekul et al., 2010; Mohd-Shaharuddin et al., 2019). Microscopy and genera-specific qPCR assays may be confounded by these zoonotic species. Discordance between the ITS-based *Trichuris* qPCR assay and the repeat-based qPCR assay has been noted in one study where all but one of the discordant results were later shown to be *Trichuris ovis*, a species found in sheep and goats that has not been known to infect humans (Pilotte et al., 2016a). Whether this is evidence of human infection or merely false positives due to close contact with infected animals or ingesting food contaminated with animal feces is an open question. In these cases, qPCR targeting highly species-specific repetitive targets alone could easily miss the prevalence of zoonotic infection, leading to a misunderstanding of the health of the population. On the other hand, the use of the repeat-based, highly specific assays gives a true picture of the prevalence of various species.

Despite the distinct morphology of *Trichuris*, the potential of misidentifying unexpected infection with zoonotic species is not the only risk when relying solely on microscopy for investigating whipworm. A study of STH in two regions in Liberia that used both microscopy and qPCR found discordance between the tests for *T. trichiura* in one of the regions (Fischer et al., 2018), with 25 of 27 putatively positive samples for *T. trichiura*, as determined by KK, being negative by qPCR. In the second region, the agreement between the two tests was high. In this case, the discrepancies were investigated further, and the eggs were determined to be a species of *Capillaria,* a human parasite that is associated with eating raw fish and was not expected to be found in this region. Reinvestigation by microscopy in this case elucidated subtle differences in the eggs found in the KK-positive, qPCR-negative samples. A microscopist, highly trained and expecting to have to differentiate between two extremely similar egg morphologies, could have noticed the difference and provided a correct result. However, since *Capillaria* had not previously been reported in Liberia, the eggs that looked like *Trichuris* were reported as such. Without the complementary qPCR, this mistake would not have been discovered. This study also found discrepancies between the microscopy and qPCR measuring the prevalence of *Ascaris lumbricoides* in the same region where *Capillaria* was discovered. Surprisingly, these samples were also determined to be confounded by the presence of *Capillaria*, which can appear rounder and have more subtle polar plugs, leading to its misidentification as the eggs of *A. lumbricoides*. These examples highlight the tendency for microscopists to sometimes see what they are looking for. In this case, the unexpected presence of *Capillaria* eggs misled the microscopists and would have resulted in significant misinterpretation of the distribution of STH species in the study region.

#### CONCLUSION

The elimination of STH is a worthy goal, given the distress and disability they cause to a large portion of the global population. The goal is attainable, but will not be easily reached. Monitoring and evaluation of progress is critical and depends on highly accurate reporting. Species-specific repeat-based target discovery and qPCR assays deliver this accuracy. Microscopy and generaspecific molecular assays have a place in this effort, especially in surveys where full mapping of parasite diversity has not occurred. These tools, used in conjunction with highly sensitive and specific molecular assays targeting repetitive elements, can give a clear and accurate assessment where one tool alone could yield misleading results.

New target development is fairly easy, since the web-based bioinformatics tool RepeatExplorer provides output in a manner that makes finding repeat-based qPCR targets straightforward. Only a skim of the genome is necessary, so the data needed to develop new assays is already available for many species and is fairly inexpensive to produce for a species whose genome has not yet been sequenced. We have used this technique to explore new diagnostics for other nematode and protist parasites, and we think that the pipeline for assay development has potential for improving diagnostic sensitivity for many other classes of infectious agents.

Correct identification of species is of interest if one wants to understand evolution, biogeography, and emerging disease. The treatment for infection with one species of helminth is often the same as for another; however, we would argue that lumping all species of a genus together is careless. It is not clear that different species respond the same way under the same drug treatment, and misidentification of species in the past might be confounding the results of studies of resistance or drug response based on microscopy alone. Different species within a genus may also vary in their capacity for animal infection, resulting in some parasites having animal reservoirs while others remain obligate human pathogens. A more detailed understanding of the underlying community structure will offer crucial insight into subjects, such as antihelminthic resistance, emerging or zoonotic diseases, and optimal threshold levels for elimination of disease.

#### BIOINFORMATIC RESOURCES

RepeatExplorer (Novák et al., 2010; Novák et al., 2013) is made freely available here:

http://repeatexplorer.org/

We make our scripts available for others to adapt and use here:

https://github.com/JessicaGrant/RepeatTargetScripts

#### AUTHOR CONTRIBUTIONS

Conceptualization: JG, NP, and SW. Funding acquisition: SW. Writing—original draft: JG. Writing—review and editing: JG, NP, SW.

#### REFERENCES


#### FUNDING

This work received financial support from the Coalition for Operational Research on Neglected Tropical Diseases (CORNTD), which is funded at The Task Force for Global Health primarily by the Bill & Melinda Gates Foundation, by the UK aid from the British government, and by the United States Agency for International Development through its Neglected Tropical Diseases Program. In addition, this work was partially funded through a grant to the DeWorm3 Project, which is funded by a grant to the Natural History Museum from the Bill and Melinda Gates Foundation.

#### ACKNOWLEDGMENTS

The authors would like to thank Marina Papaiakovou for helpful conversations, as well as Dr. Judd Walson and Dr. D. Timothy J. Littlewood (Ásbjörnsdóttir et al., 2018). We also thank Dr. Eric Ottesen and Dr. Patrick Lammie (Task Force for Global Health, Decatur, GA, USA) for their ongoing support and advice.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2019 Grant, Pilotte and Williams. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Profiling Transcriptional Regulation and Functional Roles of *Schistosoma mansoni* c-Jun N-Terminal Kinase

*Sandra Grossi Gava1, Naiara Clemente Tavares1, Franco Harald Falcone2,3, Guilherme Oliveira4 and Marina Moraes Mourão1\**

*1 Laboratório de Helmintologia e Malacologia Médica, Instituto René Rachou, Fundação Oswaldo Cruz, Belo Horizonte, Brazil, 2 Allergy and Infectious Diseases Laboratory, Division of Molecular Therapeutics and Formulation, School of Pharmacy, University of Nottingham, Nottingham, United Kingdom, 3 Institute of Parasitology, BFS, Justus Liebig University, Giessen, Germany, 4 Environmental Genomics Group, Instituto Tecnológico Vale, Belém, Brazil*

#### *Edited by:*

*Jose F Tort, University of the Republic, Uruguay*

#### *Reviewed by:*

*Sergio Verjovski-Almeida, Butantan Institute, Brazil Pengfei Cai, QIMR Berghofer Medical Research Institute, Australia Thomas Quack, University of Giessen, Germany*

*\*Correspondence:*

*Marina Moraes Mourão marina.mourao@fiocruz.br*

#### *Specialty section:*

*This article was submitted to Evolutionary and Genomic Microbiology, a section of the journal Frontiers in Genetics*

*Received: 28 June 2019 Accepted: 27 September 2019 Published: 18 October 2019*

#### *Citation:*

*Gava SG, Tavares NC, Falcone FH, Oliveira G and Mourão MM (2019) Profiling Transcriptional Regulation and Functional Roles of Schistosoma mansoni c-Jun N-Terminal Kinase. Front. Genet. 10:1036. doi: 10.3389/fgene.2019.01036*

Mitogen-activated protein kinases (MAPKs) play a regulatory role and influence various biological activities, such as cell proliferation, differentiation, and survival. Our group has demonstrated through functional studies that *Schistosoma mansoni* c-Jun N-terminal kinase (SmJNK) MAPK is involved in the parasite's development, reproduction, and survival. SmJNK can, therefore, be considered a potential target for the development of new drugs. Considering the importance of SmJNK in *S. mansoni* maturation, we aimed at understanding of SmJNK regulated signaling pathways in the parasite, correlating expression data with *S. mansoni* development. To better understand the role of SmJNK in *S. mansoni* intravertebrate host life stages, RNA interference knockdown was performed in adult worms and in schistosomula larval stage. SmJNK knocked-down in adult worms showed a decrease in oviposition and no significant alteration in their movement. RNASeq libraries of SmJNK knockdown schistosomula were sequenced. A total of 495 differentially expressed genes were observed in the SmJNK knockdown parasites, of which 373 were down-regulated and 122 up-regulated. Among the down-regulated genes, we found transcripts related to protein folding, purine nucleotide metabolism, the structural composition of ribosomes and cytoskeleton. Genes coding for proteins that bind to nucleic acids and proteins involved in the phagosome and spliceosome pathways were enriched. Additionally, we found that SmJNK and Smp38 MAPK signaling pathways converge regulating the expression of a large set of genes. *C. elegans* orthologous genes were enriched for genes related to sterility and oocyte maturation, corroborating the observed phenotype alteration. This work allowed an in-depth analysis of the SmJNK signaling pathway, elucidating gene targets of regulation and functional roles of this critical kinase for parasite maturation.

Keywords: *Schistosoma mansoni*, mitogen-activated protein kinases, c-Jun N-terminal kinase, RNA interference, signaling pathways, gene expression

**111**

## INTRODUCTION

Schistosomiasis is one of the most common human parasitic diseases whose socioeconomic impact affects people in developing nations in the tropics and sub-tropics. Preventive chemotherapy (PC) with Praziquantel is required in 52 countries, where only in 2017, 220 million people needed PC for schistosomiasis; however, only 44.9% of this demand was covered (WHO, 2018). Despite the control efforts, schistosomiasis continues to be a major public health problem, principally due to schistosomiasisrelated morbidity (French et al., 2018; Nascimento et al., 2019).

*Schistosoma mansoni* kinome containing 252 eukaryotic kinase proteins (ePKs) was first described by Andrade and co-authors (Andrade et al., 2011) based on an earlier version of *S. mansoni* genome (Berriman et al., 2009). Recently, 351 kinase genes with evidence of being transcribed in nearly all adult stages were described, of which 268 were PKs and an additional 83 were non-PKs (Grevelding et al., 2017). Although protein kinases have been recognized for years as suitable targets for drug development (Cohen, 2002; Cai et al., 2017), experimental functional evidence exists for only 40 *S. mansoni* proteins, showing that there is still need for further research.

The c-Jun N-terminal kinase signaling pathway is involved in the developmental regulation of various organisms. Its role has already been demonstrated in oocyte maturation and embryogenesis of *Xenopus laevis* (Bagowski et al., 2001) and in the spindle assembly during the mouse oocyte meiotic maturation (Huang et al., 2011). In *S. mansoni*, SmJNK (Smp\_172240) was previously characterized, demonstrating its importance for the establishment of infection in the mammalian host (Andrade et al., 2014). SmJNK knockdown caused damage to the adult worms' tegument and impaired the maturation of vitelline organs. This resulted in lower oviposition and greater susceptibility to the host immune system. Furthermore, the SmJNK ortholog in *Schistosoma haematobium* was pointed as one of the prioritized druggable kinase targets due to its essentiality based on lethal gene knock-down or knock-out phenotypes in other organisms (Stroehlein et al., 2015).

Here, we sought to assess the role of SmJNK in adult worms *in vitro*, further advancing in unraveling the functions of this critical protein and putative druggable target. Also, we elucidated genes that are target of regulation by SmJNK knockdown, aiming to identify novel parasite-specific targets in addition to contributing to a better understanding of this signaling pathway. Additionally, we checked for a crosstalk between SmJNK pathway and the previously reported data from Smp38 (Avelar et al., 2019). Finally, we identified specific targets that correlate with previously observed phenotypes of developmental impairment (Andrade et al., 2014; Avelar et al., 2019) which is highly desirable for the design of new anti-schistosome drugs.

#### METHODS

#### Parasites

*S. mansoni* adult worms of LE strain were recovered from hamster periportal perfusion 40 days after cercariae percutaneous infection (Pellegrino and Siqueira, 1956). Schistosomula were obtained by mechanical transformation of cercariae as previously described (Milligan and Jolly, 2011). Cercariae were supplied by the Mollusk Room 'Lobato Paraense' of the René Rachou Institute–FIOCRUZ, where the parasite cycle is routinely maintained. The sporocysts were prepared following the protocol previously described (Mourão et al., 2009). This work was approved by the Oswaldo Cruz Foundation's Ethics Committee for Animal Use (CEUA) under number LW12/16, according to the Brazilian national guidelines set out in Law 11794/08.

#### DsRNA Synthesis

For the dsRNA synthesis, an SmJNK mRNA fragment corresponding to a region of approximately 570 bp, previously cloned into pGEM-T Easy vector, was amplified by PCR. Primers and cycling conditions used in this study were previously designed and established by Andrade and collaborators (Andrade et al., 2014). After amplification, PCR products were separated on 1% agarose gels, purified using QIAquick Gel Extraction Kit (Qiagen) and used as the template for dsRNA synthesis. DsRNA synthesis was performed using the T7 RiboMAX Express RNAi Systems kit (Promega) according to the supplier's protocol. DsRNA integrity and annealing were verified on 1% agarose electrophoresis.

#### SmJNK Knockdown in Adult Worms by RNA Interference

After perfusion, males and females adult worms were washed and separated manually. Then, eight males and eight females were placed separately in each well containing 100 μL of RPMI 1640 medium with 25 μg of dsRNA, in two technical replicates. The worms were electroporated with specific SmJNK dsRNA or unspecific GFP dsRNA into 4 mm cuvettes at 125 V for 20 ms and cultivated in 24-well plates with 1 mL RPMI 1640 medium supplemented with 10% heat-inactivated Fetal Bovine Serum and 2% Penicillin/Streptomycin. Unless stated otherwise, all culture reagents were from Gibco, Thermo Fisher Scientific. Worm motility was assessed using the WormAssay software (Marcellino et al., 2012) for 10 days, in which eight worms were cultured in 24-well tissue culture plates containing 1 mL of medium. Similarly, to count the number of eggs laid, eight worm pairs were electroporated and cultured in 6-well plates, the medium was changed daily, and the eggs counted. Also, on days 3, 5, 7, and 10 after electroporation, two worm pairs per day were removed and macerated with TRIzol Reagent (Thermo Fisher Scientific) for RNA extraction as described below. SmJNK expression in adult worms was assessed by quantitative real-time PCR (RT-qPCR). Experiments were performed in three biological replicates. Data were analyzed using unpaired t-test with Welch's correction.

#### SmJNK Knockdown in Schistosomula by RNA Interference

Schistosomula cultures (~500,000 worms/condition) were maintained in bottles with 10 mL of Glasgow Minimum Essential Medium (GMEM) (Sigma-Aldrich) supplemented with 0.2 µM triiodothyronine (Sigma-Aldrich); 0.1% glucose; 0.1% lactalbumin (Sigma-Aldrich); 20 mM HEPES; 0.5% MEM vitamin solution (Gibco, Thermo Fisher Scientific); 5% Schneider's Insect Medium (Sigma-Aldrich); 0.5 µM Hypoxanthine (Sigma-Aldrich), 1 µM hydrocortisone (Sigma-Aldrich), 1% Penicillin/Streptomycin (Gibco, Thermo Fisher Scientific), and 2% heat-inactivated Fetal Bovine Serum (Gibco, Thermo Fisher Scientific). SmJNK dsRNA was added shortly after schistosomula transformation to a final concentration of 100 nM. Cultures were kept in a Bio-Oxygen Demand incubator (B.O.D.) at 37°C, 5% CO2, and 95% humidity. Two independent biological replicates were performed. Data were analyzed using unpaired t-test with Welch's correction.

#### Total RNA Extraction

RNA extraction was performed three, five, seven, and ten days after adult worm electroporation with SmJNK dsRNA. For schistosomula, the RNA extraction was performed two days after exposure to SmJNK dsRNA, as previously shown, reduction in transcript levels was greater at this exposure time (Andrade et al., 2014). For RNA extraction, TRIzol Reagent (Invitrogen) was used associated with the RNeasy Mini Kit (Qiagen) according to the manufacturer's guidelines. Samples were treated with Turbo DNase (Ambion, Thermo Fisher Scientific) for removal of genomic DNA. RNAs were quantified using a Qubit Fluorometer 2.0 (Invitrogen) and, for RNASeq, RNA integrity was assessed using the Agilent RNA 6000 Pico kit and BioAnalyzer 2100 (Agilent Technologies).

#### Gene Expression Analysis by Quantitative Real-Time PCR (RT-qPCR)

The cDNA was synthesized using SuperScript™ III Reverse Transcriptase (Invitrogen, Thermo Fisher Scientific) or Illustra Ready-To-Go RT-PCR Beads (GE Healthcare). qPCR was performed using Power SYBR® Green Master mix (Applied Biosystems, Thermo Fisher Scientific) on an ABI 7500 RT-PCR system (Applied Biosystems, Thermo Fisher Scientific) to evaluate SmJNK knockdown. Primers and cycling conditions used in this study were previously designed and established by Andrade and collaborators (Andrade et al., 2014). Cytochrome C oxidase I gene (Smp\_900000) was used to normalize RNA input from schistosomula (Andrade et al., 2014). FAD dependent oxidoreductase domain containing protein (SmFAD, Smp\_089880) and actin-related protein 10 (Sm-arp 10, Smp\_093230) genes were used to normalize RNA input from adult worms (**Supplementary Table S1**). In each plate, two internal controls were included to assess both reagent and genomic DNA contamination (RNA samples). Post-RNAi, SmJNK transcript levels were analyzed using the relative 2-ΔΔCt method (Livak and Schmittgen, 2001) and expressed as the percentage of difference compared to the untreated or unspecific control.

To evaluate SmJNK expression among the different stages of *S. mansoni* we used qPCR absolute quantification using copy number standards, employing a curve containing five points of an SmJNK clone 10-fold dilution. The copy number of each dilution was calculated by the ratio of the clone molecular mass to the Avogadro constant as described (Lee et al., 2006), and the absolute copy number of the SmJNK transcript at each stage was estimated by interpolation of the sample Ct using a standard curve, and expressed as copy number per ng of total RNA.

#### High-Throughput mRNA Sequencing (RNASeq)

After confirming SmJNK knockdown, 2 µg of total RNA extracted from untreated and SmJNK knocked-down schistosomula, two days after dsRNA exposure, were subjected to mRNA purification, fragmentation, cDNA synthesis, and library construction. Next generation barcode multiplexed sequencing libraries were constructed using the TruSeq stranded mRNA kit (Illumina) and pooled in equimolar concentrations. Libraries were sequenced as 100-base paired-end reads using HiSeq 2500 (Illumina) using a HiSeq Rapid SBS sequencing kit v.2 (Illumina). Raw sequence data were preprocessed using the standard Illumina pipeline to segregate multiplexed reads. Sequence quality was assessed using the FastQC program (http://www.bioinformatics.babraham.ac.uk/projects/ fastqc). Reads were then mapped to the *S. mansoni* reference genome (v. 7) using the STAR program (v. 2.6.1d) using default parameters except the following: –outSJfilterReads Unique, –sjdbOverhang 99, –outFilterType BySJout, –outFilterMultimapNmax 20, – alignSJoverhangMin 8, –alignIntronMin 25, and –outSAMattributes All (Dobin et al., 2013).

#### Screening of Differentially Expressed Genes

Downstream analysis was carried out using the R statistical software package (v. 3.3.2) (R Development Core Team, 2011). To ascertain genes that significantly change after SmJNK dsRNA exposure, we used DESeq2 (v. 1.12.4) (Love et al., 2014) to normalize the data and performed statistical analysis. Genes with adjusted p-value <0.01 were considered significantly differentially expressed. Clustering analysis was performed using the heatmap function from the gplots (v. 3.0.1) package and PCA plots were generated using the ggplot2 (v. 2.1.1) package.

Gene ontology (GO) (Ashburner et al., 2000; The Gene Ontology Consortium, 2017) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) (Kanehisa and Goto, 2000; Kanehisa et al., 2016; Kanehisa et al., 2017) pathway analysis were used to identify the enriched molecular functions and associated biological pathways of differentially expressed genes (DEGs). The g:Profiler tool (version e95\_eg42\_p13\_f6e58b9) (Reimand et al., 2007; Raudvere et al., 2019) was applied to perform the GO enrichment and KEGG pathway analysis. The g:SCS method was applied for computing multiple testing correction using the padjusted values with a 0.05 threshold. The ReViGO tool (Supek et al., 2011) was used to cluster, summarize, and visualize the list of significantly enriched GO terms based on their semantic similarities (Allowed similarity: 0.4).

We also compared DEGs of schistosomula SmJNK knockeddown with those found in schistosomula Smp38 knocked-down (Avelar et al., 2019). The down-regulated DEGs that overlap in both datasets were subjected to the BioMart tool (https://parasite.

wormbase.org/info/Tools/biomart.html) from WormBase ParaSite (Howe et al., 2017) to search for orthologs in the nematode Caenorhabditis elegans. The orthologous genes were then analyzed using the Worm Enrichr enrichment analysis tool for *C. elegans* (https://amp.pharm.mssm.edu/WormEnrichr/) to search for the RNAi phenotypes associated with this gene list (Chen et al., 2013; Kuleshov et al., 2016).

#### Quantitative Real-Time PCR Validation of Differentially Expressed Genes

Quantitative PCR was performed to validate the expression levels of selected differentially expressed genes. Ten genes were selected randomly; whereas five of them are among the DEGs exhibiting large fold changes (**Supplementary Table S1**). Specific primers were designed using PrimerQuest tool (https:// www.idtdna.com/Primerquest/Home/Index) and/or Primer3 software (http://primer3.ut.ee/) and obtained from Integrated DNA Technologies (IDT). The sequences of all primers and the final concentrations established for each qPCR reaction are presented in **Supplementary Table S1**. PCR efficiency for each pair of specific primers was estimated by titration analysis to be 100% ± 10%. RNA extraction, purification, cDNA synthesis, and RT-qPCR reactions were carried as described above. SmFAD and Sm-arp 10 genes were used to normalize RNA input (**Supplementary Table S1**).

#### RESULTS

#### SmJNK Expression Levels Among *S. Mansoni* Developmental Stages

The expression profile of SmJNK in developmental stages of *S. mansoni* (cercariae, two and seven days schistosomula, adult males, adult females, and sporocysts) was investigated by quantitative PCR. Absolute quantification was employed to assess SmJNK expression among the different developmental stages. The SmJNK gene exhibited the highest expression levels (~4,000 copies/ng of total RNA) in two days schistosomula. This was six times the amount presented in adult males that exhibited the second highest expression levels (~635 copies/ng of total RNA) (**Figure 1A**). Cercariae, sporocysts and seven days transformed schistosomula presented approximately the same amount of SmJNK transcripts.

#### SmJNK Knockdown in Adult Schistosomes

Adult worms electroporated with SmJNK dsRNA showed up to 62% reduction in SmJNK transcript levels on the fifth day after electroporation (**Figure 1B**). After the successful establishment of SmJNK knockdown in adults, we evaluated phenotypic changes, including worm movement and oviposition. SmJNK knockdown in adult worms did not result in any significant changes in male nor female movement for ten days (**Supplementary Figures S1A, B**). However, a significant reduction of 67% in egg laying was observed in worms electroporated with SmJNK dsRNA (**Figure 2**).

#### SmJNK Knockdown in Schistosomula and Quality Assessment of RNASeq Libraries

After confirming successful SmJNK knockdown by RT-qPCR in schistosomula (**Figure 1C**), four paired-end libraries were generated yielding a total of 158 M paired-end reads with 100 bps as average sequence length, with 38–40% guanine-cytosine (GC) content (**Table 1**). FastQC quality assessment indicated that RNASeq data presented a high quality and were adequate for downstream transcriptome analysis. Raw reads were submitted to Sequence Read Archive (SRA; http://www.ncbi. nlm.nih.gov/sra), under accession numbers: PRJNA354932 and PRJNA492452. Sequence mapping to *S. mansoni* reference

FIGURE 1 | SmJNK expression profile among developmental stages of *S. mansoni* and after RNAi of adult worms *in vitro*. (A) SmJNK transcript levels in the different *S. mansoni* life stages; cercariae, two and seven days schistosomula, males, females, and sporocysts. Bar graph depicting the absolute SmJNK transcript levels presented as copy number per ng of total RNA. Bars represent the standard deviation of the mean of three technical replicates. (B) Bar graph depicting the relative SmJNK transcript levels in adult worms three, five, seven, and ten days after electroporation with SmJNK dsRNA. Bars represent the relative values of SmJNK transcripts compared to the untreated control (■), or to the unspecific control-GFP (■). The dashed line represents the values of the normalized controls. Data are represented as mean fold-difference (+/− SE). Asterisks represent statistically significant differences. Data were analyzed using Unpaired t-test with Welch's correction (N=3, \**p* < 0.05, \*\**p* < 0.01, \*\*\**p* < 0.001). (C) SmJNK transcript levels in schistosomula exposed to SmJNK dsRNA before RNASeq experiments. Bar graph depicting relative SmJNK transcripts levels in schistosomula two days after exposure to SmJNK dsRNA (■) compared to the untreated control (■). Data are represented as mean fold-differences (+/− SE). Transcript levels were determined by RT-qPCR and data analyzed using the ΔΔCt method (\*p < 0.05, N = 2). Unpaired *t* test with Welch's correction.

TABLE 1 | Summary of sequenced datasets and mapping to the *S. mansoni* reference genome.


genome (v. 7) resulted in uniquely mapped reads ranging from 80.94 - 84.16%.

The correlation degree between the biological samples was investigated by hierarchical clustering using the heatmap function of the DESeq2 package and means of the Euclidean distances. Clustering analysis grouped libraries from biological replicates in the same branch (**Figure 3A**). Principal component analysis (PCA) was also performed with the genes by multidimensional scaling of the data matrix (**Figure 3B**). The PCA plot of the first two components showed a clear separation between the control and SmJNK dsRNA treated samples in the first dimension.

#### Potential SmJNK Regulated Gene Targets Identified in Schistosomula After RNAi

After checking for data consistency and sample quality, the transcriptional profile of schistosomula two days after SmJNK dsRNA exposure was compared to the profile of the untreated control using the DESeq2 package in R. We found 495 DEGs using the DESeq2 package (padj <0.01) (**Figure 4A**). Ten of the DEGs identified among the transcripts were selected for validation by RT-qPCR (**Figure 4B**) (**Supplementary Table S1**). All the analyzed genes showed consistent expression trend between the RNASeq and RT-qPCR analyses, although with some differences in log2FoldChange values, which is expected between two different methodologies and among different biological replicates. Significant correlation (R2 = 0.7474, p = 0.0012) between the log2FoldChange of the employed methodologies confirms the accuracy of the data (**Figure 4C**).

A list of all statistically significant DEGs from schistosomula exposed to SmJNK dsRNA can be found in the **Supplementary Table S2**. Of the 495 DEGs detected in the samples in which SmJNK expression was reduced, 373 are down-regulated and 122 up-regulated in comparison to untreated controls.

FIGURE 4 | Differentially expressed genes (DEGs) after SmJNK knockdown in schistosomula. (A) MA plot depicting a DESeq2 analysis to identify DEGs between SmJNK dsRNA and untreated control. The log-fold change for each transcript is plotted against the mean of normalized counts; each point corresponds to one gene. Significantly altered gene expressions are highlighted in red (padj < 0.01). (B) RT-qPCR validation of the differentially expressed genes in response to SmJNK knockdown. Ten DEGs were selected from a range of up-regulated and down-regulated genes. Expression levels were quantified by RT-qPCR (gray) and the results were compared to those obtained by the RNASeq approach (black). (C) Pearson's correlation of Log2FoldChange in differentially expressed transcripts between RNASeq and RT-qPCR analysis.

#### Gene Set Enrichment Analysis

The g:Profiler tool was used to elucidate the functional roles of the 495 DEGs. DEGs were categorized into three GO categories (GO Biological Process, GO Molecular Function, and GO Cell Component) and KEGG Pathways. The significantly enriched subcategories are shown in **Figure 5** and **Supplementary Table** 

**S3**. We observed a decrease in expression for genes encoding proteins related to (1) the ribosome structural composition, (2) cytoskeleton-related, (3), purine nucleotide metabolism (4) protein folding, (5) splicing mechanisms, (6) phagosomes, and (7) binding to nucleic acids. For DEGs that showed increased expression, no subcategory showed significant enrichment (**Figure 5A**).

FIGURE 5 | Gene Ontology Enrichment of down-regulated genes from SmJNK knockdown schistosomula. (A) Manhattan plot illustrating the enrichment analysis results separated into four categories: GO : MF (Molecular Function), GO : BP (Biological Process), GO : CC (Cellular Component), and KEGG Pathways. The number in the source name in the x-axis labels shows how many significantly enriched terms were found. The circle corresponds to term size. The term location on the x-axis is fixed and terms from the same GO subtree are located closer to each other. Interactive graph of the representative subset of the GO terms related to biological process (B) and molecular function (C) enriched in down-regulated genes from SmJNK knockdown schistosomula. Bubble colors indicate the p-value and size indicates the frequency of the GO term in the underlying Gene Ontology Annotation database. Highly similar GO terms are linked by edges, where the line width indicates the degree of similarity.

After summarizing the list of GO terms using the ReViGO tool, we found not redundant GO terms and ordered according to the log10 p-value. Focusing on enriched GO categories from Biological Process categories found among the downregulated genes, we found that 23 GOs were representative in SmJNK knockdown schistosomula, due to their uniqueness, we highlight: protein folding, biosynthetic process, adenosine triphosphate (ATP) hydrolysis coupled cation transmembrane transport, ATP metabolic process, regulation of cellular catabolic process, gene expression, nitrogen compound metabolic process, RNA splicing, cellular amide metabolic process, protein import into nucleus, positive regulation of proteolysis, translational initiation, and cellular nitrogen compound metabolic process (**Figure 5B**, **Table 2**). The GO terms included in the molecular function category were summarized in 22 GO terms. We draw attention to structural constituent of ribosome, hydrogen ion transmembrane transporter activity, unfolded protein binding, lactate dehydrogenase activity, peptidyl-prolyl cis-trans isomerase activity, drug binding, translation initiation factor activity, and nucleic acid binding (**Figure 5C**, **Table 2**).

#### Crosstalk Between SmJNK and Smp38 MAPK Pathways

Since the JNK and p38 MAPK signaling pathways are triggered by multiple stimuli that simultaneously activate both pathways and consequently share several upstream regulators (Wagner and Nebreda, 2009), we compared the DEGs found here with those found for Smp38 knocked-down schistosomula (Avelar et al., 2019). The SmJNK and Smp38 MAPK pathways in *S. mansoni* presented an intersection of 311 and 89 genes with decreased and increased expression, respectively (**Figure 6**).

Of the 311 down-regulated genes identified in common for SmJNK and Smp38 knockdown schistosomula, we found orthologs for 221 of them, which correspond to 291 different genes in *C. elegans*, since some of the genes present more than one ortholog. The *C. elegans* orthologs are enriched in genes related to the RNAi phenotypes as described: sterile, embryonic lethal, maternal sterile, larval arrest, transgene subcellular localization variant, slow growth, among others (**Figure 7** and **Supplementary Table S4**).

#### DISCUSSION

We advanced the understanding of SmJNK gene function by demonstrating that SmJNK is involved in parasite reproduction *in vitro*. We observed a significantly lower number of eggs recovered from SmJNK knocked-down adult worms compared to the control group. It was previously shown that SmJNK knockdown decreased parasite survival and maturation within the definitive host, and that female worms treated with SmJNK dsRNA showed undifferentiated oocytes (Andrade et al., 2014). SmJNK is highly expressed in the ovary of paired females (Lu et al., 2016; Lu et al., 2017), evidencing its role in female reproduction. Therefore, in *in vitro* adult females, SmJNK knockdown may influence oocyte maturation, since the ovary is already fully formed prior to the SmJNK knockdown. This is particularly promising as the eggs are fundamentally related to the disease pathology. Granulomas are the result of host immune response to schistosome eggs. Its formation is triggered by immune cells migration to encapsulate the eggs, resulting in damage of liver tissue (fibrosis and, ultimately, cirrhosis) (Espírito Santo et al., 2008; King and Dangerfield-Cha, 2008; Burke et al., 2009). Additionally, in *in vitro* cultured adult worms, SmJNK does not seem to have a role in parasite movement or viability.

Protein kinases domains are highly conserved between organisms (Manning et al., 2002). Therefore, any drug development effort needs to consider cross-binding of a small molecules that could result in undesired potential side effects. Inhibitors need to be specific and to target essential proteins. The elucidation of gene targets regulated by SmJNK, in addition to contributing to a better understanding of the JNK signaling pathway in *S. mansoni* parasite, also aims at searching for parasitespecific targets. Among the identified DEGs are some proteins with unknown function, these genes could be schistosomespecific. Such proteins may be related to the previously observed phenotypes of parasite development and oviposition impairment.

We observed the down-regulation of genes related to the structural composition of ribosomes in SmJNK knockdown parasites. Ribosomal proteins are major components of ribosomes and play critical roles in protein biosynthesis. Variations observed in the expression of ribosomal proteins in different human tissues are probably related to extraribosomal functions (Bortoluzzi et al., 2001), since ribosomal proteins and RNAs are typically synthesized in stoichiometric amounts (Mager, 1988). Also, the differential expression of ribosomal proteins has been reported in several pathological (Bévort and Leffers, 2000) and stress conditions (Wang et al., 2013).

Genes related to the cytoskeleton showed decreased expression when SmJNK was knocked-down. Integrin-linked kinase (ILK) an activator of the SmJNK, extracellular signal-regulated kinase (SmERK), and Akt pathways, is involved in cytoskeletal reorganization and cell survival. The dysregulation of SmILK may contribute to errors in cell division and genomic instability (Fielding and Dedhar, 2009). The SmILK signaling pathway has an influence on egg production, ovarian structure, and oocyte integrity in female schistosomes (Gelmedin et al., 2017), with similar phenotypes to those observed for SmJNK knockdown (Andrade et al., 2014).

Among genes with decreased expression after SmJNK knockdown, we observed genes related to protein folding, such as chaperonins and heat shock proteins (HSPs). Cellular chaperones such as HSPs confer resistance to stress, promoting cell survival. Prolonged oxidative stress causes an increase in misfolded proteins that aggregate in the endoplasmic reticulum. The objective of the endoplasmic reticulum stress response is to inhibit or retard protein synthesis. This is achieved by the phosphorylation of eukaryotic translation initiation factor 2 (eIF2a) by protein kinase R–like endoplasmic reticulum kinase or by JNK (Nagiah et al., 2016). SmJNK knockdown may have inhibited this stress response, resulting in the observed downregulation of chaperone and HSPs gene expression.

In rat fibroblasts, positive regulation of the p38 and JNK signaling pathways promotes increased expression of the TABLE 2 | List of enriched gene ontology (GO) from DEGs down-regulated in schistosomula SmJNK knockdown. The analysis is separated into two GO categories: Biological Process and Molecular Function. The dispensable GO terms are depicted in italic. The values of log10p-value refers to log10 of padjusted values obtained using g:Profiler toll.


(*Continued*)

#### TABLE 2 | Continued


(*Continued*)

#### TABLE 2 | Continued


plasminogen activator inhibitor 1 (PAI-1) (Nutter et al., 2015). JNK modulates positively the PAI-1 expression in the neurodegenerative amyloid pathology observed in Alzheimer's disease (Gerenu et al., 2017), whereas JNK inhibition significantly attenuates the induction of PAI-1 (Eurlings et al., 2017). PAI-1 was not detected among DEGs in SmJNK and Smp38 knockdown schistosomula. However, the expression of plasminogen activator inhibitor 1 RNA-binding protein (SERBP1, Smp\_009310), which plays a role in regulating the PAI-1 mRNA stability,

was down-regulated (**Figure 4B**). *Schistosoma* inhibits host coagulation during infection through stimulation of fibrinolytic pathways (Mebius et al., 2013). The interaction between parasites and the hosts' fibrinolytic system results in the regulation of several functions related to parasite survival mechanisms. For example, the degradation of immunoglobulins and complement components, the activation of matrix metalloproteinases, the stimulation of adhesion, and the degradation of proteins for nutrition (González-Miguel et al., 2016). The detected reduction in the SERBP1 transcript levels, could influence PAI-1 stability, therefore, may interfere in parasite feeding and/or reduce mobility in the host, thus, be related to the parasite survival impairment (Andrade et al., 2014).

SmJNK knockdown also down-regulated genes involved in RNA splicing. Alternative splicing events alter the protein repertoire of the cells being regulated by the expression patterns of the splicing factors (Grosso et al., 2008). The splicing control is as complex and relevant as the transcriptional control (Kornblihtt et al., 2004). It has already been shown that JNK signaling pathway controls splicing events in human T cells (Martinez et al., 2015). This pathway also seems to act in the regulation of alternative splicing in response to extracellular stimuli, promoting changes in splicing patterns (Pelisch et al., 2005). Moreover, *Schistosoma* presents trans-splicing, which is a peculiar mechanism that shares the proteins from the spliceosome, could affect up to 58% of transcripts in the cercarial stage (Boroni et al., 2018). It has been suggested as a target for parasite impairment and would be affected by the alterations detected in categories related to RNA splicing (Mourão et al., 2013).

The ability to deal with adverse environmental conditions is a fundamental need for all organisms, especially parasites. It is, therefore, not surprising that stress-activated protein kinase pathways are among the oldest and most conserved metazoan signaling modules (Twumasi-Boateng et al., 2012). The JNK and p38 MAPK pathways share several upstream regulators and, consequently, multiple stimuli simultaneously activate both pathways (Wagner and Nebreda, 2009). Several studies demonstrate that the extracellular signal-regulated kinase (ERK), JNK, and p38 MAPK signaling pathways act cooperatively, amplifying and integrating signals from various stimuli; promoting appropriate physiological responses including cell proliferation, differentiation, development, inflammatory responses, and apoptosis in mammalian cells (Zhang et al., 2002).

Orthologs of SmJNK and Smp38 in *C. elegans* regulate phenotypes that result in sterile nematodes. Hence, oocyte morphology alterations, decreased number or lack of this structure, and gonad alterations are redundant terms among the enriched RNAi phenotypes in the orthologs. The JNK (named kgb-1) deletion in *C. elegans* has a sterile phenotype, characterized by the massive presence of immature oocytes (Smith et al., 2002). The *C. elegans* kgb-1 deleted mutant presents low reproduction rate and shortened lifespan, concomitantly with reduced expression of genes for protein biosynthesis, chaperones, and enzymes involved in ubiquitination/proteasomal degradation (Gerke et al., 2014). All those findings corroborate the phenotypes and transcription profile associated with SmJNK and Smp38 knockdown.

Here we have shown that SmJNK activity is related to the reduction of egg production *in vitro*. This is in accordance with previously observed phenotype alterations *in vivo,* that demonstrated that SmJNK and Smp38 inhibitors could lead to sterile females, thereby reducing schistosomiasis pathology in the host. Our results point to other key regulatory proteins that are not well conserved between host and parasite, encouraging the development of small-molecule inhibitors.

### DATA AVAILABILITY STATEMENT

The datasets generated for this study can be found in PRJNA354932 and PRJNA492452.

### ETHICS STATEMENT

This work was approved by the Oswaldo Cruz Foundation's Ethics Committee for Animal Use (CEUA) under number LW12/16 , according to the Brazilian national guidelines set out in Law 11794/08.

### AUTHOR CONTRIBUTIONS

SG, GO, and MM contributed to conception and design of the study. SG and NT performed the experiments. SG performed the bioinformatic and statistical analysis. GO and MM contributed with reagents, materials, and analysis tools. SG, FF, and MM wrote the manuscript. All authors contributed to manuscript revision and read and approved the submitted version.

### FUNDING

This work has been supported by funding from the European Commission's Seventh Framework Programme for research, under Grant agreement no. 602080 (AParaDDisE), FAPEMIG (CBB-APQ-0520-13), CNPq grant (302518/2018-5) to MM; and Coordenação de Aperfeiçoamento de Pessoal de Nível Superior–Brasil (CAPES)–Finance Code 001, PCDD Programa CAPES/Nottingham University (003/2014), CNPq grants (470673/2014-1, 309312/2012-4, 304138/2014-2), CAPES (REDE 21/2015), and FAPEMIG (PPM-3500189-13) to GO. SG and NT fellowship was financed by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior-Brasil (CAPES). The authors thank the support of the Programa de Pósgraduação em Ciências da Saúde, IRR.

#### ACKNOWLEDGMENTS

The authors thank the Mollusk Room 'Lobato Paraense' of the René Rachou Institute–FIOCRUZ for proving the cercaria used in this study and the Program for Technological Development in Tools for Health-PDTIS FIOCRUZ for use of its facilities. We also thank Anna Salim and Flávio Araújo for their assistance in the RNASeq experimental design and libraries construction, Núbia Fernandes Braga for kindly provide RNA extracted from sporocysts, and Fábio Passetti and Ricardo Junqueira for sequencing the RNASeq libraries.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene.2019.01036/ full#supplementary-material

#### REFERENCES


*Biochim. Biophys. Acta Mol. Basis Dis.* 1863, 991–1001. doi: 10.1016/j. bbadis.2017.01.023


P granule components, associate with CSN-5 and KGB-1, proteins necessary for fertility, and with ZYX-1, a predicted cytoskeletal protein. *Dev. Biol.* 251, 333–347. doi: 10.1006/dbio.2002.0832


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2019 Gava, Tavares, Falcone, Oliveira and Mourão. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Complex I and II Subunit Gene Duplications Provide Increased Fitness to Worms

*Lucía Otero1†, Cecilia Martínez-Rosales1†, Exequiel Barrera2, Sergio Pantano2 and Gustavo Salinas1\**

*1 Laboratorio de Biología de Gusanos, Unidad Mixta Departamento de Biociencias, Facultad de Química, Universidad de la República–Institut Pasteur de Montevideo, Montevideo, Uruguay, 2 Laboratorio de Simulaciones Biomoleculares, Institut Pasteur de Montevideo, Montevideo, Uruguay*

#### *Edited by:*

*Gabriel Rinaldi, Wellcome Sanger Institute (WT), United Kingdom*

#### *Reviewed by:*

*Sutas Suttiprapa, Khon Kaen University, Thailand James Barron Lok, University of Pennsylvania, United States*

> *\*Correspondence: Gustavo Salinas gsalin@fq.edu.uy*

*†These authors have contributed equally to this work*

#### *Specialty section:*

*This article was submitted to Evolutionary and Genomic Microbiology, a section of the journal Frontiers in Genetics*

*Received: 24 April 2019 Accepted: 30 September 2019 Published: 25 October 2019*

#### *Citation:*

*Otero L, Martínez-Rosales C, Barrera E, Pantano S and Salinas G (2019) Complex I and II Subunit Gene Duplications Provide Increased Fitness to Worms. Front. Genet. 10:1043.*

Helminths use an alternative mitochondrial electron transport chain (ETC) under hypoxic conditions, such as those found in the gastrointestinal tract. In this alternative ETC, fumarate is the final electron acceptor and rhodoquinone (RQ) serves as an electron carrier. RQ receives electrons from reduced nicotinamide adenine dinucleotide through complex I and donates electrons to fumarate through complex II. In this latter reaction, complex II functions in the opposite direction to the conventional ETC (i.e., as fumarate reductase instead of succinate dehydrogenase). Studies in *Ascaris suum* indicate that this is possible due to changes in complex II, involving alternative succinate dehydrogenase (SDH) subunits SDHA and SDHD, derived from duplicated genes. We analyzed helminth genomes and found that distinct lineages have different gene duplications of complex II subunits (SDHA, SDHB, SDHC, and SDHD). Similarly, we found lineage-specific duplications in genes encoding complex I subunits that interact with quinones (NDUF2 and NDUF7). The phylogenetic analysis of ETC subunits revealed a complex history with independent evolutionary events involving gene duplications and losses. Our results indicated that there is not a common evolutionary event related to ETC subunit genes linked to RQ. The free-living nematode *Caenorhabditis elegans* uses RQ and has two genes encoding SDHA (*sdha-1* and *sdha-2*) and two genes encoding NDUF2 (*nduf2-1* and *nduf2-2*). *sdha-1* and *nduf2-1* are essential genes and have a similar expression pattern during *C. elegans* lifecycle. Using knockout strains, we found that *sdha-2* and *nduf2-2* are not essential, even in hypoxia. Yet, *sdha-2* and *nduf2-2* expression is increased in the early embryo and in dauer larvae, stages where there is low oxygen tension. Strikingly, *sdha-1* and *sdha-2* as well as *nduf2-1* and *nduf2-2* showed inverted expression profiles during the *C. elegans* life cycle. Finally, we found that *sdha-2* and *nduf2-2* knockout mutant strain progeny is affected. Our results indicate that different complex I and II subunit gene duplications provide increased fitness to worms.

Keywords: rhodoquinone, *C. elegans*, nematode, platyhelminth, electron transport chain, hypoxia, helminth, gas-1

 *doi: 10.3389/fgene.2019.01043* **Abbreviations:** FRD, fumarate reductase; RQ, rhodoquinone; SDH, succinate dehydrogenase; UQ, ubiquinone.

### INTRODUCTION

Energy is essential to all forms of life. Helminths live part of their life cycle under hypoxic conditions and have an energy metabolism different from their hosts. Under hypoxia, helminths use fumarate instead of oxygen as a final electron acceptor of the electron transport chain (ETC) (**Figure 1A**) (Hellemond et al., 2003; Harada et al., 2013). This is achieved by means of an alternative ETC, in which rhodoquinone (RQ) serves as an electron carrier (Van Hellemond et al., 1995; Hellemond et al., 2003; Harada et al., 2013). RQ is a redox-active quinone structurally similar to the conventional ubiquinone (UQ), differing only in the 6-methoxy substituent of UQ that is changed to an amino substituent in RQ (**Figure 1B**) (Moore and Folkers, 1965). This change confers RQ a lower redox potential than UQ (-63 and 43 mV, respectively) (Erabi et al., 1975). This allows RQ to receive electrons from reduced nicotinamide adenine dinucleotide (NADH) through complex I and to reduce fumarate to succinate (redox potential 30 mV) through complex II. Thus, in this alternative ETC, complex II functions as a fumarate reductase (FRD), in the opposite direction to the conventional ETC in which complex II functions as succinate dehydrogenase (SDH) (Van Hellemond et al., 1995)(Tielens and Van Hellemond, 1998). SDH oxidizes succinate to fumarate and reduces UQ. Importantly, the alternative ETC also allows proton pumping through complex I. The resulting proton gradient is coupled to ATP synthesis by mitochondrial ATP synthase. The fumarate and NADH used in this ETC are generated in the malate dismutation pathway (Tielens and Van Hellemond, 1998). This pathway yields between 5 and 6 mol of ATP per mol of glucose. A similar ETC that uses RQ is also present in some prokaryotes, protists, and other animals that alternate cycles of normoxia and hypoxia, such as bivalves and freshwater snails (Van Hellemond et al., 1995).

RQ has been found in all helminths where its presence has been examined, being more abundant in stages that dwell in hypoxic environments (Sato and Ozawa, 1969; Allen, 1973; Van Hellemond et al., 1995). Importantly, RQ has also been associated with hypoxia in the free-living nematode *Caenorhabditis elegans* (Takamiya et al., 1999), an amenable model for genetic studies. It is thought that the use RQ as an electron carrier requires adjustments in ETC complex II subunits. Several studies in *Ascaris suum*, a nematode in which biochemical studies are feasible, have shown that the use of RQ is possible due to changes in complex II subunits (Takamiya et al., 1993; Kuramochi et al., 1994; Iwata et al., 2008; Amino et al., 2003). Mitochondria from aerobic developmental stages of *A. suum*, larval stages L2 and L3, contain mainly UQ. In contrast, mitochondria from adult worms, which live in an oxygen-deprived environment, contain only RQ (Takamiya et al., 1993). The transition from normoxia to hypoxia is accompanied by an exchange of the SDHA and SDHD subunits of complex II, leading to the change in enzymatic activities of complex II, from SDH-UQ reductase to FRD-rhodoquinol oxidase (Takamiya et al., 1993; Saruta et al., 1995; Amino et al., 2003; Iwata et al., 2008). In the nematode *Haemonchus contortus*, two genes encoding distinct SDHB subunits were identified and showed different expression patterns during development (Roos and Tielens, 1994). This phenomenon has also been linked to a metabolic transition due to a change in oxygen tension. In addition, FRD and SDH activities

FIGURE 1 | Helminth mitochondrial electron transport chains in normoxia and hypoxia. In the presence of oxygen, electrons from reduced nicotinamide adenine dinucleotide (NADH) and succinate are transferred to UQ through complex I and II, then from UQ to cytochrome c through complex III, and finally from cytochrome c to O2 through complex IV (A, right). When there is no oxygen available (A, left) the electron transport chain functions with only two complexes. Electrons are transferred from NADH to RQ through complex I and then from RQ to fumarate through complex II. In both cases, a proton gradient across the inner membrane is generated, which is used to produce ATP through complex V (not shown). The structures of UQ (right) and RQ (left) are shown in (B).

have been detected in other helminths (Fioravanti et al., 1998; Matsumoto et al., 2008), but the complex II composition has not been examined. In contrast to complex II, no transitions associated with oxygen tension have been reported in helminth complex I (NADH dehydrogenase). The genes encoding ETC complexes I and II have not been studied in detail after helminth genomes have been made publicly available.

We examined nematode and platyhelminth genomes and found diverse gene duplications in complex II subunits. Furthermore, gene duplications in complex I subunits that interact with quinones were identified in helminth genomes. Using *C. elegans*, we studied the role of complex I and II subunits duplications.

#### RESULTS

#### Different Helminths Have Dissimilar Complex I and II Subunit Duplications

Since gene duplications of complex II subunits have been described in some helminths, we examined the genes encoding complex II in representative and well-annotated parasitic and free-living nematode and platyhelminth genomes. Interestingly, we observed diversity in complex II genes. The absence/presence of complex II genes is presented in **Figure 2** (alignments for all subunits studied are available in **Supplementary Figure 1**).

The flavoprotein subunit (SDHA) gene was found to be duplicated in some, but not all, nematodes, whereas this gene duplication is not observed in any platyhelminth lineage (**Figure 2**). The phylogenetic analysis does not indicate a clear evolutionary event associated with *sdha* history (**Figure 3A**) and suggests that independent gene duplications have occurred in filarial and *C. elegans* lineages. Indeed, filarial SDHA-1 and SDHA-2 form different nodes, and they cluster together in a putative ancestral filarial SDHA node. Likewise, *C. elegans* SDHA-1 and SDHA-2 form a node, indicating a recent gene duplication event in this lineage. However, in the case of *A. suum*, the two SDHAs do not group together, suggesting that the two SDHA subunits diverged extensively after duplication. Alternatively, it may represent an ancient gene duplication that was subsequently lost in other lineages. This latter scenario is less parsimonious since it would imply several gene losses and subsequent duplications.

Two genes encoding alternative iron–sulfur subunits (SDHB) have been previously described in *H. contortus* (Roos and Tielens, 1994). This gene duplication is absent in all other nematode genomes analyzed. Interestingly, we found that *Echinococcus* spp. also exhibit an *sdhb* duplication in the platyhelminth lineage. The phylogenetic analysis of SDHB suggests independent gene duplications in these lineages.

Regarding the heme-containing membrane subunits (SDHC and SDHD), the results were unexpected. *sdhc* was found to be duplicated only in *Hymenolepis microstoma*, whereas *sdhd* was found to be duplicated only in *A. suum*. Strikingly, *Fasciola hepatica* and *Trichuris muris* lost *sdhd*, suggesting independent gene losses in these lineages. It is important to note that transcripts corresponding to *sdhd* were absent in these species'

that no coding gene was identified. Par and FL denotes parasitic and free-living species, respectively. Ticks indicate the reported presence of RQ (Allen, 1973; Fioravanti and Kim, 1988; Van Hellemond et al., 1995; Van Hellemond, 1997; Takamiya et al., 1999; Shiobara et al., 2015). The absence of tick denotes species in which RQ presence has not been examined.

resolved to dichotomies. Alignments are shown in Supplementary Figure 1.

transcriptomes. Since ETC subunit genes are highly expressed, the absence of *sdhd* transcripts favors the hypothesis that this gene is absent in these genomes.

Since adjustments in complex II are thought to be related to the use of alternative quinones (UQ or RQ), we also analyzed complex I subunits NDUF2 and NDUF7, involved in quinone binding. Interestingly, we found that the same nematodes that have two *sdha* genes have also two *nduf2* genes. In platyhelminths, only *F. hepatica* has two *nduf2* genes. Phylogenetic analysis of NDUF2 in nematodes suggests independent events of gene duplication in *C. elegans* and in clade III lineages. In the case of platyhelminth NDUF2, it is likely that a gene duplication event occurred in the *F. hepatica* lineage and rapidly diverged. However, an ancestral gene duplication (e.g., at the root of trematodes) followed by gene loss in independent lineages cannot be ruled out.

We also found that *T. muris*, *Schimdtea mediterranea*, and *F. hepatica* have two *nduf7*. According to the phylogenetic tree, the gene duplications observed in *T. muris* and *S. mediterranea* lineages appear to be recent events. Thus, the most likely explanation is that these gene duplication events occurred independently.

The analysis of the ETC subunit genes revealed a complex history with independent evolutionary events without an evident pattern. In the case of *A. suum*, the different complex II subunits used in the larval and adult stages have been associated with oxygen tension changes found during its lifecycle (normoxia and hypoxia, respectively) and the use of alternative quinones (UQ and RQ, respectively). However, a conclusion supported by our study is the absence of a common evolutionary event related to ETC subunit genes directly linked to RQ, the key metabolite during hypoxia.

#### The Quinone-Binding Site in Normoxia and Hypoxia Complex II Is Highly Conserved

We performed *in silico* docking analysis in order to explore how the change in subunits of complex II may affect the potential binding modes of different quinones. To this end, three structures were used: the crystal structure of *A. suum* complex II purified from adult worms (PDB: 5C2T) and homology models of complex II obtained for *A. suum* larval stages and *C. elegans*. The latter encodes one gene for each subunit involved in binding quinones (SDHB, SDHC, and SDHD). The search space, indicated by a box with black edges in **Figure 4A**, was defined considering the binding site of RQ2 (*i.e.* RQ with two isoprene units) in 5C2T and giving sufficient volume to explore alternative binding modes. Multiple sequence alignments were performed on the SDH subunits of *A. suum* (adult and larval stages) and *C. elegans*. Results indicated that, within the docking search space, only eight amino acids are not conserved (six for SDHC and two for SDHD, **Supplementary Figure 2**). Those residues face oppositely to the ligand and are not involved in direct interactions with RQ2 (**Figure 4B**).

Docking calculations employing the three complex II models showed that the binding modes with higher affinities for both

SDHC; yellow = SDHD. (B) Inset showing the docking search space (black lines), RQ2 (spheres and sticks representation), and nonconserved residues among *A. suum* and *C. elegans* included in the search space (tubes). (C) Comparison between the best docking solution for RQ2 (cyan spheres and sticks) and its binding site obtained from X-ray crystal structure (transparent surface). The best docking solution showed a root mean square deviation (RMSD) of 0.15 nm. (D) Superimposition of the best docking solutions for UQ2 and RQ2 performed for *C. elegans* and both adult and larva *A. suum* complex II models (transparent surface).

UQ2 and RQ2 fit without any steric impediment into the quinone binding pocket (**Figure 4D**). These results suggest that SDHB, SDHC, and SDHD gene duplications would not reflect an adaptation to the binding site of UQ or RQ.

We then analyzed the residues involved in complex II quinone binding in nematodes and platyhelminths. We found that the residues of SDHB and SDHD are conserved in all the analyzed species. SDHC allows some changes in quinone-binding positions (**Figure 5**). Interestingly, in both *H. microstoma* SDHC, the only species that has two *sdhc*, the quinone-binding positions are conserved.

#### *sdha-1* and *sdha-2* as Well as *nduf2-1 and nduf-2.2* Showed an Inverted Expression Profile During *C. elegans* Lifecycle

In the *C. elegans* genome, there are two *sdha* and two *nduf2* genes, as in *A. suum*. *C. elegans* SDHA-1 is essential and has the highest level of homology to *A. suum* SDHA-1 (larval SDHA) and *C. elegans* SDHA-2 to *A. suum* SDHA-2 (adult SDHA). In the case of NDUF2, NDUF2-1 is essential, and both *C. elegans* proteins (NDUF2-1 and NDUF2-2) are more similar to each other than to any NDUF2 proteins of *A. suum*. We analyzed the expression of *sdha* and *nduf2* during *C. elegans* lifecycle. The results are shown in **Figure 6**. In general, *sdha-1* has a higher expression than *sdha-2*, and both genes show an inverted expression profile along the different developmental stages. *sdha-1* expression increases towards the end of the embryonic development and is highest during larval stages L2–L3 and then decreases in the last larval stage (L4) and in the adult worm (**Figure 6A**). In contrast, *sdha-2* expression is highest during the early embryo and then starts to decrease near L1 larval stage. Its expression increases again towards the adult worm (**Figure 6A**). *C. elegans* has an alternative larval stage named dauer, which allows it to endure when environmental conditions are not adequate for normal growth (Cassada and Russell, 1975). After entry in dauer, *sdha-1* expression decreases while *sdha-2* expression increases (**Figure 6A**).

Similar to *sdha-1* and *sdha-2*, *nduf2-1* expression is higher than *nduf2-2* along *C. elegans* lifecycle, except at the beginning of the embryo development. The expression patterns of *nduf2-1* and *nduf2-2* echo that of *sdha-1* and *sdha-2*, respectively, in the embryo, larval and adult stages, as well as in the dauer larvae (**Figure 6B**). Interestingly, during embryonic development, *nduf2-1* and *nduf2-2* expression curves cross each other around the time of egg laying.

These observations suggest that there is a tight regulation of the expression of SDHA and NDUF2 coding genes throughout the lifecycle of *C. elegans*.

#### Progeny Is Reduced in *sdha-2 and nduf2-2* Mutant Worms

As already mentioned, the expression of *nduf2-2* and *sdha-2* increases in the adult worm and is highest in the early embryo. Thus, we assessed the effect of the absence of these genes on the reproductive potential of the worm. The analysis of the brood size of KO strains in *nduf2-2* and *sdha-2* showed that


*C. elegans* development. Expression during embryonic development is shown on the left graphs. Egg laying occurs around 150 min after fertilization. The four larval stages (L1, L2, L3, and L4) and young adults are shown in the middle graphs. The expression levels for dauer larvae at three timepoints (entry, dauer, and exit) are shown on the right graphs.

the total amount of offspring is significantly reduced in both KO worms compared to the wild-type strain (N2) (**Figure 7**); this reduction was more pronounced in the *sdha-2* KO worms. We also observed a few hours delay in development from L1 to the beginning of egg laying in *nduf2-2* KO worms compared to N2.

Since in *A. suum* a duplication in the SDHA coding gene has been associated with an adaptation to environments with different oxygen tensions, we examined whether the duplications in *sdha* and *nduf2* in *C. elegans* could also be related to changes in oxygen availability. Similar to N2, *nduf2-2* and *sdha-2* KO strains survived after 24 h exposure to hypoxia (0.4% O2) or anoxia. We then tested brood size for these strains after a 24-h exposure to hypoxia or anoxia. The phenotype previously observed under normoxic conditions was not exacerbated under hypoxia or anoxia for these mutants (**Supplementary Figure 4**).

Our results indicate that *sdha-2* and *nduf2-2* are not essential for fecundity, but the absence of these genes reduces progeny production of the worm.

#### DISCUSSION

RQ has been identified in all nematodes and platyhelminths that have been analyzed for its presence. Owing to its redox potential (intermediate between complex I and II), RQ is thought to be the key metabolite in the alternative ETC used under hypoxic conditions. Until now, it was thought that the use of RQ in helminths required alternative complex II subunits derived from duplicated genes. Our results contradict this view. Indeed, some species do not have any complex II gene duplications (e.g., *Meloidogyne hapla*, *Strongyloides ratti*, *Necator americanus*, *S. mediterranea*, and *Schistosoma mansoni*), while others have even lost *sdhd* (*F. hepatica* and *T. muris*). Furthermore, there is not a unique evolutionary event associated with the use of RQ in complex I and II genes. We examined several life history traits for the different species whose genomes were analyzed. In particular, we examined absence/ presence of free-living stage(s), parasitized host(s) (intermediate, definitive, vector), parasitized site/tissue, and reproduction mode (monoecious vs. dioecious). We have not found any particular association of life history traits or style that could explain a trend regarding the presence of alternative ETCs in the different lineages. Thus, RQ has not been a driving force in complex II gene history. Although RQ is a metabolic signature for hypoxic metabolism, this metabolite is not necessarily associated with a change in complex I and II composition. Nevertheless, the variety of complex II genes duplications observed in nematodes and platyhelminths suggests that different adaptations have occurred in different lineages.

Previous studies in *A. suum* have shown that both duplicated complex II subunits (SDHA and SDHD) are exchanged when the environment and oxygen tension change. These adjustments are responsible for the change in the SDH-FRD enzymatic activity and have been associated with the need to use different quinones (Tielens and Van Hellemond, 1998 ; Amino et al., 2000; Hellemond et al., 2003; Iwata et al., 2008; Harada et al., 2013; ). In our study, we show that both quinones could bind to both *A. suum* SDHD subunits. Moreover, in organisms that have duplicated the subunits that contact quinones (SDHB, SDHC, and SDHD), no changes are observed in the amino acids that interact directly with quinones. Thus, the switch of subunits does not appear to be related to the quinone-binding capacity.

Gene duplications in complex I have not been previously examined in nematodes and platyhelminths. We searched for the nuclear-encoded quinone-binding subunit genes (*nduf2* and *nduf7*) and found that different gene duplication events occurred among different lineages, similar to the phenomenon observed for complex II subunits. So far, these gene duplications have not been associated with any particular phenotype or adaptation. Residues in NDUF2 and NDUF7 that interact with quinones have been identified in a previous study (Degli Esposti, 2015). This study identified a Tyr residue in *A. suum* NDUF2-1 that differs from Lys or Arg residues found in all other NDUF2 analyzed. We propose that this would be an adaptation of nematodes to RQ binding. Although we found that Tyr is the most frequent amino acid, His and Lys can also be found at this position (**Supplementary Figure 3**).

The preferred oxygen concentration for the free-living nematode *C. elegans* is around 7–14% (Chang and Bargmann, 2008). *C. elegans* possesses complex I and II subunit duplications. To address the role of duplicated complex I and II subunits in this organism, we analyzed their expression pattern during its lifecycle and observed developmental regulation of their expression. A striking observation is that the essential subunits NDUF2-1 and SDHA-1 (those for which KO is lethal) show a similar expression pattern, which is inverted to the expression pattern of the nonessential subunits (NDUF2-2 and SDHA-2). The essential subunits are generally more highly expressed than the nonessential ones, except in the adult worm and early embryo, suggesting that the nonessential subunits are relevant during these stages. Consistent with this interpretation, the *nduf2-2* and *sdha-2* KO strain have reduced brood size compared to the wild-type strain N2. It is worth noting that *nduf2* and *sdha* duplications co-occur in nematodes. Our results suggest that the duplicated subunits could be part of alternative ETC complexes needed to meet different metabolic demands during the lifecycle of the worm and serves to increase the overall organismal fitness. In this sense, it is worth noting that expression of *nduf2-2* and *sdha-2* increase in the early embryo and dauer stage, where a hypoxic metabolism is thought to occur.

The association of an alternative complex II with RQ has been firmly established in *A. suum*, becoming a paradigm in helminth biochemistry. However, our results highlight that *A. suum* is the exception rather than the rule. Furthermore, different variations in the complex I subunits that interact with quinones are also observed in distinct helminth lineages. Dissimilar subunit arrays for ETC complexes appear to have evolved to adapt to different stages of helminth lifecycles, environmental conditions, or to specific cells or tissues within worms.

#### MATERIAL AND METHODS

#### Sequences Identification and Analysis

In order to obtain information about helminth complex II and nuclear-encoded quinone-binding complex I subunits, we analyzed genomes and transcriptomes of six platyhelminths (*E. granulosus*, *E. multilocularis*, *H. miscrostoma*, *F. hepatica*, *S. mansoni*, and *S. mediterranea*) and nine nematodes (*T. muris*, *A. suum*, *Brugia malayi*, *Onchocerca volvulus*, *M. hapla*, *S. ratti*, *H. contortus*, *N. americanus*, and *C. elegans*). Sequences were retrieved from https://parasite. wormbase.org (WBPS10), UniProt (https://www.uniprot.org) and *S. mediterranea* database (http://smedgd.neuro.utah.edu).

Searches were performed initially with Protein Basic Local Alignment Search Tool (BLASTP) (protein databases) using *A. suum* and *C. elegans* SDHA, SDHB, SDHC, SDHD, NDUF2, and NDUF7 sequences as queries and confirmed by best reciprocal hits in BLAST. Additionally, TBLASTN searches were performed using genomic sequences and cDNAs databases and the protein sequences previously identified or the most closely related organism's sequences as queries. This served to confirm the annotated protein sequences and to identify the nonannotated ones. Identified sequences were confirmed by best reciprocal hits in BLAST.

To study the evolutionary history of complex II and nuclearencoded quinone-binding complex I subunits, phylogenetic analysis were performed in the website phylogeny.fr (Dereeper et al., 2008; Dereeper et al., 2010) for every set of identified subunits. Multiple sequence alignments were made with MUSCLE 3.8 (Chojnacki et al., 2017) and then curated with Gblocks 0.91b (Castresana, 2000; Talavera and Castresana, 2007). Phylogeny analyses were made with PhyML 3.0 (Guindon et al., 2010) and tree rending with TreeDyn 198.3 (Chevenet et al., 2006; Dereeper et al., 2008). Alignments for all subunits studied are available in **Supplementary Figure 1**.

#### Modeling of Complex II Structures

Complex II structures for *A. suum* (normoxia) and *C. elegans* were built by homology modeling with Modeller (Webb and Sali, 2016), using as a 3D template the X-ray structure of *A. suum* (hypoxia) stored in the Protein Data Bank with the code 5C2T. Multiple sequence alignment of complex II between *A. suum* (normoxia and hypoxia) and *C. elegans* was done with Clustal Omega (Chojnacki et al., 2017). To validate the docking technique, we compared the predicted binding sites for RQ2 with the one from the crystallized structure 5C2T. The best docking solution showed a root mean square deviation of 0.15 nm (**Figure 4C**). A 0.2-nm root mean square deviation cutoff is often used as a criterion of the correct bound structure prediction (Bursulaya et al., 2003).

Docking calculations were performed with AutoDock Vina 1.1.2 (Trott and Olson, 2010). Input PDBQT files for the receptors and ligands were prepared with AutoDockTools 1.5.6 (Morris et al., 2009). All torsions of the ligand were set as fully rotatable, applying to the receptors a partially flexible treatment. The maximum number (10) of rotatable side chains allowed by the program was chosen, selecting residues located closer than 0.6 nm from RQ2 in 5C2T. List of flexible residues by domain: SDHB: Trp196, Trp197, His240, Ile242; SDHC: Leu60, Trp69, Ser72, Arg76; SDHC: Asp106 and Tyr 107. The search space was defined by a grid with *x*, *y*, and *z* dimensions of 26 × 24 × 24 Å. The grid was centered so that the binding pocket corresponding to RQ2 could be included in the search space.

#### Expression Profile Analysis of SDHA and NDUF2 Coding Genes

Transcriptomic data used for this analysis were taken from the GExplore web (Hutter and Suh, 2016).The values of depth of coverage per million bases for each time-point were converted to reads per kilobase per million mapped reads to use a more familiar representation of expression values.

#### Strains and Culture Conditions

*C. elegans* wild-type Bristol N2 (N2) and VC393 (*nduf2-2* (*ok437*) III) were provided by the *Caenorhabditis* Genetics Center. VC393 has a 1,448-bp deletion expanding several exons of *nduf2-2*. *C. elegans* TM1420 (*sdha-2* (*tm1420*) I) was obtained from the National BioResource Project. This strain has a 519-bp deletion in exon 4 of *sdha-2*.

Worms were maintained at 20°C on nematode growth medium agar plates seeded with *Escherichia coli* OP50. To obtain synchronized worms, gravid adults on nematode growth medium agar plates were treated with an alkaline hypochlorite solution, and eggs were collected and placed in M9 medium without food, at 20°C for 24 h (Brenner, 1974).

#### Brood Size Analysis in *sdha-2 and nduf2-2* KO Strains

Wild-type (N2) and mutant (*sdha-2* KO and *nduf2-2* KO) worms were synchronized, and, after reaching the L4 stage, 15 worms for each strain were separated and plated in groups of 3. When a phenotype was observed, the experiment was performed three times. Offspring production was counted every day for 4 days.

### DATA AVAILABILITY STATEMENT

All datasets generated for this study are included in the article/ **Supplementary Material**.

### AUTHOR CONTRIBUTIONS

LO performed the genomic and phylogenetic analyses. CM-R performed the expression analysis and the experiments with *C. elegans* strains. EB and SP performed the molecular modeling studies of complex II with quinones. LO, CM-R, EB, SP, and GS analyzed the data, drafted, and wrote the manuscript. GS conceptualized the study.

### REFERENCES


#### FUNDING

This study was supported by Agencia Nacional para la Innovación y la Investigación (ANII, FCE\_1\_2017\_1\_136340) and Ministerio de Educación y Cultura (MEC, FVF 2017/030). This work was partially funded by FOCEM (MERCOSUR Structural Convergence Fund), COF 03/11. CM was recipient of a fellowship from ANII (FCE\_1\_2014\_1\_104366). EB is beneficiary of a postdoctoral fellowship of Consejo Nacional de Investigaciones Científicas y Técnicas, Argentina (CONICET).

#### ACKNOWLEDGMENTS

*C. elegans* strains N2 and VC393, and *E. coli* OP50 were provided by the *Caenorhabditis* Genetics Center, which is funded by NIH Office of Research Infrastructure Program: P40 OD010440. *C. elegans* TM1420 strain was acquired from the National BioResource Project. The authors would also like to acknowledge Sebastien Santini (CNRS/AMU IGS UMR7256) who manages the phylogeny.fr site (http://www.phylogeny.fr) used in this work.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene.2019.01043/ full#supplementary-material


regeneration and tissue homeostasis. *Redox Biol.* 6, 599–606. doi: 10.1016/j. redox.2015.10.004


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2019 Otero, Martínez-Rosales, Barrera, Pantano and Salinas. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

Edited by: Feng Gao, Tianjin University, China

#### Reviewed by:

Norman Nausch, Heinrich Heine University of Düsseldorf, Germany Paul Robert Giacomin, James Cook University, Australia

#### \*Correspondence:

Ivonne Martin I.Martin@lumc.nl

#### †Present address:

Ivonne Martin, Department of Epidemiology and Biostatistics, Amsterdam UMC location VUmc, Amsterdam, Netherlands Maria M. M. Kaisar, Department of Parasitology, School of Medicine and Health Sciences, Universitas Katolik Indonesia Atma Jaya, Jakarta, Indonesia

‡These authors have contributed equally to this work

#### Specialty section:

This article was submitted to Evolutionary and Genomic Microbiology, a section of the journal Frontiers in Genetics

Received: 12 June 2019 Accepted: 25 September 2019 Published: 06 November 2019

#### Citation:

Martin I, Kaisar MMM, Wiria AE, Hamid F, Djuardi Y, Sartono E, Rosa BA, Mitreva M, Supali T, Houwing-Duistermaat JJ, Yazdanbakhsh M and Wammes LJ (2019) The Effect of Gut Microbiome Composition on Human Immune Responses: An Exploration of Interference by Helminth Infections. Front. Genet. 10:1028. doi: 10.3389/fgene.2019.01028

# The Effect of Gut Microbiome Composition on Human Immune Responses: An Exploration of Interference by Helminth Infections

*Ivonne Martin1,2\**†*, Maria M. M. Kaisar3*†*, Aprilianto E. Wiria3, Firdaus Hamid4, Yenny Djuardi3, Erliyani Sartono5, Bruce A. Rosa6, Makedonka Mitreva6,7, Taniawati Supali 3, Jeanine J. Houwing-Duistermaat8,9,10‡, Maria Yazdanbakhsh5‡ and Linda J. Wammes5*

1 Department of Mathematics, Faculty of Information Technology and Science, Parahyangan Catholic University, Bandung, Indonesia, 2 Department of Biomedical Data Sciences, section Medical Statistics, Leiden University Medical Center, Leiden, Netherlands, 3 Department of Parasitology, Faculty of Medicine, Universitas Indonesia, Depok, Indonesia, 4 Department of Microbiology, Faculty of Medicine, Universitas Hasanuddin, Makassar, Indonesia, 5 Department of Parasitology, Leiden University Medical Center, Leiden, Netherlands, 6 McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, United States, 7 Department of Medicine, Washington University School of Medicine, St. Louis, MO, United States, 8 Department of Statistics, University of Leeds, Leeds, United Kingdom, 9 Department of Biostatistics and Research Support, Julius Center, University Medical Centre Utrecht, Utrecht, Netherlands, 10 Alan Turing Institute, London, United Kingdom

Background: Soil-transmitted helminths have been shown to have the immune regulatory capacity, which they use to enhance their long term survival within their host. As these parasites reside in the gastrointestinal tract, they might modulate the immune system through altering the gut bacterial composition. Although the relationships between helminth infections or the microbiome with the immune system have been studied separately, their combined interactions are largely unknown. In this study we aim to analyze the relationship between bacterial communities with cytokine response in the presence or absence of helminth infections.

Results: For 66 subjects from a randomized placebo-controlled trial, stool and blood samples were available at both baseline and 21 months after starting three-monthly albendazole treatment. The stool samples were used to identify the helminth infection status and fecal microbiota composition, while whole blood samples were cultured to obtain cytokine responses to innate and adaptive stimuli. When subjects were free of helminth infection (helminth-negative), increasing proportions of Bacteroidetes was associated with lower levels of IL-10 response to LPS {estimate [95% confidence interval (CI)] −1.96 (−3.05, −0.87)}. This association was significantly diminished when subjects were helminth-infected (helminth positive) (p-value for the difference between helminth-negative versus helminthpositive was 0.002). Higher diversity was associated with greater IFN-γ responses to PHA in helminth-negative (0.95 (0.15, 1.75); versus helminth-positive [−0.07 (−0.88, 0.73), p-value = 0.056] subjects. Albendazole treatment showed no direct effect in the association between bacterial proportion and cytokine responses, although the Bacteroidetes' effect on IL-10 responses to LPS tended downward in the albendazole-treated group [−1.74 (−4.08, 0.59)] versus placebo [−0.11 (−0.84, 0.62); p-value = 0.193].

Conclusion: We observed differences in the relationship between gut microbiome composition and immune responses, when comparing individuals infected or uninfected with geohelminths. Although these findings are part of a preliminary exploration, the data support the hypothesis that intestinal helminths may modulate immune responses, in unison with the gut microbiota.

Trial Registration: ISRCTN, ISRCTN83830814. Registered 27 February 2008 — Retrospectively registered, http://www.isrctn.com/ISRCTN83830814.

Keywords: helminth, gut microbiome, whole blood cytokine, interleukin-10, Bacteroidetes, diversity, randomized controlled trial

#### INTRODUCTION

Diseases of modernity, such as allergy, autoinflammatory, and metabolic diseases are increasingly observed in industrialized countries. It has been speculated that this growing rate was caused by changes in lifestyle, diet and environmental factors, such as pollutant exposure or hygiene. Hygiene improvement has dramatically decreased the prevalence of certain infectious agents such as parasitic helminths while these may have protective effects against autoinflammatory diseases (Wammes et al., 2016). Studies analyzing the capacity of helminths to modulate the immune system have been carried out in recent decades. However, it has become clear that this is an interplay with several other factors, such as diet, environment and also other gut inhabitants, such as the microbiota.

Early studies showed that gut microbiota is involved in developmental aspects of the immune system and that disturbance can lead to autoinflammatory disorders (Round and Mazmanian, 2009). Already in 1963, it was reported that the immune system of germ-free mice failed to respond to molecular patterns of pathogenic and beneficial microorganisms, causing morphological tissue defects in the intestinal wall (Abrams et al., 1963). In healthy humans, the role of gut microbiota and immune response was studied more recently. It was found that certain bacteria are beneficial for development and function of the immune system and simultaneously the immune system can influence the composition or function of gut microbiota, all relating to inflammatory disorders [reviewed in Belkaid and Hand (2014)].

The presence of parasitic helminths in the gastrointestinal tract may exert a direct influence on the host's gut microbiome as they share the same niche. Although in animal models helminths were shown to increase microbial abundance and diversity (Reynolds et al., 2015), the findings in human studies are not consistent. Several studies analyzing the effect of helminth on gut microbiota have indicated a higher diversity of gut microbiota in helminthpositive subjects compared to helminth-negative subjects (Lee et al., 2014; Ramanan et al., 2016). A study in Ecuador showed that this difference in diversity might be related to specific helminth species since they did not find any alterations in *Trichuris trichiura*-infected children (Cooper et al., 2013). This might be influenced by different factors among which are different bacterial profiling techniques or confounders such as ethnicity, anthelminthic treatment, and environmental differences.

As it has been shown that changes in both gut microbiota and helminth infection status might affect the host's immune response, it is suspected that the presence of helminth might directly or indirectly affect the immune system by altering the gut microbial community (Zaiss et al., 2015). For instance, the transfer of the microbiota of *Heligmosomoides polygyrus bakeri*infected mice to uninfected mice induced similar protection against allergic airway inflammation as observed with helminth infection (Wilson et al., 2005). In humans, studies on the triangular relation between helminth with the microbiome and immune system are still in infancy.

To our knowledge, the number of longitudinal studies analyzing the association between gut microbiota and immune responses in helminth-endemic areas is still limited. To understand the interaction of the gut microbial community and helminths and their common effect on immune responses, we used data from a household cluster-randomized, double blind, placebo-controlled trial of albendazole treatment in a helminth-endemic area. In this study, it has been shown that deworming reduced helminth prevalence and consequently increased several whole blood cytokine responses (Wammes et al., 2016). Helminth infection and anthelminthic treatment separately did not change the gut microbiota (Martin et al., 2018). However, when subjects remained infected while treated with albendazole, a decrease of *Bacteroidetes*: *Firmicutes* ratio and an increase of *Actinobacteria*: *Firmicutes* ratio were observed, leading to the hypothesis that there is a cross-talk between microbiome composition and immune response which is disrupted by the presence of helminths and that removing helminth by anthelminthic might affect this communication. Our aim was to characterize the association between bacterial relative abundance with the whole blood cytokine responses and the effect of helminth infections and deworming on this interaction.

#### MATERIALS AND METHODS

#### Participants

Stool samples from 150 subjects from the immunoSPIN study (Wiria et al., 2010) taking place in Ende subdistrict, Indonesia, were analyzed for the fecal microbiome. From these 150 subjects, 66 subjects were included in this study based on the complete stool data and available cytokine measurements before and 21 months after the first treatment. The microbiome composition

**Abbreviations:** AscAg, *Ascaris* antigen; BMI, Body mass index; IFN, interferon; IL, interleukin; LPS, lipoposaccharide; PHA, Phytohaemagglutinin; TNF, Tumor necrosis factor.

for these subjects are representative of these 150 subjects (**Table S1**). Four different helminth species were found namely *Ascaris lumbricoides*, hookworms (*Necator americanus* and *Ancylostoma duodenale*) and *T. trichiura*. Details on sample collection and measuring the infection status using PCR are described elsewhere (Wiria et al., 2010). *T. trichiura* infection was assessed only by microscopy, since at that time there was no real-time PCR data available for this species. For this manuscript, we defined a helminth-infected subject as a participant with a positive realtime PCR (cycle of threshold (Ct) value ≤ 30) and/or positive microscopy for one or more species of helminths, as described previously (Martin et al., 2018). Subjects with a positive real-time PCR with a Ct above 30 were regarded as uninfected.

### Microbiome Composition

The amplification and pyrosequencing of the 16S rRNA gene followed the protocols developed by the Human Microbiome Project (HMP) (A framework for human microbiome research. Nature, 2012) at the McDonnell Genome Institute, Washington University School of Medicine in St. Louis and have been described previously elsewhere (Martin et al., 2018; Rosa et al., 2018). Briefly, the V1–V3 hypervariable region was PCR — amplified and the PCR products were sequenced on the Genome Sequencer Titanium FLX (Roche Diagnostics, Indianapolis, Indiana), generating on average 6,000 reads per sample. Details of the filtering and analytical processing of 16S rRNA data for this cohort has been previously described in Rosa et al. (2018). The assembled contigs count data as a result of RDP classification was organized in a matrix format with taxa in columns and subjects in rows. The entries in the table represent the number of reads for each taxon for each subject. Our work is focused at a phylum level of gut bacteria. Five bacterial phyla have average relative abundances larger than 1%, namely *Actinobacteria*, *Bacteroidetes*, *Firmicutes*, *Proteobacteria* and an unclassified category, which consists of sequences which could not be categorized into a phylum. The remaining bacterial phyla which had lower relative abundance were pooled together into a pooled category. In the analysis, we retained the count for the three most abundant bacterial phylum proportions, namely *Actinobacteria*, *Bacteroidetes*, and *Firmicutes*. The proportion for each phylum was obtained by dividing each sequence count by the total sequence per person at each time point. Along with bacterial proportions, we computed at a phylum level the bacterial diversity within samples (Shannon index) and between samples (Bray-Curtis dissimilarity) using R package vegan (Oksanen et al., 2017). The Shannon index represents not only the presence of taxa but also the abundance of corresponding taxa. The higher diversity index means that there was not a single taxon dominating the community and the total bacterial abundance is spread out over all taxa. The Bray Curtis dissimilarity measures the percentage of similarity between one sample from the other with values range from 0 (completely similar) to 1 (completely dissimilar).

### Whole Blood Cytokine Responses

The method to obtain and assess the cytokines responses were described elsewhere (Wiria et al., 2010). In brief, heparinized

blood was diluted 1:4 and cultured in 96-well plates. Plates were incubated for 24 (innate responses) or 72 (adaptive responses) hours, after which supernatants were harvested and stored in freezers. Cytokine levels were measured by Luminex bead technology in samples obtained at before and 21 months after the start of treatment.

The analyses carried in this manuscript are limited to innate responses [interleukin (IL)-10 and tumor necrosis factor-alpha (TNF-α)] to lipopolysaccharide (LPS) from *E. coli* and adaptive responses [interferon-gamma (IFN-γ) and IL-5) to *Ascaris* antigen (AscAg) and general T cell stimulator phytohemagglutinin (PHA)]. The AscAg was a homogenate of adult worm *A. lumbricoides* obtained from infected human (Wammes et al., 2014).

### Statistical Method

All parameters of interest were described as means or frequency (± standard deviation). Prevalence rates were calculated and compared using the Pearson chi-square test, while the Student *t*-test was used to compare continuous variables.

To study the relationship between cytokines and microbiome over the two time points, a linear mixed effect regression model was fitted with helminth status and treatment as covariates. All models have been adjusted with age and sex, however, since both covariates were not significantly associated with the cytokine responses in any model, they are not included in the final analysis. The correlation between observations from the same subjects was modeled by including a subject-specific random effect. The microbiome was included in the model either as a bacterial proportion or by the Shannon diversity index. The cytokine responses were log10-transformed [log10(concentration + 1)] to obtain normally distributed variables. First, we analyzed the main effect of bacterial proportion and diversity on cytokine responses. Second, to allow for different effect sizes of bacterial proportion or diversity on cytokine responses in helminthpositive versus -negative subjects, an interaction term between bacterial categories and infection was included in the model. The *p*-value for this interaction term indicated the statistical evidence for different effect sizes in helminth-positive or -negative groups. Due to limited sample size, we restricted the analysis into estimating the general effect of helminth infection on the relationship between bacterial proportion and cytokine responses. In the same manner, due to the limitation of sample size, the direct treatment effect cannot be estimated. This means that the estimates for the infection effect on the relationship between bacterial proportions and diversity on cytokine responses were obtained from data of subjects who were infected at any time point, regardless of their randomization arm. However, as treatment removes helminth infection, the analysis of treatment effect on the relationship between bacterial proportion and cytokine responses is viewed as a proxy to understand the role of helminth in this relationship.

For this purpose, a linear mixed effect model was fitted with bacterial proportion or diversity, treatment, and time as covariates. This model was able to characterize three different associations, namely the association between bacterial proportion or diversity on cytokines at pre-treatment, the difference of the association at pre-treatment, and at post-treatment in the placebo group (time effect), as well as the difference of the association at posttreatment between albendazole and placebo group.

For each outcome separately, these models were fitted on subjects who at least had an observation at pre-treatment. The lme4 package in statistical software R was used for model fitting. The significance of the covariate effect was obtained from the likelihood ratio test. Bonferroni correction was used to adjust for multiple testing. We have applied Bonferroni correction for the number of non-correlated cytokine per stimulatory condition resulting in dividing the alpha cut-off level for significance by 2 (LPS responses) and 3(AscAg responses) and 2(PHA responses). The statistical analyses were performed in R (R Core Team, 2017) with mainly lme4 and lmerTest packages (Bates et al., 2015; Kuznetsova et al., 2017). The full record was created using the knitr package in R (Xie, 2017) and is available online at https:// github.com/Helminths\_GutMicrobes\_Cytokine/.

### RESULTS

#### The Effect of Bacterial Proportions and Diversity on In Vitro Cytokine Responses

Since it is hypothesized that gut bacteria are associated with certain cytokine responses and thereby possibly immune disorders, we set out to explore this relationship by using data from the ImmunoSPIN trial. For 66 subjects, cytokine responses were measured at pretreatment and 21 months after the start of anthelmintic treatment. At baseline, 40 out of 66 (60.6%) individuals in Ende were infected with one or more helminth species, and hookworm was the most dominant species (31.8%) followed by *A. lumbricoides* (25.7%) and *T. trichiura* (22.7%). The baseline characteristics such as age, gender, BMI and helminth prevalence were similar between the two treatment arms (**Table 1**). Three-monthly albendazole treatment for 21 months reduced the infection prevalence from 65.4% to 19.2% versus a slight increased of helminth infections from 57.5% to 65% in the placebo group (**Table S2**).

We analyzed proportions of three bacterial phyla (*Actinobacteria*, *Bacteroidetes*, and *Firmicutes*) as these were most abundant in our study population. For the cytokine responses, we selected outcomes representative of different parts of the immune system. We have opted for the pro- and anti- inflammatory (TNF and IL-10 respectively) immune response to LPS to represent the innate responses, and the Th1 and Th2 signature cytokines (IFN-gamma and IL-5 respectively) for the adaptive response to a helminth antigen (AscAg) and T-cell stimulatory PHA. When fitting the linear mixed model, no direct effect was observed of bacterial proportions or Shannon diversity on whole blood cytokine responses (**Table 2**).

#### Interference by Helminth Infection in the Effect of Bacterial Proportions and Diversity on In Vitro Cytokine Responses

To elucidate the possible role of helminth infections in the interplay of bacteria and immune responses, we conducted analyses in helminth-positive and -negative groups. For this purpose, we used observations at both pre-treatment and posttreatment. Regardless of the randomization arm, we fitted the linear mixed model on each cytokine responses as outcomes. The predictors were bacterial proportions and its interaction with helminth infection. A similar analysis was performed to estimate the association between bacterial diversity and cytokine responses. **Figure 1** illustrates the associations between bacterial proportions or diversity and cytokine responses when subjects were helminth-positive or -negative.

In the innate immune response to LPS, the *Bacteroidetes* proportion showed a significant negative association with IL-10 levels in helminth-negative subjects {estimated effect [95% confidence interval (CI)]: −1.96 (−3.05, −0.87), *p*-value = 0.001; **Figure 1B**} which shows that a unit increase of *Bacteroidetes* proportion will reduce the concentration of IL-10 to LPS almost twice as much. This association was significantly different from that of helminth-negative subjects (*p*-value for the difference = 0.002, **Figure 1**) in which the association was absent [−0.03 (−0.59, 0.53)]. The bacterial diversity had no significant association with IL-10 response to LPS (**Figure 1**). With regard to the helminthspecific cytokine responses, none of IFN-γ and IL-5 responses to AscAg were significantly associated with bacterial proportions or diversity (**Figure 1**). In the adaptive responses (PHA), none of the cytokine responses were significantly associated with the bacterial proportion in uninfected subjects (**Figure 1**). Although not significant, we noticed lower levels of IFN-γ to PHA with higher *Firmicutes* proportions [−1.57 (−3.08, −0.05), *p*-value = 0.045; **Figure 1**]. This association between *Firmicutes* proportion with IFN-γ to PHA in uninfected subjects was however significantly different from that in subjects who were infected (*p*-value for the difference = 0.009, **Figure 1**). At the same time, there was a significantly increasing concentration of IFN-γ to PHA among those who were uninfected when bacterial diversity was higher [0.95 (0.15, 1.75), *p*-value 0.022; **Figure 1**], although this association was not significantly different from the helminthpositive group [−0.07 (−0.88, 0.73), *p*-value for the difference = 0.056; **Figure 1**]. A similar negative association of *Firmicutes* was observed in IL-5 responses to PHA in uninfected subjects [−1.52 (−3.02, −0.02), *p*-value = 0.05; **Figure 1**]. Conversely increasing bacterial diversity led to slightly higher levels of IL-5 to PHA in the uninfected subjects [0.85 (0.07, 1.63), *p*-value = 0.034; **Figure 1**]. Both observations were not significantly different from the effects in those who were helminth-positive.

#### The Effect of Albendazole on the Relationship Between Bacterial Proportion and Diversity and In Vitro Cytokine Responses

We further investigated whether deworming affects the relationship between bacterial proportions or diversity and cytokine responses. For this purpose, we fitted the linear mixed model on all subjects (n = 66) to characterize the association between bacterial proportions and cytokine responses at two time points and in the two randomization arms. These analyses were irrespective of the infection status. A similar model was applied for the diversity index.

#### TABLE 1 | Characteristics of the participants at baseline.


aDiagnosed by real-time PCR; bdiagnosed by microscopy; #unclassified bacteria represents the category of sequences that could not be assigned to a phyla, and the; \*pooled category consists of the remaining 13 phyla with average relative abundance less than 1%.

TABLE 2 | The association between bacterial proportion and diversity on cytokine responses.


**Figure 2** displays the associations between the proportions of three major bacterial phyla and diversity with cytokine responses, before and after anthelminthic treatment. With regard to the relationship between *Bacteroidetes* and IL-10 response to LPS, no significant differences were observed between pre- versus post-treatment or between treatment groups (**Figure 2**). While the estimated association between *Bacteroidetes* proportion and IL-10 to LPS at pre-treatment [estimate (95% CI): −0.47 (−1.23, 0.29)] and post-treatment in placebo group [−0.11 (−0.84, 0.62)] were close to zero, the association at post-treatment in

FIGURE 1 | The association between bacterial proportion and diversity on cytokines responses in helminth-negative and -positive subjects. The effect of bacterial proportions on cytokine responses was analyzed for helminth-negative [helminth(−)] and helminth-positive [helminth(+)] groups by a linear mixed model. Estimated effects ± 95% CI are shown for the effect of Actinobacteria (A), Bacteroidetes (B), Firmicutes (C) and diversity (D) on cytokine responses. For assessing statistical significance modified Bonferroni correction was applied; \*p-value ≤ 0.025 (for LPS and PHA responses).

albendazole group was clearly lower [−1.74 (−4.08, 0.59); *p*-value for the difference between placebo and albendazole was 0.193, **Figure 2B**]. The association between IFN-γ in response to PHA and bacterial diversity was also not significant at post-treatment either in the placebo or in the albendazole group (**Figure 2D**).

The association between higher *Actinobacteria* proportion with decreasing response of TNF-α to LPS was borderline significant at pre-treatment [estimate (95% CI): −1.55 (−2.87, −0.22), *p*-value = 0.024; **Figure 2A**]. This association was significantly different to the effect of *Actinobacteria* at post-treatment when subject received placebo [2.02 (0.57, 3.47); *p*-value < 0.001; **Figure 2A**], however no difference was observed when comparing placebo and albendazole group [1.89 (0.29, 3.49), *p*-value for the difference = 0.907; **Figure 2A**]. A similar result was obtained from the association between *Actinobacteria* with IL-5 responses to AscAg. At pre-treatment, the increasing *Actinobacteria* proportions were significantly associated with less IL-5 production in response to AscAg [−3.65: (−6.34, −0.97), *p*-value = 0.009; **Figure 2A**]. This association was significantly different to the effect of *Actinobacteria* at post-treatment in placebo group [2.90 (−0.03, 5.84), *p*-value = 0.002]. Although the estimated association in albendazole group was lower [−1.42 (−4.58, 1.73), this was not significantly different between the treatment groups (*p*-value = 0.052)]. The effect of Actinobacteria on IL-10 responses to PHA changed after treatment [−0.65 (−2.33–1.02) vs 2.90 (1.07–4.73) for pre-treatment vs posttreatment placebo group, *p*-value 0.005] and was different between placebo and albendazole groups as well [albendazole group −1.72 (−3.69–0.25), *p*-value for placebo vs albendazole 0.001; **Figure 2A**].

FIGURE 2 | The association between bacterial proportion and diversity on cytokines responses at pre-treatment and post-treatment in two randomization arms. After deworming, comparisons were made for all subjects pre-treatment versus post-treatment (placebo group) and post-treatment placebo versus albendazole groups. Estimated effects of a linear mixed model ± 95% CI are depicted for the effect of Actinobacteria (A) Bacteroidetes (B) Firmicutes (C) and diversity (Shannon index) (D). \*p-value ≤ 0.025 (for LPS and PHA responses) for statistical significance between placebo and albendazole groups.

Moreover, while the association between *Firmicutes* and IL-5 response to PHA at pre-treatment was not significantly different compared to the association at post-treatment in placebo group, there was a significant difference of this association between albendazole and placebo group at post-treatment [estimate (95% CI) for placebo −1.52 (−2.83, −0.22) versus albendazole 1.84 (−0.73, 4.41), *p*-value = 0.024; **Figure 2C**].

#### DISCUSSION

This study aimed to analyze the effect of helminth infections on the relationship between gut microbiota and the immune system. We found a negative association between proportions of *Bacteroidetes* and IL-10 response to LPS in helminth-negative subjects and the presence of helminths was shown to dampen this effect. Anthelminthic treatment partly recovered this effect, although not statistically significant. To our knowledge, this is the first time that the association between the gut microbiome, presence of parasitic helminths and whole blood cytokine responses was analyzed in a longitudinal study using a randomized placebo-controlled anthelminthic trial.

IL-10 was already marked as a key anti-inflammatory cytokine involved in the induction of immune suppression by helminths (Yazdanbakhsh et al., 2002). Our observation that helminths counteract the suppressed IL-10 response to LPS in subjects with higher *Bacteroidetes* proportions supports the so called "old friends hypothesis" (Rook, 2009), stating that certain infectious agents such as helminths may have protective effects against immune dysfunction and inflammatory diseases, possibly through IL-10. This is strengthened by our observed gradient of the relative abundance of *Bacteroidetes* from rural to urban areas, where immune-related diseases are more prevalent (Bach, 2002). In contrast, a recent meta-analysis indicated that inflammatory bowel disease (IBD) patients displayed lower proportions of *Bacteroidetes* (Zhou and Zhi, 2016), however, this was only found when measuring by real-time quantitative PCR (not by conventional culture) and mainly in Asian studies. Furthermore, a member of the *Bacteroidetes* family, gut inhabitant *Bacteroides fragilis*, was shown to protect mice from experimental colitis, mediated by polysaccharide A (PSA) possibly through IL-10 induction (Mazmanian et al., 2008). However, although *B. fragilis* is the most well-known pathogen of the *Bacteroidetes*, it is the least common species in the *Bacteroidetes* phylum in the human gut (Wexler, 2007). It could therefore be that other factors or species play a dominant role in the general effect of *Bacteroidetes* on IL-10 responses. Further studies are therefore needed to assess the translation of our findings to a clinical setting, for example prevalence or activity of IBD or other autoimmune diseases. Moreover, since we have measured systemic whole blood cytokine responses, we are not sure whether this is representative for the gut responses.

A trend of negative association between *Firmicutes* and concentration of IFN-γ to PHA was seen in helminth-negative subjects only. In subjects with helminth, this association was positive, although this difference fell short of statistical significance. Parallel to this trend, the bacterial diversity was positively associated with IFN-γ responses to PHA in subjects who did not carry helminths, and in helminth-positive subjects, this association was dampened. Since a similar opposite trend was observed in the relationship between *Firmicutes* compared to bacterial diversity on IL-5 responses to PHA, we may conclude that not the proportion of *Firmicutes*, but the total bacterial diversity drove this association. *Firmicutes* was the most abundant phyla in this population and the increasing proportion of *Firmicutes* will obviously reduce diversity. This indicates that analyzing single bacterial phyla without considering the remaining phyla may lead to biased results as microbiome data is compositional and thus correlated between phyla.

Although deworming removed most helminths, treatment did not significantly alter the effects of bacterial proportions on cytokine responses. Regarding the *Bacteroidetes* effect on LPS to IL-10, we did observe a lower effect in the albendazole group compared to placebo. Although not significant, this might point towards the idea that anthelminthic treatment could restore the possibly detrimental- interaction of bacteria with immune responses. Surprisingly, we found differences in immune modulation by *Actinobacteria* in the pre- versus posttreatment group. Although there was a significant association of time (in subjects receiving a placebo), these associations were not significantly different in the albendazole group. The effect of time could be explained by other factors such as diet and possibly improved hygiene, resulting from increased awareness during the presence of our medical team in the study area. In the analysis of treatment's effect on the association between bacterial proportion and diversity, there was a significant difference between the association of *Firmicutes* on the IL-5 response to PHA in the albendazole group compared to the placebo group. In subjects receiving albendazole, *Firmicutes* proportions were positively associated with IL-5 levels, while we observed a negative (non-significant) effect in helminth-negative individuals over time. This result seems contradictory, but might be related to the fact that small numbers were analyzed and not everyone in the albendazole group lost their helminth infection. The analysis of subjects who were infected at baseline and cleared their infection would possibly reveal more clearly how the relationship between bacterial communities and immunity are affected by treatment. This analysis lacks statistical power in our study as the sample size was small (n = 12 out of 17 subjects who were successfully dewormed). Future research which involves larger sample sizes needs to be conducted. With larger sample size, the current statistical model can be extended to account for different infection status at different time points as well as different randomization arms. Thereby we would gain more insight in the responses within individuals, and how these are affected by worm infection and deworming. Another relevant thought in this and similar research settings is that although albendazole removes helminths effectively, the immunomodulatory effects of helminths on cytokine responses are long-lasting and cannot be easily corrected by short-term treatment. It was previously reported by Endara et al. that the length of periodic treatment is important for altering immune responses (Endara et al., 2010), i.e. that studies with a longer period of treatment (up to 30 months) are more likely to show effects of deworming.

As significant associations between bacterial communities and cytokine responses were only observed when subjects were helminth-negative, clearly other factors than helminth and treatment are also involved in the alteration of the microbiome community and their interaction with the immune system. For example, our study data lack information on diet. Dietary intake was clearly shown to affect bacterial communities in the gastrointestinal tract (Wu et al., 2011). This might also be related to changes in social economic status leading towards a more high-fat diet when moving from rural to urbanized areas. Recent articles reported inconsistencies with regard to the direction of *Bacteroidetes* to *Firmicutes* ratio in rural to urban comparisons of microbiome profiles from different geographical areas. Studies comparing children from Bangladesh to USA children showed the direction of increasing Bacteroidetes: *Firmicutes* in the USA, as observed in our data (Lin et al., 2013), while studies in elderly Korean and children in Burkina Faso showed opposing results, i.e. decreasing *Bacteroidetes*: *Firmicutes* ratios from rural to urban (Park et al., 2015; Filippo et al., 2017). This could be caused by different genera under *Bacteroidetes* or *Firmicutes* phyla which might be affected by certain type of diet. Therefore, it will be beneficial for future studies to also include dietary factors from the study participants.

A further limitation is related to the statistical tools available in analyzing this relationship. Here, we characterized the association of three single bacterial proportions on cytokine response in the helminth-positive and -negative group. Using this approach, we first ignore the effect of compositional structure in the microbiome data, namely when computing the *p*-value we assumed that these bacterial categories are independent while they are correlated. Secondly, the current statistical model ignores the fact that microbiome is a variable measured with errors at a different scale than the cytokine responses (Teixeira-Pinto et al., 2009). In addition, we might as well ignore the possible unobserved confounders. It is therefore important for future studies in this field to develop a statistical method to characterize the effects of helminth infection on both outcomes simultaneously by accounting these unobserved errors with a joint model.

To conclude, our findings support the hypothesis for the role of helminths in modulating the immune response, which might be related to bacterial proportion and diversity. Deworming did not show a particular effect on the observed associations. It is therefore important to repeat such studies with a larger sample size as well as using more advanced statistical models to further analyze this relationship by considering the complex structure of microbiome data and other possible confounders.

### DATA AVAILABILITY STATEMENT

The microbiome datasets generated during the current study are available at the NCBI's Sequence Read Archive (SRA) accession numbers: SAMN07688522 to SAMN07688545. The ~4 million 16S assembled sequences from Indonesia samples are also available for download from Nematode.net (Nematode.net/ Microbiome.html). The datasets supporting the conclusions of this article are available in the following Github address: https:// github.com/IvonneMartin/Helminths\_GutMicrobes\_Cytokine.

### REFERENCES


### ETHICS STATEMENT

This study was nested within the ImmunoSPIN study, a double-blind placebo-controlled trial conducted in Flores Island, Indonesia (Wiria et al., 2010). The ImmunoSPIN study has been approved by the Ethical Committee of Faculty of Medicine, Universitas Indonesia, ref:194/PT02.FK/Etik/2006 and has been filed by ethics committee of the Leiden University Medical Center. The clinical trial was registered with number: ISRCTN83830814. The subjects gave their informed consent either by written signature or thumb print. Parental consent was obtained for children below 15 years old.

## AUTHOR CONTRIBUTIONS

IM, JH-D, and LW performed the analyses and wrote the manuscript. MK, AW, FH, YD, ES, and LW performed research on cytokine data. BR performed the microbiome data processing. TS, MM, and MY designed the study. All authors read, edited and approved the final manuscript.

### FUNDING

This study was funded by The Royal Netherlands Academy of Arts and Science (KNAW), Contract: 57-SPIN3-JRP. Sequencing data generation was supported by National Institutes of Health (NIH), grant number U54HG003079 and AI081803. The doctoral research of IM was supported by joint scholarships from the Directorate General of Resources for Science Technology and Higher Education (DGRSTHE) of Indonesia – Leiden University. The findings and conclusions contained within are those of the authors. The funders had no role in data collection, analysis, and interpretation of the data, had no role in the writing of the manuscript and in the decision to publish.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene.2019.01028/ full#supplementary-material

associated with alterations in the faecal microbiota. *PLoS ONE.* 8, e76573. doi: 10.1371/journal.pone.0076573


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2019 Martin, Kaisar, Wiria, Hamid, Djuardi, Sartono, Rosa, Mitreva, Supali, Houwing-Duistermaat, Yazdanbakhsh and Wammes. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Genomic Epidemiology in Filarial Nematodes: Transforming the Basis for Elimination Program Decisions

Shannon M. Hedtke1\*, Annette C. Kuesel <sup>2</sup> , Katie E. Crawford<sup>1</sup> , Patricia M. Graves <sup>3</sup> , Michel Boussinesq<sup>4</sup> , Colleen L. Lau<sup>5</sup> , Daniel A. Boakye<sup>6</sup> and Warwick N. Grant <sup>1</sup>

#### Edited by:

Makedonka Mitreva, Washington University School of Medicine in St. Louis, United States

#### Reviewed by:

Rajeev Kumar Mehlotra, Case Western Reserve University, United States Scott Small, University of Notre Dame, United States Erik Andersen, Northwestern University, United States

> \*Correspondence: Shannon M. Hedtke S.Hedtke@latrobe.edu.au

#### Specialty section:

This article was submitted to Evolutionary and Genomic Microbiology, a section of the journal Frontiers in Genetics

Received: 30 April 2019 Accepted: 21 November 2019 Published: 09 January 2020

#### Citation:

Hedtke SM, Kuesel AC, Crawford KE, Graves PM, Boussinesq M, Lau CL, Boakye DA and Grant WN (2020) Genomic Epidemiology in Filarial Nematodes: Transforming the Basis for Elimination Program Decisions. Front. Genet. 10:1282. doi: 10.3389/fgene.2019.01282 <sup>1</sup> Department of Physiology, Anatomy and Microbiology, La Trobe University, Bundoora, VIC, Australia, <sup>2</sup> Unicef/UNDP/World Bank/World Health Organization Special Programme for Research and Training in Tropical Diseases (TDR), World Health Organization, Geneva, Switzerland, <sup>3</sup> College of Public Health, Medical and Veterinary Sciences, James Cook University, Cairns, QLD, Australia, <sup>4</sup> Unité Mixte Internationale 233 "TransVIHMI", Institut de Recherche pour le Développement (IRD), INSERM U1175, University of Montpellier, Montpellier, France, <sup>5</sup> Department of Global Health, Research School of Population Health, Australian National University, Acton, ACT, Australia, <sup>6</sup> Parasitology Department, Noguchi Memorial Institute for Medical Research, Accra, Ghana

Onchocerciasis and lymphatic filariasis are targeted for elimination, primarily using mass drug administration at the country and community levels. Elimination of transmission is the onchocerciasis target and global elimination as a public health problem is the end point for lymphatic filariasis. Where program duration, treatment coverage, and compliance are sufficiently high, elimination is achievable for both parasites within defined geographic areas. However, transmission has re-emerged after apparent elimination in some areas, and in others has continued despite years of mass drug treatment. A critical question is whether this re-emergence and/or persistence of transmission is due to persistence of local parasites—i.e., the result of insufficient duration or drug coverage, poor parasite response to the drugs, or inadequate methods of assessment and/or criteria for determining when to stop treatment—or due to re-introduction of parasites via human or vector movement from another endemic area. We review recent genetics-based research exploring these questions in Onchocerca volvulus, the filarial nematode that causes onchocerciasis, and Wuchereria bancrofti, the major pathogen for lymphatic filariasis. We focus in particular on the combination of genomic epidemiology and genome-wide associations to delineate transmission zones and distinguish between local and introduced parasites as the source of resurgence or continuing transmission, and to identify genetic markers associated with parasite response to chemotherapy. Our ultimate goal is to assist elimination efforts by developing easy-to-use tools that incorporate genetic information about transmission and drug response for more effective mass drug distribution, surveillance strategies, and decisions on when to stop interventions to improve sustainability of elimination.

Keywords: population genomics, onchocerciasis, lymphatic filariasis, transmission, parasite elimination, drug resistance, epidemiology

#### INTRODUCTION

Onchocerciasis and lymphatic filariasis (LF) are targeted for elimination, primarily using community-wide mass drug administration (MDA). In April 2019, the World Health Organization (WHO) initiated a global consultation on the "Roadmap for Neglected Tropical Diseases," including current targets for global elimination of LF as a public health problem by 2020 (World Health Organozation, 2012a) and elimination of transmission of the parasite that causes onchocerciasis in 80% of affected sub-Saharan countries by 2025 (African Programme for Onchocerciasis Control, 2012). Barriers to elimination include operational challenges for adequate drug distribution or noncompliance leading to low treatment coverage (Churcher et al., 2009; Turner et al., 2013; African Programme for Onchocerciasis Control, 2015), movement of humans or vectors among communities (De Sole et al., 1991; African Programme for Onchocerciasis Control and World Health Organization, 2010; Ramaiah, 2013; Xu et al., 2018b; Delineating Transmission Zones for Sustainable Onchocerciasis and LF Elimination), and challenges in determining when the transmission cycle has been interrupted and treatment can be stopped (Stolk et al., 2015; Lont et al., 2017; Walker et al., 2017). Elimination of these diseases has been formally verified or validated by the WHO in some geographic regions, and is under assessment in others (Parasite Transmission, Morbidity, and Control/Elimination Strategies). However, there are areas where transmission has re-emerged after apparent (though not WHO-certified) elimination, and others where transmission has continued despite decades of MDA. A critical question is whether "local" parasites continue to persist because of insufficient MDA coverage and/or duration relative to the pre-intervention endemicity, poor parasite drug response, or inadequate methods of assessment and/or criteria for decisions to stop treatment, or whether "immigrant" parasites have been introduced via human or vector movement from another endemic area. Population genetic analyses of parasite DNA sequences may provide data to answer this question. With the increasing technical feasibility of gathering population-level genomic data, it becomes possible to develop tools allowing routine acquisition of such data by elimination programs to inform decisions on MDA strategies, delineate areas to be included in evaluations, and determine whether treatment should be continued or stopped.

#### Parasite Transmission, Morbidity, and Control/Elimination Strategies Onchocerciasis

Onchocerca volvulus is transmitted by blackflies of the genus Simulium. Female blackflies taking a blood meal from an infected person ingest O. volvulus microfilariae, which develop into infective L3 larvae that are transmitted to a different host during subsequent blood meals (Figure 1A). The L3 larvae develop via L4 into adult worms, macrofilariae, which live for around 12–14 years in nodules under the skin and deep in the body, producing millions of microfilariae. The microfilariae migrate through the skin, eyes, and other organs, and live for up to 2 years. The immunological reactions upon death of microfilariae are the major cause of morbidity: primarily itching, skin depigmentation and lesions, and visual impairment that can progress to blindness (Remme et al., 2017). Increasing evidence suggests that high O. volvulus infection can induce epilepsy (Chesnais et al., 2018).

The pattern of onchocerciasis endemicity is determined by ecological conditions favorable for Simulium breeding sites, whose location and productivity (i.e., number of blackflies hatching) can vary seasonally and from year to year. Simulium damnosum sensu lato (s.l.), the main vector in Africa, breeds in fast-flowing rivers and streams with the high level of oxygenation needed for larval development. The flies typically have an active flight range of around 15 km, but can migrate seasonally over hundreds of kilometers if assisted by prevailing winds, and thus transmit the parasite over large geographic ranges (Garms et al., 1979; Le Berre et al., 1979; Baker et al., 1990). Infection rates are highest among those living and/or working close to consistently highly productive breeding sites and decline with increasing

FIGURE 1 | Life cycles of (A) Onchocerca volvulus and (B) Wuchereria bancrofti. From Centers for Disease Control and Prevention (accessed April 2019, https:// www.cdc.gov/dpdx/az.html).

distance between where blackflies breed and where people live and work.

The morbidity and socio-economic impact of onchocerciasis motivated three control programs. The Onchocerciasis Control Programme in West Africa (OCP, 1974–2002) targeted elimination as a public health problem through weekly aerial larviciding of breeding sites along ~50,000 km of rivers for the life span of macrofilariae, later complemented in some areas by mass drug administration of ivermectin (MDAi). The OCP originally covered ~654,000 km<sup>2</sup> , which was expanded to 1,300,000 km<sup>2</sup> because Simulium from outside invaded the original area (Le Berre et al., 1979; Baker et al., 1990; Remme, 2004; Boatin, 2008).

The 1987 decision of Merck & Co., Inc. (Kenilworth, NJ, USA) to donate ivermectin (Mectizan®) for onchocerciasis control, and subsequent studies showing the safety of MDAi, allowed initiation of MDAi-based control programs. The Onchocerciasis Elimination Program for the Americas (OEPA) was initiated in 1993 to eliminate transmission in 13 generally small foci across six countries with a total at-risk population of 0.56 million. Building on early efforts to control onchocerciasis through nodulectomy campaigns and annual ivermectin treatments, the OEPA implemented health system–directed biannual MDAi (and even quarterly MDAi in four areas in Brazil, Mexico, and Venezuela). Elimination was certified by WHO for all but the large Amazonian area across the border between Brazil and Venezuela, where difficult terrain and lack of roads make it challenging to ensure ivermectin distribution to ~30,500 Yanomami across ~540 migratory communities (Sauerbrey et al., 2018; World Health Organization, 2018; Figure 2).

The APOC (1995–2015) initially targeted control of onchocerciasis as a public health problem in Central and East Africa and Liberia with sustainable, community-directed ivermectin treatment (CDTI). Across the 19 APOC countries, ~86 million people were estimated to require CDTI because they lived in meso- and hyperendemic areas, i.e., where 20–40% and >40% of adults, respectively, have subcutaneous nodules (Noma et al., 2014; Figure 3). These nodule prevalences correspond to prevalences of skin microfilariae in the general population of around 40–60% for mesoendemic and >60% for hyperendemic areas associated with an increased risk of onchocercal blindness (Prost et al., 1979; UNDP/World Bank/WHO Special Programme for Research and Training in Tropical Diseases, 1992; Noma et al., 2002; Seketeli, 2002; Seketeli et al., 2002; Fobi et al., 2015). The sizes of these areas range from relatively small to a vast contiguous endemic area of ~2 million km<sup>2</sup> across seven countries (Noma et al., 2014; World Health Organization, 2018). Research and subsequent epidemiological evaluations of infection rates in areas with long-term MDAi (Diawara et al., 2009; Traore et al., 2012; Tekle et al., 2016) suggested that MDAi alone could achieve elimination of transmission in Africa. The APOC objectives were consequently expanded to onchocerciasis elimination in at least 80% of African countries by 2025 (African Programme Onchocerciasis Control, 2012). In 2015, APOC was closed. The Expanded Special Project for Elimination of Neglected Tropical Diseases (ESPEN) now provides countries with some of the support previously provided by APOC.

MDAi in meso- or hyperendemic areas may have reduced infection prevalence in neighboring hypoendemic areas. Previously unmapped hypoendemic areas that might impact elimination goals (Rebollo et al., 2018) by acting as a potential source of renewed transmission need to be efficiently determined and cost-effective intervention strategies implemented in these areas (and those previously identified as hypoendemic and not included in CDTI). Including the population in hypoendemic areas mapped by 2017–2018, WHO estimated that 204 million people required MDAi to achieve elimination (World Health

FIGURE 3 | Rapid epidemiological mapping of onchocerciasis (REMO), based on prevalence of nodules across communities with a history of onchocerciasis and covered by the African Programme for Onchocerciasis Control (APOC). Figure from Zouré et al., (2014).

Organization, 2018). In Central Africa where another filarial parasite, Loa loa, is endemic, areas previously identified as requiring interventions based on parasites observed in the skin may require remapping. Recent data suggest that L. loa microfilariae, once assumed to be present only in blood, may also be present in the skin and mistaken for O. volvulus (Nana-Djeunga et al., 2019).

The OCP experience (Remme et al., 1995) informed the 2001 WHO procedures for entomological assessments and criteria for certification of onchocerciasis elimination (World Health Organization, 2001). Those procedures stipulated testing at least 10,000 flies from each endemic community selected for evaluation by the WHO International Certification Team, maximizing the chances of detecting infected flies by collecting during peak transmission season. In 2010, APOC outlined a "Conceptual Framework of Onchocerciasis Elimination with Ivermectin Treatment" which built on the OCP and APOC experience, took into account differences between control via vector elimination and via ivermectin treatment, and emphasized that much is still to be learned (African Programme for Onchocerciasis Control/World Health Organization, 2010; Dadzie et al., 2018). This framework defined elimination as "reduction of infection and transmission to the extent that interventions can be stopped but postintervention surveillance is still necessary" and a transmission zone as a "geographical area where transmission of O. volvulus occurs by locally breeding vectors and which can be regarded as a natural ecological and epidemiological unit for interventions." It identified delineating transmission zones and examining risks of parasite reintroduction through human or vector migration as critical to decisions to stop MDA. Acknowledging that "in practice it is difficult to determine with a fair degree of certainty that vectors in a given area are exclusively locally breeding," the framework recommended an operational definition of a transmission zone as a "river basin, or a major section of a river basin, where onchocerciasis is endemic and where the river is the core of the endemic area, with communities with the highest prevalence of infection generally located close to the river and infection levels falling with increasing distance from the river until they become negligible or reach a neighboring transmission zone." In 2016, the WHO issued a new guideline for stopping MDA and verifying elimination of human onchocerciasis (World Health Organization, 2016). This guideline defines interruption of transmission of O. volvulus as "the permanent reduction of transmission in a defined geographical area after all the adult worms (and microfilariae) in the human population in that area have died, been exterminated by some other intervention, or become sterile and infertile." While this guideline does not specify criteria for a "defined geographical area," it defines a transmission zone in the same way as the APOC framework. The 2016 Guideline Development Group recommended that decisions about interruption of parasite transmission, stopping MDA, and confirmation of transmission interruption at the end of the post-treatment surveillance period should be based on a prevalence of infective parous flies of <0.1%. Statistical confidence for this elimination threshold requires that a minimum of 6,000 flies collected from "a transmission zone" should be tested. The guideline does not address delineation of a transmission zone and hence the geographical range over which those flies are to be collected. To date, no validated methodology for transmission zone delineation is available. We review evidence here that population genetic measures of the parasite (and its vectors) capture the history of transmission and hence can be used as a proxy for defining transmission zone boundaries.

#### Lymphatic Filariasis

LF is a vector-borne disease caused by three species of parasites: Wuchereria bancrofti, Brugia malayi, and Brugia timori. The majority of infections are due to W. bancrofti, which is found throughout the tropics (Figure 2). Chronic infection with these parasites can disrupt lymphatic function and lead to long-term disability due to lymphedema (most commonly swollen lower limbs) and scrotal hydrocele (Taylor et al., 2010; World Health Organzation, 2011). Globally, ~68 million are infected with LF, with 36 million microfilaremic persons, 19 million hydrocele cases, and 17 million lymphedema cases (Ramaiah and Ottesen, 2014). In 1997, following a World Health Assembly resolution, the WHO initiated elimination of LF as a public health problem, using a two-pillar program of annual MDA to reduce microfilaremia to levels that would not sustain continued transmission, and management of morbidity and disability by providing a minimum package of care (Ichimori et al., 2014).

Infective larval LF are transmitted to humans by mosquitoes and migrate to the lymphatic system where they develop into adult worms that live for ~4 to 6 years (Figure 1B). After mating, female adult worms produce thousands of microfilariae throughout their life, which circulate in the peripheral blood and are available to infect mosquitoes during blood meals to continue the transmission cycle. LF is transmitted by a wide range of mosquito genera including Anopheles, Culex, Mansonia, and Aedes (World Health Organization, 2013). In a striking example of the power of adaptive evolutionary change, the daily periodicity in density of microfilariae in peripheral blood is tuned to the biting times of local vectors.

Human infections are diagnosed by detecting microfilariae on blood films collected at the appropriate time of day, or by detecting circulating filarial antibodies or antigens (Ag). Detecting microfilariae on blood films requires technical skill and equipment and collecting blood at night (where vectors are nocturnal) is challenging, so serology is usually used rather than microscopy. The current standard test for programmatic surveillance activities in W. bancrofti areas is a rapid diagnostic test for circulating filarial antigens (World Health Organization, 2017b). Antibody tests are not usually used because antibodies can persist for years after the death of adult worms. Both overall and age-specific prevalence of LF infection (usually not peaking until adulthood) is determined by transmission intensity, which in turn depends on vector biting rates and the availability of susceptible humans. The threshold for starting MDA programs is microfilariae or antigen prevalence of ≥1% within an implementation unit (IU), which is defined prior to mapping (World Health Organization, 2011; World Health Organization, 2017b). The size of the IU was left to the judgment of the endemic countries; some chose to map by village or district, while others classified the entire country or regions. In areas in Africa co-endemic for loiasis (L. loa filariasis), the use of antigenbased test strips for mapping has probably overestimated LF prevalence, because of false positives from people with high intensity infection with L. loa (Wanji et al., 2015; Pion et al., 2016). Thus, mapping should be revisited in loiasis co-endemic areas using approaches that increase species specificity.

MDA is conducted annually with a single-dose combination of albendazole with diethylcarbamazine (DEC) or, in onchocerciasis co-endemic countries, with ivermectin. Recently, the three drugs have been combined in some countries (King et al., 2018), including those where elimination targets have not been reached despite completing the required rounds of two-drug MDA (World Health Organization, 2017a). In loiasis co-endemic areas in Central Africa not meso- or hyperendemic for onchocerciasis, biannual albendazole treatment in combination with vector management is recommended, since DEC and ivermectin can cause serious adverse reactions in individuals with high L. loa microfilaremia (Boussinesq et al., 2003; World Health Organization, 2012b).

The WHO recommendation is for MDA to be conducted until large, well-designed transmission assessment surveys (TASs) of 6- to 7-year-old children in schools or communities determine that the estimated antigen prevalence in an evaluation unit has dropped below a critical cutoff value, when transmission is not considered sustainable (World Health Organization, 2011; World Health Organization, 2017b). Critical cutoff values depend on the filarial and vector species, and are calculated such that the likelihood of an evaluation unit passing is at least 75% if true antigen prevalence is 0.5%, and no more than 5% if the true antigen prevalence is 1% (World Health Organization, 2011; World Health Organization, 2017b). For example, in regions where W. bancrofti is endemic and transmission is dominated by Aedes spp. mosquitoes, the threshold is based on an upper 95% confidence limit for antigen prevalence of 1%. The TAS evaluation unit can be the same as the IU, it can be larger and include several IUs, or it can be smaller (by splitting an IU), depending on available information on likely residual LF endemicity. Once a country has passed two post-MDA TASs at 2- to 3-year intervals in all evaluation units, it can apply for validation of elimination of LF as a public health problem. Postvalidation surveillance also uses TAS in 6- to 7-year olds (World Health Organization, 2017b). Morbidity management and rehabilitation must continue as needed.

### Delineating Transmission Zones for Sustainable Onchocerciasis and LF Elimination

#### Onchocerciasis

The mosaic of onchocerciasis hyper-, meso-, and hypoendemicity in Africa implies an underlying geographic mosaic of parasite transmission zones (Figure 3; African Programme for Onchocerciasis Control/World Health Organization, 2010) that is the product of long-term, historical spatial density and migration patterns of the blackfly vector and of the human host (Blouin et al., 1995; Nadler, 1995; Jarne and Théron, 2001; Criscione and Blouin, 2004; Criscione et al., 2005; Prugnolle et al., 2005; Barrett et al., 2008). This pattern was not important for implementation of "project areas" for control of onchocerciasis as a public health problem (i.e., morbidity reduction and prevention), which were delineated based on the administrative borders of health system units in which meso- and hyperendemic areas were located. However, with the switch from control as a public health burden to elimination of transmission, explicit consideration should be given to the alignment of intervention areas and the areas included in the evaluations for decisions to stop treatment (subsequently referred to as "evaluation areas") with the true, long-term parasite transmission zones.

Although the success of MDAi has driven the move from control to elimination of onchocerciasis, there are some areas in Africa in which many years of MDAi have reduced infection prevalence less than expected (i.e., less than transmission models predict) or where resumption of transmission has been reported following cessation of interventions (Kamga et al., 2016; Tekle et al., 2016; Koala et al., 2017; Bakajika et al., 2018; Komlan et al., 2018; Koala et al., 2019). Transmission models predict that the duration of interventions required to interrupt transmission is dependent on vector abundance (which determines the percentage of the population infected before interventions begin), and the treatment frequency and percentage of the population taking ivermectin (African Programme for Onchocerciasis Control/World Health Organization, 2010; Stolk et al., 2015). Consequently, the apparent failure of MDAi to interrupt transmission in some treatment areas, and of posttreatment recrudescence in others, might be due to highly hyperendemic areas requiring longer-than-assumed duration of MDAi (and/or higher treatment coverage and/or more frequent treatment; Stolk et al., 2015) or higher-than-appropriate thresholds of vector infectivity for deciding to stop interventions. An alternative explanation that has yet to be modeled, or considered explicitly in any guideline, is that continuing transmission/recrudescence may be a result of continuing parasite invasion from neighboring areas; i.e., there may be discordance between an intervention or evaluation area and the parasite transmission zone within which the area is located. In order to achieve sustainable elimination, transmission must be interrupted in the whole transmission zone, which needs to be effectively delineated to ensure that the whole zone is included in both interventions and evaluations.

The risk of parasite invasion is illustrated by two reports from river basins in Burkina Faso where transmission was considered to have been permanently interrupted. In the Comoé basin, retrospective analysis of recrudescence suggested that long-range vector movement was responsible, although immigration of infected people from outside the basin was not excluded. Ongoing transmission was confirmed via entomological assessments at 4/4 capture points (Koala et al., 2017; Koala et al., 2019). The second Burkina Faso report investigated microfilariaepositive individuals in the Upper Mouhoun, Nakambé , and Nazinon river basins and found all but 1 of the 31 infected individuals recently resided in neighboring Cô te d'Ivoire (Nikièma et al., 2018). The remaining person had been previously identified as infected in 2008 (Nikièma et al., 2018). Thus,migration between an area with and an area without ongoing transmission accounted for 30/31 of the identified infected people.

The two cases described above point to immigrant parasites as the potential source of renewed transmission and emphasize the importance of knowing whether or not new infections are due to parasites which "immigrated" via vectors or humans for determining areas to be included in appropriate treatment strategies. Similarly, the estimated risk of immigration via humans or vectors is important for elimination programs to make cost–risk based decisions: i.e., evaluate the cost of stopping MDAi based on the risk of resurgence due to immigration from areas that do not yet meet stopping criteria versus the cost of continuing MDAi until these other areas also meet stopping criteria. Finally, such information can provide lessons for evaluations required for decisions to stop interventions, including delineation of evaluation areas, sampling criteria, or thresholds for MDAi cessation, which are all currently determined independent of vector abundance.

#### Lymphatic Filariasis

In 2012, 73 countries were regarded as LF endemic (Ramaiah and Ottesen, 2014), with 55 implementing MDA. By October 2019, 15 countries had been validated as having eliminated LF as a public health problem: Vanuatu, Tonga, Niue, Cook Islands, Republic of the Marshall Islands, Palau, Wallis & Futuna, Kiribati, Cambodia, and Vietnam in the WHO Western Pacific Region (WPRO); Sri Lanka, Maldives, and Thailand in the WHO South East Asian Region (SEARO); Egypt in the WHO Eastern Mediterranean Region (EMRO); and Togo in the WHO African Region (AFRO) (World Health Organization, 2019) (Figure 2).

While 15 countries have achieved validation of elimination sofar (and some are nearing this milestone), there are countries where progress toward elimination has stagnated or reversed, including several in the Pacific where LF is transmitted by efficient day-biting Aedes vectors and where there may also be supplementary nightbiting vectors (Ichimori and Graves, 2017). These countries include American Samoa, Samoa, French Polynesia, and Fiji.

LF is relatively heterogeneous in its distribution within countries, and thus the role of movement between intervention/evaluation units in maintaining or re-establishing transmission is a concern for many national LF elimination programs. Transmission zones in W. bancrofti are highly dependent on variation in vector density, survival, and competence for transmission and whether vector mosquitoes are diurnal, nocturnal, or sub-nocturnal. In areas with postintervention recrudescence, there are sometimes infection "hot spots": a localized area where prevalence is significantly higher than surrounding areas (e.g., American Samoa: Lau et al., 2014; Lau et al., 2016; Lau et al., 2017; Sri Lanka: Rao et al., 2014; Rao et al., 2017; Rao et al., 2018). Potential explanations for recrudescence include expansion of infections from hot spots when mosquitoes take up and transmit parasites to geographically nearby communities or when infectious people visit other communities, slow but insidious increase in frequency and intensity of transmission from widespread infections at low prevalence that were not detected or were considered (incorrectly) to be under the threshold for stopping MDA, or migrants from other endemic regions continue to introduce new parasites, if the evaluation unit is inappropriately small.

The role of population movement between countries and regions within countries in maintaining or re-establishing transmission is a concern for many national elimination programs. Continued surveillance post-MDA has been recommended by researchers when communities are in close proximity to cross-border regions with ongoing transmission (Ramaiah, 2013; Dorkenoo et al., 2018). The possibility that highly mobile migrant workers from Myanmar introduced and maintained LF transmission in Thailand, after effective MDA throughout most of the latter country, resulted in additional rounds of MDA in border communities and sparked epidemiological research on the connectivity between communities (Bhumiratana et al., 2010; Satimai et al., 2011; Bhumiratana et al., 2013; Toothong et al., 2015; Dickson et al., 2017). The influx of thousands of refugees from Haiti, where LF is hyperendemic, to Brazil, where LF transmission has been reduced and even eliminated in some states, has prompted screening and treatment of recent immigrants to reduce risk of re-introduction (Nunes et al., 2016; da Silva et al., 2017; Zuchi et al., 2017). Even in the remote islands of Samoa and American Samoa, models have shown that movement of people between countries could have a significant impact on the transmission of infectious diseases, including LF (Xu et al., 2018b). The role of movement of infected persons in the establishment and persistence of hot spots, and thus the appropriate choice of the size for implementation and evaluation units (both in terms of geographic size and total population) for LF programmatic activities needs to be critically evaluatedfor their impact on elimination progress. In contrast, long distance vector dispersal is not typically considered a major risk for increasing the distribution or prevalence of the LF parasite (Hapairai et al., 2013; Verdonschot and Besse-Lototskaya, 2014, but see Huestis et al., 2019).

Crucial questions to be addressed for sustainable LF elimination are thus: (1) what drives development of hot spots, and their persistence after MDA, (2) what are effective methods for post-MDA surveillance, when very low infection prevalence becomes difficult to detect, (3) how to determine appropriate target thresholds for successful interruption of transmission in different settings (e.g., areas with different mosquito vectors), (4) what strategies should be used to delineate intervention/evaluation units, (5) what role does travel/migration (i.e., populationmovement and connectedness) play in the transmission of parasites between countries and between regions within countries, and (6) how do transmission dynamics among regions affect planning for future MDA? To stop interventions and enter the "post-MDA surveillance" phase in a specific geographic area requires reasonable certainty that the probability of importation of parasites is too low to re-initiate transmission, i.e., that stopping criteria are met across the whole transmission zone. In other words, stopping MDA for LF also requires delineation of transmission zone boundaries.

#### PARASITE POPULATION GENETICS AND TRANSMISSION ZONES

Filarial nematodes have nuclear genomes on the order of ~90–100 Mbp, mitochondrial genomes of ~13–14 Kbp, and most, but not all, species obligately carry a Wolbachia endosymbiont that has its own genome of ~1 Mbp (Keddie et al., 1998;Keddie et al., 1999; Unnasch andWilliams, 2000; Foster et al., 2005;McNulty et al., 2012;Ramesh et al., 2012; Desjardin et al., 2013; Cotton et al., 2016; Small et al., 2016; Chung et al., 2017). Sequencing, assembling, and annotating all three genomes for many helminth species, including the filariae, is the objective of continuing efforts, and the availability and assembly quality of parasite genomes will likely continue to increase rapidly. Currently, the O. volvulus genome has the highest quality assembly, with complete assemblies of mitochondrial and Wolbachia genomes and nuclear genome sequence scaffolds that are approaching whole chromosomes. Genome resources for this and other parasites can be found at https://parasite.wormbase.org/index.html.

With a few notable exceptions (Choi et al., 2016; Small et al., 2016; Doyle et al., 2017; Small et al., 2019), most genomic research for filarial nematodes has focused on predictions of gene function. The aim of many of these "functional" predictions is to identify targetsfor new drug discovery, usually in the absence ofmethodsfor functional genomic validation of targets. Given the time for moving from identification of potential drug targets to regulatory registration and WHO guidelines for new drugs, it is unclear to what extent drugs emerging from such research are going to be available for use by LF and onchocerciasis elimination programs.

In contrast, relatively little attention has been paid to the potential for the explosion of genomic information to support elimination efforts with currently used drugs or new applications of drugs such as moxidectin, which achieved regulatory registration with the U.S. Food and Drug Administration in 2018 and is undergoing additional studies for inclusion in WHO guidelines (Opoku et al., 2018). Understanding the geographic distribution of variation in susceptibility to those drugs whether, for example, drug-naive populations already contain alleles for reduced susceptibility—may allow us to act to preserve susceptibility to those drugs. Population genomics analyzes whole-genome variation within and between parasite populations, identifying where that variation might impact drug efficacy (the evolution of drug resistance) or elucidating patterns of disease transmission (genetic epidemiology). We review applications of population genomics to informing when and where to cease treatment, on the quantifiable risk of recrudescence through migration of infected humans/vectors from other areas (O. volvulus Population Genetic Structure and W. bancrofti Population Genetic Structure), and on the long-term sustainability of MDA if poor response to drugs is heritable (Parasite Population Genomics for Understanding Geographic and Temporal Variability in Drug Response).

#### Conceptual Approach

Genetic structure is the result of movement of parasites within, but not between, transmission zones: parasites within a transmission zone interbreed and are genetically more similar to each other than to parasites in another transmission zone, with whom they do not interbreed. The likelihood that parasites are from the same transmission zone, and thus likelihood of parasite transmission between two locations (invasion), can therefore beinferredfrom the degree of genetic relatedness between them (e.g., Blouin et al., 1995; Nadler, 1995; Criscione and Blouin, 2004; Small et al., 2014). Statistically significant genetic differentiation at neutral loci (i.e., loci which do not affect a phenotype subject to selection pressure) between two parasite samples indicates that they originatefrom two separate transmission zones (Figure 4). In contrast, high genetic similarity among parasites indicates either historical or ongoing gene flow (i.e., interbreeding), and that these more closely related parasites are from a single transmission zone, regardless of geographical scale. This genetic population structure—or lack of it—is the result of long-term patterns of transmission. Measures of population genetic structure could thus help programs assess the risk of resurgence associated with alternative scenarios on a longer time scale than intrinsically shorter-term decisions aboutwhere and whom to treat. Population genetic measures have proven to be informative for malaria epidemiology and control, particularly for investigating sources of transmission or the origins and prevalence of drug resistant strains (Jennison et al., 2015; Lo et al., 2017; Aydemir et al., 2018; Fola et al., 2018; Waltmann et al., 2018).

In population genetic analyses, sequence data are collected from parasites from multiple geographic locations. One approach to analyzing these data is to compare and cluster sequences based on the distribution of genetic variation across a geographic area: members of the same parasite population share similar patterns of genetic sequence variation because they have been interbreeding (e.g., Pritchard et al., 2000; Jombart et al., 2010). An alternative approach is to explore the history of parasite movement by simulating the expected distribution of genetic variation under different parasite migration histories, and comparing test statistics derived from these simulated data to test statistics calculated using the observed genetic variation (e.g., Laval and Excoffier, 2004; Peng and Kimmel, 2005). Development of analytical methods suitable for quantitative assessment of transmission of parasites within and among geographic regions is an active area of research, and more indepth reviews of these methods can be found elsewhere (e.g., Schawbl et al., 2017; Wesolowski et al., 2018).

Prior to MDA, filarial parasite populations had a widespread, continuous distribution (e.g., onchocerciasis in Central/East Africa; Figure 3). Because MDA (or vector control) reduces or interrupts transmission, parasite population sizes fluctuate over time, and their geographic distribution becomes increasingly patchy. Parasite movement within a large geographic area in which parasite populations were reduced to zero only in some subareas can lead to transmission being re-established where it was previously interrupted. These re-established parasite populations and the parasite populations from which they derive form what in ecology and population genetics is called a "metapopulation" (Levins, 1969; Ariey et al., 2003; Churcher et al., 2008; Tadiri et al., 2018). Where there is a pattern of reestablishment post-intervention, parasites within a metapopulation are genetically similar, and the geographic range of this metapopulation defines the transmission zone.

Parasitic nematodes tend to have higher genetic diversity than vertebrates (e.g., Blouin et al., 1992; Grant and Whitington, 1994; Blouin et al., 1995; Hawdon et al., 2001; Prichard, 2001), possibly due to their complex life cycles, transmission epidemiology (Criscione et al., 2005), and large population size. Although the high genetic diversity in parasites may require a large sample size to ensure the data are representative, estimating population genetic structure does not require a complete, assembled genome. Researchers target genomic regions identified as sufficiently informative to reveal relatedness among individuals. For example, estimating genetic diversity and population structure using mitochondrial sequence data is well established in population genetics. The advantages of mitochondrial markers include that they are maternally inherited, have a higher mutation rate than the nuclear genome (10- to 17-fold; although this may be lower in species associated with Wolbachia; Cariou et al., 2017), have a high copy number relative to the nuclear genome in each cell (useful for DNA sequencing of samples with low concentration or degraded DNA), and do not generally recombine. Mitochondrial genomes upon which markers for population genetics can be built are also widely available across filarial nematodes, including O. volvulus (Keddie et al., 1998; Unnasch and Williams, 2000), W. bancrofti (Ramesh et al., 2012; Small et al., 2016), B. malayi (Williams et al., 2000), L. loa (Higazi et al., 2004), and the canine heartworm, Dirofilaria immitis (Hu et al., 2003).

There are advantages to conducting population genetic analyses that include nuclear markers: they are inherited from both parents, and the extent of recombination between markers can be informative about past demographic events such as reductions in population size or selection on heritable traits (both events cause markers to be linked: two or more DNA variants on a chromosome become associated more often than expected by chance). Nuclear variation has been estimated using partial sequences of multi-copy ribosomal genes or spacers, common repeat sequences, microsatellites, and fingerprinting using electrophoresis of randomly amplified DNA (RAPD). While these approaches have been useful due to their low cost compared to whole-genome sequencing, ease of experimental

FIGURE 4 | Alternate possibilities for distribution of genetic variation in parasites within a hypothetical country. (A) Haplotype frequencies (each unique sequence has a different color) from three locations within one country where nematodes were collected, and from a second country (haplotypes in grey), indicate all three locations are genetically distinct. The resulting haplotype network indicates sequence divergence between haplotypes. Taken together, these results would support the conclusion that there are three transmission zones in which parasites persist. (B) Haplotype frequencies in three locations where nematodes were collected indicate that they are genetically similar. The resulting haplotype network supports the conclusion that there is one transmission zone, and that recrudescence is local. (C) Haplotype frequencies in three locations indicate that they are genetically similar. The haplotype network indicates that recrudescence may have come from a migrant source, because haplotypes are shared between countries (1 and 2).

procedures, and straightforward comparisons between species (e.g., Blouin et al., 1995), capturing only a small part of genome variation underestimates diversity and can result in estimates of genetic variation with low information content (e.g., Keddie et al., 1999; Higazi et al., 2004). Analyses that utilize wholegenome sequencing data, rather than a few selected loci, can also identify the confounding effects of selection on estimates of population size, structure, and migration (see, e.g., Weigand and Leese, 2018). Increasing the information available per individual using whole-genome sequencing of either individuals or pools of individuals, and increasing the information available per population by increasing the total number of individuals sequenced, has improved the sensitivity and power of analyses, and will improve our understanding of disease transmission.

### O. volvulus Population Genetic Structure

O. volvulus population genetic structure is determined by the geographic range and movement of its hosts and vectors. The major blackfly vectors in Africa belong to a species complex (S. damnosum s.l.) divided into a number of subspecies based on cytotaxonomy (Boakye, 1993; Boakye and Meredith, 1993; Krueger and Hennings, 2006), although these subspecies are not all distinguishable using DNA sequence data (Tang et al., 1995; Tang et al., 1996). Variation in vector species distribution could influence O. volvulus population structure, because different species may influence transmission dynamics differently. Specific vector–parasite associations have been postulated (Duke, 1981; Basáñez et al., 2009). Blackflies from forest areas transmit more infective larvae of O. volvulus than flies prevalent in savannah areas (Cheke and Garms, 2013), and onchocerciasis-associated ocular damage is more common in savannah than in forest areas (Dadzie et al., 1990).

Because of this higher prevalence of onchocercal blindness in savannah compared to forest communities, early population genetics research focused on methods for differentiating "blinding savannah" and "non-blinding forest" parasite "strains." Nuclear genome variation in the O-150 repeat family differentiated O. volvulus collected from forest and savannah communities in West Africa and Nigeria (Erttmann et al., 1987; Erttmann et al., 1990; Zimmerman et al., 1992; Zimmerman et al., 1994a; Ogunrinade et al., 1999), but not Uganda or Sudan (Fischer et al., 1996; Higazi et al., 2001). DNA sequences derived from a ribosomal internal transcribed spacer (ITS-2) could not distinguish forest and savannah O. volvulus from West Africa (Morales-Hojas et al., 2007). Analyses of nuclear genome sequences quantified admixture between parasites collected from different forest and savannah sites: parasites can have a mix of ecotype-associated markers (Choi et al., 2016), indicating that parasites from forest and savannah can and do interbreed and their respective transmission zones overlap.

Reconstruction of mitochondrial genomes of a small number of geographically diverse adult O. volvulus (Choi et al., 2016) has pointed to greater genetic diversity than previously reported (Zimmerman et al., 1994b; Keddie et al., 1999). Crawford et al. (2019) analyzed whole mitochondrial genomes of ~150 parasites from West Africa and demonstrated that O. volvulus populations are extremely genetically diverse and that the historical population size was likely very large. More importantly in the context of this review, a subset of mitochondrial markers could distinguish O. volvulus sampled in Ghana from those sampled in Mali or Côte d'Ivoire (via discriminant analysis of principal components; Figure 5A). Nuclear sequence data differentiated worms collected in Ghana from those collected in Cameroon, indicating that current, ongoing interbreeding between parasites from these two countries is unlikely. Therefore, they are from different transmission zones (Doyle et al., 2017). At an even larger geographic scale, admixture analysis of genetic sequence data demonstrated that American parasite populations are derived from African populations: competent vectors existed in areas where infected people from Africa were forcibly brought to work as slaves (Zimmerman et al., 1994b; Unnasch and Williams, 2000; Choi et al., 2016).

Analysis of population structure over shorter distances within a single ecotype has been conducted for parasites from different river basins. Nuclear genetic differentiation was clear between individual parasites from the Mbam and Nkam river basins in Cameroon, suggesting no (or limited) parasite transmission between them, and that, consequently, the Mbam and Nkam river basins belong to two different transmission zones (Doyle et al., 2017). In practical terms, this means that decisions to stop treatment in one basin may not need to take into consideration ongoing transmission in the other. In contrast, both nuclear (Doyle et al., 2017) and mitochondrial genomes showed little genetic differentiation among parasites from three river basins across an east-west transect of ~250 km in Ghana: the Daka river basin on the eastern side of Lake Volta, the Pru river basin on the western side, and the Black Volta/Tombe river basin to the north (Figure 5B; Crawford et al., 2019). These results are consistent with weak population structure between the three basins, and significant past or current gene flow between them. The epidemiological interpretation of this population structure is

that transmission has occurred and likely continues to occur amongst these three river basins (Crawford et al., 2019). The Black Volta/Tombe, Pru, and Daka river basins should thus be considered a single transmission zone: cessation of treatment at any one location would require interruption of transmission throughout the zone to reduce risk of recrudescence.

The analyses of mitochondrial and nuclear variation presented here suggest that O. volvulus populations—and thus transmission zones—can cover large regions and different river basins. It remains to be seen to what extent these genetic estimates of transmission zones can be correlated with windfacilitated vector movement, features of the local climate or landscape that impact fly prevalence because of breeding site availability, differences in transmission capacity of the local fly vector species, or factors impacting movement among regions by people such as low habitability, large distances, or seasonal migration of workers.

### W. bancrofti Population Genetic Structure

Within populations of W. bancrofti, genetic diversity (i.e., the number of polymorphic sites, regardless of genetic background) and haplotype diversity (i.e., the number of unique, contiguous sequences) appear to be comparable to O. volvulus (de Souza et al., 2014; Choi et al., 2016; Small et al., 2016; Doyle et al., 2017; Crawford et al., 2019; Small et al., 2019). Diversity may be higher where transmission rates are higher (Blouin et al., 1992; Blouin et al., 1995), because population sizes are expected to be larger where there is more active transmission, and larger population sizes result in greater retention of genetic diversity over time. Studies in Papua New Guinea (PNG; Small et al., 2013), Ghana (de Souza et al., 2014), and India (Thangadurai et al., 2006; Hoti et al., 2008b; Mahakalkar et al., 2017) have found high levels of genetic variation among and within W. bancrofti populations, and suggested higher similarities between geographically proximate villages. Consistent with expectations that higher transmission rates (and consequently, larger parasite population size) may increase genetic diversity, Hoti et al. (2008b) detected the highest genetic variability in densely populated, urban areas.

Genetic differentiation has been associated with geographic distance and other barriers to movement of people and vectors in Ghana (de Souza et al., 2014), India (Kumar et al., 2002; Thangadurai et al., 2006; Hoti et al., 2007; Patra et al., 2007), and Nepal (Adhikari et al., 2015). For example, genetic fingerprinting detected divergence among strains on either side of the Western Ghats mountain range in India, likely driven by the geographic barrier (Thangadurai et al., 2006). Genetic differentiation between parasites from different vectors was identified in the Andaman and Nicobar Islands (Dhamodharan et al., 2008), and a nocturnally sub-periodic strain from Thailand was genetically distinguishable from a nocturnally periodic Myanmar strain (Nuchprayoon et al., 2007). These genetic differences could be driven by geographic distance reducing gene flow or a consequence of divergent phenotypic adaptation to enhance transmission by different vectors. A significant challenge with these studies is that many of them used RAPD assays, yielding data which are difficult to compare between studies and have uncertain reproducibility (Penner et al., 1993; Davin-Regli et al., 1995; Pérez et al., 1998).

Genomic sequencing of W. bancrofti within populations has been limited to 20 individual worms from PNG (Small et al., 2016; Small et al., 2019) and 27 from three geographically widespread communities (Haiti, Mali, and Kenya; Small et al., 2019). Small et al. (2019) analyzed all 47 genomes and found genetic differentiation among all countries sampled, suggesting migration is low or non-existent at this spatial scale. Further analyses supported ancestral movement of parasites between Africa and Haiti, likely driven by the forced imprisonment and transport of people from Western and Central Africa to the Americas (Small et al., 2019). These efforts indicate sufficient genetic variation in W. bancrofti to detect gene flow. Advances in whole-genome sequencing of W. bancrofti, such as those made by Small et al. (2016; 2019), using either infective larvae dissected from laboratory-reared mosquitoes or using selective wholegenome amplification of single microfilaria isolated from blood samples, may increase the feasibility and accuracy of using molecular sequence data to identify and track transmission of LF. In particular, advances in laboratory methods that could enable sequencing from materials commonly collected during TASs (e.g., from rapid diagnostic tests: Rao et al., 2006; microfilariae slides: Bisht et al., 2006; and dried blood spots: Kumar et al., 2019), would in turn increase availability of sequence data in terms of the number of individual parasites per community and the number of communities, allowing accurate delineation of transmission zones and identification of immigrant parasites.

### PARASITE POPULATION GENOMICS FOR UNDERSTANDING GEOGRAPHIC AND TEMPORAL VARIABILITY IN DRUG RESPONSE

#### Conceptual Approach

One aim of population genomics is to identify genome variation associated with phenotypic variation in traits of interest. Because neither O. volvulus nor W. bancrofti can be cultured in the laboratory, this work requires correlations to be determined from worms collected from infected people and phenotyped for a trait (e.g., disease manifestation, variation in drug response) and then subsequently genotyped. For example, onchocerciasis disease severity and manifestations vary widely across Africa, with onchocerciasis-associated blindness more common in savannah regions (Remme et al., 1989; Dadzie et al., 1990), and severe onchodermatitis (Sowda, hyperreactive onchocerciasis) prevalent in Yemen and in one focus in Sudan (Mukhtar et al., 1998), but rare in other endemic areas (Al-Kubati et al., 2018). Similarly, the extent to which the level of skin microfilariae decreases after ivermectin treatment and the timing and extent of subsequent increases in skin microfilariae levels varies between individuals even in ivermectin-naive areas (Awadzi et al., 2014; Opoku et al., 2018).

The challenge is determining whether there is a correlation between a well-identified phenotype (e.g., early resumption of female worms' fertility after drug treatment) and a particular genotype (Figure 6). Identifying parasite population structure is a critical first step: it allows identification of genetic similarities and differences that arise because of degrees of relatedness between worms (interbreeding) and not because of genetic association with the parasites' phenotype. Such genetic variants should be excluded from analyses of correlations between phenotypic and genotypic variation. Equally important, and perhaps more challenging, is that the phenotype needs to be well defined. For example, when comparing differences in pathology, manifestations might depend on whether the person was infected as a child or as an adult, on variation in the longevity of microfilariae or adult worms in different geographic regions (e.g., forest and savannah worms; see Duke, 1993), or on the genetic background of the human host driving specific immunologic reactions (e.g., Sowda; see Hoerauf et al., 2002).

#### Defining Phenotypic Variation in Drug Response

The potential for the evolution of resistance to either macrocyclic lactones (such as ivermectin) or benzimidazoles (such as albendazole) in filarial nematodes infecting humans has been a concern from the start of MDA programs (e.g., WHO Expert Committee on Onchocerciasis Control 1993 and WHO, 1995), because of demonstrated evolution of resistance in parasites of domestic animals (Kaplan, 2004; Wolstenholme et al., 2004; Wolstenholme et al., 2015). However, identifying drug resistance in human-infecting filarial parasites that cannot be cultured in the laboratory is challenging, and quantifying genetic association with a phenotype is completely dependent on identifying the correct phenotype.

In O. volvulus, the phenomenon of "suboptimal response" to ivermectin (SOR) was identified during research into reasons for higher-than-expected prevalence of infection and skin microfilariae levels in areas under long-term MDAi in Ghana. SOR is defined as the resumption of microfilariae production by adult female worms following ivermectin treatment earlier than considered typical, resulting in detectable levels of skin microfilariae 80–90 days post-treatment and the presence of viable embryos and microfilariae in the uteri of female worms obtained through nodulectomy (Awadzi et al., 2004a; Awadzi et al., 2004b). Since reasons such as low ivermectin blood concentrations were excluded, Awadzi et al. attributed SOR to low susceptibility to the so-called "embryostatic" effect of ivermectin, resulting in developmental stages of microfilariae in the uteri 90 days after treatment (see also Grant, 1994). The progeny of these worms would be available for transmission earlier and for longer between rounds of MDAi than progeny of susceptible worms releasing microfilariae later (Figure 6). If low susceptibility to the "embryostatic" effect is heritable, the prevalence of SOR parasites may increase over time. While increasing prevalence of SOR worms is not expected to impact effectiveness of MDAi for control of onchocerciasis as a public health problem (because SOR does not change post-treatment disappearance of microfilariae from the skin—the "microfilaricidal effect"), it would negatively impact elimination, which depends on progressively lower skin microfilariae available for transmission. Since the initial description of SOR, further investigations of SOR in Ghanaian and Cameroonian areas with different ivermectin treatment histories (e.g., Osei-Atweneboana et al., 2007; Osei-Atweneboana et al., 2011; Pion et al., 2013; Nana-Djeunga et al., 2014) have shown that SOR parasites are detectable even

where MDAi occurs biannually (Frempong et al., 2016). Modeling showed large inter-individual variability in skin microfilariae repopulation rates among individuals from Ghana, Liberia, Sierra Leone, and Guatemala treated one to five times with ivermectin (Churcher et al., 2009). None of the individuals from Ghanaian communities who received 10–19 rounds of ivermectin treatment (Osei-Atweneboana et al., 2007) had a higher repopulation rate higher than the highest observed in the control data set (skin microfilariae repopulation rate after the first ivermectin treatment among individuals from Ghana, Liberia, Sierra Leone, and Guatemala), but analysis by village showed significantly higher repopulation rate for some villages and for one previously untreated village in the same area (Churcher et al., 2009) This supports the conclusion that SOR parasites are present in ivermectin-naive communities and may increase via selection by ivermectin treatment. Furthermore, variability of response to ivermectin measured by skin microfilariae levels over time after a single dose of ivermectin was found among individuals from ivermectin-naive areas in Ghana, Liberia, and the Democratic Republic of Congo: the skin microfilariae kinetics in some of these individuals met the SOR criteria used in other studies (Awadzi et al., 2004a; Awadzi et al., 2004b; Opoku et al., 2018). It is unknown whether these ivermectin-naive areas belong to the same transmission zones as other nearby areas with ongoing MDAi. Together, these data support the conclusion that O. volvulus populations consist of worms with variable susceptibility to the embryostatic effect of ivermectin. Based on these data, the phenotype for genomic analysis is the reproductive status of individual female worms collected from people 80–90 days post-ivermectin treatment: female worms that respond well to ivermectin should not have post-fertilization embryonic stages, while those considered SOR have early resumption of fecundity, and thus already have microfilariae in uteri (Osei-Atweneboana et al., 2011; Nana-Djeunga et al., 2014; Doyle et al., 2017).

In W. bancrofti, reasons underlying continued persistence of transmission are challenging to determine. So far persistence has not yet been associated with heritable variability in parasite drug response. Density-dependent effects on vector response to infection and vector competence play a role in continuous, low-level transmission, depending on the vector species: iL3 development is proportionally higher at lower microfilariae densities in some vector species, while in other vectors, fewer iL3 develop when many microfilariae are ingested (Pichon, 2002; reviewed in Graves et al., 2016). These differences change transmission dynamics during MDA in ways highly dependent on specific vector species that may favor greater vector efficiency when microfilaremia is low.

#### Defining Correlations Between Phenotypic and Genotypic Variation for Drug Response

Several point mutations in the target of albendazole, b-tubulin, lead to drug resistance in nematodes (Kwa et al., 1994; Grant and Mascord, 1996; Silvestre and Cabaret, 2002; Silvestre and Humbert, 2002), and at least one of these has been identified in W. bancrofti (Schwab et al., 2005). Increases in alleles associated with benzimidazole resistance have been detected after MDA (Schwab et al., 2005), and modeling suggests that further MDA treatments would lead to selection via an increase in the frequency of these alleles (Schwab et al., 2006; Schwab et al., 2007). However, analysis of natural variation in response to benzimidazole anthelmintics in the free-living nematode Caenorhabditis elegans has shown that many other loci contribute to and modify variation at the b-tubulin locus in this species (Hahnel et al., 2018), and it is clear from analysis on benzimidazole resistance in veterinary helminths that b-tubulin– mediated resistance is also subject to modification by as yet unidentified loci (Doyle and Cotton, 2019).

With respect to DEC, this drug is not 100% effective against LF (Eberhard et al., 1988; Eberhard et al., 1991), suggesting that variation in DEC susceptibility exists and could lead to the development and spread of drug resistance if poor susceptibility is both heritable and mating and recombination allow for assortment among alleles (Grant, 1994; McCarthy, 2005; Cobo, 2016).

The view that has emerged from single-candidate gene association studies in helminths is that response to macrocyclic lactones such as ivermectin (a glutamate-gated chloride channel agonist) is a complex and highly variable phenotype, affected by multiple genes across the genome (e.g., Dent et al., 2000; Ardelli et al., 2009; Yan et al., 2012; Ardelli and Prichard, 2013; Menez et al., 2016), with poor repeatability between studies, strains, and species. A single-candidate gene approach has been applied to ivermectin SOR in O. volvulus by sequencing genes that might contribute to ivermectin resistance in other nematodes, and comparing differences between communities that have never received ivermectin and those that have, or between microfilarial pools collected before and after treatment (Ardelli and Prichard, 2004; Ardelli et al., 2005; Eng and Prichard, 2005; Eng et al., 2006; Ardelli and Prichard, 2007; Osei-Atweneboana et al., 2012). If a gene contributes to ivermectin response, the expectation is that alleles that decrease sensitivity to ivermectin would be more common in worms from areas that have had many rounds of MDAi than in worms from ivermectin-naive areas, or that they would be more common in microfilariae collected shortly after ivermectin treatment than before treatment, or would be more common in adult females that have resumed reproduction within 80–90 days of treatment (the SOR phenotype). Genes identified using this approach include btubulin (Eng and Prichard, 2005; Bourguinat et al., 2006; Eng et al., 2006; Bourguinat et al., 2007; Osei-Atweneboana et al., 2012) and members of the ATP-binding cassette transporter family (Ardelli and Prichard, 2004; Ardelli et al., 2005; Eng and Prichard, 2005; Ardelli et al., 2006a; Ardelli et al., 2006b; Ardelli and Prichard, 2007).

However, this approach faces one of the biggest challenges in detecting selection: the frequency of an allele can increase or decrease due to random chance rather than selection. In particular, if a population decreases in size, as might be expected as transmission decreases during successive rounds of MDA, the chance that changes in allele frequencies are stochastic rather than deterministic increases: the genetic diversity overall decreases (e.g., Schistosoma mansoni: Coeli et al., 2013), but the fate of individual alleles is unpredictable, and some may increase due to genetic drift caused by population decline. Another challenge is that, prior to treatment, parasite populations can be quite large (i.e., they have a high effective population size; Blouin et al., 1995; Blouin et al., 1999; Zhou et al., 2013; Rodelsperger et al., 2014; Jan et al., 2016), which means they have ample genetic variation, so genetic variation contributing to a poor response phenotype may be present at different frequencies and at different loci in different parasite populations (e.g., Haemonchus contortus, Gilleard and Redman, 2016). Furthermore, once MDA begins, mating and recombination could create new combinations of alleles from different genes, which may also decrease response to ivermectin. In other words, good responders may be carriers of alleles that, in the right genomic background, could contribute to poor response in drug naive populations, and vice versa. Selection then acts on "standing" genetic variation that was present in the population before the selection pressure began. This mechanism for selection has been called a "soft sweep" (e.g., Hermisson and Pennings, 2005; Przeworski et al., 2005; Pritchard and Di Rienzo, 2010; Ferrer-Admetlla et al., 2014; see also Doyle et al., 2017) and is notoriously difficult to detect because (1) each variant alone could have a weak effect on phenotype, and thus difficult to distinguish statistically from background noise (random variation), and (2) many techniques for detecting selection rely on correlated, linked variants being physically near to the causative mutation, but in a soft sweep, this linkage is often weak (Pennings and Hermisson, 2006). The technical and conceptual weaknesses of candidate gene approaches specifically in the context of anthelmintic response was reviewed recently (Doyle and Cotton, 2019).

Genome-wide approaches are, therefore, required. Doyle et al. (2017) examined female O. volvulus macrofilariae from Ghana and Cameroon that were "good responders" and "suboptimal responders" to the embryostatic effect of ivermectin, determined by the absence or presence of stretched microfilariae in the uteri. Females were pooled by response phenotype, sequenced, and genetic variants correlated with phenotype identified. Genetic variants clustered in 31 quantitative trait loci (QTLs) with genes for molecular pathways involved in neurotransmission, development, and stress response. None of these QTLs contained genes previously proposed by single candidate association studies. In addition, some loci associated with SOR to ivermectin in worms from Ghana were different from loci associated with SOR in worms from Cameroon. They concluded that ivermectin response is a polygenic, quantitative trait, subject to soft sweeps of pre-existing QTLs, rather "hard selection" on rare, resistance-conferring mutations. Since populations possess different standing variation, selection may act on different loci in different populations (Doyle et al., 2017). Genome-wide studies of veterinary nematodes generally support the "soft sweep QTL" model for response to macrocyclic lactones (Bourguinat et al., 2015; Redman et al., 2015; Choi et al., 2017; Doyle et al., 2019), and likewise have not identified any of the candidate genes described above.

This "soft sweep QTL" model that has emerged from genomewide association studies of ivermectin response leads to the following predictions for control and elimination of human filariases:


### DEVELOPMENT OF POPULATION GENOMICS–BASED TOOLS TO INFORM ELIMINATION PROGRAM DECISIONS

A wide range of DNA-based molecular tools have been developed for detecting the presence of parasites in vectors and the human host using skin, blood, plasma, and even urine (e.g., Zimmerman et al., 1993; Rao et al., 2006; Rodriguez-Pérez et al., 2006; Hoti et al., 2008a; Gass et al., 2012; Alhassan et al., 2015; Owusu et al., 2015; Lau et al., 2016). The addition of information on transmission gained from genomic analyses of parasites/ vectors to data on prevalence and intensity could significantly improve the basis for elimination program decisions on whether to stop or continue interventions, determine causes for recrudescence post-intervention and thus appropriate subsequent interventions, and determine optimal postintervention surveillance frequencies commensurate with the risk of resurgence based on a cost–risk evaluation.

Methods to obtain and analyze genome sequences require training and equipment not available to many laboratories in endemic countries. However, once the required basic genome research has been done, fast and relatively inexpensive methods can be developed suitable for use in at least one laboratory in an endemic country, possibly supported by one or more "reference laboratories."

#### Tools for Defining Transmission Zones and Identification of Migrants

The "genetic signatures" of an interbreeding parasite population constitute markers that allow delineation of transmission zones as "the natural ecological and epidemiological unit for interventions" for onchocerciasis or the implementation and evaluation units for LF. In conjunction with transmission models that allow modeling the impact of immigrant parasites on long-term infection prevalence, the size of these zones can inform cost–risk analysis for stop-treatment decisions as well as when and where to conduct entomological evaluations or TAS, dependent on the risk of resurgence that programs are willing to accept.

The advantage of DNA-based transmission zone markers is that they can be designed to be highly specific. The first step is identifying variant sites informative for population structure; i.e., are polymorphic and more common or fixed (and thus have different frequency-based likelihoods) in genomes of parasites from one area compared to another. For example, Crawford et al. (2019) identified mitochondrial polymorphic sites that could differentiate O. volvulus collected in Ghana from countries farther west with high statistical likelihood. Once such variant sites have been identified for specific geographic areas, parasite genomes from other areas need to be analyzed to determine whether the same variants are informative, or whether additional sites are necessary to infer the extent of interbreeding. Once analysis is completed for parasites from across their distribution, a minimum set of polymorphic sites can be defined to predict the transmission zone of origin for any given worm.

This process may sound daunting, but methodology developed over the past years is efficient and amenable to further optimization. Skin snips or infected/infective Simulium or O. volvulus adults from many areas in Africa are available in the ESPEN laboratory in Ouagadougou, national laboratories, and research centers. LF parasites, primarily on slides of blood films and in dried blood spots, are available from endemic areas in national laboratories and research centers.

Developing inexpensive methodologies for routine use in each endemic country is the next challenge. High resolution melt (HRM) analysis has been proposed as a species identification tool for Onchocerca spp. collected from blackflies (Doyle et al., 2016) and could potentially be extended as a genotyping tool. The advantage to using vectors is that, in the case of Simulium spp., vector collection is already an integral part of the process for stop-treatment decisions, invasive sampling of people would not be required, and (if sampling strategies are well designed) vectors would include a representative sample of parasites from many different human hosts.

### Tools for Monitoring Drug Susceptibility

Ideally, control/elimination programs would have reference data on pre-MDA/early-MDA variability of parasite drug response and an easy-to-use, cost-effective, and non-invasive tool for monitoring changes in response phenotype frequency. Detecting changes while the frequency of poor responder alleles is low would allow alternative control/elimination strategies to be put in place (Grant, 2000) and enable evidencebased decisions on monitoring strategies. Some moves toward these ends have been initiated (e.g., Ardelli and Prichard, 2004; Eng and Prichard, 2005; Bourguinat et al., 2006; Bourguinat et al., 2008; Hoti et al., 2009) but have yet to result in practical application. Research for methods development for monitoring potential emergence of resistance to ivermectin in O. volvulus were initiated by the UNDP/World Bank/WHO Special Programme for Research and Training in Tropical Diseases (TDR) and OCP in the early 1990s (WHO Expert Committee of Onchocerciasis Control (1993) and WHO, 1995; UNDP/ World Bank/WHO Special Programme for Research & Training in Tropical Diseases, 1997). TDR also initiated characterization of parasite drug response variability before or early during MDA (Walker et al., 2016; Halder et al., 2017), research which continues with Wellcome Trust funding leveraged with this initial TDR investment. Alternative strategies for onchocerciasis elimination include increased MDAi frequency, complementary vector control, test-and-treat strategies, and new drugs (African Programme for Onchocerciasis Control, 2015; Boussinesq et al., 2018; Verver et al., 2018; Horstick and Runge-Ranzinger, 2018).

Currently, screening communities for O. volvulus with low ivermectin susceptibility requires either collection of skin snips before and after ivermectin treatment to determine skin microfilariae levels, or post-treatment nodulectomies to determine the reproductive status of individual female worms. Genetic analysis of microfilariae collected via skin snips at a single post-treatment time point, or larvae in vectors, could significantly improve monitoring capacity. While the molecular mechanisms that contribute to variation in drug response are of significant interest, identifying anonymous genetic variants associated with drug response phenotype is sufficient for development of tools for monitoring, even while research into the mechanisms is ongoing.

Genetic variation associated with ivermectin response in O. volvulus has been identified in parasites from Ghana and Cameroon (Doyle et al., 2017). These results have not yet been validated with

single-worm genomic analysis, and whether the responseassociated genetic variants identified are shared across populations from other areas in Africa is untested. Compared to development of transmission zone markers, development of ivermectin response markers is much more challenging, since large numbers of parasites with known response phenotype from different geographic areas are needed to identify, validate, and test the predictive value of genetic variants associated with response phenotype.

Identification of a panel of relatively few variants (e.g., < 50) would pave the way for developing a simple PCR-based or loopmediated isothermal amplification (LAMP) surveillance toolfor use with parasites in vectors or skin snips by endemic country laboratories. This core set of variants would define the QTLs that are both necessary and sufficient for SOR to develop in O. volvulus and may, therefore, shed light on the as yet unexplained mechanism by which ivermectin exerts its embryostatic effect.

MDA for W. bancrofti uses albendazole, alone or in combination with DEC and/or ivermectin. A PCR-based assay for one single nucleotide polymorphism that leads to resistance to benzimidazoles has been developed, so that monitoring for allele changes in treated communities can occur (Hoti et al., 2009). However, while variation in response to drug therapies for LF has been reported (e.g., Eberhard et al., 1988; Eberhard et al., 1991), to date a clear definition of an LF SOR has not been reported (Irvine et al., 2017). A clear, and preferably quantitative, definition of poor response is an essential prerequisite for development of response genetic markers in LF.

#### Models Incorporating Population Genetics to Estimate Risk of Recrudescence From Migration or Poor Drug Response

There is a long history of the use of onchocerciasis and LF transmission models such as EPIONCHO and ONCHOSIM (reviewed in Stolk et al., 2015; Basáñez et al., 2016) and EPIFIL, LYMFASIM, and TRANSFIL (reviewed in Smith et al., 2017) to inform control program decisions. These models differ in whether they are deterministic or stochastic and whether they are population- or individual-based, and can be parameterized to match the characteristics of a particular community, including endemicity, variation in biting rates, or treatment coverage, such that effects of different treatment strategies on transmission can be explored. However, these programs currently cannot model movement of parasites from one geographic area to another (via people or vectors), spatial heterogeneity of endemicity or elimination program implementation, or transmission of parasites with variable drug susceptibility. Thus, although these models have been useful in estimating declines of parasite infection prevalence and intensity over a control campaign, they do not realistically model variation in parasite drug susceptibility, human/ vector movement among areas, or long-term population processes that could impede reaching elimination thresholds or contribute to medium- to long-term recrudescence after MDA discontinuation.

To address these limitations, TDR funded extension of transmission models for preventive chemotherapy-controlled diseases. Coffeng et al. (2017) incorporated evolution of changes in drug susceptibility in O. volvulus (and other parasites) into a stochastic, individual-based model, and McCulloch et al. (2017) adapted an infection-intensity model, EPIONCHO, into a patch framework for onchocerciasis, incorporating differences in endemicity and treatment between two adjacent geographic areas, or "patches," and people and vector migration between patches. When simulatingmigration of seasonalworkersfrom a patchwhere transmission is ongoing to a patch where MDAi had interrupted transmission, modest levels of migration (10% of the adult population move for between 1 and 12 months) were sufficient to re-establish transmission with a lag time of 5–10 years, and pre-MDA prevalence was reached within 30–50 years in the absence of renewed MDA.

Data from Burkina Faso showed a significant delay between cessation of interventions and recrudescence (Koala et al., 2017; Koala et al., 2019), suggesting that the predicted time frames for recrudescence modeled by McCulloch et al. (2017) are realistic. Such delays are not unexpected considering that parasite invasion may initiate transmission at a low level, and transmission takes time to become detectable. A model incorporating population genetic structure data, allowing simulation of vector and human movement–mediated parasite transmission between different geographic areas, could provide estimates of the risk and time frame of resurgence in one area where treatment cessation is planned while transmission is above stop-treatment thresholds elsewhere. National programs could obtain a measure of risk for stopping interventions given migration and an estimated time frame for resurgence, decide whether to intensify interventions in areas that are potential source areas of invasion, and plan cost-effective, post-MDA surveillance strategies.

A model parameterized for a specific region can be informative for elimination programs by exploring how variation in treatment coverage, changes in human demographic and travel patterns, vector control and migration patterns, or other factors will affect progress toward and postintervention sustainability of elimination within different areas in a region. Such a spatially explicit, agent-based modeling framework was developed to explore LF transmission in the Samoan Islands (GEOFIL; Xu et al., 2018a). This type of model incorporates detailed information about individuals within the community and their connectivity, and simulates changes in characteristics such as age, physical location, and infection status. GEOFIL predicts spatial variation in LF infection prevalence across American Samoa over space and time. The model can contribute toward understanding transmission dynamics in areas of high infection prevalence (hot spots) which can help determine effective surveillance strategies for identifying hot spots, and how best to manage them. Incorporation of information from genomic analyses could enhance the utility of such models for elimination decisions. An agent-based framework does require significant data on individual people, information not available for many (or most) communities, and it will be valuable to see how lessons from a well-characterized system can be applied more broadly to areas where LF has reemerged post-MDA.

#### CONCLUSIONS AND RECOMMENDATIONS

Population genomic analyses of O. volvulus and W. bancrofti show high levels of genetic variation consistent with historically large population sizes (> 10<sup>5</sup> ). Genetic variation is spatially structured; parasite populations are genetically distinct, indicating that they are not interbreeding. For O. volvulus, genomic data from West Africa indicate that the geographic areas across which parasites interbreed—transmission zones can cover hundreds of kilometers and extend beyond the boundaries of a "river basin, or a major section of a river basin" (African Programme for Onchocerciasis Control/World Health Organization, 2010). This large-scale structure is reflected in the pattern of onchocerciasis endemicity in Africa, with a mosaic of hypo-, meso-, and hyperendemic areas (Figure 3). For W. bancrofti, transmission zones are larger than expected based on the focal nature of many mosquito vector populations, consistent with the increasingly recognized role that movement of infected people plays. In epidemiological terms, genetic data indicate that significant transmission must have occurred (and likely to still be occurring) over tens to hundreds of kilometers.

We propose that long-range transmission via infected people and/or infective vectors has epidemiological significance for elimination programs that it does not have for control programs. The latter plan to continue treatment indefinitely, and consequently, the effect of occasional or continual long-range transmission, is suppressed by continuing MDA. In contrast, elimination programs stop treatment once relevant criteria have been met within a defined area, assuming transmission has been interrupted permanently. Without MDA, occasional or low but continual parasite transmission from outside an intervention area may result in recrudescence. Population genetic analyses can predict whether the potential for gene flow is epidemiologically significant. Consequently, elimination programs need to define the boundaries of the entire extended geographic area over which parasite transmission occurs and continue treatment until transmission is interrupted over the entire area. From an operational perspective, population genomic statistics do not differentiate between long-range transmission due to movement of vectors and that due to movement of people, and are thus informative independent of the cause of parasite migration.

What research and development are required to allow elimination programs to take advantage of population genomics? One clear need is gathering more (and better quality) sequence data across a greater geographical range, based on improved methods

#### REFERENCES


which allow sequencing from parasite material already available or routinely collected (blackflies or skin snips for onchocerciasis, and mosquitoes or blood samples for LF). Testing and development of analytical methods for quantifying parasite migration based on genetic, entomological, and epidemiological data, and their optimal application to transmission zone delineation, is critical. Tools not requiring specialized laboratory equipment must be developed for use by national programs to include in MDA and stop-MDA evaluations. These tools would allow investigation of the source of infections/transmission identified during post-treatment surveillance, facilitate monitoring for changes in the frequency of parasites with SOR to MDA, and promote planning cost-effective solutions. Comparable population genomic data for vectors would be a useful adjunct to differentiate between vector and human movement as sources of long-range transmission to inform appropriate interventions. Current transmission models need to be enhanced to allow modeling of areas with different endemicity, treatment history, progress toward elimination, heritable drug resistance, and parasite gene flow, to better estimate the risk and time frame of recrudescence when treatment is to be stopped in one area while transmission is continuing in another. Finally, the research and development outputs can improve procedures and criteria for cessation of treatment, including estimating risks of recrudescence to inform cost–risk assessment-based stoptreatment decisions and appropriate post-treatment monitoring strategies.

#### AUTHOR CONTRIBUTIONS

SH, KC, AK, PG, and WG wrote the first draft. SH, AK, KC, PG, MB, CL, DB, and WG critically reviewed, edited, and approved the final version of the manuscript. The authors alone are responsible for the views expressed which do not necessarily represent the views, decisions, or policies of the institutions with which the authors are affiliated.

#### FUNDING

TDR, the Unicef/UNDP/World Bank/WHO Special Programme for Research and Training in Tropical Diseases provided the funds for open access of this review and support for SMH (B80149 and B80153). KEC was supported by an Australian Government Research Training Program (RTP) Scholarship.


despite more than 15 years of mass treatment. Parasit. Vectors 9 (1), 581. doi: 10.1186/s13071-016-1868-8


Conflict of Interest: AK works for TDR, the Unicef/UNDP/World Bank/WHO Special Programme for Research and Training in Tropical Diseases.

The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The handling editor declared a past co-authorship with one of the authors WG.

Copyright © 2020 World Health Organization; Licensee Frontiers Media SA. This is an Open Access article published under the CC BY 3.0 IGO license which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. In any use of this article, there should be no suggestion that WHO endorses any specific organization, products, or services. The use of the WHO logo is not permitted. This notice should be preserved along with the article's original URL.

digital media

of impactful research

article's readership