Full Genome Nobecovirus Sequences From Malagasy Fruit Bats Define a Unique Evolutionary History for This Coronavirus Clade

Bats are natural reservoirs for both Alpha- and Betacoronaviruses and the hypothesized original hosts of five of seven known zoonotic coronaviruses. To date, the vast majority of bat coronavirus research has been concentrated in Asia, though coronaviruses are globally distributed; indeed, SARS-CoV and SARS-CoV-2-related Betacoronaviruses in the subgenus Sarbecovirus have been identified circulating in Rhinolophid bats in both Africa and Europe, despite the relative dearth of surveillance in these regions. As part of a long-term study examining the dynamics of potentially zoonotic viruses in three species of endemic Madagascar fruit bat (Pteropus rufus, Eidolon dupreanum, Rousettus madagascariensis), we carried out metagenomic Next Generation Sequencing (mNGS) on urine, throat, and fecal samples obtained from wild-caught individuals. We report detection of RNA derived from Betacoronavirus subgenus Nobecovirus in fecal samples from all three species and describe full genome sequences of novel Nobecoviruses in P. rufus and R. madagascariensis. Phylogenetic analysis indicates the existence of five distinct Nobecovirus clades, one of which is defined by the highly divergent ancestral sequence reported here from P. rufus bats. Madagascar Nobecoviruses derived from P. rufus and R. madagascariensis demonstrate, respectively, Asian and African phylogeographic origins, mirroring those of their fruit bat hosts. Bootscan recombination analysis indicates significant selection has taken place in the spike, nucleocapsid, and NS7 accessory protein regions of the genome for viruses derived from both bat hosts. Madagascar offers a unique phylogeographic nexus of bats and viruses with both Asian and African phylogeographic origins, providing opportunities for unprecedented mixing of viral groups and, potentially, recombination. As fruit bats are handled and consumed widely across Madagascar for subsistence, understanding the landscape of potentially zoonotic coronavirus circulation is essential for mitigation of future zoonotic threats.

Bats are natural reservoirs for both Alpha-and Betacoronaviruses and the hypothesized original hosts of five of seven known zoonotic coronaviruses. To date, the vast majority of bat coronavirus research has been concentrated in Asia, though coronaviruses are globally distributed; indeed, SARS-CoV and SARS-CoV-2-related Betacoronaviruses in the subgenus Sarbecovirus have been identified circulating in Rhinolophid bats in both Africa and Europe, despite the relative dearth of surveillance in these regions. As part of a long-term study examining the dynamics of potentially zoonotic viruses in three species of endemic Madagascar fruit bat (Pteropus rufus, Eidolon dupreanum, Rousettus madagascariensis), we carried out metagenomic Next Generation Sequencing (mNGS) on urine, throat, and fecal samples obtained from wild-caught individuals. We report detection of RNA derived from Betacoronavirus subgenus Nobecovirus in fecal samples from all three species and describe full genome sequences of novel Nobecoviruses in P. rufus and R. madagascariensis. Phylogenetic analysis indicates the existence of five distinct Nobecovirus clades, one of which is defined by the highly divergent ancestral sequence reported here from P. rufus bats. Madagascar Nobecoviruses derived from P. rufus and R. madagascariensis demonstrate, respectively, Asian and African phylogeographic origins, mirroring those of their fruit bat hosts. Bootscan recombination analysis indicates significant selection has taken place in the spike, nucleocapsid, and NS7 accessory protein regions of the genome for viruses derived from both bat hosts. Madagascar offers a unique phylogeographic nexus of bats and viruses with both Asian and African phylogeographic origins, providing opportunities for unprecedented mixing of viral groups and, potentially, recombination. As fruit bats are handled and consumed widely across Madagascar for subsistence, understanding the landscape of potentially zoonotic coronavirus circulation is essential for mitigation of future zoonotic threats.
The family Coronaviridae is considered one of the most likely viral taxa to switch host species (34,35) in part because many CoVs utilize well-conserved cell surface receptors presented on a wide variety of mammalian host cells. The zoonotic Sarbecoviruses, SARS-CoV and SARS-CoV-2, for example, use the human cell surface receptor Angiotensinconverting enzyme 2 (ACE2) to gain entry into human cells (36,37), while many Merbecoviruses interact with the wellconserved vertebrate host cell receptor dipeptidyl peptidase 4 (DPP4) (38). MERS-CoV has an evolutionary history with bat origins, and closely-related viruses have been previously identified in bats from the families Vespertillionidae, Nycteridae, Emballonuridae and Molossidae throughout Africa, Asia, and Europe. More recently, human cases of MERS-CoV have arisen in the human and camel populations through recent host switching events (39)(40)(41)(42). As only a fraction of the Alpha-and Betacoronavirus diversity projected to circulate in wild bat hosts has been already described (41), it is possible that many CoVs capable of zoonotic emergence remain uncharacterized. Because CoVs are known to recombine with other CoVs, or more rarely, with other viral groups (43)(44)(45)(46)(47)(48), there is additional concern that naturally-circulating CoVs presently unable to infect humans may acquire this ability in the future. Several factors, which have been reviewed at length elsewhere (4,34,49), contribute to the propensity for CoV recombination, including the large CoV genome size supported by a unique proofreading mechanism in the RNAdependent RNA polymerase (RdRp) (50)(51)(52)(53), as well as a 'copy choice' template switching mechanism of RNA replication whereby RdRp physically detaches from one RNA template during replication and reattaches to an adjacent template, thus facilitating recombination in cases where multiple viruses may be coinfecting the same cell (54).
Madagascar is an island country in southeastern Sub-Saharan Africa, located in the Indian Ocean, ∼400 km off the coast from Mozambique. Madagascar has been isolated from the African continent for over 170 million years and all surrounding landmasses for over 80 million years, allowing for the evolution of a unique and highly endemic floral and faunal assemblage (55). The country is home to 51 species of bat (56), two-thirds of which are endemic and boast long evolutionary divergence times with sister species on both the African and Asian continents (57)(58)(59). A growing body of work has characterized the landscape of potentially zoonotic viruses in Madagascar bats, identifying evidence of circulating infection (through RNA detection or serology) with henipaviruses, filoviruses, lyssaviruses, and coronaviruses (12,33,60,61). Previously, coronavirus surveillance efforts have identified Alphacoronavirus RNA in the Malagasy insectivorous bat, Mormopterus jugularis, and Betacoronavirus RNA in the subgenus Nobecovirus (12,33) in all three endemic Malagasy fruit bat species: Pteropus rufus, Eidolon dupreanum, and Rousettus madagascariensis (12,33). In addition to Madagascar, Nobecoviruses have been previously characterized from Pteropodidae fruit bats across Asia and in both East and West Africa (27,30,(62)(63)(64)(65). Though Nobecoviruses are not known to be zoonotic, previous research has described widespread circulation throughout Asia of a recombinant Nobecovirus which carries an orthoreovirus p10 gene insertion (27,65,66), highlighting the capacity for this viral subgenus to undertake rapid shifts in genomic organization which could lead to expanded host range. As both E. dupreanum and R. madagascariensis are known to co-roost with each other, and with several species of insectivorous bat (67), CoV recombination is a distinct concern in the Madagascar system. Though no Rhinolophus spp. bats, the typical host for ACE2using Sarbecoviruses, inhabit Madagascar, the island is home to four species of Hipposiderid bat (56), which host the Sarbecovirus-adjacent and understudied Hibecoviruses, as well as several species of Vespertilionid bat, the most common hosts for the zoonotic Merbecoviruses.
Human-bat contact rates are high in some regions of Madagascar-including the region in which we conducted our field research-where bats are consumed for subsistence and frequently roost in close proximity to human settlements or natural tourist attractions (68)(69)(70)(71)(72). Moreover, in addition to the natural CoV diversity described in Malagasy bats, several human coronaviruses are known to circulate widely in the human population in Madagascar, including the common coldcausing Embecoviruses, HCoV-OC43 and HCoV-HKU1, and, more recently, the zoonotic Sarbecovirus, SARS-CoV-2 (73)(74)(75). As spillback of SARS-CoV-2 into wildlife hosts and possible recombination with wildlife viruses remains a global concern (16), characterization of the genetic diversity of bat-borne coronaviruses in Madagascar and elsewhere in Africa is a critical public health priority. Here, we contribute and characterize three full genome sequences of two novel Nobecoviruses, derived from R. madagascariensis and P. rufus hosts. We define five distinct Nobecovirus clades in global circulation across Asia and Africa and assess these new Nobecoviruses for their past and future capacity for recombination.

Bat Sampling
As part of a long-term study characterizing the seasonal dynamics of potentially zoonotic viruses in wild fruit bats in Madagascar, monthly captures of Malagasy pteropodid bats were carried out at species-specific roost sites in the Districts of Moramanga 48.4525 E). In brief, bats were captured in nets hung in the tree canopy (P. rufus) or over cave mouths (E. dupreanum, R. madagascariensis) at dusk (17:00-22:00) and dawn (03:00-07:00), removed from nets, and processed under manual restraint following methods that have been previously described (61,76,77). All animals were identified to species, sex, and age class (juvenile vs. adult), and fecal, throat, and urine swabs were taken from each individual, collected into viral transport medium, and frozen on site in liquid nitrogen. Post-sampling, swabs were transported to −80 • C freezers for long-term storage in the Virology Unit at Institut Pasteur de Madagascar. In total, 2156 bats (P. rufus: 167 females, 184 males; E. dupreanum: 495 females, 421 males; R. madagascariensis: 416 females, 473 males) were captured across the course of our long-term field study, though only fecal samples from a subset of individuals (see 'Results') were analyzed in part with the coronavirus research outlined here. Of those captures, 84 P. rufus were juveniles and 267 adults, 108 E. dupreanum were juveniles and 810 adults, and 126 R. madagascariensis were juveniles and 767 adults. Additional details relevant to our time series are included in the results.
This study was carried out in strict accordance with research permits obtained from the Madagascar Ministry of Forest and the Environment (permit numbers 019/18, 170/18, 007/19) and under guidelines posted by the American Veterinary Medical Association. All field protocols employed were pre-approved by the UC Berkeley Animal Care and Use Committee (ACUC Protocol # AUP-2017-10-10393), and every effort was made to minimize discomfort to animals.

RNA Extraction
RNA was extracted from a randomly selected subset of fecal, throat, and urine swab samples in the Virology Unit at the Institut Pasteur de Madagascar, with each sample corresponding to a unique individual from the field dataset. Samples undergoing mNGS corresponded to individuals captured in Feb-Apr, Jul-Sep and December 2018 or in January 2019. Water controls were extracted in conjunction with samples on each unique extraction day. Extractions were conducted using the Zymo Quick DNA/RNA Microprep Plus kit (Zymo Research, Irvine, CA, USA), according to the manufacturer's instructions and including the step for DNAse digestion. Postextraction, RNA quality was checked on a nanodrop to ensure that all samples demonstrated 260/280 ratios exceeding 2 and revealed quantifiable concentrations. Resulting extractions were stored in freezers at −80 • C, then transported on dry ice to the Chan Zuckerberg Biohub (San Francisco, CA, USA) for library preparation and metagenomic Next Generation Sequencing (mNGS).

Library Preparation and mNGS
Four randomly selected samples from each of three bat species underwent additional quantification using an Invitrogen Qubit 3.0 Fluorometer and the Qubit RNA HS Assay Kit (ThermoFisher Scientific, Carlsbad, CA, USA). After quantification, all total RNA samples, along with water samples from Madagascar extractions, were manually arrayed into 96 well plates to automate high throughput mNGS library preparation. Based on the initial quantitation, a 2 µL aliquot from each plated sample was diluted 1:9 on a Bravo liquid handling platform (Agilent, Santa Clara, CA, USA). A 5 µL aliquot from each diluted sample was arrayed into a 384 well plate for input into the mNGS library prep. Samples derived from fecal, throat, and urine swab samples were arrayed on distinct 384 well plates for separate sequencing runs. Additional unrelated total RNA samples (a dilution series of total RNA isolated from cultured HeLa cells) and a set of local lab water samples were included on each 384 well plate to serve as library preparation controls. Input RNA samples in the 384 well plate were transferred to a GeneVac EV-2 (SP Industries, Warminster, PA, USA) to evaporate the samples to enable miniaturized mNGS library preparation with the NEBNext Ultra II RNA Library Prep Kit (New England BioLabs, Beverly, MA, USA). Library preparation was performed per the manufacturer's instructions, with the following modifications: 25 pg of External RNA Controls Consortium Spike-in mix (ERCCS, Thermo-Fisher) was added to each sample prior to RNA fragmentation; the input RNA mixture was fragmented for 8 min at 94 • C prior to reverse transcription; and a total of 14 cycles of PCR with dual-indexed TruSeq adapters was applied to amplify the resulting individual libraries. An initial equivolume library pool was generated, and the quality and quantity of that pool was assessed via electrophoresis (High-Sensitivity DNA Kit and Agilent Bioanalyzer; Agilent Technologies, Santa Clara, CA, USA), real-time quantitative polymerase chain reaction (qPCR) (KAPA Library Quantification Kit; Kapa Biosystems, Wilmington, MA, USA), and small-scale sequencing (2 x146 bp) on an iSeq platform (Illumina, San Diego, CA, US). Subsequent equimolar pooling of individual libraries from each plate was performed prior to performing large-scale paired-end sequencing (2 × 146 bp) run on the Illumina NovaSeq sequencing system (Illumina, San Diego, CA, USA). The pipeline used to separate the sequencing output of the individual libraries into FASTQ files of 146bp paired-end reads is available on GitHub at https:// github.com/czbiohub/utilities.

CZID
Raw reads from Illumina sequencing were host-filtered, qualityfiltered, and assembled on the CZID (v3.10, NR/NT 2019-12-01) platform, a cloud-based, open-source bioinformatics platform designed for microbe detection from metagenomic data (78), using a host background model of "bat" compiled from all publicly available full-length bat genomes in GenBank at the time of sequencing. Samples were deemed "positive" for coronavirus infection if CZID successfully assembled at least two contigs with an average read depth >2 reads/nucleotide that showed significant nucleotide or protein BLAST alignment(s) (alignment length >100 nt/aa and E-value < 0.00001 for nucleotide BLAST/ bit score >100 for protein BLAST) to any CoV reference present in NCBI NR/NT database (version 12-01-2019). To verify that no positives were missed from CZID, all non-host contigs assembled in CZID underwent directed, offline BLASTn and BLASTx (79) against a reference database constructed from all available fulllength nucleotide and protein reference sequences for Alpha-and Betacoronavirus available in NCBI Virus (last access: August 15, 2021).
Step-by-step instructions for our offline BLAST protocol can be accessed in our publicly available GitHub repository at: https://github.com/brooklabteam/Mada-Bat-CoV/.

Genome Annotation and BLAST
Three full genome-length Nobecovirus contigs returned from CZID (two from R. madagascariensis and one from P. rufus) were aligned with Nobecovirus homologs from NCBI (see 'Phylogenetic Analysis') and annotated in the program Geneious Prime (2020.0.5). We then used NCBI BLAST and BLASTx to query identity of our full length recovered genomes and their respective translated proteins to publicly available sequences in NCBI (79). We queried identity to reference sequences for four previously described Nobecovirus strains (accession numbers: MG762674 (Rousettus bat coronavirus HKU9), NC_030886 (Rousettus bat coronavirus RoBat-CoV GCCDC1), MK211379 (Rhinolophus affinis coronavirus BtRt-BetaCoV/GX2018), and NC_048212 (Eidolon helvum bat coronavirus), as well as to the top BLAST hit overall. Finally, we aligned representative sequences from each major Nobecovirus clade and visually examined the region of p10 orthoreovirus insertion from the RoBat-CoV GCCDC1 lineage in the newly described sequences from Madagascar.

Phylogenetic Analysis
Contigs returned from CZID were combined with publicly available coronavirus sequences in NCBI to perform phylogenetic analysis. We carried out four major phylogenetic analyses, building (a) a full-genome Betacoronavirus maximum likelihood (ML) phylogeny, (b) a Betacoronavirus ML phylogeny corresponding to a conserved 259 bp fragment of the RNAdependent RNA polymerase (RdRp) gene encapsulated in the CoV Orf1b, (c) four amino acid ML phylogenies derived from translated nucleotides corresponding to the spike (S), envelope (E), matrix (M), and nucleocapsid (N) proteins from a subset of full length genomes, and (d) a time-resolved Bayesian phylogeny corresponding to all available full genome Nobecovirus sequences in NCBI virus, which we computed in BEAST2 (80). Detailed methods for the construction of each phylogeny are available at https://github.com/brooklabteam/Mada-Bat-CoV/.
Briefly, our full genome ML phylogeny was comprised of 122 unique NCBI records, corresponding to all available full genome sequences with bat hosts under NCBI taxon IDs, Betacoronavirus (694002), unclassified Betacoronavirus (696098), Betacoronavirus sp. (1928434), unclassified Coronaviridae (1986197), or unclassified Coronavirinae (2664420) (107 records), in addition to all full genome Betacoronavirus (694002) reference sequences with a non-bat host (14 records), plus one Gammacoronavirus outgroup (accession number NC_010800). The full genome phylogeny additionally included three full length Madagascar Nobecovirus sequences returned from CZID (two from R. madagascariensis and one from P. rufus), which are described in this paper for the first time, for a total of 125 unique sequences.
Our Betacoronavirus RdRp ML phylogeny consisted of an overlapping subset of a 259 bp fragment in the center of the RdRp gene that has been previously described in Madagascar fruit bats (12) (7 records), in addition to the same RdRp fragment extracted from near-full length Nobecovirus sequences on NCBI Virus (17 records) and full length reference sequences for other Betacoronavirus subgenera available in NCBI Virus (17 records). This phylogeny also included two NCBI Virus RdRp Nobecovirus fragments, in addition to seven Madagascar Nobecovirus sequences encompassing the RdRp fragment of interest, which were returned from the assembly in CZID (four from R. madagascariensis, two from P. rufus, and one from E. dupreanum). Finally, we included the RdRp fragment of our Gammacoronavirus outgroup, for a total of 51 unique sequences.
Our amino acid phylogenies consisted of S, E, M, and N gene extractions from a subset of the same representative set of near-full genome length sequences used in the RdRp analysis: the same 17 full-length Betacoronavirus reference sequences, 17 near full-length Nobecovirus sequences, and the one Gammacoronavirus outgroup, in addition to our three full genome Madagascar sequences derived from R. madagascariensis and P. rufus. Gene extractions were carried out using annotation tracks reported with each accession number in NCBI or, in cases where annotations were unavailable, genes were manually annotated and extracted in Geneious Prime based on alignment to homologs. After nucleotide extraction, genes were translated prior to alignment.
Finally, our Bayesian time-resolved Nobecovirus phylogeny consisted of all available full genome Nobecovirus sequences in NCBI virus; because recombination events can muddle molecular clock estimation in phylogenetics, we constructed two different versions of this timetree, one including 18 Nobecoviruses sequences only for which no past history of recombination has been described, and a second which added two additional sequences from the Nobecovirus GCCDC1 clade known for its p10 orthoreovirus insertion.
After compiling sequences for each phylogenetic analysis, sequence subsets for the full-length, RdRp, and four amino acid phylogenies were aligned in MAFFT v.7 (81, 82) using default parameter values. Alignments were checked manually for quality in Geneious Prime, and the RdRp alignment was trimmed a fragment (259 bp) conserved across all sequences in the subset. All sequence subsets and alignment files are available for public access in our GitHub repository: https://github.com/ brooklabteam/Mada-Bat-CoV/.
After quality control, alignments were sent to Modeltest-NG (83) to assess the best fit nucleotide or amino acid substitution model appropriate for the data. All alignments for ML analysis (full genome, 259 bp RdRp fragment, and amino acid protein sequences) were then sent to RAxML-NG (84) to construct the corresponding phylogenetic trees. Following best practices outlined in the RAxML-NG manual, twenty ML inferences were made on each original alignment and bootstrap replicate trees were inferred using Felsenstein's method (85), with the MREbased bootstopping test applied after every 50 replicates (86). Bootstrapping was terminated once diagnostic statistics dropped below the threshold value and support values were drawn on the best-scoring tree.
We constructed the Bayesian timetree using the Bayesian Skyline Coalescent (87) model in BEAST2 (80), assuming a constant population prior, and the best fit nucleotide substitution model as indicated by ModelTest-NG. Sampling dates corresponded to collection date as reported in NCBI Virus; in cases where only year was reported, we assumed a collection date of 15-July for the corresponding year. We tested trees using both an uncorrelated exponentially distributed relaxed molecular clock (UCED) and a strict clock but ultimately reported results from the strict clock assumptions, as similar results were inferred from both. Markov chain Monte Carlo (MCMC) sample chains were run for 600 million iterations, convergence was checked using TRACER v1.7 (88), and trees were averaged after 10% burnin using TreeAnnotater v2.6.3 (89) to visualize mean posterior densities at each node.

Recombination Analysis
Full length Nobecovirus sequences derived from CZID were analyzed for any signature of past recombination. First, the ORF1a, ORF1b, S, NS3, E, M, N, and NS7 genes from the P. rufus Nobecovirus sequence, the longest R. madagascariensis Nobecovirus sequence (OK067320), and two full genome representative sequences from the HKU9 (NC_009021) and E. helvum African lineages (NC_048212) were extracted, translated, and concatenated. Concatenated, translated sequences were then aligned in MAFFT v.7 (81, 82) using default parameter values. Nobecovirus sequences corresponding to the RoBat-CoV GCCDC1 (27,65) and BtRt-BetaCoV/GX2018/BtCoV92 (62, 63) genotypes were omitted from recombination analyses because inserted genes and/or genetic material upstream from the nucleocapsid in the corresponding genomes interfered with the alignment.
After alignment, genomes were analyzed for amino acid similarity in the program pySimplot (91), using the P. rufus and, subsequently, the R. madagascariensis genome as query sequences, the HKU9 and Eidolon helvum African Nobecovirus clades as references, and the corresponding Madagascar sequence as the alternative. Analyses were carried out using a window size of 100 aa and a step size of 20 aa.
After alignment, genomes were analyzed for recombination in the program SimPlot (v.3.5.1). Nucleotide similarity plots were generated using the P. rufus and, subsequently, the R. madagascariensis genomes as query sequences, the HKU9 and Eidolon helvum African Nobecovirus clades as references, and the corresponding Madagascar sequence as the alternative. Bootscan analyses were conducted on the same alignment, using the same query and reference inputs. Both nucleotide similarity and Bootscan analyses were carried out using a window size of 200 bp and a step size of 20 bp.

Nucleotide Sequence Accession Numbers
All three annotated full-length genome sequences (two from R. madagascariensis, one from P. rufus), plus four additional RdRp gene fragment sequences (two from R. madagascariensis, one from P. rufus, and one from E. dupreanum) were submitted to NCBI and assigned accession numbers OK020086-OK020089 (RdRp fragments) and OK067319-OK067321 (full genomes).

Prevalence of CoV Sequence Detection in Field Specimens
RNA from 285 fecal, 143 throat, and 196 urine swab samples was prepped into libraries and submitted for Illumina sequencing. In 28/285 (9.82%) fecal specimens and in 2/196 (1.00%) urine specimens, at least two contigs with an average read depth >2 reads/nucleotide, and nucleotide or protein-BLAST alignments to any CoV reference sequence in NCBI were identified via CZID analysis. Because the prevalence detected in the urine samples was low, it is likely attributable to field contamination with fecal excrement upon urine swab collection, as bats often excrete both substances simultaneously under manual restraint. None of the 143 throat swabs assayed demonstrated evidence of CoV infection.

Genome Annotation and BLAST
Three full or near-full CoV genome length contigs were recovered from CZID for Nobecoviruses derived from R. madagascariensis (two genomes: 28,980 and 28,926 bps in length) and P. rufus (one genome: 29,122 bps in length). In all three genomes, we successfully identified ORF1ab (including RdRp) and structural proteins S (spike), E (envelope), M (matrix), and N (nucleocapsid), in addition to accessory genes NS3, NS7a, and NS7b (Figure 2A). In keeping with convention outlined in (65), the accessory genes, NS7a and NS7b, were so named based on nucleotide alignment and amino acid identity to homologous proteins in previously described Nobecoviruses.
BLAST analysis of the full genome indicated that the P. In general, BLASTx queries of NS7 accessory proteins in both R. madagascarienis and P. rufus Nobecovirus demonstrated ∼40-91% amino acid identity to already-characterized Nobecovirus proteins (Supplementary Table 1). Genomes derived from R. madagascariensis appeared slightly more complex than those derived from P. rufus, allowing for annotation of one additional accessory gene, NS7c, which has been characterized in recombinant Nobecovirus sequences of the RoBat-CoV GCCDC1 lineage (27,65). Curiously, BLASTx query of the NS7a accessory protein in the P. rufus genome showed no identity to any previously described Nobecovirus protein; rather, the highest scoring protein alignment (31.25% identity, 1e-06 E-value) of the NS7a translation encompassed 40% of the query (query coverage was located at the 3' end of the query length) and corresponded to an arachnid Low-Density Lipoprotein Receptor-Related Protein 1 (LRP-1) (Supplementary Table 2). As LRP-1 is involved in the mammalian innate immune response (92), we hypothesized that this putative novel ORF could be a viral gene involved in immune antagonism. To check the integrity of our de novo assembly in NS7a, we mapped the deduplicated raw reads from mNGS to the full genome P. rufus Nobecovirus contig generated by CZID (78) (Supplementary Figure 1). We confirmed >200x read coverage across the region corresponding to the putative NS7a accessory protein, with good representation of both forward and reversefacing reads across the length of the protein, as well as the intergenic regions preceding and succeeding it. We were also able to identify a putative Transcription Regulatory Sequence (TRS) preceding this gene ( Table 1), further validating our confidence that P. rufus Nobecovirus NS7a represents a real though highly divergent protein.
The p10 orthoreovirus insertion within the RoBat-CoV GCCDC1 Nobecovirus lineage was not observed in either Nobecovirus genomes from R. madagascariensis or P. rufus. Nonetheless, examination of the multiple sequence alignment of representative sequences of all Nobecovirus clades in this region demonstrated the presence of some variable genetic material downstream from the N gene and upstream from the NS7a gene in the divergent P. rufus Nobecovirus genome ( Figure 2B). Nobecoviruses clustering in the BtRt-BetaCoV/GX2018 -BtCoV92 lineage also carry a unique coding sequence in this region, highlighting the dynamic nature of the 3' end of the CoV genome (63).
In addition to the identification of both canonical and novel ORFs described above, we also observed noncoding TRS elements preceding all the major proteins in all three Nobecovirus genomes ( Table 1). Many of these correspond to the 5'-ACGAAC-3' six bp core motif common to many Betacoronaviruses, including SARS-CoV and previously described in Nobecoviruses of the GCCDC1 and GX2018/BtCoV92 lineages (62,65,93). For most genes, these TRS elements were located a short distance upstream from the corresponding gene (Table 1). Elements identified in the two R. madagascariensis genomes were largely comparable, suggesting that these two sequences could represent slight variations in the same virus lineage. Some putative TRS elements, including that preceding P. rufus NS7a, showed variation from the 5'-ACGAAC-3' core motif, with some recapitulating the 5'-AAGAA-3' motif common to SARS-CoV-2 (94). TRS variations may be indicative of variation in gene expression across individual bats and/or species.

Phylogenetic Analysis
Phylogenetic analysis of full length Betacoronavirus genomes confirmed that both P. rufus and R. madagascariensis genomes cluster in the Nobecovirus subgenus of the Betacoronaviruses, with the divergent P. rufus forming its own distinct clade and both R. madagascariensis genomes grouping with the previously described E. helvum reference sequence from Cameroon (64) (Figure 3A). We observed distinct groupings of five main Nobecovirus lineages in our phylogeny: (a) the largely Asianderived HKU9 sequences, (b) the African E. helvum-derived sequences (now including new R. madagascariensis Nobecovirus genomes), (c) the recombinant GCCDC1 genomes, (d) the BtRt-BetaCoV/GX2018 and BtCoV92 genomes described respectively from China and Singapore, and (e) the divergent P. rufus genome contributed here from Madagascar. Intriguingly, the P. rufus genome groups ancestral to all other Nobecoviruses, followed by the E. helvum/R. madagascariensis African lineage, with the Asian genotypes forming three distinct (and more recent) clades corresponding to genotypes HKU9, GCCDC1, and GX2018 -BtCoV92. This finding suggests that the evolutionary origins of the Nobecovirus clade likely predate the recent divergence of P. rufus from sister Pteropus spp. taxa (though see results for Bayesian timetree below), which are primarily Asian in their distribution. To date, no other Nobecovirus sequences have yet been described from any Pteropus spp. bats; additional sampling across other Pteropus lineages will be needed to confirm our hypothesis of a Pteropus genus origin for this Betacoronavirus clade. Further phylogenetic analysis of a 259 bp fragment of the RdRp gene reconfirmed these groupings and suggested the presence of at least two distinct genetic variants within the P. rufus lineage (Figure 3B). One RdRp fragment derived from feces of the third Malagasy fruit bat, E. dupreanum, grouped within the E. helvum -R. madagascariensis African Nobecovirus lineage, consistent with previous reporting (12). Characterization of the full length genome of this virus will be needed to clarify whether it represents a genetic variant of or a distinct genotype from the R. madagascariensis virus. Phylogenetic analysis of the RdRp fragment allowed for inclusion of one partial Nobecovirus sequence derived from E. helvum bats in Kenya (HQ728482), which also grouped within the E. helvum -R. madagascariensis African clade, confirming the distribution of this genotype across West and East Africa and into the South-Western Indian Ocean Islands. Notably, one partial Cameroonian E. helvum sequence (MG693170) clustered with HKU9 sequences from Asia, rather than within the E. helvum -R. madagascariensis African clade. These findings suggest that both "African" and "Asian" Nobecovirus lineages are likely broadly geographically distributed. Amino acid phylogenies computed from translated protein alignments of the S, E, M, and N Betacoronavirus structural genes (Supplementary Figure 2) further confirmed evolutionary relationships suggested in Figure 3. S, M, and N gene phylogenies demonstrated distinct groupings of five main Nobecovirus lineages outlined above, while in the E gene phylogeny, the P. rufus sequence grouped adjacent to the single Cameroonian-derived E. helvum sequence within the HKU9 clade. However, we note that bootstrap values were extremely low in this E gene phylogeny, suggesting that additional sampling is needed to confirm these evolutionary relationships.
Results from our Bayesian timetree analysis (Figure 4;  Supplementary Figure 3) recapitulated ML support for the five distinct Nobecovirus subclades but indicated a time to MRCA for the entire Nobecovirus clade of ∼300 years ago (1695; 95% HPD: 1643-1748) (Figure 4), with the African Eidolon lineage (which included our R. madagascariensis sequences) branching off ∼200 years ago (1827; 95% HPD: 1797-1857). Surprisingly, both of these viral divergence times post-date the estimated radiation of Madagascar host bats from sister species (58,59), suggesting that Malagasy bat populations may support substantial viral exchange with bats from surrounding islands and the African continent. Additional sampling of Nobecovirus lineages in Asian-distributed Pteropus spp. bats will be crucial to ascertaining whether the evolutionary origins of the Nobecovirus subgenus could date back any further than that suggested by samples highlighted here. Evolutionary relationships were preserved in phylogenies which included GCCDC1 orthoreovirus recombinant sequences, though inclusion of these sequences diminished certainty surrounding evolutionary divergence times, resulting in broad HPD distributions (Supplementary Figure 3).

Recombination Analysis
SimPlot analysis confirmed the evolutionary distinctiveness of the P. rufus Nobecovirus genome, which showed <70% amino acid similarity and <50% nucleotide similarity to HKU9, E. helvum, and R. madagascariensis genotypes across the majority of its genome length (Figures 5A,B). Consistent with BLAST results, the P. rufus Nobecovirus genome demonstrated the highest similarity to previously described sequences in the Orf1b region, which includes RdRp. The R. madagascariensis Nobecoviruses, by contrast, showed >90% amino acid and nucleotide similarity to the E. helvum African lineage throughout Orf1ab, but both P. rufus and R. madagascariensis sequences diverged from all other reference genomes in the first half of the spike protein, which corresponds to the S1 subunit and includes the receptor binding domain that mediates viral entry into host cells (95). Further divergence for both P. rufus and R. madagascariensis Nobecoviruses was observed in the N structural protein and in the NS7 accessory genes. Bootscan analysis further confirmed these findings, showing that the P. rufus Nobecovirus clusters with HKU9 lineages across Orf1ab, NS3, E, and M genes but demonstrates evidence of recombination with E. helvum -R. madagascariensis African lineages in the S (particularly S1), N, and NS7 genes (Figure 5C). Similarly, bootscanning demonstrated that R. madagascariensis Nobecoviruses group with the E. helvum lineage across Orf1ab, NS3, E, and M but show evidence of recombination with HKU9 and P. rufus Nobecovirus in S (again, particularly S1), N, and NS7 genes (Figure 5C), thus highlighting the dynamic nature of these regions of the Nobecovirus genome.

DISCUSSION
Here, we contribute three full-length genome sequences and four RdRp fragments to public NCBI repositories; these sequences correspond to at least two novel Nobecoviruses derived from wild Malagasy fruit bats, Pteropus rufus and Rousettus madagascariensis, with evidence of additional genetic variants circulating in Eidolon dupreanum, as well. Phylogenetic analyses suggest that previously described Nobecoviruses can be grouped into five general clades: (a) the HKU9 lineage of largely Asian origins, (b) the mostly African-distributed lineage derived from E. helvum bats (which contains our R. madagascariensis and E. dupreanum sequence contributions), (c) the recombinant GCCDC1 lineage, which has been previously reported from China and Singapore (27,65), (d) the BtRt-BetaCoV/GX2018 -BtCoV92 lineage, also known from China and Singapore (62,63), and (e) a novel, divergent clade corresponding to the newlydescribed P. rufus genome. This new P. rufus Nobecovirus appears to be ancestral to all previously-described Nobecovirus sequences with a time to MRCA dating back ∼300 years and likely postdating the late Pleistocene divergence of the P. rufus host from other Southwest Indian Ocean Island pteropodids (58). The relatively recent branching of the ancestral P. rufus Nobecovirus sequence and the nested grouping of the R. madagascariensis sequences within the African Eidolon subclade suggest that substantial viral genetic exchange takes place between fruit bats in Madagascar, in the surrounding Southwest Indian Ocean Islands, and on the African continent. Thirty-eight of Madagascar's 49 bat species are endemic, while two species exhibit ranges that include the Comoros and Réunion islands (respectively, Scotophilus borbonicus and Miniopterus aellini), and nine species can be found broadly distributed across Africa and, in some cases, parts of Asia and Europe (e.g., Pipistrellus kuhlii, Mops midas) (67). Additional coronavirus sampling in some of these more cosmopolitan insectivorous bat species will shed light on their role in diversification and distribution of Nobecoviruses more broadly. Notably, all three Malagasy fruit bat species in the family Pteropodidae-from which Nobecoviruses are thought to be sourced-are endemic to Madagascar and not known to disperse outside the island.
Importantly, though largely characterized in Asia, HKU9 Nobecovirus genotypes have been identified in West Africa (Cameroon) (64), and E. helvum lineages have been characterized across both West (Cameroon) (64) and East (Kenya) (30,96) Africa, as well as on one Indian Ocean island (Madagascar). These findings suggest that different Nobecovirus clades may be more broadly geographically distributed than has been previously recognized. To our knowledge, no Nobecoviruses have been identified from the southern extension of the pteropodid fruit bat range in Australia, or from any Pteropus spp. bat in Asia; characterization of any CoVs infecting these bats, which are known to host important, zoonotic henipaviruses (97) and lyssaviruses (93), would do much to enhance our understanding of the phylogeography of the Nobecovirus clade. Madagascar represents a unique phylogeographic melting pot, with flora and fauna-and corresponding viruses-of both African and Asian descent (55), offering opportunities for mixing of largely disparate viral groups. This mixing is important in light of the CoV penchant for recombination, which can allow viruses from one clade to gain function through acquisition of genetic material from another, thus facilitating rapid changes in host range (43)(44)(45)(46)(47)(48). Indeed, recombination events have been implicated in the cross-species emergence of most zoonotic coronaviruses (44)(45)(46)(47)(48)98), and viral acquisition of an ACE2-compatible receptor binding domain-often mediated through recombination-has been a critical step in the expansion of bat Sarbecovirus ranges to include human hosts (98)(99)(100)(101). As bats can host multiple coronaviruses at once and often roost together, there are ample opportunities for recombination events to occur (7,102).
Nobecoviruses are not known to be zoonotic and have been thus far described exclusively infecting fruit bats hosts of the Old World bat family, Pteropodidae. Nonetheless, the Nobecovirus subgenus demonstrates a CoV-characteristic tendency to recombine, as evidenced by circulation of the widespread GCCDC1 lineage in Asia, which carries a p10 gene insertion derived from an orthoreovirus between the N structural protein and NS7a accessory protein toward the 3' end of the genome (65). This finding highlights CoVs' capacity to undergo both homologous (within-clade) and heterologous (out-of-clade) recombination. Since bats are known to maintain co-infections with many viruses simultaneously (103,104), this recombinatory potential could allow CoVs to rapidly acquire new genetic material from coinfections, including receptor binding or host immune evasion capabilities that may expand host range. The orthoreovirus insertion within the GCCDC1 virus genome was not detected among the CoVs in our dataset, though, anecdotally, mNGS of fecal and throat samples collected in our sampling did identify evidence of orthoreovirus infection in several throat swabs derived from R. madagascariensis bats, highlighting the potential for recombination opportunities between these two viral groups. Notably, recombination analyses suggested substantial selection has taken place in this region of both R. madagascariensis and P. rufus-derived Nobecoviruses. Selection at the 3' end of the CoV genome may modulate viral replication ability, since several regulatory sequences and accessory genes (e.g., NS7) are defined in this region (94). Viral replication ability may be further impacted by variation in TRS motifs, which regulate expression of corresponding genes. We identified putative TRS sequences corresponding to all structural and non-structural genes identified in all three contributed Nobecovirus genomes; while the majority of these TRS motifs recapitulated the well-conserved 5'-ACGAAC-3' Betacoronavirus core sequence (62,65), variation in a subset of genes across species and individuals (e.g., differing motifs between two R. madagascariensis-derived genomes) may correspond to variation in gene expression.
Recombination potential is a particular cause for concern in cases where viruses that lack the ability to infect human cells may acquire this zoonotic capacity through genetic exchange with other viruses coinfecting the same host. Indeed, the original SARS-CoV is believed to have acquired its capacity to bind human ACE2 through a recombination event with ACE2using Sarbecoviruses in the disparate SARS-CoV-2 clade (98). Sarbecoviruses, in particular, are known to recombine frequently, giving rise to new genetic variants, in regions where different species of Rhinolophid bat hosts co-roost and share viruses (7). Cave-resident Malagasy fruit bats, E. dupreanum and R. madagascariensis, are known to co-roost with each other and with several species of insectivorous bat (67), which could facilitate Nobecovirus recombination. The observed similarity in Nobecovirus sequences derived from E. dupreanum and R. madagascariensis (which cluster in the same lineage), as compared with disparate sequences derived from tree-roosting P. rufus, suggest that some CoV genetic exchange may have already taken place between bats with overlapping habitats. To date, zoonotic potential has not been demonstrated for any previously described Nobecoviruses, and Rhinolophid bats associated with ACE2 usage are not resident in Madagascar. Nonetheless, bats in family Vespertilionidae, the family most commonly associated with zoonotic Merbecoviruses (8-10), are widespread in Madagascar, and Mormopterus jugularis, a known Molossidae bat host for Alphacoronaviruses of undetermined zoonotic potential (33), has been described co-roosting with R. madagascariensis (105). Bootscan analyses identified signatures of recombination in the S1 subunit of both P. rufus and R. madagascariensis Nobecovirus spike proteins, suggesting that this region of the genome, which modulates host range through cell surface receptor binding, may be under selective pressure. In addition to facilitating direct bat-to-human spillover, recombination can also play an important role in facilitating cross-species emergence via intermediary bridge hosts: both SARS-CoV-1 and MERS-CoV demonstrated a critical role for intermediate hosts (respectively, palm civets and camels) in their evolutionary history of zoonotic emergence (106)(107)(108). Given the high endemicity and biodiversity characteristic of Madagascar's mammalian fauna, the island abounds with opportunities for CoV recombination with unique viruses in unique hosts.
In addition to posing risk for future zoonoses, Nobecoviruses derived from wild Madagascar fruit bats could provide unprecedented genetic material for recombination to existing human coronaviruses already in circulation across the islandmost notably SARS-CoV-2 (75). At the time of this writing, SARS-CoV-2 infections remain widespread and vaccination limited across Madagascar (109). Previous work has assessed the risk of reverse zoonosis, or 'spillback' of SARS-CoV-2 from human to bat populations in the United States (16), concluding that high human caseloads and frequent human-bat contact rates in research settings pose both conservation risks to naïve bat populations presented with a novel pathogen, as well as human health risks presented by the possible establishment of secondary wildlife reservoirs for SARS-CoV-2 capable of sourcing future epidemics or the generation of unique viral variants through human-wildlife virus recombination (16). Bat-human contact rates are higher, on average, in Madagascar than in the US, as bats are consumed across the island for subsistence and frequently found roosting in human establishments or human-adjacent habitats (68)(69)(70)(71)(72). SARS-CoV-2 has already demonstrated its capacity for successful reverse zoonosis and adaptation to nonhuman hosts, in the case of farmer-sourced infections of mink in Finland (110), underscoring the legitimacy of these concerns. Notably, spillback is less likely to be an issue in regions where animals are killed upon capture for consumption (vs. transported live), as is often the case in Madagascar (72).
Prevalence of coronavirus RNA by sequence detection in fecal samples averaged around 10% across all three Malagasy fruit bat species examined in our study, consistent with CoV prevalence reported in wild bat species elsewhere (12,33). One previous study of CoV circulation in Madagascar fruit bats reported much lower prevalence of infection in E. dupreanum and R. madagascariensis-derived fecal specimens, respectively 1/88 (1.1%) and 0/141 (0%), as compared with a 13/88 (14.8%) prevalence in P. rufus-derived feces (12). As in our study, this previous work found no positive infections in throat swabs, supporting a gastrointestinal tropism for CoVs in this fruit bat system, in contrast to the respiratory infections more commonly observed in humans (though humans do shed SARS-CoV-2 gastrointestinally, as well). Currently, very little is known concerning the cell receptors of Nobecoviruses and their tissue tropism. One previous study demonstrated how bat coronavirus HKU9 (along with MERS-CoV) gained cell entry via the ER chaperone activity cell membrane receptor GRP78, but the study did not describe the distribution of this receptor beyond demonstrating co-expression with DPP4 (111). Future cosampling of both throat and fecal swabs for bat coronaviruses will shed light on their tropism and shedding across different tissues.
One previous study in the West Indian Ocean provided more information about CoV prevalence in Madagascar bats, with 6/45 (13.3%) R. madagascariensis fecal specimens testing CoV positive, as compared to 10/63 (15.9%) M. jugularis specimens, 4/44 (9.1%) Triaenops menamena specimens, and 2/21 (9.5%) Mops midas specimens (33). Consistent with previous findings (32,41,112,113), we observed the highest prevalence of CoV infection in P. rufus and E. dupreanum juveniles. We hypothesize that the absence of juvenile infection identified in R. madagascariensis bats in our study could be due to the staggered nature of the birth pulse for these three species: Madagascar fruit bats birth in three successive birth pulse waves, led by P. rufus in October, and followed by E. dupreanum in November and R. madagascariensis in December and January (114). As the bulk of juvenile R. madagascariensis bats sampled in our study were captured in February, it is possible that most were still too young to be CoV-positive [perhaps under protection from inherited maternal immunity (61)]. By the time of the second R. madagascariensis sampling in April, juveniles would have been large enough to be erroneously classed as adults, as size range variation is more limited in small R. madagascariensis bats as compared with the two other Malagasy fruit bat species (67). Our observations are consistent with previous records indicating that juvenile and subadult bats show enhanced CoV shedding after weaning and the loss in maternal immunity in other systems (115,116).
Our work emphasizes the importance of longitudinal ecological studies in identifying viral shedding events in transiently-infected wildlife hosts across multiple age and reproductive classes. Enhanced future surveillance efforts will be useful in pinpointing the exact seasonality of peak CoV shedding events, and mitigation efforts for both zoonotic and reverse zoonotic risks should be focused on limiting humanbat contact (in particular, the government-sanctioned hunting seasons) during these periods. Our study highlights the enhanced evolutionary and functional virological inference that can be derived from full genome sequences, detected by unbiased metagenomic sequencing. Characterization of these genomes provides the basis for basic virology experiments to follow, such as pseudovirus or reverse genetics experiments aimed at understanding host receptor utilization. More thorough studies documenting the seasonal dynamics of bat-borne CoVs, which elucidate genetic variation within and between species that share habitats in wild populations will be essential to understanding CoV recombination, host shifting, and zoonotic potential. Replication of such studies across the global range of both coronaviruses and their bat hosts, in particular in understudied regions of Africa, is needed to assess the landscape of future zoonotic risks and present opportunities for intervention and mitigation.

DATA AVAILABILITY STATEMENT
The sequences presented in the study are deposited in NCBI, accession numbers OK020086-OK020089 and OK067319-OK067321.

ETHICS STATEMENT
The animal study was reviewed and approved by UC Berkeley Animal Care and Use Committee and Madagascar Ministry of Forest and the Environment under guidelines posted by the American Veterinary Medical Association.

AUTHOR CONTRIBUTIONS
CB conceived of the project and acquired the funding, in collaboration with J-MH, PD, JD, and CT. Field samples were collected and RNA extracted by AA, SA, AG, HR, TR, NR, and CB. AK led the mNGS, with support from VA, HR, TR, and CB. GK and CB analyzed the resulting data and co-wrote the original draft of the manuscript, which all authors edited and approved.