De novo Genome Assembly of the Raccoon Dog (Nyctereutes procyonoides)

1 LOEWE-Centre for Translational Biodiversity Genomics (LOEWE-TBG), Senckenberg Nature Research Society, Frankfurt am Main, Germany, Department of Zoology and Animal Cell Biology, University of the Basque Country (UPV-EHU), Vitoria-Gasteiz, Spain, 3 Senckenberg Biodiversity and Climate Research Centre (SBiK-F), Frankfurt am Main, Germany, 4 Institute for Ecology, Evolution and Diversity, Goethe University, Frankfurt am Main, Germany, 5 Institute of Organismic and


INTRODUCTION
The raccoon dog, Nyctereutes procyonoides (NCBI Taxonomy ID: 34880, Figure 1a) belongs to the family Canidae, with foxes (genus Vulpes) being their closest relatives (Lindblad-Toh et al., 2005;Sun et al., 2019). Its original distribution in East Asia ranges from south-eastern Siberia to northern Vietnam and the Japanese islands. In the early 20th century, the raccoon dog was introduced into Western Russia for fur breeding and hunting purposes, which led to its widespread establishment in many European countries, Figure 1b. Together with the raccoon (Procyon lotor), it is now listed in Europe as an invasive species of Union concern (Regulation (EU) No. 1143/2014) and member states are required to control pathways of introductions and manage established populations.
The raccoon dog is a host and vector for a variety of pathogens, including rabies and canine distemper virus. Whether, it is involved in the transmission of coronaviruses to humans is inconclusive (Guan, 2003;Chan and Chan, 2013), but experimental studies have demonstrated that raccoon dogs are susceptible to SARS-CoV-2 infection and its transmission to contact animals (Freuling et al., 2020). However, a recent study using predictions by sequence alignment suggests that the mammalian ACE2 receptor of N. procyonoides binds less effectively to the S-protein of SARS-CoV and SARS-CoV-2 than those of other species like cows and rodents (Luan et al., 2020a,b).
Several subpopulations have been recognized in their current range of distribution in Europe and East Asia based on mtDNA (Kim et al., 2013;Paulauskas et al., 2016), microsatellite (Drygala et al., 2016;Hong et al., 2018), and SNP markers (Nørgaard et al., 2017). Interestingly, continental populations from Asia and Europe seem to have a higher number of chromosomes (2n = 54) than those from Japanese islands (2n = 38) (Wada and Imai, 1991;Nie et al., 2003). Moreover, the raccoon dog is also known to be one of the few Carnivora species which presents B chromosomes (Bs) in its karyotype (Duke Becker et al., 2011;Makunin et al., 2018). Several mitochondrial genome sequences of wild and bred raccoon dogs are known (Sun et al., 2019), however, a complete nuclear genome is not still available. Apart from its potential role as disease vector, N. procyonoides is of interest because it is the only extant species in the genus Nyctereutes and the only canid known to hibernate.
Here, a first draft genome of a raccoon dog sampled in Germany is presented, which will provide a basis for deeper understanding of its phylogenetic relationships, the evolution and function of B chromosomes in mammals, give insights in the evolution of hibernation, provide markers for future studies on invasive population structures in Europe and serve as a resource for studying gene-disease associations.

Sample Collection, Library Construction, Sequencing
One adult female individual of raccoon dog, Nyctereutes procyonoides (Figure 1a), was bagged in February 2020 in Germany (52 • 06 ′ 51.2 ′′ N 12 • 03 ′ 03.6 ′′ E) according to hunting regulations. Blood samples as well as various types of tissue were immediately stored on dry ice or in RNAlater and kept at −80 • C until further processing (Figure 1c).
A SMRTbell library was constructed following the instructions of the SMRTbell Express Prep kit v2.0 with Low DNA Input Protocol (Pacific Biosciences, Menlo Park, CA). Blood (5 ml) was used for high molecular weight DNA extraction using Genomic-tip 100/G (QIAGEN) according to the manufacturers' instructions. One SMRT cell sequencing run was performed in CLR mode on the Sequel System II with the Sequel II Sequencing Kit 2.0. For chromosome-level genome information, genomic DNA was isolated from ear tissue (62 mg) following the OMNI-C Proximity Ligation Assay (Version 1.1) with some modifications. The library was sequenced on the NovaSeq 6000 platform using a 150 paired-end sequencing strategy at Novogene (UK). The fragment size distribution and concentration of each of the final libraries was assessed using the TapeStation (Agilent Technologies) and the Qubit Fluorometer and Qubit dsDNA HS reagents Assay kit (Thermo Fisher Scientific, Waltham, MA), respectively. For more information on the different protocols see Supplementary Information.
A mix of different tissues (liver, heart, gonads, brain, kidney, muscle) was ground into small pieces using steel balls and a Retsch Mill. A total of 120 mg of the tissue was shipped on dry ice to Novogene (UK) for Illumina paired-end 150 RNA-seq of a 250-300 bp insert cDNA library.

Genome Size Estimation
Genome size was estimated following a flow cytometry protocol with propidium iodide-stained nuclei described in Hare and Johnston (2012). Ear tissue of one frozen (−80 • C) adult sample of N. procyonoides and neural tissue of the internal reference standard Acheta domesticus (female, 1C = 2 Gb) was mixed and chopped with a razor blade in a petri dish containing 2 ml of ice-cold Galbraith buffer. The suspension was filtered through a 42-µm nylon mesh and stained with the intercalating fluorochrome propidium iodide (PI, Thermo Fisher Scientific) and treated with RNase II A (Sigma-Aldrich), each with a final concentration of 25 µg/ml. The mean red PI fluorescence signal of stained nuclei was quantified using a Beckman-Coulter CytoFLEX flow cytometer with a solid-state laser emitting at 488 nm. Fluorescence intensities of 5,000 nuclei per sample were recorded. We used the software CytExpert 2.3 for histogram analyses. The total quantity of DNA in the sample was calculated as the ratio of the mean red fluorescence signal of the 2C peak of the stained nuclei of the raccoon dog sample divided by the mean fluorescence signal of the 2C peak of the reference standard times the 1C amount of DNA in the standard reference. Six replicates were measured on 6 different days to minimize possible random instrumental errors. Furthermore, we estimated the genome size by coverage from mapping reads used for genome assembly back to the assembly itself using backmap 0.3 (https://github.com/ schellt/backmap; Schell et al., 2017). In brief, the method divides the number of mapped nucleotides by the mode of the coverage distribution. By doing so, the length of collapsed regions with many fold increased coverage is taken into account.

Genome Assembly Workflow
SMRT reads longer than 7 kb were assembled under two different approaches (wtdbg v2.5; Ruan and Li, 2020 and Flye v2.7.1;Kolmogorov et al., 2019). The resulting assemblies were compared in terms of contiguity using Quast v5.0.2 (Gurevich et al., 2013), and evaluated for completeness by BUSCO v3.0.2 (Simão et al., 2015) (under short mode) against the laurasiatheria_odb9 data set (Supplementary Table 1). The assembled genome obtained with Flye presented the highest contiguity and completeness of both approaches and was therefore selected for downstream analyses.

Scaffolding and Gap Closing
To further improve the assembly, we applied two rounds of scaffolding and gap closing to the selected genome assembly. The genome was first scaffolded with the SMRT reads by SSPACElongread v1.1 (Boetzer and Pirovano, 2014) and then with ONT reads by SLR (Luo et al., 2019). TGS gapcloser v1.0.1 (Xu et al., 2019) was run after each scaffolding step. Subsequently, Omni-C reads were employed to further scaffold the draft genome following the HiRise pipeline (Putnam et al., 2016) operated by the Dovetail Genomics TM team. The assembly was screened for contamination using BlobTools v1.1.1 (Kumar et al., 2013;Laetsch and Blaxter, 2017) by evaluating coverage, GC content and sequence similarity against the NCBI nt database of each sequence (Figure 1d).

Gene Prediction and Functional Annotation
After the repeat sequences were masked, genes were predicted using the homology-based gene prediction tool GeMoMa v1.7.1 (Keilwagen et al., 2016(Keilwagen et al., , 2018 Venter et al., 2001). First, from the mapped RNA-seq reads, introns were extracted and filtered by the GeMoMa modules ERE and DenoiseIntrons. Then, we independently ran the module GeMoMa pipeline for each reference species using MMseqs2 (Steinegger and Söding, 2017) as alignment tools and including the mapped RNAseq data. Finally, the 11 gene annotations were combined into a final annotation by using the GeMoMa modules GAF and AnnotationFinalizer.
Predicted genes were annotated by BLAST search against the Swiss-Prot database with an e-value cutoff of 10 −6 . InterProScan v5.39.77 (Quevillon et al., 2005) was used to predict motifs and domains, as well as Gene ontology (GO) terms.

Genome Size Validation
The calculated DNA content through flow cytometry experiments was 3.10 Gb, similar to previous flow cytometry studies (3.19 Gb;Wurster-Hill et al., 1988). The genome size estimation by read coverage resulted in 3.23 Gb. Although our draft genome assembly was smaller than the values obtained by flow cytometry and coverage, the assembly length obtained of 2.39 Gb was in the range of other Carnivora genomes ( Table 1,  Supplementary Table 2) and showed good completeness with 92.9% completely recovered BUSCOs. The difference regarding assembly vs. estimated genome size could be explained by the complex chromosome structure of the raccoon dog which presents large chromatin proximal regions and a fluctuating number of B chromosomes (Duke Becker et al., 2011;Makunin et al., 2018). Both uncommon structures in carnivores are mostly compound by repetitive elements that were most likely not properly resolved and collapsed. 1 | A. Genome assembly and annotation statistics for raccoon dog (Nyctereutes procyonoides) and comparison with related species. B. Repeat statistics: De novo and homology based repeat annotations as reported by RepeatMasker and RepeatModeler; Families of repeats included here are long interspersed nuclear elements (LINEs), short interspersed nuclear elements (SINEs), long tandem repeats (LTR), DNA repeats (DNA), unclassified (unknown) repeat families, small RNA repeats (SmRNA), and others (consisting of small, but classified repeat groups). The total is the total percentage of base pairs made up of repeats in each genome assembly, respectively. C. Number and percentage of functional annotated predicted protein-coding genes.

Vulpes vulpes Canis lupus familiaris
A. GENOME STATISTICS  Table 2). We also compared synteny between raccoon dog and dog genome assemblies by running Jupiterplot v1.0 (https://github.com/JustinChu/JupiterPlot). The Jupiterplot displays the largest 58 raccoon dog scaffolds, which covered more than 99% of the dog genome (Figure 1e). The colored bands represent synteny between both genome assemblies. The plot shows high synteny between both genomes with several genomic rearrangements and break points, some of them previously identified (Duke Becker et al., 2011). All these results makes the N. procyonoides genome the best genome recovered so far for the Vulpini tribe.

SARS-CoV-2
Animal cell infection by SARS-CoV-2 is determined by specificity between the receptor-binding domain (RBD) spike protein (Sprotein) of SARS-CoV-2 and the membrane proteins ACE2 (peptidase domain of angiostensin I converting enzyme 2) and TMPRSS2 (transmembrane serine protease) (Lam et al., 2020). We identified both proteins in the raccoon dog genome annotation, showing high similarity with dog and fox orthologues. ACE2 protein alignments between dog and raccoon dog showed 99.3% of similarity, with only 6 of 894 different amino acids (Supplementary Figure 1). Moreover, the affinity in the binding process between S-protein from SARS-CoV-2 and ACE2 have been found to be smaller for groups like canids (Canis, Vulpes), chiroptera (Rhinophus, Pteropus) and pangolins (Manis) among others due to the matching of 14 of the 20 key amino acids in human ACE2 protein (Luan et al., 2020a). However, the reported infections of SARS-CoV-2 in domestic dogs and ferrets (Elbe and Buckland-Merrett, 2017;Shu and McCauley, 2017;Shi et al., 2020) indicated that the raccoon dog can be considered as a potential host and vector for this virus along its natural distribution range in East Asia and also in its introduced populations within Europe.

ETHICS STATEMENT
Ethical review and approval was not required for the animal study because the animal was culled in full accordance to German hunting laws (waidgerecht), which means that unnecessary suffering was avoided. Moreover, the individual was not killed for the study. We used one that was killed anyway in accordance to the Convention on Biological Diversity CBD (in § 8h), that stipulates precaution, control and eradication of invasive species as a goal and task of nature conservation under international law. In 2000, the states committed themselves to developing national strategies in Decision V/8(6).

AUTHOR CONTRIBUTIONS
SK, JK, and MP conceived this study. JK and CG prepared the samples. CG conducted lab work. LC performed bioinformatic analyses and data statistics with support of TS. LC, JK, AJ, TS, and MP discussed and interpreted the data. LC wrote the manuscript and all authors commented and revised the manuscript.

FUNDING
The present study is a result of the Centre for Translational Biodiversity Genomics (LOEWE-TBG) and was supported through the program LOEWE-Landes-Offensive zur Entwicklung Wissenschaftlich-ökonomischer Exzellenz of