<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article article-type="brief-report" dtd-version="2.3" xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Genet.</journal-id>
<journal-title>Frontiers in Genetics</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Genet.</abbrev-journal-title>
<issn pub-type="epub">1664-8021</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">761791</article-id>
<article-id pub-id-type="doi">10.3389/fgene.2021.761791</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Genetics</subject>
<subj-group>
<subject>Brief Research Report</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Evaluation of Germline Structural Variant Calling Methods for Nanopore Sequencing Data</article-title>
<alt-title alt-title-type="left-running-head">Bolognini&#x2009; and Magi&#x2009;</alt-title>
<alt-title alt-title-type="right-running-head">Nanopore Sequencing Germline Variant Calling</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Bolognini&#x2009;</surname>
<given-names>Davide</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="corresp" rid="c001">&#x2a;</xref>
<uri xlink:href="https://loop.frontiersin.org/people/1419970/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Magi&#x2009;</surname>
<given-names>Alberto</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
</contrib>
</contrib-group>
<aff id="aff1">
<label>
<sup>1</sup>
</label>Unit of Medical Genetics, Meyer Children&#x2019;s Hospital, <addr-line>Florence</addr-line>, <country>Italy</country>
</aff>
<aff id="aff2">
<label>
<sup>2</sup>
</label>Department of Information Engineering, University of Florence, <addr-line>Florence</addr-line>, <country>Italy</country>
</aff>
<author-notes>
<fn fn-type="edited-by">
<p>
<bold>Edited by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/142023/overview">Ka-Chun Wong</ext-link>, City University of Hong Kong, China</p>
</fn>
<fn fn-type="edited-by">
<p>
<bold>Reviewed by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/39966/overview">Paola Bonizzoni</ext-link>, University of Milano-Bicocca, Italy</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/526441/overview">Jean-St&#xe9;phane Varr&#xe9;</ext-link>, Lille University of Science and Technology, France</p>
</fn>
<corresp id="c001">&#x2a;Correspondence: Davide Bolognini&#x2009;, <email>davidebolognini7@gmail.com</email>
</corresp>
<fn fn-type="other">
<p>This article was submitted to Computational Genomics, a section of the journal Frontiers in Genetics</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>18</day>
<month>11</month>
<year>2021</year>
</pub-date>
<pub-date pub-type="collection">
<year>2021</year>
</pub-date>
<volume>12</volume>
<elocation-id>761791</elocation-id>
<history>
<date date-type="received">
<day>20</day>
<month>08</month>
<year>2021</year>
</date>
<date date-type="accepted">
<day>11</day>
<month>10</month>
<year>2021</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2021 Bolognini&#x2009; and Magi&#x2009;.</copyright-statement>
<copyright-year>2021</copyright-year>
<copyright-holder>Bolognini&#x2009; and Magi&#x2009;</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these&#x20;terms.</p>
</license>
</permissions>
<abstract>
<p>Structural variants (SVs) are genomic rearrangements that involve at least 50 nucleotides and are known to have a serious impact on human health. While prior short-read sequencing technologies have often proved inadequate for a comprehensive assessment of structural variation, more recent long reads from Oxford Nanopore Technologies have already been proven invaluable for the discovery of large SVs and hold the potential to facilitate the resolution of the full SV spectrum. With many long-read sequencing studies to follow, it is crucial to assess factors affecting current SV calling pipelines for nanopore sequencing data. In this brief research report, we evaluate and compare the performances of five long-read SV callers across four long-read aligners using both real and synthetic nanopore datasets. In particular, we focus on the effects of read alignment, sequencing coverage, and variant allele depth on the detection and genotyping of SVs of different types and size ranges and provide insights into precision and recall of SV callsets generated by integrating the various long-read aligners and SV callers. The computational pipeline we propose is publicly available at <ext-link ext-link-type="uri" xlink:href="https://github.com/davidebolo1993/EViNCe">https://github.com/davidebolo1993/EViNCe</ext-link> and can be adjusted to further evaluate future nanopore sequencing datasets.</p>
</abstract>
<kwd-group>
<kwd>bioinformatics</kwd>
<kwd>nanopore sequencing</kwd>
<kwd>genomics</kwd>
<kwd>structural variation</kwd>
<kwd>long reads</kwd>
</kwd-group>
<contract-sponsor id="cn001">Ministero della Salute<named-content content-type="fundref-id">10.13039/501100003196</named-content>
</contract-sponsor>
<contract-sponsor id="cn002">Associazione Italiana per la Ricerca sul Cancro<named-content content-type="fundref-id">10.13039/501100005010</named-content>
</contract-sponsor>
</article-meta>
</front>
<body>
<sec id="s1">
<title>1 Introduction</title>
<p>Structural variants (SVs) are defined as DNA rearrangements &#x2265;50&#xa0;bp and include copy number variants (CNVs; deletions and duplications) as well as insertions, inversions, translocations, and more complex combinations of these described events (<xref ref-type="bibr" rid="B2">Alkan et&#x20;al., 2011</xref>; <xref ref-type="bibr" rid="B39">Sudmant et&#x20;al., 2015</xref>). Although single nucleotide variants (SNVs) were initially thought to contribute the majority of genomic variation in humans (<xref ref-type="bibr" rid="B33">Sachidanandam et&#x20;al., 2001</xref>; <xref ref-type="bibr" rid="B46">Zou et&#x20;al., 2020</xref>), SVs can extend to well over megabases of sequence, accounting for more varying base pairs than any other class of sequence variants (<xref ref-type="bibr" rid="B16">Ho et&#x20;al., 2020</xref>).</p>
<p>Several studies have implicated SVs in human health, with associated phenotypes ranging from cognitive neurological disorders (<xref ref-type="bibr" rid="B32">Rovelet-Lecrux et&#x20;al., 2006</xref>; <xref ref-type="bibr" rid="B29">Pytte et&#x20;al., 2020</xref>) to obesity (<xref ref-type="bibr" rid="B41">Walters et&#x20;al., 2013</xref>) and cancer (<xref ref-type="bibr" rid="B24">Li et&#x20;al., 2020</xref>; <xref ref-type="bibr" rid="B1">Aganezov et&#x20;al., 2020</xref>), among others (<xref ref-type="bibr" rid="B42">Weischenfeldt et&#x20;al., 2013</xref>).</p>
<p>Despite the importance of SVs, they have been largely understudied compared to SNVs because of dominant short-read sequencing technologies hindering their identification, especially in low-complexity regions, which are known to be SV hotspots (<xref ref-type="bibr" rid="B27">Mills et&#x20;al., 2011</xref>). Indeed, it has been shown that from a computational perspective, repeats create ambiguities in short-read alignment and assembly which, in turn, introduces errors in calling genetic variants (<xref ref-type="bibr" rid="B40">Treangen and Salzberg, 2011</xref>; <xref ref-type="bibr" rid="B26">Mantere et&#x20;al., 2019</xref>).</p>
<p>Long-read sequencing from Pacific Biosciences and Oxford Nanopore Technologies (ONT) has emerged in recent years (<xref ref-type="bibr" rid="B7">Chaisson et&#x20;al., 2015</xref>; <xref ref-type="bibr" rid="B18">Jain et&#x20;al., 2016</xref>) and proved invaluable in identifying previously intractable DNA sequences (<xref ref-type="bibr" rid="B23">Li and Freudenberg, 2014</xref>; <xref ref-type="bibr" rid="B5">Bolognini et&#x20;al., 2020</xref>) and close gaps in the human genome assemblies and unraveling otherwise undetected SVs at population-scale (<xref ref-type="bibr" rid="B4">Beyter et&#x20;al., 2020</xref>; <xref ref-type="bibr" rid="B43">Wu et&#x20;al., 2021</xref>).</p>
<p>The idea of sequencing DNA fragments using a protein nanopore dates back to the 1980s and culminated in the ONT MinION device being released in June 2014 (<xref ref-type="bibr" rid="B13">Deamer et&#x20;al., 2016</xref>). A single MinION flowcell has 512 sensors collecting measurements from 2048 nanopores and currently allows us to sequence a full human genome at 3-4X coverage with read lengths up to &#x223c;800&#xa0;kbp (<xref ref-type="bibr" rid="B17">Jain et&#x20;al., 2018</xref>). While low-coverage data can be used to detect CNVs at array resolution (<xref ref-type="bibr" rid="B25">Magi et&#x20;al., 2019</xref>), higher throughput facilitates the resolution of the full SV spectrum with base-pair resolution and can be achieved by combining multiple MinION runs (<xref ref-type="bibr" rid="B9">Cretu Stancu et&#x20;al., 2017</xref>) or by sequencing through the high-performance PromethION platform (<xref ref-type="bibr" rid="B10">De Coster et&#x20;al., 2019</xref>).</p>
<p>Thanks to the efforts of the Human Genome Structural Variation (<xref ref-type="bibr" rid="B8">Chaisson et&#x20;al., 2019</xref>) and Genome in a Bottle (GIAB) (<xref ref-type="bibr" rid="B45">Zook et&#x20;al., 2020</xref>) consortia, high-coverage nanopore sequencing data have been released to the research community together with high-quality SV callsets that enable an accurate estimation of precision and recall of SV calling methods. Moreover, with many other studies to follow, such as the All of Us research program and the Human Pangenome project (<xref ref-type="bibr" rid="B12">De Coster et&#x20;al., 2021</xref>), a throughout benchmark of available strategies for the identification and characterization of SVs from nanopore data is greatly needed.</p>
<p>In this article, we present an evaluation of current long-read SV calling pipelines applied to nanopore sequencing data. Specifically, we focus on germline SVs identified by read alignment&#x2013;based approaches and evaluate each SV caller&#x2019;s ability to detect genomic breakpoints of different SV types and size ranges and the effects of read alignment, sequencing coverage, variant allele depth, and integration of multiple call sets on SV detection and genotyping. A scalable workflow for SV calling based on the popular workflow language Snakemake (<xref ref-type="bibr" rid="B21">K&#xf6;ster and Rahmann, 2012</xref>) is available at <ext-link ext-link-type="uri" xlink:href="https://github.com/davidebolo1993/EViNCe">https://github.com/davidebolo1993/EViNCe</ext-link> and can be used to reproduce findings described in this article and adapted to future nanopore sequencing datasets.</p>
</sec>
<sec id="s2">
<title>2 Methods</title>
<p>The evaluation workflow used in this work is outlined in short below. Additional details are provided in the accompanying <xref ref-type="sec" rid="s10">Supplementary Material</xref>.</p>
<p>We benchmarked 5 SV calling methods, namely, Sniffles (<xref ref-type="bibr" rid="B35">Sedlazeck et&#x20;al., 2018</xref>), SVIM (<xref ref-type="bibr" rid="B15">Heller and Vingron, 2019</xref>), cuteSV (<xref ref-type="bibr" rid="B20">Jiang et&#x20;al., 2020</xref>), npInv (<xref ref-type="bibr" rid="B37">Shao et&#x20;al., 2018</xref>), and pbsv (<ext-link ext-link-type="uri" xlink:href="https://github.com/PacificBiosciences/pbsv">https://github.com/PacificBiosciences/pbsv</ext-link>), using real ONT PromethION data released by the GIAB consortium for the NA24385 Ashkenazim individual (<xref ref-type="bibr" rid="B36">Shafin et&#x20;al., 2019</xref>) and synthetic ONT data generated using the SV simulator VISOR (<xref ref-type="bibr" rid="B6">Bolognini et&#x20;al., 2019</xref>), aligned to the GRCh37 and GRCh38 versions of the human reference genome, respectively, using the long-read aligners minimap2 (<xref ref-type="bibr" rid="B22">Li, 2018</xref>), NGMLR (<xref ref-type="bibr" rid="B35">Sedlazeck et&#x20;al., 2018</xref>), lra (<xref ref-type="bibr" rid="B30">Ren and Chaisson, 2021</xref>), and pbmm2 (<ext-link ext-link-type="uri" xlink:href="https://github.com/PacificBiosciences/pbmm2">https://github.com/PacificBiosciences/pbmm2</ext-link>) (<xref ref-type="sec" rid="s10">Supplementary Datasheet&#x20;S1</xref>).</p>
<p>By randomly down-sampling the original alignments and filtering the generated SV callsets on different numbers of reads supporting a reported SV, we evaluated the influence of various depths of coverage and variant allele depths on SV callers&#x2019; ability to detect genomic breakpoints and identify their genotype.</p>
<p>Precision and recall of the SV callsets generated by combining the different long-read aligners and SV callers were calculated using truvari (<ext-link ext-link-type="uri" xlink:href="https://github.com/spiralgenetics/truvari">https://github.com/spiralgenetics/truvari</ext-link>) against the truth SV callsets from GIAB and VISOR. In accordance with similar studies (<xref ref-type="bibr" rid="B14">Gong et&#x20;al., 2020</xref>), the following criteria were used to pick out true-positive calls: 1) the genomic position of the breakpoints identified for a candidate SV must be within a predefined reference distance (500&#xa0;bp) from at least one SV in the truth callset, 2) the SV type reported for the candidate SV must match the SV type of the SV in the truth callset, and 3) for genotyping, the genotype of the candidate SV must match the genotype of the SV in the truth callset. Candidate SVs absent (and, for genotyping, also those not having a matching genotype) from the truth callset were considered false positives, and <italic>vice versa</italic> for false negatives.</p>
</sec>
<sec id="s3">
<title>3 Results</title>
<sec id="s3-1">
<title>3.1 Nanopore Sequencing Datasets and Truth SV Callsets</title>
<p>For benchmarking, we used the ultra-long ONT data released by the GIAB consortium for the NA24385 individual. These data were generated running 3 flow cells in parallel on the ONT PromethION sequencing platform and yielded &#x223c;157&#xa0;Gbp throughput. A high-quality callset of insertions and deletions derived from short-, long-, and linked-read sequencing and optical mapping is available for the same individual on the human GRCh37 reference genome and was used as the truth callset. The NA24385 truth SV callset contains 12,745 SVs (with the FILTER &#x201c;PASS&#x201d;), divided into 7,281 insertions and 5,464 deletions with the size ranges reported in <xref ref-type="sec" rid="s10">Supplementary Table&#x20;S1</xref>.</p>
<p>Since not all the SV types are included in the NA24385 truth callset, we additionally generated synthetic ONT data (&#x223c;154&#xa0;Gbp throughput) that we refer to as SI00001 from now on, harboring deletions and insertions as well as inversions, duplications, and translocations using the SV simulator VISOR (average length and standard deviation of reads are &#x223c;15,000&#x20;bp and 12,000 bp, respectively, and minimum and maximum identity of sequences is set to &#x223c;88% and &#x223c;98%). In greater detail, we inserted 10,676, randomly generated, heterozygous SVs in chromosomes 1 to 22, X and Y of the human GRCh38 reference genome, divided into 5,027 deletions, 5,027 insertions, 300 duplications, 300 inversions, and 22&#x20;cut-paste translocations (<xref ref-type="sec" rid="s10">Supplementary Table&#x20;S1</xref>).</p>
<p>Further references to the data mentioned above are available in the Data Availability Statement section.</p>
</sec>
<sec id="s3-2">
<title>3.2&#x20;Long-Read Aligners and SV Callers</title>
<p>NA24385 and SI00001 ONT reads were aligned to the GRCh37 and GRCh38 decoy versions of the human reference genome, respectively, using the long-read aligners minimap2, NGMLR, lra, and pbmm2 (<xref ref-type="sec" rid="s10">Supplementary Table S2</xref>). Read depth of the resulting alignments was calculated using mosdepth (<xref ref-type="bibr" rid="B28">Pedersen and Quinlan, 2017</xref>) and additional alignment statistics using NanoPack (<xref ref-type="bibr" rid="B11">De Coster et&#x20;al., 2018</xref>).</p>
<p>As shown in <xref ref-type="sec" rid="s10">Supplementary Figure S1</xref>, minimap2 produced the highest coverage alignments (&#x223c;138&#xa0;Gbp aligned in the NA24385 dataset and &#x223c;142&#xa0;Gbp in the SI00001 dataset) and lra the lowest (&#x223c;116&#xa0;Gbp aligned in the NA24385 dataset and &#x223c;135&#xa0;Gbp in the SI00001 dataset), with NGMLR and pbmm2 performing intermediately (&#x223c;127&#xa0;Gbp and &#x223c;128&#xa0;Gbp aligned in the NA24385 dataset; &#x223c;137&#xa0;Gbp and &#x223c;139&#xa0;Gbp aligned in the SI00001 dataset). Alignments from NGMLR and pbmm2 hit the highest N50 score (&#x223c;63 Kbp for the NA24385 dataset and &#x223c;22 Kbp for the SI00001 dataset), minimap2 the lowest (&#x223c;49 Kbp for the NA24385 dataset and &#x223c;21 Kbp for the SI00001 dataset), and lra placed in-between (&#x223c;55 Kbp for the NA24385 dataset and &#x223c;21&#xa0;kbp for the SI00001 dataset). Among the tested aligners, minimap2 was the fastest (i.e.,&#x20;&#x223c;26&#xa0;min to align 100,000 reads, randomly sampled from the NA24385 dataset, using a single core on our SUSE Linux Enterprise Server&#x2014;average of three consecutive measurements) and NGMLR the slowest (&#x223c;190&#xa0;min, &#x223c;7&#x20;times slower than minimap2), with lra and pbmm2 performing similarly (&#x223c;30 and &#x223c;34&#xa0;min, respectively).</p>
<p>We used the generated NA24385 and SI00001 alignments to benchmark 5 SV calling methods, namely, Sniffles, SVIM, cuteSV, npInv, and pbsv. We tuned the settings of the different SV callers to report only variants &#x2265;50 bp, having at least 2 reads supporting the identified SVs (<xref ref-type="sec" rid="s10">Supplementary Table&#x20;S2</xref>).</p>
<p>While Sniffles, SVIM, cuteSV, and pbsv can detect all SV types, npInv is developed specifically to identify inversions. Because the recommended aligner for pbsv is pbmm2, neither was pbmm2 tested with other SV callers nor was pbsv with other long-read aligners. Furthermore, because pbmm2 wraps minimap2 but uses lower gap penalties for SV discovery, results from pbsv are reported after minimap2 alignment in all figures and tables of this article.</p>
<p>
<xref ref-type="sec" rid="s10">Supplementary Table S3</xref> summarizes by SV type the SVs identified by the different combinations of long-read aligners and SV callers in the NA24385 and SI00001 datasets, before and after filtering for high-quality SVs (i.e.,&#x20;SVs with the FILTER &#x201c;PASS&#x201d; that fall in assembled chromosomes only and are supported by &#x2265; 10 reads). The size distribution of the high-quality SVs is shown in <xref ref-type="sec" rid="s10">Supplementary Figure S2</xref> (NA24385) and Supplementary Figure S3 (SI00001). SVIM following minimap2 alignment detected more deletions (9,566) and insertions (12,818) than the other aligner&#x2013;SV caller combinations in the NA24385 dataset, pbsv more duplications (1941), and cuteSV more inversions (156) and translocation breakpoints (37), following NGMLR and minimap2 alignment, respectively. For the SI00001 dataset, cuteSV following minimap2 alignment detected more deletions (4,763) and insertions (4,320), SVIM after minimap2 more duplications (358), cuteSV after NGMLR more inversions (590), and pbsv more translocation breakpoints (i.e.,&#x20;BNDs,&#x20;39).</p>
<p>For each aligner, we also calculated the number of SVs overlapping between the high-quality SV callsets and the corresponding truth callset using SURIVOR (<xref ref-type="bibr" rid="B19">Jeffares et&#x20;al., 2017</xref>). SVs were considered to be shared among the callsets if their distance was &#x2264;500 bp, as measured pairwise between breakpoints, and their type was concordant. As shown in the resulting upset plots for minimap2 (<xref ref-type="sec" rid="s10">Supplementary Figure S4</xref>), the largest overlap is shared between the truth callset and the different SV callers (with the obvious exception of npInv) and mostly contains deletions (4,022 for the NA24385 dataset and 3,368 for SI00001) and insertions (4,054 for the NA24385 dataset and 3,101 for SI00001). Comparable results are obtained with the NGMLR (<xref ref-type="sec" rid="s10">Supplementary Figure S5</xref>) and lra (<xref ref-type="sec" rid="s10">Supplementary Figure S6</xref>) alignments.</p>
</sec>
<sec id="s3-3">
<title>3.3 Structural Variant Caller&#x2019;s Performances</title>
<p>We calculated precision, recall, and F-score (i.e.,&#x20;the harmonic mean of precision and recall) of the generated high-quality SV callsets using truvari (see Methods and <xref ref-type="sec" rid="s10">Supplementary Datasheet S2</xref>). <xref ref-type="fig" rid="F1">Figure&#x20;1</xref> and <xref ref-type="sec" rid="s10">Supplementary Table S4</xref> show these findings for the datasets tested. CuteSV following NGMLR alignment reached the highest F-score at SV calling and genotyping (&#x223c;0.93 and &#x223c;0.91, respectively) in the NA24385 dataset; for the SI00001 dataset, SVIM after minimap2 reached the highest F-score at SV calling (&#x223c;0.93) while cuteSV after NGMLR reached the highest F-score at SV genotyping (&#x223c;0.92). Overall, cuteSV, SVIM, and pbsv performed similarly well at SV calling, with F-score values &#x223c;0.90. With the obvious exception of npInv, which is specifically tailored to identify inversions, Sniffles achieved the lowest recall, especially in the SI00001 dataset after lra alignment.</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption>
<p>Precision (y axis), recall (x axis) and F-score (dashed lines) of the high-quality SV callsets from Sniffles, SVIM, cuteSV, npInv and pbsv (hue palette) after minimap2 (top panels), NGMLR (mid panels) and lra (bottom panels) alignments. Results for both SV calling (left panels) and genotyping (right panels) in the NA24385 (circle symbol) and SI00001 (triangle symbol) datasets are shown.</p>
</caption>
<graphic xlink:href="fgene-12-761791-g001.tif"/>
</fig>
<p>
<xref ref-type="sec" rid="s10">Supplementary Figure S7</xref> illustrates precision and recall of the high-quality SV callsets when resolved by SV type. CuteSV, SVIM, and pbsv compared favorably to Sniffles for the detection of deletions in both NA24385 and SI00001 datasets (F-score <inline-formula id="inf1">
<mml:math id="m1">
<mml:mo>&#x3e;</mml:mo>
</mml:math>
</inline-formula>0.90 vs. F-score <inline-formula id="inf2">
<mml:math id="m2">
<mml:mo>&#x3c;</mml:mo>
</mml:math>
</inline-formula>0.90), while for insertions, only cuteSV, after NGMLR in the NA24385 dataset and after NGMLR/minimap2 in SI0001, hit F-score<inline-formula id="inf3">
<mml:math id="m3">
<mml:mo>&#x3e;</mml:mo>
</mml:math>
</inline-formula>0.90. With respect to duplications, SVIM and cuteSV following NGMLR alignment outperformed the other combinations, and for inversions, SVIM after minmimap2, Sniffles after minimap2 and NGMLR, npInv after minimap2, and pbsv reached F-score <inline-formula id="inf4">
<mml:math id="m4">
<mml:mo>&#x3e;</mml:mo>
</mml:math>
</inline-formula>0.90. Last, pbsv and SVIM following minimap2 alignment had the highest F-score for the detection of translocations (F &#x223c;0.90). Notably, no high-quality duplications or translocations were reported by any SV callers when tested on alignments from&#x20;lra.</p>
<p>The number of true-positive, false-positive, and false-negative SV calls relative to their length is reported in <xref ref-type="sec" rid="s10">Supplementary Figure S8</xref>. The peaks in the NA24385 dataset at &#x223c;300&#xa0;bp and &#x223c;6,000&#xa0;bp correspond to SVs involving Alu and L1 elements, respectively (<xref ref-type="bibr" rid="B3">Audano et&#x20;al., 2019</xref>), while those in the SI00001 dataset at &#x223c;1,000&#xa0;bp and &#x223c;10,000&#xa0;bp correspond to the average size of simulated&#x20;SVs.</p>
<p>By randomly down-sampling the NA24385 and SI00001 original alignments to various fractions of the original datasets (i.e.,&#x20;5X, 10X, 15X, 20X, 25X, and 35X), we evaluated the influence of genome coverage on SV callers&#x2019; precision and recall. <xref ref-type="fig" rid="F2">Figure&#x20;2</xref>, <xref ref-type="sec" rid="s10">Supplementary Figure S9</xref>, and <xref ref-type="sec" rid="s10">Supplementary Table S5</xref> show these findings for the NA24385 and SI00001 datasets, respectively. While recall for both SV calling and genotyping increased significantly when moving on from low- (i.e.,&#x20;5X) to mid-range (i.e.,&#x20;15X-20X) coverage, this effect was less marked for higher depths of coverage and came at the cost of a reduction in precision for Sniffles (NA24385 dataset) and SVIM (NA24385 and SI00001 datasets). For low-coverage NA24385 data, CuteSV after NGMLR alignment hit the highest F-score (&#x223c;0.80 for SV calling and &#x223c;0.72 for genotyping) and Sniffles after lra the lowest (&#x223c;0.60 for SV calling and &#x223c;0.28 for SV genotyping). For low-coverage SI00001 data, cuteSV after NGMLR alignment hit the highest F-score at SV calling (&#x223c;0.70) and pbsv the lowest (&#x223c;0.43), while Sniffles after NGMLR reached the highest F-score at SV genotyping (&#x223c; 0.61) and SVIM after lra the lowest (&#x223c;&#x20;0.32).</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption>
<p>Precision (y axis), recall (x axis) and F-score (dashed lines) of the SV callers Sniffles (square symbol), SVIM (cross symbol), cuteSV (circle symbol) and pbsv (triangle symbol) after minimap2 (top panels), NGMLR (mid panels) and lra (bottom panels) alignments. Results for both SV calling (left panels) and genotyping (right panels) are reported. The plot shows the influence of average genome coverage after down-sampling NA24385 alignments to different fractions (5X, 10X, 15X, 20X, 25X, 35X&#x2013;hue palette) of the original coverage (total) on SV callers&#x0027; performances.</p>
</caption>
<graphic xlink:href="fgene-12-761791-g002.tif"/>
</fig>
<p>We furthermore examined how filtering on the minimum number of reads supporting the variant alleles affects precision and recall of SV callers. As expected, for all the combinations tested in the NA24385 (<xref ref-type="fig" rid="F3">Figure&#x20;3</xref>) and SI00001 (<xref ref-type="sec" rid="s10">Supplementary Figure S10</xref>) datasets (see also <xref ref-type="sec" rid="s10">Supplementary Table S6</xref>), the recall was the highest (and precision the lowest) when less support for a candidate SV (i.e.,&#x20;2) is used and decreased (while precision increased) when higher support was required (i.e.,&#x20;up to 50 supporting reads for the NA24385 dataset and up to 25 for SI00001). From our tests, a good trade-off between precision and recall was achieved when SVs were minimally supported by 5&#x2013;10 reads (see above for details on the F-score when filtering on a minimum number of 10 supporting reads).</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption>
<p>Precision (y axis), recall (x axis) and F-score (dashed lines) of the SV callers Sniffles (square symbol), SVIM (cross symbol), cuteSV (circle symbol) and pbsv (triangle symbol) after minimap2 (top panels), NGMLR (mid panels) and lra (bottom panels) alignments. Results for both SV calling (left panels) and genotyping (right panels) are reported. The plot shows the influence of the number of reads minimally supporting a SV (2, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50&#x2013;hue palette) on SV callers&#x0027; performances for the NA24385 dataset.</p>
</caption>
<graphic xlink:href="fgene-12-761791-g003.tif"/>
</fig>
<p>Last, we evaluated how much of the false positive rate from individual SV callsets we could reduce by integrating multiple SV callers for the same sample. For each aligner tested, we calculated precision and recall of all the combinations of the SV callers tested, as shown in <xref ref-type="fig" rid="F4">Figure&#x20;4</xref> for the NA24385 dataset and <xref ref-type="sec" rid="s10">Supplementary Figure S11</xref> for the SI00001 dataset. An additional consensus callset including SVs supported by at least 4 (minimap2) or 3 (NGMLR and lra) SV callers and at least 2&#x20;long-read aligners was also produced. Results are documented in <xref ref-type="sec" rid="s10">Supplementary Table S7</xref> as well. The different combinations of SV callsets were generated using SURVIVOR, following the strategy described in the previous section. For the NA24385 dataset, combining high-quality SV calls from cuteSV, Sniffles, and SVIM led to a &#x223c;2% increase in precision at both SV calling and &#x223c;3% at SV genotyping with respect to the corresponding highest precision values reached by Sniffles (&#x223c;0.96) and cuteSV (&#x223c;0.92), respectively, after NGMLR alignment. For SV calling, the consensus callset reached comparable precision (&#x223c;0.96) but improves on recall (&#x223c;0.89) with respect to the other combinations tested. For the SI0001 dataset, combining multiple SV callsets did not show significant improvement over precision of single SV callsets. For instance, most of the combinations tested hit &#x223c;1.00 precision at SV calling, but Sniffles alone after NGMLR reached precision <inline-formula id="inf5">
<mml:math id="m5">
<mml:mo>&#x3e;</mml:mo>
</mml:math>
</inline-formula>0.99.</p>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption>
<p>Precision (y axis), recall (x axis) and F-score (dashed lines) of the combination of the SV callers Sniffles, SVIM, cuteSV and pbsv (hue palette) after minimap2, NGMLR and lra alignments as well as after consensus generation (top-to-bottom panels). Results for both SV calling (left panels) and genotyping (right panels) are reported. The plot shows the influence of the integration of multiple high-quality callsets on reducing false positive calls in the NA24385 dataset.</p>
</caption>
<graphic xlink:href="fgene-12-761791-g004.tif"/>
</fig>
</sec>
</sec>
<sec id="s4">
<title>4 Discussion</title>
<p>While short-read sequencing has been considered the gold standard for the majority of sequencing projects for years (<xref ref-type="bibr" rid="B31">Roberts et&#x20;al., 2021</xref>), such data have biases in whole-genome sequencing studies due to the uneven coverage of regions with high/low GC and difficulty of mapping short reads in low-complexity regions. Long-read sequencing has already proved invaluable in overcoming these limitations, improving on short reads for the resolution of SVs in comparative and clinical studies (<xref ref-type="bibr" rid="B34">Sanchis-Juan et&#x20;al., 2018</xref>).</p>
<p>In this article, we provided a succinct yet comprehensive evaluation of long-read SV calling pipelines applied to ONT data. In particular, we focused on germline SVs, and as such, our findings are likely not reproducible in different contexts, such as somatic variant calling, for which alternative strategies exist (<xref ref-type="bibr" rid="B38">Shiraishi et&#x20;al., 2020</xref>).</p>
<p>We tested four general-purpose SV callers (Sniffles, SVIM, cuteSV, and pbsv) and a tool tailored specifically to inversions (npInv) across four long-read aligners (minimap2, NGMLR, lra, and pbmm2) using both real and simulated ONT data. In particular, we used the ultra-long ONT reads released by the GIAB consortium for the NA24385 Ashkenazim individual, for which a truth set of deletions and insertions based on the integration of multiple technologies is available, and synthetic long reads generated using the SV simulator VISOR (SI00001) to complement SVs missing in the real dataset (inversions, duplications, and translocations). Also, although the NA24385 truth set from GIAB is assumed to be sufficiently complete, which is supported by the fact that the majority of the SVs identified by the different SV calling pipelines is shared with the ground truth, a consistent number of deletions and insertions identified by multiple SV callers are absent from the truth callset, suggesting that at least part of them could have been missed in the ground&#x20;truth.</p>
<p>We first calculated the precision, recall, and F-score of the different SV calling pipelines after filtering for high-quality variants (&#x201c;PASS&#x201d; SVs not falling in decoy contigs and supported by at least 10 reads&#x2014;which is the default for SV callers like Sniffles and cuteSV) and evaluated the impact of each SV type and various SV sizes on the SV callers&#x2019; performances. In accordance with prior evaluations (<xref ref-type="bibr" rid="B10">De Coster et&#x20;al., 2019</xref>; <xref ref-type="bibr" rid="B44">Zhou et&#x20;al., 2019</xref>), we observed the highest precision at SV calling with Sniffles following NGMLR alignment in both the NA24385 (&#x223c;0.96) and SI00001 (&#x223c;0.99) datasets but at a cost of low recall, with most of the false-negative SVs being shorter than 500&#x20;bp in the real dataset. However, cuteSV, SVIM, and pbsv all performed better than Sniffles in terms of the F-score across the different aligners, and Sniffles also hit the lowest F-score values at SV genotyping in both datasets. When taking into account the individual SV types, cuteSV, SVIM, and pbsv had the best performances for the detection of deletions, and cuteSV after NGMLR (NA24385 and SI0001) and minimap2 (SI00001) hit the highest F-score for the detection of insertions and duplications together with SVIM after NGMLR. For inversions, SVIM (after minimap2), Sniffles, npInv (after minimap2 or NGMLR), and pbsv all hit an F-score of &#x223c; 0.9 or higher, while SVIM after minimap2 and pbsv performed better than the other aligner&#x2013;SV caller combinations for the detection of translocation breakpoints. Notably, none of the SV callers tested were able to identify high-quality duplications or translocations after lra alignment, especially in the SI00001 dataset where they are known to occur. Manual investigation of the variant files generated by the different SV callers before filtering revealed that most duplications and translocations had few supporting reads (<inline-formula id="inf6">
<mml:math id="m6">
<mml:mo>&#x3c;</mml:mo>
</mml:math>
</inline-formula>3 in most cases) and were not flagged as &#x201c;PASS.&#x201d; As a consequence, being more permissive with the filters used could improve on the detection of these SV categories in datasets aligned with&#x20;lra.</p>
<p>When further evaluating the influence of genome coverage on the SV caller&#x2019;s performances, we concluded that adding more than 15X&#x2013;20X produced only little increment in sensitivity but was associated with a marked decrease in precision for Sniffles and SVIM. On the other hand, a slight increase in precision could be reached by combining multiple callsets and the consensus from cuteSV, Sniffles, and SVIM after NGMLR hit the highest precision values in our tests. Last, we highlighted that having at least 5&#x2013;10 reads supporting a called SV represents a good trade-off to optimize precision and recall.</p>
<p>Given the results presented in this article, we recommend using cuteSV for the initial assessment of the data as it achieves substantial precision and recall at both SV calling and genotyping, even when analyzing low-coverage data. Because the choice of the pipeline could depend on the need of the users to retrieve SV calls with either high precision or high recall, we conclude that Sniffles should be preferred when looking for high precision, while cuteSV or SVIM should be preferred when high recall is required. However, due to low F-score values at SV genotyping, we do not recommend using Sniffles for an accurate estimation of the zygosity of SV calls, as it often misclassified or missed heterozygous variants in our tests. Combining SV calls from cuteSV, Sniffles, and SVIM can be helpful in further reducing the final false positive&#x20;rate.</p>
</sec>
</body>
<back>
<sec id="s5">
<title>Data Availability Statement</title>
<p>The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/<xref ref-type="sec" rid="s10">Supplementary Material</xref>.</p>
</sec>
<sec id="s6">
<title>Author Contributions</title>
<p>DB conceived the workflow, benchmarked the pipelines, and wrote the manuscript draft. AM contributed to the interpretation of the results, provided critical feedback, and helped to write the manuscript. All the authors read and approved the manuscript.</p>
</sec>
<sec id="s7">
<title>Funding</title>
<p>DB is supported by the Italian Ministry of Health grant SG-2019&#x2013;12369962. AM is supported by AIRC grant 20307. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.</p>
</sec>
<sec sec-type="COI-statement" id="s8">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s9">
<title>Publisher&#x2019;s Note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors, and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<ack>
<p>The authors thank the GIAB consortium for data access.</p>
</ack>
<sec id="s10">
<title>Supplementary Material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="https://www.frontiersin.org/articles/10.3389/fgene.2021.761791/full#supplementary-material">https://www.frontiersin.org/articles/10.3389/fgene.2021.761791/full&#x23;supplementary-material</ext-link>.</p>
<supplementary-material xlink:href="DataSheet1.PDF" id="SM1" mimetype="application/PDF" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Aganezov</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Goodwin</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Sherman</surname>
<given-names>R. M.</given-names>
</name>
<name>
<surname>Sedlazeck</surname>
<given-names>F. J.</given-names>
</name>
<name>
<surname>Arun</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Bhatia</surname>
<given-names>S.</given-names>
</name>
<etal/>
</person-group> (<year>2020</year>). <article-title>Comprehensive Analysis of Structural Variants in Breast Cancer Genomes Using Single-Molecule Sequencing</article-title>. <source>Genome Res.</source> <volume>30</volume>, <fpage>1258</fpage>&#x2013;<lpage>1273</lpage>. <pub-id pub-id-type="doi">10.1101/gr.260497.119</pub-id> </citation>
</ref>
<ref id="B2">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Alkan</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Coe</surname>
<given-names>B. P.</given-names>
</name>
<name>
<surname>Eichler</surname>
<given-names>E. E.</given-names>
</name>
</person-group> (<year>2011</year>). <article-title>Genome Structural Variation Discovery and Genotyping</article-title>. <source>Nat. Rev. Genet.</source> <volume>12</volume>, <fpage>363</fpage>&#x2013;<lpage>376</lpage>. <pub-id pub-id-type="doi">10.1038/nrg2958</pub-id> </citation>
</ref>
<ref id="B3">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Audano</surname>
<given-names>P. A.</given-names>
</name>
<name>
<surname>Sulovari</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Graves-Lindsay</surname>
<given-names>T. A.</given-names>
</name>
<name>
<surname>Cantsilieris</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Sorensen</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Welch</surname>
<given-names>A. E.</given-names>
</name>
<etal/>
</person-group> (<year>2019</year>). <article-title>Characterizing the Major Structural Variant Alleles of the Human Genome</article-title>. <source>Cell</source> <volume>176</volume>, <fpage>663</fpage>&#x2013;<lpage>675</lpage>. <pub-id pub-id-type="doi">10.1016/j.cell.2018.12.019</pub-id> </citation>
</ref>
<ref id="B4">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Beyter</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Ingimundardottir</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Oddsson</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Eggertsson</surname>
<given-names>H. P.</given-names>
</name>
<name>
<surname>Bjornsson</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Jonsson</surname>
<given-names>H.</given-names>
</name>
<etal/>
</person-group> (<year>2020</year>). <source>Long Read Sequencing of 3,622 Icelanders Provides Insight into the Role of Structural Variants in Human Diseases and Other Traits</source>. <publisher-name>Cold Spring Harbor Laboratory</publisher-name>. <comment>bioRxiv Available at: <ext-link ext-link-type="uri" xlink:href="https://www.biorxiv.org/content/early/2020/12/14/848366.full.pdf">https://www.biorxiv.org/content/early/2020/12/14/848366.full.pdf</ext-link>
</comment>. </citation>
</ref>
<ref id="B5">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bolognini</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Magi</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Benes</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Korbel</surname>
<given-names>J.&#x20;O.</given-names>
</name>
<name>
<surname>Rausch</surname>
<given-names>T.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>TRiCoLOR: Tandem Repeat Profiling Using Whole-Genome Long-Read Sequencing Data</article-title>. <source>GigaScience</source> <volume>9</volume>, <fpage>giaa101</fpage>. <pub-id pub-id-type="doi">10.1093/gigascience/giaa101</pub-id> </citation>
</ref>
<ref id="B6">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bolognini</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Sanders</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Korbel</surname>
<given-names>J.&#x20;O.</given-names>
</name>
<name>
<surname>Magi</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Benes</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Rausch</surname>
<given-names>T.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>VISOR: a Versatile Haplotype-Aware Structural Variant Simulator for Short- and Long-Read Sequencing</article-title>. <source>Bioinformatics</source> <volume>36</volume>, <fpage>1267</fpage>&#x2013;<lpage>1269</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btz719</pub-id> </citation>
</ref>
<ref id="B7">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chaisson</surname>
<given-names>M. J.&#x20;P.</given-names>
</name>
<name>
<surname>Huddleston</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Dennis</surname>
<given-names>M. Y.</given-names>
</name>
<name>
<surname>Sudmant</surname>
<given-names>P. H.</given-names>
</name>
<name>
<surname>Malig</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Hormozdiari</surname>
<given-names>F.</given-names>
</name>
<etal/>
</person-group> (<year>2015</year>). <article-title>Resolving the Complexity of the Human Genome Using Single-Molecule Sequencing</article-title>. <source>Nature</source> <volume>517</volume>, <fpage>608</fpage>&#x2013;<lpage>611</lpage>. <pub-id pub-id-type="doi">10.1038/nature13907</pub-id> </citation>
</ref>
<ref id="B8">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chaisson</surname>
<given-names>M. J.&#x20;P.</given-names>
</name>
<name>
<surname>Sanders</surname>
<given-names>A. D.</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Malhotra</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Porubsky</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Rausch</surname>
<given-names>T.</given-names>
</name>
<etal/>
</person-group> (<year>2019</year>). <article-title>Multi-platform Discovery of Haplotype-Resolved Structural Variation in Human Genomes</article-title>. <source>Nat. Commun.</source> <volume>10</volume>, <fpage>1784</fpage>. <pub-id pub-id-type="doi">10.1038/s41467-018-08148-z</pub-id> </citation>
</ref>
<ref id="B9">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cretu Stancu</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>van Roosmalen</surname>
<given-names>M. J.</given-names>
</name>
<name>
<surname>Renkens</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Nieboer</surname>
<given-names>M. M.</given-names>
</name>
<name>
<surname>Middelkamp</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>de Ligt</surname>
<given-names>J.</given-names>
</name>
<etal/>
</person-group> (<year>2017</year>). <article-title>Mapping and Phasing of Structural Variation in Patient Genomes Using Nanopore Sequencing</article-title>. <source>Nat. Commun.</source> <volume>8</volume>, <fpage>1326</fpage>. <pub-id pub-id-type="doi">10.1038/s41467-017-01343-4</pub-id> </citation>
</ref>
<ref id="B10">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>De Coster</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>De Rijk</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>De Roeck</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>De Pooter</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>D&#x2019;Hert</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Strazisar</surname>
<given-names>M.</given-names>
</name>
<etal/>
</person-group> (<year>2019</year>). <article-title>Structural Variants Identified by oxford Nanopore Promethion Sequencing of the Human Genome</article-title>. <source>Genome Res.</source> <fpage>29</fpage>, <fpage>1178</fpage>. <pub-id pub-id-type="doi">10.1101/gr.244939.118</pub-id> </citation>
</ref>
<ref id="B11">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>De Coster</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>D&#x2019;Hert</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Schultz</surname>
<given-names>D. T.</given-names>
</name>
<name>
<surname>Cruts</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Van Broeckhoven</surname>
<given-names>C.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>NanoPack: Visualizing and Processing Long-Read Sequencing Data</article-title>. <source>Bioinformatics</source> <volume>34</volume>, <fpage>2666</fpage>&#x2013;<lpage>2669</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/bty149</pub-id> </citation>
</ref>
<ref id="B12">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>De Coster</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Weissensteiner</surname>
<given-names>M. H.</given-names>
</name>
<name>
<surname>Sedlazeck</surname>
<given-names>F. J.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Towards Population-Scale Long-Read Sequencing</article-title>. <source>Nat. Rev. Genet.</source> <volume>22</volume>, <fpage>527</fpage>. <pub-id pub-id-type="doi">10.1038/s41576-021-00367-3</pub-id> </citation>
</ref>
<ref id="B13">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Deamer</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Akeson</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Branton</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>Three Decades of Nanopore Sequencing</article-title>. <source>Nat. Biotechnol.</source> <volume>34</volume>, <fpage>518</fpage>&#x2013;<lpage>524</lpage>. <pub-id pub-id-type="doi">10.1038/nbt.3423</pub-id> </citation>
</ref>
<ref id="B14">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gong</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Hayes</surname>
<given-names>V. M.</given-names>
</name>
<name>
<surname>Chan</surname>
<given-names>E. K. F.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Detection of Somatic Structural Variants from Short-Read Next-Generation Sequencing Data</article-title>. <source>Brief. Bioinform.</source> <volume>22</volume>, <fpage>bbaa056</fpage>. <pub-id pub-id-type="doi">10.1093/bib/bbaa056</pub-id> </citation>
</ref>
<ref id="B15">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Heller</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Vingron</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>SVIM: Structural Variant Identification Using Mapped Long Reads</article-title>. <source>Bioinformatics</source> <volume>35</volume>, <fpage>2907</fpage>&#x2013;<lpage>2915</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btz041</pub-id> </citation>
</ref>
<ref id="B16">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ho</surname>
<given-names>S. S.</given-names>
</name>
<name>
<surname>Urban</surname>
<given-names>A. E.</given-names>
</name>
<name>
<surname>Mills</surname>
<given-names>R. E.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Structural Variation in the Sequencing Era</article-title>. <source>Nat. Rev. Genet.</source> <volume>21</volume>, <fpage>171</fpage>&#x2013;<lpage>189</lpage>. <pub-id pub-id-type="doi">10.1038/s41576-019-0180-9</pub-id> </citation>
</ref>
<ref id="B17">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jain</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Koren</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Miga</surname>
<given-names>K. H.</given-names>
</name>
<name>
<surname>Quick</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Rand</surname>
<given-names>A. C.</given-names>
</name>
<name>
<surname>Sasani</surname>
<given-names>T. A.</given-names>
</name>
<etal/>
</person-group> (<year>2018</year>). <article-title>Nanopore Sequencing and Assembly of a Human Genome with Ultra-long Reads</article-title>. <source>Nat. Biotechnol.</source> <volume>36</volume>, <fpage>338</fpage>&#x2013;<lpage>345</lpage>. <pub-id pub-id-type="doi">10.1038/nbt.4060</pub-id> </citation>
</ref>
<ref id="B18">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jain</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Olsen</surname>
<given-names>H. E.</given-names>
</name>
<name>
<surname>Paten</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Akeson</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2016</year>). <article-title>The Oxford Nanopore MinION: Delivery of Nanopore Sequencing to the Genomics Community</article-title>. <source>Genome Biol.</source> <volume>17</volume>, <fpage>239</fpage>. <pub-id pub-id-type="doi">10.1186/s13059-016-1103-0</pub-id> </citation>
</ref>
<ref id="B19">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jeffares</surname>
<given-names>D. C.</given-names>
</name>
<name>
<surname>Jolly</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Hoti</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Speed</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Shaw</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Rallis</surname>
<given-names>C.</given-names>
</name>
<etal/>
</person-group> (<year>2017</year>). <article-title>Transient Structural Variations Have strong Effects on Quantitative Traits and Reproductive Isolation in Fission Yeast</article-title>. <source>Nat. Commun.</source> <volume>8</volume>, <fpage>14061</fpage>. <pub-id pub-id-type="doi">10.1038/ncomms14061</pub-id> </citation>
</ref>
<ref id="B20">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jiang</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Jiang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Gao</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Cui</surname>
<given-names>Z.</given-names>
</name>
<etal/>
</person-group> (<year>2020</year>). <article-title>Long-read-based Human Genomic Structural Variation Detection with cuteSV</article-title>. <source>Genome Biol.</source> <volume>21</volume>, <fpage>189</fpage>. <pub-id pub-id-type="doi">10.1186/s13059-020-02107-y</pub-id> </citation>
</ref>
<ref id="B21">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>K&#xf6;ster</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Rahmann</surname>
<given-names>S.</given-names>
</name>
</person-group> (<year>2012</year>). <article-title>Snakemake&#x2014;a Scalable Bioinformatics Workflow Engine</article-title>. <source>Bioinformatics</source> <volume>28</volume>, <fpage>2520</fpage>&#x2013;<lpage>2522</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/bts480</pub-id> </citation>
</ref>
<ref id="B22">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>H.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Minimap2: Pairwise Alignment for Nucleotide Sequences</article-title>. <source>Bioinformatics</source> <volume>34</volume>, <fpage>3094</fpage>&#x2013;<lpage>3100</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/bty191</pub-id> </citation>
</ref>
<ref id="B23">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Freudenberg</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>Mappability and Read Length</article-title>. <source>Front. Genet.</source> <volume>5</volume>, <fpage>381</fpage>. <pub-id pub-id-type="doi">10.3389/fgene.2014.00381</pub-id> </citation>
</ref>
<ref id="B24">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Roberts</surname>
<given-names>N. D.</given-names>
</name>
<name>
<surname>Wala</surname>
<given-names>J.&#x20;A.</given-names>
</name>
<name>
<surname>Shapira</surname>
<given-names>O.</given-names>
</name>
<name>
<surname>Schumacher</surname>
<given-names>S. E.</given-names>
</name>
<name>
<surname>Kumar</surname>
<given-names>K.</given-names>
</name>
<etal/>
</person-group> (<year>2020</year>). <article-title>Patterns of Somatic Structural Variation in Human Cancer Genomes</article-title>. <source>Nature</source> <volume>578</volume>, <fpage>112</fpage>&#x2013;<lpage>121</lpage>. <pub-id pub-id-type="doi">10.1038/s41586-019-1913-9</pub-id> </citation>
</ref>
<ref id="B25">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Magi</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Bolognini</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Bartalucci</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Mingrino</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Semeraro</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Giovannini</surname>
<given-names>L.</given-names>
</name>
<etal/>
</person-group> (<year>2019</year>). <article-title>Nano-GLADIATOR: Real-Time Detection of Copy Number Alterations from Nanopore Sequencing Data</article-title>. <source>Bioinformatics</source> <volume>35</volume>, <fpage>4213</fpage>&#x2013;<lpage>4221</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btz241</pub-id> </citation>
</ref>
<ref id="B26">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mantere</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Kersten</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Hoischen</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Long-Read Sequencing Emerging in Medical Genetics</article-title>. <source>Front. Genet.</source> <volume>10</volume>, <fpage>426</fpage>. <pub-id pub-id-type="doi">10.3389/fgene.2019.00426</pub-id> </citation>
</ref>
<ref id="B27">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mills</surname>
<given-names>R. E.</given-names>
</name>
<name>
<surname>Walter</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Stewart</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Handsaker</surname>
<given-names>R. E.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Alkan</surname>
<given-names>C.</given-names>
</name>
<etal/>
</person-group> (<year>2011</year>). <article-title>Mapping Copy Number Variation by Population-Scale Genome Sequencing</article-title>. <source>Nature</source> <volume>470</volume>, <fpage>59</fpage>&#x2013;<lpage>65</lpage>. <pub-id pub-id-type="doi">10.1038/nature09708</pub-id> </citation>
</ref>
<ref id="B28">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pedersen</surname>
<given-names>B. S.</given-names>
</name>
<name>
<surname>Quinlan</surname>
<given-names>A. R.</given-names>
</name>
</person-group> (<year>2017</year>). <article-title>Mosdepth: Quick Coverage Calculation for Genomes and Exomes</article-title>. <source>Bioinformatics</source> <volume>34</volume>, <fpage>867</fpage>&#x2013;<lpage>868</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btx699</pub-id> </citation>
</ref>
<ref id="B29">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pytte</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Flynn</surname>
<given-names>L. L.</given-names>
</name>
<name>
<surname>Anderton</surname>
<given-names>R. S.</given-names>
</name>
<name>
<surname>Mastaglia</surname>
<given-names>F. L.</given-names>
</name>
<name>
<surname>Theunissen</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>James</surname>
<given-names>I.</given-names>
</name>
<etal/>
</person-group> (<year>2020</year>). <article-title>Disease-modifying Effects of an SCAF4 Structural Variant in a Predominantly SOD1 ALS Cohort</article-title>. <source>Neurol. Genet.</source> <volume>6</volume>, <fpage>e470</fpage>. <pub-id pub-id-type="doi">10.1212/NXG.0000000000000470</pub-id> </citation>
</ref>
<ref id="B30">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ren</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Chaisson</surname>
<given-names>M. J.&#x20;P.</given-names>
</name>
</person-group> (<year>2021</year>). <article-title>Lra: A Long Read Aligner for Sequences and Contigs</article-title>. <source>PLOS Comput. Biol.</source> <volume>17</volume>, <fpage>1</fpage>&#x2013;<lpage>23</lpage>. <pub-id pub-id-type="doi">10.1371/journal.pcbi.1009078</pub-id> </citation>
</ref>
<ref id="B31">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Roberts</surname>
<given-names>H. E.</given-names>
</name>
<name>
<surname>Lopopolo</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Pagnamenta</surname>
<given-names>A. T.</given-names>
</name>
<name>
<surname>Sharma</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Parkes</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Lonie</surname>
<given-names>L.</given-names>
</name>
<etal/>
</person-group> (<year>2021</year>). <article-title>Short and Long-Read Genome Sequencing Methodologies for Somatic Variant Detection; Genomic Analysis of a Patient with Diffuse Large B-Cell Lymphoma</article-title>. <source>Scientific Rep.</source> <volume>11</volume>, <fpage>6408</fpage>. <pub-id pub-id-type="doi">10.1038/s41598-021-85354-8</pub-id> </citation>
</ref>
<ref id="B32">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rovelet-Lecrux</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Hannequin</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Raux</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Meur</surname>
<given-names>N. L.</given-names>
</name>
<name>
<surname>Laquerri&#xe8;re</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Vital</surname>
<given-names>A.</given-names>
</name>
<etal/>
</person-group> (<year>2006</year>). <article-title>APP Locus Duplication Causes Autosomal Dominant Early-Onset Alzheimer Disease with Cerebral Amyloid Angiopathy</article-title>. <source>Nat. Genet.</source> <volume>38</volume>, <fpage>24</fpage>&#x2013;<lpage>26</lpage>. <pub-id pub-id-type="doi">10.1038/ng1718</pub-id> </citation>
</ref>
<ref id="B33">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sachidanandam</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Weissman</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Schmidt</surname>
<given-names>S. C.</given-names>
</name>
<name>
<surname>Kakol</surname>
<given-names>J.&#x20;M.</given-names>
</name>
<name>
<surname>Stein</surname>
<given-names>L. D.</given-names>
</name>
<name>
<surname>Marth</surname>
<given-names>G.</given-names>
</name>
<etal/>
</person-group> (<year>2001</year>). <article-title>A Map of Human Genome Sequence Variation Containing 1.42 Million Single Nucleotide Polymorphisms</article-title>. <source>Nature</source> <volume>409</volume>, <fpage>928</fpage>&#x2013;<lpage>933</lpage>. <pub-id pub-id-type="doi">10.1038/35057149</pub-id> </citation>
</ref>
<ref id="B34">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sanchis-Juan</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Stephens</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>French</surname>
<given-names>C. E.</given-names>
</name>
<name>
<surname>Gleadall</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>M&#xe9;gy</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Penkett</surname>
<given-names>C.</given-names>
</name>
<etal/>
</person-group> (<year>2018</year>). <article-title>Complex Structural Variants in Mendelian Disorders: Identification and Breakpoint Resolution Using Short- and Long-Read Genome Sequencing</article-title>. <source>Genome Med.</source> <volume>10</volume>, <fpage>95</fpage>. <pub-id pub-id-type="doi">10.1186/s13073-018-0606-6</pub-id> </citation>
</ref>
<ref id="B35">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sedlazeck</surname>
<given-names>F. J.</given-names>
</name>
<name>
<surname>Rescheneder</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Smolka</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Fang</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Nattestad</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>von Haeseler</surname>
<given-names>A.</given-names>
</name>
<etal/>
</person-group> (<year>2018</year>). <article-title>Accurate Detection of Complex Structural Variations Using Single-Molecule Sequencing</article-title>. <source>Nat. Methods</source> <volume>15</volume>, <fpage>461</fpage>&#x2013;<lpage>468</lpage>. <pub-id pub-id-type="doi">10.1038/s41592-018-0001-7</pub-id> </citation>
</ref>
<ref id="B36">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Shafin</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Pesout</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Lorig-Roach</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Haukness</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Olsen</surname>
<given-names>H. E.</given-names>
</name>
<name>
<surname>Bosworth</surname>
<given-names>C.</given-names>
</name>
<etal/>
</person-group> (<year>2019</year>). <source>Efficient de novo assembly of eleven human genomes using PromethION sequencing and a novel nanopore toolkit</source>. <publisher-name>Cold Spring Harbor Laboratory</publisher-name>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="https://www.biorxiv.org/content/early/2019/07/26/715722.full.pdf">https://www.biorxiv.org/content/early/2019/07/26/715722.full.pdf</ext-link>
</comment>. </citation>
</ref>
<ref id="B37">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Shao</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Ganesamoorthy</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Duarte</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Cao</surname>
<given-names>M. D.</given-names>
</name>
<name>
<surname>Hoggart</surname>
<given-names>C. J.</given-names>
</name>
<name>
<surname>Coin</surname>
<given-names>L. J.&#x20;M.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>npInv: Accurate Detection and Genotyping of Inversions Using Long Read Sub-alignment</article-title>. <source>BMC Bioinformatics</source> <volume>19</volume>, <fpage>261</fpage>. <pub-id pub-id-type="doi">10.1186/s12859-018-2252-9</pub-id> </citation>
</ref>
<ref id="B38">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Shiraishi</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Koya</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Chiba</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Saito</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Okada</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Kataoka</surname>
<given-names>K.</given-names>
</name>
</person-group> (<year>2020</year>). <source>Precise Characterization of Somatic Structural Variations and mobile Element Insertions from Paired Long-Read Sequencing Data with Nanomonsv</source>. <publisher-name>Cold Spring Harbor Laboratory</publisher-name>. <comment>bioRxiv Available at: <ext-link ext-link-type="uri" xlink:href="https://www.biorxiv.org/content/early/2020/07/23/2020.07.22.214262.full.pdf">https://www.biorxiv.org/content/early/2020/07/23/2020.07.22.214262.full.pdf</ext-link>
</comment>. </citation>
</ref>
<ref id="B39">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sudmant</surname>
<given-names>P. H.</given-names>
</name>
<name>
<surname>Rausch</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Gardner</surname>
<given-names>E. J.</given-names>
</name>
<name>
<surname>Handsaker</surname>
<given-names>R. E.</given-names>
</name>
<name>
<surname>Abyzov</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Huddleston</surname>
<given-names>J.</given-names>
</name>
<etal/>
</person-group> (<year>2015</year>). <article-title>An Integrated Map of Structural Variation in 2,504 Human Genomes</article-title>. <source>Nature</source> <volume>526</volume>, <fpage>75</fpage>&#x2013;<lpage>81</lpage>. <pub-id pub-id-type="doi">10.1038/nature15394</pub-id> </citation>
</ref>
<ref id="B40">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Treangen</surname>
<given-names>T. J.</given-names>
</name>
<name>
<surname>Salzberg</surname>
<given-names>S. L.</given-names>
</name>
</person-group> (<year>2011</year>). <article-title>Repetitive DNA and Next-Generation Sequencing: Computational Challenges and Solutions</article-title>. <source>Nat. Rev. Genet.</source> <volume>13</volume>, <fpage>36</fpage>&#x2013;<lpage>46</lpage>. <pub-id pub-id-type="doi">10.1038/nrg3117</pub-id> </citation>
</ref>
<ref id="B41">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Walters</surname>
<given-names>R. G.</given-names>
</name>
<name>
<surname>Coin</surname>
<given-names>L. J.&#x20;M.</given-names>
</name>
<name>
<surname>Ruokonen</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>de Smith</surname>
<given-names>A. J.</given-names>
</name>
<name>
<surname>Moustafa</surname>
<given-names>J.&#x20;S. E.-S.</given-names>
</name>
<name>
<surname>Jacquemont</surname>
<given-names>S.</given-names>
</name>
<etal/>
</person-group> (<year>2013</year>). <article-title>Rare Genomic Structural Variants in Complex Disease: Lessons from the Replication of Associations with Obesity</article-title>. <source>PLOS ONE</source> <volume>8</volume>, <fpage>e58048</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0058048</pub-id> </citation>
</ref>
<ref id="B42">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Weischenfeldt</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Symmons</surname>
<given-names>O.</given-names>
</name>
<name>
<surname>Spitz</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Korbel</surname>
<given-names>J.&#x20;O.</given-names>
</name>
</person-group> (<year>2013</year>). <article-title>Phenotypic Impact of Genomic Structural Variation: Insights from and for Human Disease</article-title>. <source>Nat. Rev. Genet.</source> <volume>14</volume>, <fpage>125</fpage>&#x2013;<lpage>138</lpage>. <pub-id pub-id-type="doi">10.1038/nrg3373</pub-id> </citation>
</ref>
<ref id="B43">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Wu</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Jiang</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Xie</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>J.</given-names>
</name>
<etal/>
</person-group> (<year>2021</year>). <source>Structural Variants in Chinese Population and Their Impact on Phenotypes, Diseases and Population Adaptation</source>. <publisher-name>Cold Spring Harbor Laboratory</publisher-name>. <comment>Available at: <ext-link ext-link-type="uri" xlink:href="https://www.biorxiv.org/content/early/2021/02/10/2021.02.09.430378.full.pdf">https://www.biorxiv.org/content/early/2021/02/10/2021.02.09.430378.full.pdf</ext-link>
</comment>. </citation>
</ref>
<ref id="B44">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhou</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Xing</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Evaluating Nanopore Sequencing Data Processing Pipelines for Structural Variation Identification</article-title>. <source>Genome Biol.</source> <volume>20</volume>, <fpage>237</fpage>. <pub-id pub-id-type="doi">10.1186/s13059-019-1858-1</pub-id> </citation>
</ref>
<ref id="B45">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zook</surname>
<given-names>J.&#x20;M.</given-names>
</name>
<name>
<surname>Hansen</surname>
<given-names>N. F.</given-names>
</name>
<name>
<surname>Olson</surname>
<given-names>N. D.</given-names>
</name>
<name>
<surname>Chapman</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Mullikin</surname>
<given-names>J.&#x20;C.</given-names>
</name>
<name>
<surname>Xiao</surname>
<given-names>C.</given-names>
</name>
<etal/>
</person-group> (<year>2020</year>). <article-title>A Robust Benchmark for Detection of Germline Large Deletions and Insertions</article-title>. <source>Nat. Biotechnol.</source> <volume>38</volume>, <fpage>1347</fpage>&#x2013;<lpage>1355</lpage>. <pub-id pub-id-type="doi">10.1038/s41587-020-0538-8</pub-id> </citation>
</ref>
<ref id="B46">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zou</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>L.-X.</given-names>
</name>
<name>
<surname>Tan</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Shang</surname>
<given-names>F.-F.</given-names>
</name>
<name>
<surname>Zhou</surname>
<given-names>H.-H.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Significance of Single-Nucleotide Variants in Long Intergenic Non-protein Coding RNAs</article-title>. <source>Front. Cel Develop. Biol.</source> <volume>8</volume>, <fpage>347</fpage>. <pub-id pub-id-type="doi">10.3389/fcell.2020.00347</pub-id> </citation>
</ref>
</ref-list>
</back>
</article>