Long-Read Sequencing to Unravel Complex Structural Variants of CEP78 Leading to Cone-Rod Dystrophy and Hearing Loss

Inactivating variants as well as a missense variant in the centrosomal CEP78 gene have been identified in autosomal recessive cone-rod dystrophy with hearing loss (CRDHL), a rare syndromic inherited retinal disease distinct from Usher syndrome. Apart from this, a complex structural variant (SV) implicating CEP78 has been reported in CRDHL. Here we aimed to expand the genetic architecture of typical CRDHL by the identification of complex SVs of the CEP78 region and characterization of their underlying mechanisms. Approaches used for the identification of the SVs are shallow whole-genome sequencing (sWGS) combined with quantitative polymerase chain reaction (PCR) and long-range PCR, or ExomeDepth analysis on whole-exome sequencing (WES) data. Targeted or whole-genome nanopore long-read sequencing (LRS) was used to delineate breakpoint junctions at the nucleotide level. For all SVs cases, the effect of the SVs on CEP78 expression was assessed using quantitative PCR on patient-derived RNA. Apart from two novel canonical CEP78 splice variants and a frameshifting single-nucleotide variant (SNV), two SVs affecting CEP78 were identified in three unrelated individuals with CRDHL: a heterozygous total gene deletion of 235 kb and a partial gene deletion of 15 kb in a heterozygous and homozygous state, respectively. Assessment of the molecular consequences of the SVs on patient’s materials displayed a loss-of-function effect. Delineation and characterization of the 15-kb deletion using targeted LRS revealed the previously described complex CEP78 SV, suggestive of a recurrent genomic rearrangement. A founder haplotype was demonstrated for the latter SV in cases of Belgian and British origin, respectively. The novel 235-kb deletion was delineated using whole-genome LRS. Breakpoint analysis showed microhomology and pointed to a replication-based underlying mechanism. Moreover, data mining of bulk and single-cell human and mouse transcriptional datasets, together with CEP78 immunostaining on human retina, linked the CEP78 expression domain with its phenotypic manifestations. Overall, this study supports that the CEP78 locus is prone to distinct SVs and that SV analysis should be considered in a genetic workup of CRDHL. Finally, it demonstrated the power of sWGS and both targeted and whole-genome LRS in identifying and characterizing complex SVs in patients with ocular diseases.

Inactivating variants as well as a missense variant in the centrosomal CEP78 gene have been identified in autosomal recessive cone-rod dystrophy with hearing loss (CRDHL), a rare syndromic inherited retinal disease distinct from Usher syndrome. Apart from this, a complex structural variant (SV) implicating CEP78 has been reported in CRDHL. Here we aimed to expand the genetic architecture of typical CRDHL by the identification of complex SVs of the CEP78 region and characterization of their underlying mechanisms. Approaches used for the identification of the SVs are shallow whole-genome sequencing (sWGS) combined with quantitative polymerase chain reaction (PCR) and long-range PCR, or ExomeDepth analysis on whole-exome sequencing (WES) data. Targeted or whole-genome nanopore long-read sequencing (LRS) was used to delineate breakpoint junctions at the nucleotide level. For all SVs cases, the effect of the SVs on CEP78 expression was assessed using quantitative PCR on patient-derived RNA. Apart from two novel canonical CEP78 splice variants and a frameshifting single-nucleotide variant (SNV), two SVs affecting CEP78 were identified in three unrelated individuals with CRDHL: a heterozygous total gene deletion of 235 kb and a partial gene deletion of 15 kb in a heterozygous and homozygous state, respectively. Assessment of the molecular consequences of the SVs on patient's materials displayed a loss-of-function effect. Delineation and characterization of the INTRODUCTION During the last years, next-generation sequencing (NGS) techniques mostly relying on short-read sequencing (SRS) have accelerated molecular diagnoses in individuals with inherited retinal diseases (IRDs) (Jespersgaard et al., 2019). IRD is characterized by a tremendous genetic heterogeneity with variants identified in more than 270 genes [RetNet (Retinal Information Network) 1 ]. Different types of variants can give rise to IRD, both single-nucleotide variants (SNVs) as well as structural variants (SVs), of which copy number variants (CNVs) have been most frequently reported Ellingford et al., 2018;Van Schil et al., 2018;Daiger et al., 2019;Zampaglione et al., 2020). The latter are estimated to contribute to at least 7%-10% of pathogenic alleles in IRD Ellingford et al., 2018;Daiger et al., 2019;Zampaglione et al., 2020). Interestingly, genomic features such as gene size have been shown to correlate with CNV occurrence in IRD genes (Van Schil et al., 2018).
Structural variant detection based on standard molecular karyotyping and on sequencing depth algorithms is capable of detecting only large CNVs, such as deletions and duplications greater than 50 kb in size, whereas cryptic SVs, such as smaller CNVs, copy neutral, and complex rearrangements, may be missed (Lindstrand et al., 2019). Nowadays, it is possible to detect approximately 27,000 SVs (>50 bp) per human genome only using a combination of technologies, where the majority of these SVs is located in the non-coding part of the genome and missed using exome-based approaches (Belkadi et al., 2015;Chaisson et al., 2019;Mahmoud et al., 2019). Whole-genome sequencing (WGS) proved to be particularly powerful to detect SVs, variants in GC-rich regions, and variants in non-coding regulatory regions (Nishiguchi et al., 2013;Ellingford et al., 2016;Carss et al., 2017;de Bruijn et al., 2020). While predictions are indicating that at least 48% of deletions and 83% of insertions are routinely missed by short-read-calling algorithms (Eichler, 2019), 1 https://sph.uth.edu/retnet long-read sequencing (LRS) is particularly valuable for detecting SVs, as the long reads provide the necessary context to call and resolve SVs, regardless of their sequence composition (De Coster and Van Broeckhoven, 2019).
One of the more recently identified disease genes that contribute to the genetic heterogeneity of IRD is the centrosomal gene CEP78 (MIM# 617110), in which several types of variants, inactivating sequence variants as well as a unique missense variant have been found in autosomal recessive conerod dystrophy with hearing loss (CRDHL; MIM# 617236), a recognizable phenotype distinct from Usher syndrome (Fu et al., 2016;Namburi et al., 2016;Nikopoulos et al., 2016;Ascari et al., 2020). Apart from CRDHL, sperm abnormalities causing infertility have been reported in two unrelated affected males (Ascari et al., 2020). Functional studies pointed to a lossof-function effect with decreased amounts of protein, normal subcellular localization, and elongated primary cilia in patients' cells (Namburi et al., 2016;Nikopoulos et al., 2016;Ascari et al., 2020). Interestingly, a complex SV implicating CEP78 has been reported in one individual with CRDHL, being a homozygous deletion-inversion-deletion overlapping CEP78 (Sanchis-Juan et al., 2018). CEP78 localizes to the mature centrioles (Brunk et al., 2016), which are the main components of the centrosomes, key microtubule-organizing hubs in eukaryotic cells, with the mother centriole acting as the basal body during cilia formation (Gönczy and Hatzopoulos, 2019). Centrioles duplicate once per cell cycle, and irregularities in their structure, or number, are associated with several diseases including cancer or ciliopathies (Nigg and Raff, 2009;Gönczy, 2015).
Here, we report CEP78 SVs and SNVs in four unrelated CRDHL families of Belgian, Danish, and Turkish origin. Interestingly, two distinct SVs were identified in three of the four CRDHL families using a combination of NGS technologies. Targeted or whole-genome nanopore LRS was used to delineate breakpoint junctions at the nucleotide level. For the smallest SV, a founder effect was shown. Overall, this study supports that the CEP78 locus is prone to distinct SVs and emphasizes the importance of SV analysis in the genetic workup of CRDHL, leveraging the power of shallow whole-genome sequencing (sWGS) and both targeted and whole-genome LRS. Furthermore, data mining of bulk and single-cell (sc) transcriptional datasets, in combination with CEP78 immunostaining, displayed a CEP78 expression domain in agreement with the phenotypic manifestations of CRDHL.

Ethics Statement
This study followed the tenets of the Declaration of Helsinki, and ethical approval was given by the local ethics committee (Ghent University Hospital, EC UZG 2017/1540). All individuals involved gave their informed consent prior to inclusion in this study.

Phenotypic Evaluation
Affected individuals were subjected to ophthalmologic evaluation including best-corrected visual acuity measurement, funduscopy, visual field assessment, infrared and blue light reflectance and autofluorescence imaging, spectral-domain optical coherence tomography, and electroretinography, when possible. In addition, an audiological assessment was performed before or after the molecular diagnosis.
For F2, II:1 gene panel sequencing of 13 genes [ABHD12, ADGRV1 (GPR98), CDH23, CIB2, CLRN1, DFNB31 (WHRN), HARS1, MYO7A, PCDH15, PDZD7, USH1C, USH1G, and USH2A] associated with Usher syndrome was performed using a HaloPlex custom design created by use of Agilent SureDesign and included exons and 50 bp of the intron at each intronexon boundary. The enrichment library from patient DNA was sequenced on a MiSeq sequencer (Illumina). FASTQ files were analyzed using SureCall v.3.0.1.4 (Agilent Technologies) using default settings. Whole-exome trio sequencing was performed for F2, II:1 and her parents. Library preparation was done using the Ion AmpliSeq exome kit (Thermo Fisher), and libraries were sequenced using the IonProton system (Thermo Fisher). Base calling, read alignment, and variant calling were performed using the Torrent Suite including the Torrent Variant Caller (Thermo Fisher). VarSeq (GoldenHelix) was used for annotation and filtering of the variants. Confirmation of the identified CEP78 variant was performed via Sanger sequencing using NM_001098802.2 as reference sequence.

Shallow WGS
Shallow whole-genome sequencing was performed using the Hiseq3000 (Illumina), starting from 200 ng of gDNA. For library construction the NEXTflex Rapid DNA Sequencing kit (Bio Scientific) was used. Pipetting steps were automated using a Hamilton Star robot (Hamilton). Library concentrations were measured by the Qubit High-Sensitivity kit (Thermo Fisher Scientific), and equimolar concentrations were pooled before sequencing. The minimal number of mapped reads was set at 50 million. Copy number analysis was performed with WisecondorX and further visualized with the ViVar platform (Sante et al., 2014;Raman et al., 2019).

Quantitative and Long-Range Polymerase Chain Reaction for Deletion Confirmation and Delineation
Copy number variants were confirmed by quantitative polymerase chain reaction (qPCR), and primers were designed in coding exons or intronic regions as previously described (D'haene et al., 2010). Assays were prepared using SsoAdvanced Universal SYBR Green Supermix (Bio-Rad Laboratories) and run on LightCycler 480 System (Roche). Data were analyzed with qbase + software (Biogazelle). ZNF80 and GPR15 were used as reference genes. Subsequent to iterative qPCRs (long-range), PCR was performed to obtain junction deletion products using Phusion High-Fidelity PCR kit (New England Biolabs) and visualized on 1% UltraPure Agarose (Thermo Fisher Scientific) gels. Primer design was done using Primer3Plus. Primers sequences are listed in Supplementary Table 1.

Targeted LRS
Amplified (long-range) PCR product was quality checked using DropSense (Trinean), Qubit (ThermoFisher), and Fragment Analyzer (Agilent), using DNF-492 Large Fragment analysis kit (Agilent). A library was constructed according to the Ligation sequencing protocol (SQK-LSK109, Oxford Nanopore Technologies, ONT; GDE_9063_v109_revU_14Aug2019) with minor adaptations. DNA repair and end-prep using ONT consumables (SQK-LSK109) and NEBNext FFPE DNA repair mix and NEBNext Ultra II End repair/dA tailing Module (M6630, E7546 both from New England Biolabs) started with 100 fmol of the PCR amplicon, with extended enzymatic incubation times, 30 min at 20 • C and 30 min at 65 • C. Repaired and endprepped amplicon was Ampure XP (Beckman Coulter) cleaned up for increased adaptor ligation efficiency, using a ratio of 1:1 (vol/vol) and extended incubation (10 min on Hulamixer). After bead cleanup, the pellet was eluted in 32 µL Nuclease Free Water, of which 30 µL was used for adapter ligation. Adapter ligation using SQK-LSK109, ONT, and NEBNext Quick Ligation Module (E6056, New England Biolabs) was performed according to the protocol with extension of the ligation incubation to 30 min at room temperature. The ligation reaction was cleaned up using Ampure XP (Beckman Coulter) in vol/vol ratio of 0.4 with extended incubation (10 min on Hulamixer). A purified pellet was eluted in 10 µL of MilliQ water with the final yield of library calculated using concentration and size information (F1, II:1 yield = 508.2 ng/72.6 fmol and F3, II:1 yield = 462 ng/61 fmol). Ten fmol (64 ng for F1, II:1) and 20 fmol (151.5 ng for F3, II:1) amount of the library was loaded and sequenced on MinION using Flongle Flowcell (both Oxford Nanopore Technologies) with, respectively, 85 (F1, II:1) and 47 (F3, II:1) number of pores sequencing after loading of the library. Sequencing was complete in 24 h and generated in total 1.08 Gb (F1, II:1) and 827.9 Mb (F3, II:1) of data, equaling 172.75 K (F1, II:1) and 140.99 K (F3, II:1) reads with an estimated N50 of 10.55 kb (F1, II:1) and 10.7 kb (F3, II:1).

Whole-Genome LRS
Extracted DNA was checked for concentration, purity, and integrity using DropSense (Trinean), Qubit (ThermoFisher), and Fragment Analyzer (Agilent), using 464 High Sensitivity Large Fragment 50-Kb kit (Agilent). The sample was sheared using Mega3 (Diagenode) to the final average size of the peak 21,015 bp (smear analysis 25,230 bp). Short fragments were eliminated using SRE XS (Circulomics). Sheared and size-selected DNA sample was used in library prep following the protocol gDNA by Ligation (SQK-LSK109, GDE_9063_v109_revU_14Aug2019, Oxford Nanopore Technologies) with minor adaptations. DNA repair and end-prep using ONT consumables (SQK-LSK109) and NEBNext FFPE DNA repair mix and NEBNext Ultra II End repair/dA tailing Module (M6630, E7546 both New England Biolabs) started with 182 fmol of the PCR amplicon, with extended enzymatic incubation times, 30 min at 20 • C and 30 min at 65 • C. Repaired and end-prepped amplicon was Ampure XP (Beckman Coulter) cleaned up for increased adaptor ligation efficiency, using a ratio of 1:1 (vol/vol) and extended incubation (10 min on Hulamixer). After bead cleanup, the pellet was eluted in 63 µL nuclease-free water, of which 60 µL was used for adapter ligation. Adapter ligation using SQK-LSK109, ONT, and NEBNext Quick Ligation Module (E6056, New England Biolabs) was performed according to the protocol with extension of the ligation incubation to 30 min at room temperature. The ligation reaction was cleaned up using Ampure XP (Beckman Coulter) in vol/vol ratio of 0.4 with extended incubation (10 min on Hulamixer). A purified pellet was eluted in 40 µL of MilliQ water with final yield of library calculated using concentration and size information (yield = 1,627.4 ng/105.6 fmol). Thirty femtomoles of the final library prep was loaded onto PromethION Flow cells (Oxford Nanopore Technologies). In order to generate enough data, the sample was loaded on three flow cells (FCs).

Expression Analysis on Patient's Material and Splicing Assessment
For quantitative reverse transcription-PCR (RT-PCR), total RNA was extracted from short-term cultured lymphocytes or fibroblasts using MagCore according the manufacturer's guidelines. cDNA was synthesized with the iScript cDNA Synthesis Kit (Bio-Rad Laboratories). For each cDNA sample, assays were prepared using SsoAdvanced Universal SYBR Green Supermix (Bio-Rad Laboratories) and run on LightCycler 480 System (Roche). Data were analyzed with qbase + and normalized to the YWHAZ and HMBS or SDHA genes. Primers were designed using Primer3plus, and sequences are listed in Supplementary Table 1. For non-quantitative RT-PCR, cDNA was synthesized using SuperScript IV Reverse Transcriptase kit (Thermo Fisher) and underwent standard PCR, loaded on 2% UltraPure Agarose gel, and Sanger sequenced.

Haplotype Analysis
In total, 18 single-nucleotide polymorphisms (SNPs) (dbSNP, build 151) were selected for haplotype reconstruction in the upstream and downstream areas flanking CEP78. SNPs were Sanger sequenced according to the standard procedures (see above). PCR primers were designed with Primer3Plus, and sequences can be found in Supplementary Table 2.

Data Mining in Single-Cell Retinal and Cochlear Transcriptional Datasets
Human adult retinal and murine P1 cochlear sc transcriptional datasets were mined for evaluating CEP78 expression at the sc level. Expression matrices derived from pooling three donor neural retinal (Cowan et al., 2020) and four cochlear samples (Kolla et al., 2020) were retrieved and processed separately using SCANPY (v1.4.6) (Wolf et al., 2018). Preprocessing and quality control were conducted to remove outlier cells. Briefly, we considered only genes with counts of at least three cells and filtered out cells that had unique feature counts <200 or >2,500 and/or that expressed >5% mitochondrial counts. The data were then total-count normalized, logarithmized, filtered for highly variable features, and scaled to unit variance. After quality control preprocessing, a total of 19,768 retinal and 11,332 cochlear cells were kept for subsequent dimensionality reduction, embedding, and clustering. Markers associated with major neural retina and cochlear cell populations were used to assess CEP78 expression at the sc level.

Data Mining in Bulk Retinal and Cochlear Transcriptional Datasets
Retina| Expression levels by transcripts per million (TPMs) were retrieved from postmortem retina samples characterized in Ratnapriya et al. (2019). From 453 samples that passed quality control, we only considered donor retinas that showed no age-related macular degeneration progression (n = 102) to avoid confounding variables in downstream analyses. To remove potential noise, 20% of genes with the lowest mean expression across all samples were filtered out. TPM values were then filtered for a set of candidate genes, which included all genes reported to cause IRD (RetNet) and ciliary genes (SCGSv1) (van Dam et al., 2013). A total of 519 genes were eventually considered. Before evaluating correlations in the expression of the candidate genes, the set was subjected to a variance-stabilizing transformation to correct for mean-variance dependency (Zwiener et al., 2014). We then examined the expression of CEP78 and several ciliary genes listed in the top 10 of the Human Gene Connectome (HGC) (Itan et al., 2013) (SCLT1, MKS1, CEP57, CEP76, CEP135, CEP152, CEP63, CEP164, OFD1, and CEP250). Spearman correlations were computed along with pairwise p values adjusted for multiple comparisons (Holm method). A correlogram was then generated for visualization. Cochlea| We retrieved paired-end FASTQ files (GSE111348) derived from adult (P28-P32) mouse inner (IHC) and outer (OHC) cochlear hair cells (∼1,000 cells per sample; n = 4 and 6 for IHC and OHC, respectively) . Transcripts were quantified through pseudoalignment by Kallisto (v.0.46.1), for which default parameters were used for both index build and transcript quantification (Bray et al., 2016). Additionally, data generated by Schrauwen et al. (2016) were used to retrieve expression values of CEP78 in adult human cochlea and components of the vestibular labyrinth.

CEP78 Immunostaining on Human Retina
Human retina used for immunohistochemistry was fixed in 10% neutral buffered formaldehyde and embedded in paraffin. Staining for CEP78 was performed on 3-µm-thick sections using an automatic immunostainer (BenchMark Ultra, Ventana Medical Systems, Tucson, AZ, United States). The rabbit polyclonal antibody anti-CEP78 (1:100; LN2004459, LabNed) was used, and visualization was achieved with the OptiView Amplification Kit (Ventana Medical Systems). Heat-induced epitope retrieval was performed using Cell Conditioning 2 (Ventana Medical Systems).

RESULTS
The CRDHL families included in this study are of Belgian (F1 and F3), Danish (F2), and Turkish origin (F4). Consanguinity was reported for F4. Pedigrees are represented in Figure 1.

CNV Analysis on WES Data, sWGS, Long-Range PCR, and Targeted LRS in F1
For F1, II:1 CNV analysis on WES data using ExomeDepth revealed a potential homozygous deletion overlapping CEP78 [initial coordinates (hg38) chr9: g.78236351-78243636]. Subsequent sWGS confirmed the homozygous deletion spanning exons 1-5 of CEP78 [new coordinates (hg38) chr9:g.78230001-78245000], covering a region of 15-20 kb. The sWGS output is available in Supplementary Table 3. The deletion could be refined up to ∼12 kb using iterative qPCRs, and a junction product could be obtained via long-range PCR (Supplementary Figure 1). Targeted LRS on the long-range PCR amplicon allowed final delineation, identifying a complex deletion-inversiondeletion (Figure 2). The left breakpoint is located at (hg38) chr9:78228782, whereas the right breakpoint at chr9:78244762. The inverted segment spans chr9:78234546-78234844 [nearby an L1ME3Cz repetitive element (chr9:78234521-78234902)]. The SV overlaps with the one previously described in a British patient by Sanchis-Juan et al. (2018), delineated using microarray and Sanger sequencing (Sanchis-Juan et al., 2018). A smaller deletion, overlapping exons 2 and 3 in CEP78, is reported in gnomAD SV (DEL_9_103019; gnomAD SVs v2.1). Segregation analysis was performed via qPCR and confirmed both parents as heterozygous carriers of the deletion. Expression analysis of CEP78 mRNA on available lymphocytes (F1, II:1) and three controls showed complete loss of CEP78 expression due to the homozygous deletion (Supplementary Figure 2).

Segregation Analysis, sWGS, and Whole-Genome LRS in F2
For F2, II:1, initial targeted gene panel sequencing was negative, as it was carried out before the CEP78 gene had been associated with CRDHL. Thus, trio WES was performed for the proband The sWGS output is available in Supplementary Table 3. The final delineation was obtained via whole-genome LRS allowing the identification of a heterozygous (13 reference sequence reads, 21 deletion reads) deletion spanning the region chr9:78096930-78331887 and covering 235 kb (Figure 2). Breakpoint junction  analysis highlighted microhomology (Supplementary Figure 5). Expression analysis of CEP78 mRNA on available fibroblasts (F2, II:1) and two controls showed loss of CEP78 expression due to the CEP78 genotype (Supplementary Figure 2).

CNV Analysis on WES Data, Long-Range PCR, and Targeted LRS in F3
For F3, II:1 CNV analysis on WES data using ExomeDepth revealed a heterozygous deletion overlapping CEP78 with the same coordinates of the deletion in F1, II:1 (Figure 3) in combination with c.1449dup [p. (Arg484Thrfs  *  4)]. The latter is a novel 1-bp duplication in exon 12, creating a frameshift starting at codon Arg484. The new reading frame ends in a stop codon at position 4 (Supplementary Table 4). Segregation analysis confirmed the presence of c.1449dup on the maternal allele, while the deletion has paternal origin (Supplementary Figure 6). A junction product could be obtained via long-range PCR using the same primers used for F1, II:1. Targeted LRS on the longrange PCR amplicon allowed a final delineation, identifying once more the same complex deletion-inversion-deletion as found in F1, II:1 (Figure 3). Expression analysis of CEP78 mRNA on available lymphocytes (F3 I:1, I:2, and II:1) and three controls showed extremely reduced CEP78 expression due to the variants in the index case (Supplementary Figure 2).

Haplotype Reconstruction for Recurrent Deletion-Inversion-Deletion of CEP78 Identified in F1 and F3
The same deletion-inversion-deletion, originally described by Sanchis-Juan et al. (2018), has been identified here in F1 (homozygous) and F3 (heterozygous, with paternal inheritance); therefore, haplotype reconstruction using genotyping of 18 SNPs was performed in the proband of F1 (II:1) and in the proband and parents of F3 (II:1, I:1, and I:2). In addition, the haplotype of the homozygous CEP78 SV reported by Sanchis-Juan et al. (2018), was reconstructed here on the basis of available WGS data. A common shared haplotype of at least 1.9 Mb was identified (Supplementary Table 5 In agreement with previous studies, the IRD phenotype displays a more pronounced cone dysfunction at the onset. The age at onset of the IRD ranged from the second to third decade and, of the hearing loss, from congenital to the first decade of life. More detailed clinical features are provided in Figure 4, Table 1, and Supplementary Clinical Data 1, 2. Affected individuals from F1 to F3 are females, whereas the affected individual from F4 is a male, without signs of subfertility/infertility. Apart from CRDHL, no additional clinical features were reported except for a balance disorder in F1. Overall, no genotype-phenotype correlation could be observed for the IRD phenotype in probands with biallelic CEP78 SVs, SNVs, or a combination of both.

CEP78 Expression in Human and Mouse Tissues Affected in CRDHL
In line with the main systems affected in CRDHL, i.e., neural retina and the inner ear, we examined CEP78 expression in sc transcriptional datasets of human neural retina together with mouse and human cochlear or inner ear cells. This showed that CEP78 is predominantly expressed in cone photoreceptor and in hair cell clusters, respectively (Figure 5 and Supplementary  Figures 7, 8). Next, the expression patterns of CEP78 and other ciliary genes listed in the top 10 of the HGC (Itan et al., 2013), SCLT1, MKS1, CEP57, CEP76, CEP135, CEP152, CEP63, CEP164, OFD1, and CEP250, in human adult bulk retinal transcriptional datasets were assessed for coordinated correlation, reasoning that this could offer insight into possible regulatory interactions. Interestingly, a matrix correlation plot (Supplementary Figure 9) showed CEP78 expression to be correlated with SCLT1 (ρ = 0.63, p < 0.001) and to be anticorrelated with CEP250 (ρ = −0.53, p = 0.0007), both of them already associated with IRD (Khateb et al., 2014;Tonda et al., 2016;Kubota et al., 2018). Immunohistochemistry analysis of human retina shows predominant cytoplasmic expression of CEP78 in both cones and rods, with strong staining at the base of the inner segments, concordant with previous findings (Figure 5; Namburi et al., 2016;Nikopoulos et al., 2016).

DISCUSSION
In this study, we focused on CEP78-associated IRD, of which several studies have linked CEP78 variants with a presumed loss-of-function effect to CRDHL. Interestingly, a homozygous complex SV affecting CEP78, i.e., a deletion-inversion-deletion, was recently described in a case of CRDHL (Sanchis-Juan et al., 2018). We assessed the role of SVs in CRDHL cases and found two distinct SVs affecting the CEP78 region, emphasizing their role in CEP78 disease. We showed the efficacy of CNV analysis of WES data to identify subtle CNVs; more specifically, we identified a deletion spanning exons 1-5 of CEP78, in both homozygous and heterozygous states, leading to loss of function. Furthermore, we showed the power of combined sWGS, qPCR, and targeted or whole-genome LRS to delineate and characterize SVs at the nucleotide level. This approach revealed a complex 15-kb deletion-inversion-deletion, which proved to be the same as the previously reported CEP78 SV that was found and characterized by combined molecular karyotyping and Sanger sequencing. As this deletion was identified in the two families of Belgian origin and in a previously published case of a British patient, haplotype reconstruction was performed and suggested a founder effect. Apart from this, a distinct larger SV, being a 235-kb heterozygous deletion, was identified using molecular karyotyping and characterized using whole-genome LRS. This deletion encompasses the CEP78 and PSAT1 (MIM #610936) genes, the latter of which has so far only been implicated in recessive phosphoserine aminotransferase deficiency (MIM # 610992) and Neu-Laxova syndrome 2 (MIM # 616038) (Hart et al., 2007;Acuna-Hidalgo et al., 2014). Overall, for all cases with CEP78 SVs identified by us, loss of CEP78 expression was confirmed on patient-derived material. The latter is in line with the molecular effects of the previously reported CEP78 sequence variants. Sanchis-Juan et al. (2018) reported that the sequences surrounding all the breakpoints of the described SV in CEP78 present high similarity to long interspersed nuclear elements (LINEs). It was hypothesized that repetitive elements enable replication-based SV formation, providing the necessary microhomology islands or increasing the vulnerability of the region to the formation of secondary DNA structures, which can lead to replication fork collapse (Sanchis-Juan et al., 2018). We performed a similar extensive in silico analysis of all breakpoints and junctions of the SVs identified in this study (Supplementary  Figures 5, 10 and Supplementary Tables 6, 7). Interestingly, all breakpoint junctions, including the 235-kb CEP78-PSAT1 deletion, showed microhomology, supporting repetitive elementmediated replication-based SV formation underlying CEP78 SVs/complex SVs. Our findings underscore that SVs contribute to the genetic diversity of the human genome and are of high relevance for the molecular pathogenesis of rare diseases. Very  recently, SVs were mapped and characterized in 17,795 deeply sequenced human genomes (Abel et al., 2020). It is estimated that SVs account for 17.2% of rare alleles genome-wide and that approximately 90% of such SVs are non-coding deletions. The number of complex SVs as a cause of genetic diseases is emerging. This is illustrated by de novo interspersed repeat insertions found in 124 cases with a genetic disease, 76 of which are caused by Alu short interspersed elements (SINEs), and 30 can be attributed to LINE-1 insertions (Payer et al., 2017). Two remarkable examples of complex SVs as a cause of IRD are SVA retrotransposon insertions in MFSD8-and BBS1-associated disease, of which the first SV served as a target for antisense oligonucleotide treatment, and the second appears to be a frequent cause of Bardet-Biedl syndrome (Kim et al., 2019;Delvallée et al., 2020). Another recent example is the identification of 33 pathogenic SVs in a cohort of 722 patients with autosomal dominant retinitis pigmentosa (adRP). Indeed, eight distinct complex non-coding SVs were found as the underlying mechanism of RP17-linked adRP in 22 affected families with >300 affected individuals, clearly emphasizing the importance of the non-coding genome as target for SVs in IRD (de Bruijn et al., 2020). Given a high contribution of CNVs to ∼7%-10% of the pathogenic alleles in IRD, the fact that the majority of the SVs in our genome are non-coding and the emerging number of non-coding or complex SVs causing IRD (Abel et al., 2020;Zampaglione et al., 2020), it can be expected that SVs represent an important part of missing heritability of IRD. Hence, there is a need to assess the impact and frequency of SVs in IRD cohorts more systematically. Here, we showed how CNV calling on WES data, sWGS, and targeted or whole-genome LRS led to a confirmed molecular diagnosis in CEP78-associated IRD cases. CNV assessment using NGS-based algorithms has been already described as a reliable method to enhance the diagnostic rate of IRD Ellingford et al., 2018). Furthermore, this study supports the superiority of LRS for fast characterization of complex SVs. Massively parallel sequencing is currently dominated by second-generation sequencing technology, mainly relying on SRS, having its limitations for the identification of cryptic SVs, sequencing repetitive regions, phasing of alleles, and distinguishing highly homologous genomic regions, mainly due to its short-read lengths. Third-generation or LRS technologies offer improvements in the characterization of genetic variation and regions that are difficult to assess with the current SRS approaches. The main advantage of LRS compared to SRS technologies comes from the use of long reads (>10 kb on average), originating from single-DNA molecules. Apart from this, the sequencing occurs in real time without the need of PCR amplification, therefore being mostly free from PCRrelated bias. The power of LRS approaches to overcome specific limitations of second generation-based analyses will further become clear when LRS will be implemented in routine genetic testing workflows.
Apart from the CEP78 SVs, the two novel splice variants found here expand the spectrum of CEP78 (likely) pathogenic SNVs: a novel canonical acceptor and donor splice variant of intron 9, respectively, c.1209-2A>C (p.?), predicted to lead to an exon 10 skip, and c.1208 + 2T > A (p.?), predicted to cause an exon 9 skip. Interestingly, skipping of the 46-bp exon 10 was previously reported for c.1254 + 5G>A (CEP78, NM_032171) (Fu et al., 2016). When comparing the phenotypes found in CRDHL cases with SNV and SVs, no apparent genotypephenotype correlation could be demonstrated that would allow discriminating between the two classes of variants, emphasizing the need for a systematic SV assessment in the genetic workup of CRDHL cases, in which a CEP78 genotype is suspected or in "atypical" Usher syndrome cases.
Using data mining of bulk or sc transcriptional datasets from human retina, mouse cochlea, and human inner ear, we showed expression in cone photoreceptors and in hair cell clusters. These expression domains are in agreement with the main systems affected in CRDHL, i.e., neural retina and the inner ear. Apart from this, a correlation study of CEP78 expression strengthened the importance of CEP78 in the ciliary machinery.
To conclude, this study supports that the CEP78 locus is prone to microhomology-mediated, replication-based SV formation and that (complex) SV analysis should be included in molecular genetic testing of CRDHL or "atypical" Usher syndrome. Finally, it demonstrates the power of WES-based CNV assessment, sWGS, and whole or targeted LRS in identifying and characterizing suspected complex SVs in patients with CEP78-associated IRD. Systematic SV assessment in IRD will certainly close a diagnostic gap and will contribute to precision medicine in IRD.

DATA AVAILABILITY STATEMENT
The variants presented in this study can be found in online repositories. The name of the repository and accession numbers can be found below: Leiden Open Variation Database (LOVD), https://databases.lovd.nl/shared/individuals/00332412; https:// databases.lovd.nl/shared/individuals/00332413; and https://data bases.lovd.nl/shared/individuals/00332414.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Ethics Committee Ghent University Hospital, Ghent University Hospital, Ghent. Written informed consent to participate in this study was provided by the participants' legal guardian/next of kin. Written informed consent was obtained from the individual(s), and minor(s)' legal guardian/next of kin, for the publication of any potentially identifiable images or data included in this article.

AUTHOR CONTRIBUTIONS
GAs and EDB designed the study and wrote the overall manuscript. GAs, NR, MDB, MVH, DC, JVD, TVL, TDP, AR, and EDB performed the research. JDZ, MVL, JJ, MBe, and LT clinically evaluated and described the families included in the study. GAs, NR, MDB, MBa, MVH, GAr, DC, JVD, TR, TDP, PDR, WDC, BM, AR, MS, and EDB analyzed and described the data. All authors reviewed the manuscript.

ACKNOWLEDGMENTS
Lone Sandbjerg Hindbaek is thanked for excellent technical help with RT-PCR studies. Sarah De Jaegere and Valerie Baumont are thanked for their assistance. We are also grateful to the families for their participation in the study.