Synthesis and cloning of long repeat sequences using single-stranded circular DNA

Non-coding repeat expansion causes several neurodegenerative diseases, such as fragile X syndrome, amyotrophic lateral sclerosis/frontotemporal dementia, and spinocerebellar ataxia (SCA31). Such repetitive sequences must be investigated to understand disease mechanisms and prevent them, using novel approaches. However, synthesizing repeat sequences from synthetic oligonucleotides is challenging as they are unstable, lack unique sequences, and exhibit propensity to make secondary structures. Synthesizing long repeat sequence using polymerase chain reaction is often difficult due to lack of unique sequence. Here, we employed a rolling circle amplification technique to obtain seamless long repeat sequences using tiny synthetic single-stranded circular DNA as template. We obtained 2.5–3 kbp uninterrupted TGGAA repeats, which is observed in SCA31, and confirmed it using restriction digestion, Sanger and Nanopore sequencing. This cell-free, in vitro cloning method may be applicable for other repeat expansion diseases and be used to produce animal and cell culture models to study repeat expansion diseases in vivo and in vitro.


Introduction
The discovery that simple tandem repeats or microsatellites can cause neurological diseases was revolutionary in the field of neurodegenerative disorders. Nearly 50 neurological diseases have been identified so far, of which 26 diseases are related to the repeat expansion in coding, non-coding, intron and 5′and 3′UTR regions (Rohilla and Gagnon, 2017;Paulson, 2018;Chintalaphani et al., 2021). Among the repeat expansionrelated diseases, spinocerebellar Ataxia type 31 (SCA31) is caused by the repeat expansion of 2.5 to 3.8 kbp pentanucleotide TGGAA, where a pure (TGGAA) n extended for at least 110 repeats in the intron region of the BEAN gene (Sato et al., 2009), whereas benign adult familial myoclonic epilepsy (BAFME1) is associated with the repeat expansion of 105 to 3,680 units of TTTCA (Cen et al., 2018;Ishiura et al., 2018). Such diseases can be classified based on the DNA sequences of repeat units (trinucleotide, tetranucleotide, pentanucleotide, or hexanucleotide). To develop transgenic models of these diseases, we need to obtain disease sequences from patients, except for some special cases (Mizielinska et al., 2014;Swinnen et al., 2018). However, this is often difficult due to ethical regulations and unavailability of patient-derived genomic DNA. In addition, the sizes of disease sequences are limited to that of the patients. Yet, longer DNAs are desirable for developing disease models because of the anticipation found in the repeat diseases (Wells, 1996). In the case of non-repeat mutation disorders, disease genes can be synthesized and studied without patient sources. Moreover, methods to obtain uninterrupted long repeats are lacking owing to the technical obstacles associated with amplification and cloning. Recombinant vectors containing these synthetic repeats can have numerous applications in biological, medical, and bioengineering research. These vectors can be used for studying repeat associated-non-AUG translation (RAN polypeptides) as well as formation of RNA foci and its interaction with RNA binding proteins (Lee et al., 2013;Malik et al., 2021). Several methods have been described for trinucleotide repeat synthesis using synthetic oligonucleotides. For example, 20 bp trinucleotide repeat has been used as template as well as primer for conventional polymerase chain reaction (PCR) and cloned in a vector (Ordway and Detloff, 1996). Synthesis of long iterative polynucleotide (SLIP) and non-template PCR method for trinucleotide repeat synthesis was developed based on the theory that filling gaps leads to repeat expansion (Takahashi et al., 1999). Another ligation-based method required iterative ligation reactions to obtain expanded repeats (Kim et al., 2005). Alternatively, concatenated DNA was obtained by random insertion of restriction sites using ligation and PCR (Jiang et al., 1996). PCR has also been used to obtain repeat sequences from DNA sources for polyglutamine expansions disease like Huntington's disease and spinocerebellar ataxia type 10 (Laccone et al., 1999;Peters and Ross, 1999;Michalik et al., 2001;Matsuura and Ashizawa, 2002). Amplification of dimerized expanded repeats (ADER) method was developed to obtain 2,000 CTG repeats using phi29 DNA polymerase in a cell free system (Osborne and Thornton, 2008). In rolling circle amplification or RCA (also known as hyperbranched amplification), small single stranded circular DNA is used as a template for amplification (Fire and Xu, 1995). This method has a low error rate, strong strand displacement activity, high processivity, and uses circular DNA for isothermal amplification (Hafner et al., 2001). Here we describe a cell-free synthetic method taking SCA31 and BAFME1 as examples for the synthesis and cloning of long repeat sequences using rolling circle amplification. As we were able to reproducibly obtain long repeat sequences in a cell-free manner, we regarded this method as a kind of "in vitro cloning". Table 1 shows all oligonucleotide sequences used in the experiment. Repeat single-stranded DNA (ssDNA) with phosphorylated 5′ end (80 nt, (TGGAA) 16 , HPLC purified) was obtained from Shanghai Generay Biotech Co., Ltd. (Generay, Shanghai, China). For RCA control, M13 DNA (90 nt) without repeat sequence, was obtained from Hokkaido System Science Co., Ltd. (Hokkaido, Japan). The 20 pmol ssDNA was incubated overnight at 60°C with CircLigase II (Lucigen CL9021K) according to the manual condition to obtain single-stranded circular DNA. The ligase was inactivated by heating at 80°C for 10 min. To remove excess ssDNA, exonuclease T (NEB M0265) was used at 25°C for 30 min and inactivated at 65°C for 20 min. Alternatively Exonuclease I (NEB M0293) can be used (See the results & discussion). This clean circular DNA (2 pmol) was used as a template for RCA, which was performed using Bst DNA polymerase, large fragment (NEB M0275) with 5′phospholated forward and reverse primers (Sca31_RCA_F, Sca31_RCA_R) at 60°C for 12 h. The amplified products were treated with 5 U mung bean nuclease (Takara Bio, 2420) and 0.5 U nuclease P1 (Wako 145-08221) at 37°C for 10 min. After digestion, the DNA was run on a 1.5% agarose gel and 2-3 kbp DNA samples were excised from the gel and purified with the Fastgene gel/PCR extraction kit (Nippon genetics co. FG-91302). The pIRES2 DsRed-Express2 vector (Takara Bio, 632540) was digested by AfeI (NEB R0652) and Shrimp Alkaline Phosphatase (rSAP) (NEB M0371) to prepare blunt ended vector. pIRES and the purified 2-3 kbp DNA insert were ligated for 2 h using T4 DNA Ligase (NEB M0202) and electroporated in Stable Competent E. coli (NEB C3040) cells. After 30 min recovery of competent cells, they were plated on kanamycin-supplemented culture plates and incubated at 30°C overnight. The insert lengths of selected clones were checked by PCR amplification using Phusion High-Fidelity DNA Polymerase (NEB M0530) and vector primers (pIRES_F pIRES_R) and by restriction digestion with NheI-HF & EcoRI-HF whose sites are located close to the cloning site, and by Sanger sequencing. The same process was applied to another repeat TTTCA (BAFME1) (80 nt, (TTTCA) 16 , HPLC purified) to check the versatility of the method. The whole length of one SCA31 plasmid was sequenced using Oxford Nanopore Technology (ONT) (Stevanovski et al., 2022). Library preparation with the ligation sequencing kit (SQK-LSK110) was followed by sequencing on Flongle Flow Cell (FLO-FLG001). Multiple alignment using fast fourier transform (MAFFT) on raw sequence data and consensus sequence polishing by the Medaka tool (Lee et al., 2021) were employed to construct a complete plasmid map.

Results & discussion
The initial oligonucleotide containing 80 nucleotides was selected for our experiment, as the circularized template exhibits maximum amplification efficiency for this size (Joffroy et al., 2018). The formation of circular DNA was confirmed using 15% acrylamide/8 M urea denaturing gel ( Figure 1A; Supplementary Figure S1A, B). The size discrepancy between marker and ssDNA ( Figure 1A; Supplementary Figure S1A) may be due to base composition bias of ssDNA causing differential mobility. The reaction mixture of circularization was treated with a single-strand specific nuclease, Exonuclease T. Unreacted linear ssDNA found in lane 3 of Supplementary Figure S1B was digested and not found in lane 5, which showed only circular DNA was left. Alternatively, Exonuclease I may be more effective for ssDNA containing C nucleotide(s). After removing linear ssDNA, RCA was employed for 1 h to SCA31 (repeat unit is TTGGA) and M13_90mer circular DNA (control). Ladder like pattern (Kuhn and Frank-Kamenetskii, 2005) was observed for control (M13_90mer) and smear was observed for concatenated repeat sequence showing the success of RCA ( Figure 1B). In order to produce a longer repeat sequence, the RCA reaction was extended to 12 h; consequently, extremely large RCA products were obtained, which were concatemers of the repeat DNA ( Figure 1C). DNA ranging from 2 to 3 kbp was cut out from the randomly elongated RCA products and cloned into pIRES vector ( Figure 1D). Sanger sequence result for one clone confirmed at least 90-unit repeats of TGGAA from the 5′ end ( Figure 2A) and 180-unit repeats of TTCCA from the 3' end ( Figure 2B). The same procedure was applied for the BAFME1 repeat (repeat unit is TTTCA). Figure 1E white boxes shows that inserts of about 600 bp (Lane 4-5) and about 900 bp (Lane 8-9) were cloned in the pIRES vector. We confirmed the sequence of one clone and found that the insert contained at least 80-unit repeats of TTTCA ( Figure 2C). These results show the reproducibility of this method; repetitive sequences of different repeat sequences with various unit lengths could be generated by RCA and be cloned into plasmid vectors.
To characterize the whole sequence composition and repeat unit length, Oxford Nanopore Technology (ONT) was attempted on a SCA31 plasmid containing a 2.5 kbp repeat. From 8,000 reads (N50: 6,833bp), 3,831 reads were extracted with 6,000-8,000 bp length (almost full length of the plasmid), which were assembled to form the plasmid containing the repeat and vector (Supplementary Figure S2A). Supplementary Figure S2B shows 50 random reads aligned onto the consensus Frontiers in Bioengineering and Biotechnology frontiersin.org sequence. After polishing with the Medaka tool, full length sequence of the plasmid was obtained (Supplementary Figure  S2C), containing 445 TGGAA repeats that included four mutated sequences-TGTAA, TTAAA, TGGAGG, TGGAT. Although these point mutations were found in the sequence, the longest stretch of a perfect TGGAA repeat was 1960 bp (392 TGGAA), which covered the pathogenic size. In summary, our method successfully obtained uninterrupted long repeat sequences using Bst DNA polymerase. However, our trial with phi29 polymerase was not successful despite trying with various concentrations and different reaction conditions (Supplementary Figure S3). RCA products did not migrate from the loading well into the gel and did not run after treating with various concentrations of mung bean nuclease. As a result, the DNA could not be extracted out from the gel. Our method can be used for generating long tandem repeats of tailored size in a cell-free manner and combined with Frontiers in Bioengineering and Biotechnology frontiersin.org the ADER method (Osborne and Thornton, 2008) to stabilize the repeat units in E. coli. Moreover, this in vitro cloning method may be applicable for other types of repeat expansion, including those for neurodegenerative diseases, undiscovered disease-causative repeats or artificial repeats and may be used to produce animal and cell culture models to study repeat expansion diseases in vivo and in vitro. Lastly, we believe that this method can also be used for studying the function of repeats in the genome, such as alphasatellite, centromeres and telomeres.

Data availability statement
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author/s.

Author contributions
SA and AB contributed to conception and design of the study. AB performed all experiments and analysis. AB wrote the first draft of the manuscript. All authors contributed to manuscript revision, read, and approved the submitted version.

Funding
This work was supported by the Japan Society for the Promotion of Science (grant no. JP20H00429).