De novo Assembly of the Brain Coral Platygyra sinensis Genome

Coral reefs are one of the most productive ecosystems on the planet, supporting the productivity of ∼25% of marine fisheries (Moberg and Folke, 1999). Scleractinian corals constitute the primary framework of coral ecosystems. Unfortunately, they are highly sensitive to changes in the water temperature (Hoegh-Guldberg and Bruno, 2010; Mumby Peter and van Woesik, 2014). Ocean warming has caused coral bleaching and posed a global threat to coral health and survival. Reef corals have suffered major declines as the frequency and severity of periodic ocean warming have increased (Loya et al., 2001). Coral reefs in Southeast Asia especially in the Andaman Sea and Gulf of Thailand were severely affected during the 2010 mass bleaching event (Tun et al., 2010). The scleractinian coral Platygyra sinensis is one of the dominant reef-builders commonly found in the Gulf of Thailand and Andaman Sea. Previous studies have shown that Platygyra corals in Southeast Asia were extensively bleached during the major thermal anomaly in May 2010 (McClanahan, 2004; Guest et al., 2012). Little is known about the responses of the brain coral genus Platygyra to heat stress at the molecular level due to the lack of genomic/transcriptomic resources. The availability of genomic and transcriptomic data from a number of coral species allows us to probe the molecular stress response of the organisms to biotic and abiotic stress conditions (Shinzato et al., 2011; Traylor-Knowles et al., 2011; Kenkel et al., 2013; Kitchen et al., 2015; Davies et al., 2016; Buitrago-López et al., 2020). Here, we report the first reference genome assembly for the brain coral P. sinensis using long-read Pacific Biosciences sequencing technology. We hope that our genomic resource will help the coral research community gain insights into genetic factors driving responses to thermal stress and other biotic/abiotic stresses and help promote the conservation of this species.


BACKGROUND
Coral reefs are one of the most productive ecosystems on the planet, supporting the productivity of ∼25% of marine fisheries (Moberg and Folke, 1999). Scleractinian corals constitute the primary framework of coral ecosystems. Unfortunately, they are highly sensitive to changes in the water temperature (Hoegh-Guldberg and Bruno, 2010;Mumby Peter and van Woesik, 2014). Ocean warming has caused coral bleaching and posed a global threat to coral health and survival. Reef corals have suffered major declines as the frequency and severity of periodic ocean warming have increased (Loya et al., 2001). Coral reefs in Southeast Asia especially in the Andaman Sea and Gulf of Thailand were severely affected during the 2010 mass bleaching event (Tun et al., 2010). The scleractinian coral Platygyra sinensis is one of the dominant reef-builders commonly found in the Gulf of Thailand and Andaman Sea. Previous studies have shown that Platygyra corals in Southeast Asia were extensively bleached during the major thermal anomaly in May 2010 (McClanahan, 2004;Guest et al., 2012). Little is known about the responses of the brain coral genus Platygyra to heat stress at the molecular level due to the lack of genomic/transcriptomic resources. The availability of genomic and transcriptomic data from a number of coral species allows us to probe the molecular stress response of the organisms to biotic and abiotic stress conditions (Shinzato et al., 2011;Traylor-Knowles et al., 2011;Kenkel et al., 2013;Kitchen et al., 2015;Davies et al., 2016;Buitrago-López et al., 2020). Here, we report the first reference genome assembly for the brain coral P. sinensis using long-read Pacific Biosciences sequencing technology. We hope that our genomic resource will help the coral research community gain insights into genetic factors driving responses to thermal stress and other biotic/abiotic stresses and help promote the conservation of this species.

Sample Collection
Gamete bundles from Platygyra sinensis colonies were collected in situ from the inshore reef in Sattahip district (Samaesarn subdistrict) located in the Gulf of Thailand (12 • 35 ′ 556 ′′ N, 100 • 57 ′ 508 ′′ E), following the guideline in (Edwards et al., 2010). Prior to the collection date, we conducted several night dives to assess proximity to spawning. Once we were able to predict the onset of spawning, we placed collecting devices over three mature P. sinensis colonies that were not in the vicinity of other coral species. The collecting devices consisted of a funnel made of 100-µM plankton mesh with a 500-mL transparent plastic container attached at the mouth of the net. The collecting devices were attached to the surrounding substrata using stainless steel nails, and drawstrings were used to tighten the base of the net around the colony (so that the net enclosed the entire colony). When spawning was finished, we collected and closed the container underwater and immediately transported the gamete bundles to the lab station (The Marine Science Camp and Conservation, Samaesarn, Chonburi).
Upon returning to the lab, eggs and sperm bundles were separated using Pasteur pipettes and plankton mesh sieves, and the eggs were transferred to 2-mL screw-capped tubes, immediately frozen and stored in liquid nitrogen until use. Even though we collected the samples from three individual colonies, we chose to sequence one colony from which we could retrieve the largest amount of eggs from the collecting device (to ensure we could isolate sufficient amount of genomic DNA required for sequencing). Small coral fragments from that same colony were also collected (for RNA sequencing) and placed in sterile disposable 15-mL tubes submerged in seawater. Seawater was removed upon returning to the lab station, and the samples were frozen and stored in liquid nitrogen until use. Access to the field site was authorized and the permit for coral sample collection was issued by the Department of Marine and Coastal Resources, Ministry of National Resources and Environment (Thailand), following the Nagoya protocol (permit number 20210304).

DNA/RNA EXTRACTION AND SEQUENCING
High molecular weight genomic DNA was isolated from the eggs using the MagAttract HMW DNA kit (Qiagen, Hilden, Germany) following the manufacturer's instruction. The DNA sample was quantified using Qubit fluorometer (Thermo Fisher Scientific, Waltham, USA), and its integrity was assessed using the Pippin Pulse Electrophoresis System (Sage Science, Beverly, USA). SMRTbell libraries with an insert size of 10,000 nt were constructed for the Pacific Biosciences (PacBio) Sequel sequencing system. Sequencing was performed with the Sequel Binding Kit 2.0 using a 20-h movie collection time according to manufacturer's protocols (Pacific Biosciences, Menlo Park, USA).
To isolate RNA for library preparation and sequencing, frozen samples were homogenized in liquid nitrogen with sterile mortars and pestles, and the CTAB buffer (2% CTAB, 1.4 M NaCl, 2% PVP, 20 mM EDTA pH 8.0, 100 mM Tris-HCl pH 8.0, 0.4% SDS) was added. RNA was extracted from the aqueous phase twice using 24:1 chloroform:isoamylalcohol and precipitated in 1/3 volume of 8 M LiCl overnight. RNA pellets were washed with 70% ethanol, air-dried and resuspended in RNase-free water. Poly(A) mRNAs were enriched from total RNA samples using the Dynabeads mRNA purification kit (Thermo Fisher Scientific, Waltham, USA). RNA integrity was assessed with a Fragment Analyzer System (Agilent, Santa Clara, CA, USA). The RNA library was prepared using

GENOME ASSEMBLY
A total of 11,092,922 PacBio raw reads totaling 93.32 Gb were subjected to read correction and trimming by MECAT2  Figure S1) (Bushnell, 2014). De novo assembly was performed using the Flye software version 2.8.3 (Kolmogorov et al., 2019) with the following parameter setting: genomeSize = 1 giterations 3, and -m 3000. The polishing was carried out using the GenomicConsensus package in the SMRT Analysis software suite version 2.3 with -algorithm =arrow (https:// github.com/PacificBiosciences/GenomicConsensus). The draft genome assembly was 1.1 Gb with an N50 length of 32.4 kb and contained 56,352 contigs larger than 1 kb ( Table 1). Given the high coverage of the long-read PacBio sequences used, the contiguity of the assembly was lower than expected, and this is likely due to the repetitive nature of the genome.

IDENTIFICATION OF REPETITIVE SEQUENCES
To analyze the repetitive sequences in P. sinensis, we employed RepeatModeler version 2.0.1 to predict transposable elements and generate a de novo repeat library and RECON version 1.08 and RepeatScout version 1.0.5 to identify the boundaries of repetitive elements and to build consensus models of interspersed repeats. We discovered that 50.03% of the genome assembly was occupied by repetitive elements, most of which were unclassified repeats (29.28%) and DNA elements (10.79%). The proportion of repetitive sequences in the brain coral was higher than the figures reported for A. millepora (34.55%) (Ying et al., 2019) and P. verrucosa (41.22%) (Buitrago-López et al., 2020) but comparable to the number reported for Pachyseris speciosa (52.5%) (Bongaerts et al., 2021).

GENOME QUALITY ASSESSMENT
To evaluate the quality of the assembly, we aligned shortread RNA-seq sequences from this study to the genome using BLASTN at an e-value cutoff of 10 −10 . Prior to performing the alignment, any short RNA sequence read that was mapped to the Symbiodiniaceae database with an e-value of <10 −5 and bit-scores higher than 50 was discarded. We found that 87.92% of the P. sinensis RNA-seq reads could be mapped back to the genome assembly. To further assess the completeness of the final assembly, we employed the Benchmarking Universal Single-Copy Orthologs (BUSCO) software version 3 using the Metazoa OrthoDB release 10 Simão et al., 2015). Gene predictions based on the assembly revealed the presence of 954 highly conserved orthologs in the metazoan gene set, with 66.8% identified as complete and single-copy, 12.1% as complete and duplicated, 7.7% as fragmented, and 13.4% as missing. We also ran the HaploMerger2 (Huang et al., 2017) software to resolve allelic relations in the assembly. The size of the resulting assembly (1.04 Gb) was very similar to the original assembly with an N50 contig length of 33,443 nt. When we evaluated the completeness of the gene space of this merged assembly, 65.8, 11.6, 8.1, and 14.5% of the highly conserved orthologs in the metazoan gene set were identified as complete and single-copy, complete and duplicated, fragmented and missing, respectively. We decided to present the original unmerged assembly in this study since that version appeared to be slightly more complete with regard to the gene space assessed by BUSCO.

GENOME ANNOTATION
Evidence from transcriptome-based prediction, ab initio gene prediction and homology-based prediction were combined to predict protein-coding sequences in the unmasked P. sinensis genome using EvidenceModeler (EVM) version 1.1.1 r2015-07-03 (Haas et al., 2008). Short-read RNA-seq data associated with Symbiodiniaceae were filtered out as mentioned earlier.
The remaining P. sinensis RNA-seq data were mapped to the assembly during the initial step of annotation using the PASA2 pipeline version 2.0.1 (Haas et al., 2008). Protein sequences from Acropora digitifera, Acropora millepora, Montipora capitata, Orbicella faveolata, Pocillopora damicornis, Pocillopora verrucosa, and Stylophora pistillata obtained from public databases were aligned to the unmasked genome using AAT version 1.52 (Huang et al., 1997). An ab initio gene predictor was run on the unmasked assembly. Protein-coding gene predictions were obtained with Augustus version 3.2.1 (Stanke et al., 2004) trained with A. digitifera, A. millepora, M. capitata, O. faveolata, P. damicornis, P. verrucosa, and S. pistillata, and PASA transcriptome alignment using P. sinensis RNA-seq (Chen et al., 2015) alignment files as inputs. All gene predictions were integrated by EVM to generate consensus gene models using the following weight for each evidence type: PASA2−1, AAT−0.3, and Augustus−0.3. The positions of annotate genes were cross-checked with those of known repeats, and any gene that had more than 20% overlapping sequence with repetitive elements were excluded from the list of annotated genes. In addition, RNA-seq data were mapped to the predicted gene set using HISAT2 version 2.2.0 (Kim et al., 2019), and any predicted genes with no mapped reads were excluded from the annotation list. The genome annotation contained 62,638 predicted gene models, of which 60,174 were protein-coding genes ( Table 1; Supplementary Table S1). There were on average 4.59 exons per gene, and the mean transcript length was 4,374 bp. The GC contents of the coding sequences and the introns were 44.6 and 37.8%, respectively. Predicted protein-coding genes were assigned functions by aligning them to the best matches in the NCBI Genbank nr protein database, and the BLASTP results were used to map and retrieve gene ontology (GO) annotation (Figure 1). Among genes annotated to cellular component, the largest category was integral component of membrane, followed by membrane and nucleus. The most prevalent GO terms associated with molecular function were ATP binding, metal ion binding and nucleic acid binding. DNA integration, proteolysis and protein phosphorylation were among the largest categories associated with biological function. We compared the top 10 most prevalent GO terms identified in P. sinensis, S. pistillata and A. digitifera and found that phosphorylation was among the largest category associated with biological process in all three species (Supplementary Table S2). Nucleus, cytoskeleton and endoplasmic reticulum membrane were the common top 10 cellular component annotations in P. sinensis, S. pistillata, and A. digitifera while DNA and nucleotide binding appeared to be among the most prevalent GO term in the molecular function category.

RE-USE POTENTIAL
We report the first draft assembly of the P. sinensis genome using the PacBio long-read single molecule real-time sequencing technology. The availability of this genome assembly and annotation enable the coral research community to study thermal stress responses and gain a better understanding of FIGURE 1 | Gene Ontology (GO) annotation of P. sinensis genes in the genome assembly. Results are summarized in three main categories: cellular components, molecular functions, and biological processes. A total of 23,240 genes have been assigned GO terms.
how P. sinensis copes with elevated ocean temperature. This knowledge may be useful for future reef conservation and restoration programs.

DATA AVAILABILITY STATEMENT
P. sinensis genome assembly and RNA-seq data have been submitted to the NCBI Genbank databases under the BioProject number PRJNA736579 (NCBI accession number JAHPZR000000000 for P. sinensis genome assembly). The annotation files (gff, cds, and protein) are available at https:// www.nstda.or.th/noc/research-development/80-about-us/102our-genome-assemblies.html.

AUTHOR CONTRIBUTIONS
WP and ST conceived and designed the experiment. NK and LP collected coral samples. TY and DS performed DNA and RNA extraction and sequencing. CS and CN carried out bioinformatics analyses. WP wrote and revised the manuscript. All authors have read and approved the final manuscript.

FUNDING
This study was funded by the National Science and Technology Development Agency (NSTDA), Thailand (grant number: 1000221) and the L'Oréal-UNESCO for Women in Science Fellowship (awarded to WP).

ACKNOWLEDGMENTS
We would like to thank the Marine Science Camp and Conservation (Samaesarn, Chonburi) for sharing their lab facilities. We would also like to thank Bawornnan Jitphong, Piyasak Sangpaiboon, Phumiphat Jaronvannaying, and Chiraphan Raengdi for their assistance with sample collection.