Single-cell transcriptome sequencing atlas of cassava tuberous root

Introduction Single-cell transcriptome sequencing (ScRNA-seq) has emerged as an effective method for examining cell differentiation and development. In non-model plants, it hasn't been employed very much, especially in sink organs that are abundant in secondary metabolites. Results In this study, we sequenced the single-cell transcriptomes at two developmental phases of cassava tuberous roots using the technology known as 10x Genomics (S1, S2). In total, 14,566 cells were grouped into 15 different cell types, primarily based on the marker genes of model plants known to exist. In the pseudotime study, the cell differentiation trajectory was defined, and the difference in gene expression between the two stages on the pseudotime axis was compared. The differentiation process of the vascular tissue and cerebral tissue was identified by the trajectory. We discovered the rare cell type known as the casparian strip via the use of up-regulated genes and pseudotime analysis, and we explained how it differentiates from endodermis. The successful creation of a protoplast isolation technique for organs rich in starch was also described in our study. Discussion Together, we created the first high-resolution single-cell transcriptome atlas of cassava tuberous roots, which made significant advancements in our understanding of how these roots differentiate and develop.


Introduction
Cassava (Manihot esculenta Crantz) is the third most important source of calorie in Africa, Asia, and Latin America (after rice and maize) (Oliveira et al., 2014). Its starchy roots provide essential industrial materials and new sources of energy in addition to serving as a staple diet for millions of people in tropical and subtropical regions of the world (Li et al., 2017;Fathima et al., 2022). Cassava breeding is primarily focused on producing storage roots that are high yielding and of superior quality. Despite the importance of storage roots for the cassava, the origin of their development is poorly understood. Previous studies have described the anatomical process of differentiation of fibrous roots into storage roots by the section observation. (Indira and Kurian, 1977;Lowe et al., 1982;Wheatley et al., 2003). However, rootspecific genes that can effectively distinguish emerging fibrous roots and storage roots have been identified by transcriptome analysis of roots collected before, during, and after root expansion, and it has been shown that storage roots contain internal channel structures that are continuous with stem secondary xylem, indicating that storage roots have a distinct rhizogenic process compared with fibrous roots (Carluccio et al., 2022). The anatomical structure of cassava tuberous roots is well established by 90 days after planting (Figueiredo et al., 2015). The mature tuberous root has five tissue layers visible on the transverse section. The epidermis, cambium, secondary phloem, parenchymatic cells, secondary xylem, and primary xylem are located from the outside in (Carvalho et al., 2002). Secondary growth is marked by the formation of vascular cambium, which produces secondary phloem tissue externally and secondary xylem tissue internally. After that, the secondary xylem differentiated into parenchyma cells, which expanded and accumulated starch (Lowe et al., 1982;Carvalho et al., 2018).
The occurrence of cassava tuberous root is a complicated process involving related endogenous and environmental signals, which requires various cell types to carry out different biological functions (Zierer et al., 2021). Currently, transcriptome research on the initiation and development of the cassava tuberous root mainly focused on the analysis of several metabolic pathways, including glycolysis/gluconeogenesis, amino acid metabolism, lipid metabolism (Sojikul et al., 2015;Rüscher et al., 2021). A significant advantage of a multi-omics analysis that integrates proteomics, transcriptomics, and metabolomics is the ability to identify highly dynamic and specific changes in how proteins or genes are expressed as roots develop (Ding et al., 2020;Utsumi et al., 2020). However, early transcriptome analysis treated the sample as a homogeneous material and averaged differences across hundreds or thousands of cells (Shaw et al., 2021). The patterns of gene expression at the cell level cannot be clearly revealed by a bulk gene expression profile from one tissue (Iqbal et al., 2020).
Single-cell transcriptome sequencing (scRNA-seq) has assisted researchers nowadays acquire critical new insights into cell-by-cell transcriptional analysis (Shaw et al., 2021). Since the first description of scRNA-seq (Tang et al., 2010), rapid advancements in single-cell transcriptome technology have made it an indispensable tool for research in many fields, including human, animal, and plant studies (Jovic et al., 2022). Since the preparation of plant single-cell suspension is tricker than that of animal due to the need to digest cell wall, the majority of current studies on plant single-cell primarily focus on model plants, such as Arabidopsis thaliana and Oryza sativa L. (rice) (Denyer et al., 2019;Ryu et al., 2019;Zhang et al., 2019;Liu et al., 2021b). The first scRNA-seq atlas was accomplished in Arabidopsis roots, demonstrating the viability of this technology in plants (Ryu et al., 2019). The single cell transcriptome technique can actually characterize common cell types and cell states, uncover rare cell types, and reveal, in particular, the relationship between cell differentiation and development among various cell types (Bawa et al., 2022). Since its invention, the single cell suspension technique has been used to study the genetics of maize, the regulation of tomato root initiation, and the differentiation trajectory of peanut leaf (Nelms and Walbot, 2019;Liu et al., 2021a;Xu et al., 2021;Omary et al., 2022). However, little is known about its application in tissues rich in secondary metabolites.
Early development is crucial in determining final tuber yields (Lowe et al., 1982). In the study, scRNA-seq was employed to explore the cell-by-cell heterogeneity of the cassava tuberous root at an early developmental stage, to track the differentiation trajectory of vascular tissue and cortical tissue and to examine functional enrichment among various tissues. It is projected that a high-resolution single-cell transcriptome atlas of the cassava tuberous root will be established, which will overcome the challenge of single-cell research in tissues rich in secondary metabolites and offer new insights into the tuberization and development of the root.

Plant materials and growth conditions
The tuberous roots of cassava 'South China 8' (SC8) were used in this study. The materials were planted in Hainan University, Haikou City, Hainan Province, China (20°3΄47˝N, 110°19΄46˝E), at an elevation of 11 m, under field conditions. The climate is classified as tropical monsoon climate. The materials were planted on 2 and 3 April 2021 and collected on 2 July of the same year.

Preparation of tuberous root samples for ScRNA-seq
After collection, the tuberous roots were brushed and rinsed with water to move surface impurities and part of brown periderm. Fresh tuberous roots with the diameter of 0.5 cm (S1) and 0.75 cm (S2) were rapidly cut into small pieces and immersed in enzyme solution (10 mM MES, pH 5.7, 0.5 M mannitol, 2 mM KCl, 1.5% (w/v) Cellulose R10, 0.75% (w/v) Macerozyme, 0.1% (w/v) Pectinase, 1% (w/v) Cellulose RS, 10 mM CaCl 2 , 0.5% (w/v) BSA) at vacuum pump at room temperature for 10 min. Then the mixture was placed for 2-3 hours at 30°C and 65 rpm in dark and beat upon every 30 min to make the cell wall completely hydrolyzed and fully release the cells. The released protoplasts were filtered through 70 mm cell strainer, and rinsed with W5 (154 mM NaCl, 125 mM CaCl 2 , 5 mM KCl, 5 mM glucose, 1.5 mM NaCl, pH 5.6). The filtrate was centrifuged at 100 g at 4°C for 10 min. The precipitate was resuspended by adding WB (0.4 M mannitol, 0.5% BSA) and then centrifuged at 150 g at 4°C for 4 min. The precipitate was filtered again through 40 mm cell strainer. The process of centrifugation and resuspension mentioned above were repeated for protoplast solution. The oversized protoplasts and impurities were successfully filtered out, and a pure protoplast suspension was obtained. The protoplast viability was determined by trypan blue staining. The ratio of viable cells to total cells for S1 and S2 was >85% to meet the sequencing requirements. The concentration of protoplasts in each sample was counted by Countess ® II Automated Cell Counter (Thermofisher) and adjusted to 1000~2000 cells/mL.

ScRNA-seq library construction and sequencing
The single-cell suspensions of tuberous roots were loaded on a 10× Genomics GemCode Single-cell instrument to generate single-cell gel beads in emulsion (GEMs). The scRNA-seq library was prepared using the Chromium Next GEM Single Cell 3' Reagent Kits v3.1 (10× Genomics, PN-1000121). The library was sequenced by Illumina sequencer NovaSeq 6000 using 150 bp paired-end kit. The raw scRNA-seq dataset was comprised of Read1, Read2, and i7 index read.

Cell clustering and identification of marker genes
The cells by gene matrices for each sample were individually imported to Seurat (version 3.1.1) for downstream analysis. Cells with unusually high number of UMIs (≥8000) or mitochondrial gene percent (≥10%) were filtered out. We also excluded cells with less than 540 or more than 5900 genes detected. Each individual dataset was scaled and normalized by the NormalizeData functions. The top 2000 highly variable genes were used for Principal Component Analysis (PCA) dimensionality reduction. The top 20 Principal Components (PCs) were selected according to the bending map of PCA, and the resolution parameter 0.3 was used for clustering (Liu et al., 2021b). Uniform Manifold Approximation and Projection (UMAP) (Becht et al., 2019) and t-Distributed Stochastic Neighbor Embedding (t-SNE) (Van der Maaten and Hinton, 2008) made clustering results visualized and easy to explore.
To determine the cell types of cassava tuberous root, marker genes were selected to identify 19 cell clusters. By homologous alignment of the genes of Arabidopsis thaliana and Oryza sativa, cell-specific marker genes were obtained. These marker genes were used for cell type annotation.

Pseudo time analysis
To observe the differentiation relationships among different cell cluster, we used Monocle (version 2.0) to construct its single cell pseudotime differentiation trajectory. To understand the key metabolic pathways related to the development and differentiation of vascular tissue, we performed pseudo-timeaxis differential gene analysis of xylem, phloem and procambium. The genes with the top 100 P values in the branch of pseudo-time-axis were preserved and analyzed for functional enrichment, respectively. Subsequently, we drawn heat mapped of genes that are included in the significantly enriched pathways.

ScRNA-seq and identification of cassava tuberous root cell clusters
We performed high-throughput scRNA-seq for plant tissues from cassava tuberous roots with diameters of about 0.5 cm (S1) and 0.75 cm (S2), respectively. After preparing protoplast suspension and capturing single cells with 10×Genomics, we successfully detected 7511 cells at S1 and 7055 cells at S2. The average number of genes detected per cell was 1274 and 1209, respectively (Supplemental Table 1). PCA was used to project the data into linear dimensions. To effectively convert the highdimensional data into the two-dimensional image, t-SNE or UMAP was used to further reduce the dimension of the cells. These unsupervised clustering analyses divided these cells into 19 cell clusters. The cell number of each cluster accounted for 0.3%-32.02% of the total cell number ( Figure 1A; Supplemental Table 2).
The combination of up-regulated genes analysis and pseudotime analysis was a good way to identify the clusters that were difficult to distinguish. LINAMARIN SYNTHASE 1 (UGT85K4), the up-regulated gene of cluster 8, was highly expressed in cortical tissues (Kannangara et al., 2011) ( Figure 1B). The results of pseudotime analysis showed that cluster 8 was in the same branch with endodermis (Supplemental Figures 1A, B). The Manes_01G055100 (Avr9/ Cf-9 rapidly elicited protein) is homologous to marker gene AT1G52140 (Avr9/Cf-9 rapidly elicited protein) of Arabidopsis phloem (Wendrich et al., 2020), is highly expressed in cluster 5 ( Figure 1B). By pseudotime analysis we found that cluster 5, in line with xylem, was differentiated from procambium (Supplement Figures 1C, D). It was consistent with the previous results on the differentiation relationship between phloem and procambium (Lucas et al., 2013). Therefore, we defined cluster 8 as endodermis and cluster 5 as phloem (Supplemental Table 4).

Differentiation trajectory of xylem and phloem
We observed that xylem and phloem were differentiated from procambium, which is in keeping with the results of previous studies (Lucas et al., 2013). After a symbolic bifurcation point which marks developmental state transition, procambium cells were divided into two differentiated branches: one was xylem (branch 1), and the other was phloem (branch 2) ( Figure 2A). From the cell trajectory diagram of differentiated fate, PIP1-2, the xylem marker gene, was highly expressed in the branch 1, while Manes_01G055100 that is the phloem marker gene was highly expressed in the branch 2. The pseudotime score gradually increased along the branches ( Figures 2C, D). In addition, the number of xylem cells at S2 was more than that at S1, and the number of phloem cells at S2 was less than that at S1 ( Figure 2B). It is consistent with the fact that the cell number of xylem increase with cassava tuberous root enlargement (Figueiredo et al., 2015).
In order to explore the key pathways involved in the regulation of the vascular tissues of cassava tuberous root, we performed functional enrichment analysis of differential genes in xylem and phloem on the pseudotime axis. The results showed that xylem differential genes were mainly related to organic metabolism and response to external stimuli, suggesting that they may be involved in the process of starch accumulation and water transportation in xylem (Spicer and Groover, 2010). The phloem differential genes were mainly related to the regulation of macromolecule metabolism, which indicated that they may be involved in the transportation of products of photosynthesis in phloem (Fukuda and Ohashi-Ito, 2019). The differential genes of procambium were enriched in small molecular metabolic pathways, suggesting that it may be sensitive to plant hormones such as auxin ( Figure 2E; Supplemental Table 6).

Differentiation trajectory of the cortical tissue
In the plant root, the cortical tissue is the basic organization between epidermis and vascular bundle and plays an important role in the storage of carbohydrates and temporary metabolites. We observed that there was a differentiation relationship between cortical tissue and meristem of the cassava tuberous roots. In the cell trajectory diagram of the pseudotime analysis, meristem cells served as the starting branch of differentiation and then gradually differentiated into two branches: endodermal cells were in branch 1, exodermis cells were in branch 2, and cortex cells were scattered in branch 1 and branch 2 ( Figure 3A).
The number of cells of cortical tissue at S2 was significantly less than S1 ( Figure 3B). It is consistent with the results that the cortical tissue became thinner with the tuberous root development (Figueiredo et al., 2015).
To provide insights into the biological functions of the cortical tissue and the meristem, we conducted functional enrichment analysis of them. We found that genes related to plant disease-resistant microorganisms and antioxidants were enriched in the cortical tissue, suggesting that it might be the tissue that provided biological protection for the tuberous root. The exodermis was mainly enriched with genes interacting with plant-pathogen interactions. Among these genes, we found several homologous genes of cyclic nucleotide-gated ion channels (CNGCs) (Leng et al., 1999) from animals, which penetrated K + and Ca 2+ in a cyclic nucleotide-dependent manner and had a strong selectivity for Na + (Clough et al., 2000). At the same time, we also observed that many calcium-binding proteins and calcium-dependent proteins were enriched in this pathway. It indicated that they played an important role in resisting pathogen invasion ( Figure 3C; Supplemental Table 7). The endodermis highly expressed with the genes related to flavonoid biosynthesis and ascorbate and aldarate metabolism, suggesting that these substances might be synthesized in the endodermis ( Figure 3D; Supplemental Table 8). The genes that highly expressed in the meristem associated with the ribosomal activity and oxidative phosphorylation. The result was consistent with the fact that the meristem had strong mitotic activity ( Figure 3E; Supplemental Table 9).
Interestingly, through the analysis of up-regulated genes, we found that cluster 8 and endodermis shared some significantly up-regulated genes, such as MLP-like protein 28 (MLP28), Manes_04G078500, Dormancy-associated protein 1 (DRM1) and so on. By comparing the significantly up-regulated genes in cluster 8, we found that the significantly up-regulated genes of cluster 8, such as 36.4 kDa proline-rich protein (TPRP-F1), were also highly expressed in endodermis. And some significantly upregulated genes of cluster 8, such as ADP-ribosylation factor 1 (Os01g0813400), were expressed in xylem ( Figure 4A). Therefore, we speculated that cluster 8 might be a type of cells in between endodermis and vascular tissue. As shown in the Figures 4B, C, cluster 8 is mainly distributed in branch 2. Endodermis is mainly distributed in branch 2 and xylem is mainly distributed in branch 3. In plants, except for cells facing stem axis and the surface of root and stem, the endodermal cell wall is surrounded by lignified and suberized hydrophobic structure called casparian strip. Endodermis with casparian strip can regulate the water flow and mineral ionic equilibrium between external tissue and vascular bundle (Nagahashi et al., 1974). However, because the cell number of casparian strip is scarce, we divided cluster 8 into four sub-clusters ( Figure 4D). As a rare cell type, the casparian strip was not readily detected by up-regulation of genes expression. Casparian strip membrane domain protein (CASP) is a specifically expressed gene in the casparian strip and plays an important role in the formation of the casparian strip (Roppolo et al., 2011). We found that CASP was relatively highly expressed in subcluster 3 ( Figure 4E). Therefore, we defined subcluster 3 as the casparian strip.

Discussion
Preparation of plant single-cell suspensions is trickier than that of animal cells due to the need to digest cell wall. New challenges and new possibilities might emerge when applying this technology in starch-rich tissues, such as the influence of free starch on suspension preparation, the interference of free starch Differentiation Trajectory of Cartical Tissue. (A) Single-cell transcriptome data revealing the differentiation trajectory of cortical tissue and meristem from S1 and S2 analyzed by Monocle 2. Each dot indicates a single cell. Different color on the dots indicates the pseudotime scores. (B) Cell types labeled on the differentiation trajectory for S1 and S2, respectively. (C-E) Scatter plots of GO and KEGG enrichment analysis of up-regulated genes in exodermis (C), endodermis (D) and meristem (E) cell types after the aggregation of S1 and S2.
on sequencing results, and the acquisition of transcriptome information. Our results might provide some reliefs for these concerns. Firstly, most of the free starch particles could be removed by multiple washing in the preparation stage of protoplast suspension. Secondly, enough cells combined with enzymes and gel beads containing barcode information were wrapped into oil droplets to form gel beads in-emulsions (GEMs). If a small amount of free starch granules were wrapped in GEMs as cells, the obtained data would also be filtered out by Cell Ranger software and Seurat software in the data analysis stage. Thirdly, the number of filtered cells is still sufficient to obtain efficient data sets as most of the cells in cassava tuberous roots are less than 40 microns in diameter, even starch-rich cells can be detected by their pore size. Finally, the amplified products meet the sequencing standards, and the transcriptional information is normal, indicating that starch does not affect the reverse transcription results at the sequencing stage. Therefore, the application of single cell transcriptome in starch-rich plant tissues is feasible.
The cortex which is a ground tissue and lies between the epidermis and vascular tissue, provide support, and perform metabolic processes in plant root (Soukup et al., 2005). The cortex of Arabidopsis, which consists of a single layer of cells, is derived from cortex initial cells (Petricka et al., 2012). However, according to Figueiredo et al. (2015), cassava tuberous roots are the result of secondary growth, and the cortex only exists in the organs of primary growth or the early stages of secondary growth. Therefore, the term cortex should not be used to describe cassava tuberous roots. However, in this study, we identified a small number of cells of cortex at S1 and S2 of tuberous roots, accounting for 0.97% and 0.48% of the total cells, respectively. It fully demonstrated the advantages of single-cell transcriptome technology compared to section technology in observing clusters with rare number of cells.
The pseudotime analysis is not only a way to study cell differentiation trajectories, but also an approach to mine rare cell types. Since single-cell research relied on the number of captured cells, the identification of the cell clustering with a little number of cells existed obstacles. The casparian strip is deposited on the radial parts of cell wall of the endodermal cells (Chen et al., 2011) and acts as a barrier for the diffusion of material through the cell wall from cortical tissues to vascular tissues (Nagahashi et al., 1974). We observed that some up-regulated genes in cluster 8 were expressed in both endodermis and xylem, suggesting that cluster 8 might be associated with endodermis and xylem. Through pseudotime analysis, we further confirmed subcluster 3 of cluster 8 as the casparian strip. According to the pseudotime analysis in our results, the casparian strip was differentiated from endodermis, which was consistent with previous studies (Bonnett, 1969;Robards and Robb, 1972). Although pseudotime analysis did not reveal a differentiated relationship between the casparian strip and xylem, there were up-regulated genes related to lignification in both parts. This result is consistent with previous results that both the casparian strip and xylem were highly lignified tissues, containing a large amount of lignin and similar cell wall components (Zeier et al., 1999;Spicer and Groover, 2010).
In conclusion, we successfully mapped a single cell transcriptome atlas of starch-rich cassava tuberous root, observed the differentiation relationships among the cell clusters by pseudotime analysis, and identified the rare cell type, the casparian strip. This study laid a foundation for mining the development and genetic improvement of cassava tuberous root.

Data availability statement
The original contributions presented in the study are publicly available. This data can be found here: NCBI, PRJNA895163, https://www.ncbi.nlm.nih.gov/bioproject/PRJNA895163.

Publisher's note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

SUPPLEMENTARY FIGURE 1
Identification of Cluster 5. (A) Single-cell Transcriptome data of endodermis, cluster 8 and phloem from S1 and S2 analyzed by Monocle 2, revealing the differentiation trajectory of them. Each dot indicates a single cell. Different color on the dots denotes the pseudotime score. (B) Cell types labeled on the differentiation trajectory for S1 and S2, respectively. (C) Single-cell Transcriptome data of xylem, cluster 5 and procambium from S1 and S2 analyzed by Monocle 2, revealing the differentiation trajectory of them. Each dot indicates a single cell. Color on the dots indicates the pseudotime score. (D) Cell types labeled on the differentiation trajectory of S1 and S2, respectively.

SUPPLEMENTARY FIGURE 2
Identification of Cluster 10. (A-C) Scatter plots of GO enrichment analysis of up-regulated genes for cluster 6 (A), cluster 7 (B) and cluster 10 (C) cell types after the aggregation of S1 and S2. (D) Scatter plots of GO enrichment analysis of 56 up-regulated genes specific to cluster 10 cell types after the aggregation of S1 and S2.

SUPPLEMENTARY FIGURE 3
The Umap visualization of each cluster in the cassava tuberous root of S1 and S2, respectively. (A) Umap visualization of each cluster in the cassava tuberous root of S1 stage. Each dot indicates a single cell. Each color indicates a cell cluster. (B) Umap visualization of each cluster in the cassava tuberous root of S2 stage. Each dot indicates a single cell. Each color indicates a cell cluster. (C) The cross section of the Cassava tuberous root.