A Pipeline for Reconstructing Somatic Copy Number Alternation’s Subclonal Population-Based Next-Generation Sequencing Data

State-of-the-art next-generation sequencing (NGS)-based subclonal reconstruction methods perform poorly on somatic copy number alternations (SCNAs), due to not only it needs to simultaneously estimate the subclonal population frequency and the absolute copy number for each SCNA, but also there exist complex bias and noise in the tumor and its paired normal sequencing data. Both existing NGS-based SCNA detection methods and SCNA’s subclonal population frequency inferring tools use the read count on radio (RCR) of tumor to its paired normal as the key feature of tumor sequencing data; however, the sequencing error and bias have great impact on RCR, which leads to a large number of redundant SCNA segments that make the subsequent process of SCNA’s subclonal population frequency inferring and subclonal reconstruction time-consuming and inaccurate. We perform a mathematical analysis of the solution number of SCNA’s subclonal frequency, and we propose a computational algorithm to reduce the impact of false breakpoints based on it. We construct a new probability model that incorporates the RCR bias correction algorithm, and by stringing it with the false breakpoint filtering algorithm, we construct a whole SCNA’s subclonal population reconstruction pipeline. The experimental result shows that our pipeline outperforms the existing subclonal reconstruction programs both on simulated data and TCGA data. Source code is publicly available as a Python package at https://github.com/dustincys/msphy-SCNAClonal.


INTRODUCTION
Tumor heterogeneity introduces challenges in cancer tissue diagnosis and subsequent treatment (Nowell, 1976). Tumor heterogeneity cannot be inferred by the properties of biomolecular through the ontology or pathway analysis (Cheng et al., 2017;Cheng et al., 2018c), but could be inferred by measuring the quantity of biomoleculars (Cheng et al., 2018b;Cheng et al., 2018d;Cheng et al., 2019). To decipher cell composition in bulk cells, somatic copy number alternations (SCNAs), most commonly found in tumor cells (Beroukhim et al., 2010), are utilized as the representative to determine tumor subclonal populations in a tumor-normal tissue paired manner (Oesper et al., 2013;Li and Xie, 2015).
The benefit of using SCNA to conduct subclonal reconstruction is that the WGS data doesn't have to be deeply sequenced (Li and Xie, 2015), because SCNA affects large, multi-kilobase-sized or megabase-sized regions of the genome, which allows the average copy number of these regions to be accurately estimated with whole genome sequencing (WGS) (Deshwar et al., 2015).
SCNA's subclonal reconstruction algorithms attempt to infer the population structure of heterozygous tumors based on the subclonal population frequency of SCNA (Deshwar et al., 2015). However, the cellular prevalence and the absolute copy number are intertwined and next-generation sequencing (NGS)-based subclonal reconstruction needs to simultaneously estimate population frequency and the absolute copy number for each SCNA. The solution space of subclonal frequency of SCNA remains poorly understood, and there might exist multiple solutions for subclonal frequency for some SCNAs (Oesper et al., 2013), which makes the infinite site assumptions (ISAs) (Kimura, 1969;Hudson, 1983;Jiao et al., 2014) invalid. ISA is the commonly accepted and powerful assumption, which posits that each mutation occurs only once in the evolutionary history of the tumor.
To infer the SCNA's subclonal population frequency based on NGS data, the location of SCNAs in the genome needs to be obtained first. The SCNA breakpoints are detected through multiple bin-merging processes, during which rcr of tumor to its paired normal is used as a key feature (Xi et al., 2010). However, the sequencing error and bias have great impact on RCR, which leads to false positive breakpoints and incorrect subclonal reconstruction (Please refer to Figures S2 and S3, Tables S2 and S3 in the Supplementary). The higher sensitivity the SCNA detection tools show, the more prone to the sequencing error the tools would be. For example, BIC-seq (Xi et al., 2010) first splits whole genome into small bins, then uses the Bayesian Information Criterion as the bin merging and stopping criterion to detect SCNA breakpoints. When sensitivity parameter l of BIC-seq is very high, the true positive rate and the false discovery rate will decrease simultaneously (Xi et al., 2010), which means the SCNA regions will be separated into small fragments by the false positive breakpoints (Xi et al., 2010). The choice of parameter l is equivalent to setting type I error; in other words, when performing the loop of combining windows, two neighboring windows that should be combined are left separated apart. Since the reconstruction algorithm of subclone depends on the proportion of subclone populations of somatic mutation to define mutation set and its subpopulation (Deshwar et al., 2015) (Please refer to Figure S4 for the definition of subpopulation and subclonal population), in order to more precisely estimate the subclonal population ratio of every SCNA fragment, we need to choose a smaller l to ensure the high true positive rate of breakpoints, so as to more accurately estimate the subclonal population frequency. However, the false positive breakpoints split the SCNA regions into many small SCNA fragments, which violates ISA and results in many redundant input data and causes the subclone reconstruction process to be extremely slow and time consuming.
Existing (NGS) based subclonal reconstruction methods, such as ThetA (Oesper et al., 2013) and Mixclone (Li and Xie, 2015), use expectation maximation (EM) or maximum likelihood method (MLM) to infer the subclonal frequency and the absolute copy number of every input data. To reduce the searching space, MixClone assumes that the number of subclonal population is less than 3, and this number (1 or 2) needs to be predefined. During the maximization step of the EM process, MixClone assumes the subclonal frequencies of all the subclonal population only equal to several combinations of discrete values to further reduce the searching space. Thus, MixClone's accuracy is compromised for speed of computation. On the other side, Theta (Oesper et al., 2013) does not make any compromise on searching space. Thus, Theta is extremely time consuming while search optimal subclonal frequency in (0,1) for every input data, which makes it unable to perform subclonal reconstruction for more than three subclonal populations.
With the ever increasing data of biotechnology comes the chance of developing computational toolkit (Cheng et al., 2016;Cheng et al., 2018a;Cheng et al., 2019) to find out the pathogeny of diseases; in this article, we provide a pipeline for reconstructing SCNA's subclonal population-based NGS data. We first perform a mathematical analysis of the solution number of SCNA's subclonal frequency, propose and prove the theorem of solution number of SCNA's subclonal frequency, and present a method to filter out false SCNA breakpoints based on it. Then we propose a probability model that incorporates rcr bias correction algorithm we previously developed, and we construct an SCNA's subclonal population reconstruction pipeline by stringing it with the false breakpoint filtering algorithm. We model the read depth of tumor sample as a Poisson distribution with the expected tumor read count proportional to the absolute copy number and subclonal frequency. We use the tree-structured stick breaking Dirichlet process (Prescott Adams et al., 2010) to generate the tree structure of tumor's evolutionary history, and use the Markov Chain Monte Carlo (MCMC) to obtain the result of subclonal reconstruction. The experimental result shows that our pipeline outperforms the existing subclonal reconstruction programs both on simulated data and TCGA data.

Solution Space of SCNA's Subclonal Population Frequency
The RCR and the b-allele frequency (BAF) of the heterozygous single nucleotide polymorphism (SNP) locus in the SCNA segment are commonly used as input for the sequencing databased SCNA's copy number and subclonal frequency inferring tools (Wang et al., 2007;Oesper et al., 2013;Li and Xie, 2015). Since the number of reads mapped in certain genome region is proportional to the copy number of this region, the RCR is set to be proportional to C j 2 by existing tools (Oesper et al., 2013;Li and Xie, 2015), where C j 2 denotes its average copy number of the jth SCNA segment. Let f j denote the subclonal population cellular prevalence of the jth SCNA segment; C T j denote its absolute copy number; m T jk represent the BAF of the kth heterozygous SNP locus in the jth SCNA segment; m j represent the average BAF of the kth heterozygous SNP locus in the jth SCNA segment. Then we have the following equation set (1) where K j is the total number of heterozygous SNP loci in the jth SCNA segment. Since the B allele locates either in paternal or maternal haploid, both m T jk and (1 − m T jk ) could possibly be the BAF value in the same SCNA fragment and both m jk and (1 − m jk ) could possibly be the average BAF value in the same SCNA fragment. To reduce the complexity, we usem T jk to denote the smaller one of m T jk and (1 − m T jk ); b m jk to denote the smaller one of m T jk and (1 − m jk ). Here we give a theorem to help answer the solution space of equation set 1 and we prove it in the Supporting Information.
THEOREM 1. Given C j and f b m jk g K j k=1 and let x = The multiple solution area would be C j ∈ (2, min(C j′ , C j″) ) and b m jk ∈ ( min (m T j 0 k ,m T j 00 k ), 2).
As shown in Figure 1, given the observation value C j and b m jk and maximum copy number C max = 15, only 7/43 of the curves of the family of function b Table S1 for the detail information of multi-solution range).

The Algorithm of Filtering Out False Positive SCNA Breakpoints
We assume that there are no two adjacent SCNAs that present the same C j and b m jk and meanwhile the different f j and C T j according to Theorem 1. We use the same method described in Li and Xie (2015) to model the read count ratio of tumor and its paired normal. Based on the Lander-Waterman model (Lander and Waterman, 1988), the probability of sampling a read from a given segment depends on three main factors: 1) its copy number, 2) its total genomic length, and 3) its mappability, which depends on factors such as repetitive sequence and GC content (Li and Xie, 2015). For each segment j, we associate a coefficient j) to account for the effect of its mappability and genomic length. Thus, the expected tumor read counts mapped to segment j, which is denoted as l j , are proportional to C j q j . For example, for segment x and segment y, we have Because the mappability coefficients matter only in a relative sense, we take q x =q y = D N x =D N y , as these segments should have the same sequence properties between the normal and tumor samples. Thus, Equation 2 is transformed into However, our previous study (Chu et al., 2017a) has shown the RCR of tumor to its paired normal presents a log-linear GC content bias, and has described a bias correction software "Pre-SCNAClonal" (Chu et al., 2017a) to correct this bias specifically.
Let d D S i =D N i denote the corrected read count ratio of tumor sample and its paired normal, and let F() denote the bias correction process. Then we have d Then we use the following steps to filter out false positive SCNA breakpoints.
1. First, BIC-Seq with a small l is used to detect SCNA breakpoints. Then the whole genome is separated into SCNA fragments by these breakpoints. We use fs j g J j=1 to denote this SCNA fragment set. 2. Next, Pre-SCNAClonal (Chu et al., 2017a) is used to correct the bias of RCR. 3. Next, the hierarchical clustering algorithm is used to cluster of every segment with the maximum amount of cluster predefined as C max * t, where t is the number of subclonal populations. Suppose in this step, there are N clusters obtained by the hierarchical clustering algorithm. We denote the nth cluster as S n where n = 1, 2,…, N. For convenience, we call this step the aggregation step. 4. Next, the MeanShift algorithm is used to perform an to represent the cluster index. Then for every s j ∈ S n we define the BAF cluster of s j to be the BAF cluster of f b with the largest number. Then each S n is split into subclusters fS n,m g M n m=1 based on the BAF cluster of each s j . For convenience, we call this step the decomposition step. 5. For each S n,m , n = 1,2,…,N, m = 1,2,..,M n , we merge two adjacent SCNA fragments, which are on the same chromosome and the distance between them is less than a predefined threshold r.

Normal Segments Detection Method
The task of normal segments detection is to find out all the segments that C j = 2, since the copy number C N j in s j in normal sample equals 2, normally. A cancer genome differs from the reference genome by gains and losses of segments, or intervals, of the reference genome (Oesper et al., 2013).
However, due to two different sequencing processes and the coverage may not exactly be the same for tumor and its paired normal, d D S j =D N j does not always equal to 1 for the normal segments (Li and Xie, 2015). In this paper, we use the same normal segments detection method described in our previous work (Chu et al., 2017a), which utilizes BAF information to detect normal segments.
Equation set 1 implies following conclusion We detect the normal segments N t m from S t m according to Equation 5 by the following two steps. First, we filter out all the segments s j ∈ S t m with m T jk ≠ 1 2 for k = 1, …, K s j . In the remaining segments, the possible C T j could be any one in {0, 2, 4,…}, since all the possible genotypes G T jk of allele at the kth site for m T jk = 1 2 could be any one in {∅, PM, PPMM,…}. Next, we obtain all the normal segments N t m from these segments by selecting the segments with the read depth d S jk at the kth heterozygous SNP site equal to the coverage of the aligned WGS data of the tumor sample. Figure 2 shows the probabilistic graphical model of SCNA's subclonal population frequency. In this figure, S denotes the set of all the SCNA segments; N denotes the set of segments that contain no SCNA. We use the same method described in Li's study (Li and Xie, 2015) to set the probability of BAF to obey binomial distribution

The Probability Model of Subclonal Population Frequency
where b S jk denotes the number of tumor reads that contain B allele at the kth heterogeneous SNP locus and d S jk denotes the total number of tumor reads mapped at this locus. In this figure, G T jk denote the allele's genotype at the kth heterogeneous snp locus in segment s j .
According to Equation 4, we have the expected tumor read counts mapped to segment j where F −1 () denotes the reverse process of bias correction. Let |N| denote the number of baseline segments (Li and Xie, 2015) (in which the absolute copy number C T j = 2). We use the average of read count's log ratio of all the baseline to calculate the expectation of tumor read count, and model the tumor read count as a Poisson distribution It could be deduced from the first equation in Equation set 1 that C j > 2 ⇔ C T j > 2. Therefore, we may conclude that d D S j =D N j > ϑ ⇔ C T j > 2, since C i must equal 2 if s i contains no SCNA. We set C T j obeys the categorical distribution where function ς (ϑ) denotes C T j 's range; The subclonal population frequency of certain mutation equals the sum of all its subpopulation frequencies (for details, refer to Figure S1 in the Supplementary), and all the subpopulation frequencies in the tumor sample sums to 1. Therefore, all the subpopulation frequencies in the tumor sample obey the Dirichlet distribution, and this Dirichlet distribution obeys the treestructured Dirichlet process (DP) (Prescott Adams et al., 2010). Suppose there are P subpopulations in a tumor sample; let x 1 ,…, x p denote all the subpopulation frequencies where a 1 ,…, a p are the concentration parameters. In this paper, we set a 1 = … = a p = 1, then Equation 10 is transformed into a uniform distribution of (p −1)-dimension simplex. Therefore, the prior probability of subclonal frequency f j equals the probability of the tree structure. In Figure 2, G denotes the tree-structured DP; H denotes the base distribution; a and g are the scaling parameters of G.
We use MCMC to obtain the prior distribution of f j since the probability of tree-structured DP cannot be explicitly expressed. We use the slice sampling method described in Prescott's study (Prescott Adams et al., 2010) to generate tree structure. The complete posterior probability of the subclonal population frequencies of all the SCNA segments where T denotes the tree structure, and N denotes a node in T. We select the tree structure with maximum posterior probability T max = arg max Pr where T (i) and ff j g (i) s j ∈SnN denote tree structure and subclonal population frequencies of the ith sampling process. The absolute copy number of the ith sampling process is FIGURE 2 | Bayesian network model for subclonal population frequency. In this figure, G denotes the tree-structured Dirichlet process; H denotes the base distribution; a and g are the scaling parameters of G; f j denotes the subclonal frequency of SCNA in segment s j ; D S j denotes the number of tumor reads mapped in segment s j , while D N j denotes the number of normal reads mapped in segment s j ; C T j denotes the absolute copy number of SCNA in segment s j ; ϑ denotes the geometric mean of the read count ratio of all the baseline segments N; C max is the maximum absolute copy number predefined; G T jk denotes the tumor genotype of the kth heterozygous SNP loci in the jth segments s j ; u T jk denotes the tumor BAF of the kth heterozygous SNP loci in the jth segments s j ;b S jk and d S jk denote the number of B-allele and the total allele at the kth heterozygous SNP loci in the jth segments s j .

Chu et al. SCNA's Subclonal Reconstruction Pipeline
Frontiers in Genetics | www.frontiersin.org February 2020 | Volume 10 | Article 1374 where fC T j g (i) s j ∈SnN are absolute copy numbers with the maximum posterior probability if the i'-th sampling process is the solution of Equation 12.

The Pipeline for Reconstructing SCNA's Subclonal Population-Based NGS Data
As shown in Figure 3, the pipeline consists of five models. The tumor and its paired normal sequence alignment sequencing data in BAM format are used as input of the pipeline. The SCNA segments are detected by BIC-seq (Xi et al., 2010), then the bias of read count ratio is corrected by the correction model (Chu et al., 2017a) we previously proposed. We filter out the false positive breakpoints by the algorithm we proposed in this paper, then we use the probability model of subclonal population frequency proposed in this paper to infer the subclonal frequency of each SCNA segment. Finally, we use the tree structure learning algorithm (Prescott Adams et al., 2010) to reconstruct the SCNA's subclonal population.

RESULTS
In this section, we evaluate the performance of probabilistic model on both simulated and real datasets and compare its performance with existing tools. Existing tools such as Mixclone (Li and Xie, 2015) and TheatA (Oesper et al., 2013) could not calculate the subclonal frequencies of more than three subclonal populations. Therefore, we use the simulated data, which contain more than three subclonal populations and TCGA benchmark data together to evaluate our model.

Results From Simulated Data
We use Pysubsim-tree (Chu et al., 2017b) to simulate a tumor's NGS read alignment data from Chromosome 21 with the evolution history configuration shown in Figure 4 and the acquired SCNA's configuration listed in Table 1. In Figure 4, each circle represents a subpopulation; the squares with character a, b, c, d, e, and f represent five SCNAs; the number on the right side of the circle is the frequency of the subpopulation.
We set the first 50 cycles of the MCMC sampling process as burn-in and use the result of the following 300 cycles to calculate the probability of the evolutionary relationship between subpopulations. We set a = 1.0, g = 1.0, H to be the uniform distribution. Figures 5A, B are the dot-plots of the distribution of the output of subclonal population frequency model. Figure  5C shows the partial order plot (Jiao et al., 2014) of the evolutionary relationship obtained by the model proposed in this paper. The arrows in this figure denote the direct evolutionary relationship of the two subpopulations. The width of the arrow denotes the probability of this evolutionary relationship present in the 300 cycles of the MCMC process. Suppose fT i g I i=1 denotes all the trees obtained in all the cycles of the MCMC process, ab ! denotes the evolutionary relationship from subpopulation a to b. Then the probability of this evolutionary relationship is According to Theorem 1, a and e have only one solution of f j while the others are not. The distribution of absolute copy numbers shown in Figure 5A is consistent with Theorem 1. The distribution of e's subclonal frequency is quite scattered in Figure 5B because the small subclonal frequency and the absolute copy number of e (closed to normal) cause the coverage to decrease by 5%, which is almost the same as the noise. The subclonal frequencies of other SCNAs are highly distributed at the positions of subclonal frequencies listed in Table 1. Each SCNA's absolute copy number and subclonal frequency with the maximum posterior probability are listed in Table 2. The subclonal frequencies of b and c are not correct because they have multiple solutions of subclonal frequencies according to Theorem 1, while the others are correct. The distribution of absolute copy number and subclonal frequency in Figure 5 and the result listed in Table 2 show that our SCNA probability model could correctly calculate the subclonal frequency of SCNA.

Results From Breast Cancer Sequencing Data
We use the ngs data "HCC1954-spiked1-n25t35s40" and "HCC1954-spiked1-n25t55s20" (denoted as "n25t35s40" and "n25t55s20" for convenience) of Cancer Genome Atlas (TCGA) Benchmark 4 dataset, which is publicly available at the National Cancer Institute GDC Data Portal (https://gdc.cancer.gov/ resources-tcga-users/tcga-mutation-calling-benchmark-4-files) to further validate the subclonal frequency model proposed in this paper. HCC1954 is an immortal cell line derived from an invasive ductal carcinoma of the breast diagnosed in a 61-year-old woman (Bignell et al., 2007). "G15512.HCC1954.1" is the NGS data of this cell line, which contains one subclonal population with purity 0.99; however, this data has no ground truth of absolute copy number of the SCNA regions. "HCC1954-spiked1-n25t35s40" is generated by merging 35% of "G15512.HCC1954.1" with 25% of its paired normal NGS data and 40% of "G15512.HCC1954.1" with some SCNAs randomly spiked in it. Therefore, there are two subclonal populations in the tumor sample "HCC1954-spiked1-n25t35s40," and their subclonal frequencies are 75% and 40%, respectively. The ISA is invalid since each subclonal population contains multiple SCNAs; thus, we set the prior probability of tree structure to obey uniform distribution, and thus Equation 11 could be rewritten as follows: (15) Figure 6 shows the subclonal frequencies obtained by the model proposed in this paper. In this figure, "P" denotes the parent subclonal population (subclonal frequency 75%) and "C" denotes the child subclonal population (subclonal frequency 40%). As shown in Figure 6, the subclonal frequencies of these two population obtained by the model proposed in this paper are 72% and 42% for sample "n25t35s40" and 77% and 25% for sample "n25t55s20," which are the most closed to the fact in comparison with MixClone and ThetA.

DISCUSSION
Generally, SCNAs with larger subclonal population frequency could relatively be more precisely located. However, due to the FIGURE 4 | The evolution process of subclonal population in the simulation data. In this figure, each circle denotes a subpopulation; the number on the left is its frequency; each square inside the circle denotes an SCNA; each arrow points an offspring subpopulation. twice sequencing procedures of tumor and its paired normal, the read information of the genomic regions with the same copy number in tumor sample is not exactly the same as its paired normal's. Moreover, the lower read coverage of NGS makes the noise/error more likely to be mistaken for an SCNA. As shown in Figure 7, the number of SCNA breakpoints obtained by SCNA detection tool is proportional to the subclonal population frequency. If there exists a large proportion of false negative breakpoints, it will cause the read count in the segments incapable to reveal the copy number property, then it will affect all the read count-based SCNA analysis tools. On the other hand, if there exists a large proportion of false positive breakpoints, the segment clustering step of filtering out the false positive breakpoints could reduce the data size and make the read count information more robust to noise by merging the SCNA segments with the same absolute copy number and subclonal population frequency. As shown in Theorem 1, the SCNA segments with the same RCR and average B-allele frequency are indistinguishable to the NGS-based SCNA analysis tools. Merging two non-adjacent SCNA segments with the same NGS properties could not affect the result of the NGSbased SCNA analysis tools. Tree-Structured Stick Breaking (TSSB) process (Prescott Adams et al., 2010) could learn the tree structure of the hierarchical data. A tree structure space could be generated  by intertwining two DP; then as described in Prescott's paper (Prescott Adams et al., 2010), one can imagine throwing a dart (data) on the tree space and considering which node the dart hits. If we know subclonal number L in advance, then we could generate the tree structure in two steps.
Step 1: generate a tree using all the data; Step 2: sort nodes by the sum of the size of the genome region hit, then find out the top L nodes and throw the rest of the darts (data not in the L nodes) into these L nodes randomly. Figure 7 shows that subclonal frequency affects the number of breakpoints; thus, there might present false positive or false negative breakpoints in the result of the SCNA detection tool. The false positive breakpoints could be filtered out by the algorithm in this paper. Even if there exist false breakpoints, the redundant data that contains the same SCNA might hit the same node in the tree space generated by the TSSB process. Thus, the redundant data affects the time FIGURE 6 | The subclonal proportion of SCNAs in HCC1954 data. In this figure, SCNAModel is the subclonal frequency inferring model proposed in this paper.

CONCLUSION
In this paper, we first perform a mathematical analysis of the solution space of SCNA's subclonal frequency. Then based on the mathematical analysis, we propose an algorithm to filter out the false breakpoints and we construct a new probability model to reconstruct SCNA's subclonal population, which incorporates the algorithms of RCR bias correction we previously proposed. We use the tree-structured stick breaking DP (Prescott Adams et al., 2010) to generate the tree structure space of tumor's evolutionary history. In the probability model, the BAF of the heterozygous SNP locus in the SCNA segment is modeled as a binomial distribution and the read depth of tumor sampling data is modeled as a Poisson distribution with respect to the potential bias in RCR. We generate the distribution of subclonal frequency from the distribution of subpopulation frequency, which is drawn from the tree structure space. By stringing the model with the false breakpoint filtering algorithm, we construct a whole SCNA's subclonal population reconstruction pipeline, which is capable of inferring SCNA's absolute copy number and its subclonal population frequency and its evolutionary process while there are a lot of false positive SCNA breakpoints and the RCR presents bias. The results show that the model proposed in this paper could more accurately estimate the absolute copy number of SCNA segments and their subclonal population frequencies in comparison with existing methods both on simulated data and TCGA data.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. This data can be found here: https://gdc.cancer.gov/resources-tcga-users/ tcga-mutation-calling-benchmark-4-files.

AUTHOR CONTRIBUTIONS
YC: Coming up with the theories and all the mathematical equations in this paper and implemented the initial version of P-SCNAClonal, the initial version of this paper. CN: Debugging of the initial version of P-SCNAClonal, experiments and result collecting, completed this paper with the result section. YW: Providing the basic idea and funding support.