Challenges and opportunities in estimating viral genetic diversity from next-generation sequencing data

Beerenwinkel, Niko; Günthard, Huldrych  F.; Roth, Volker; Metzner, Karin  J.

doi:10.3389/fmicb.2012.00329

REVIEW article

Front. Microbiol., 11 September 2012

Sec. Virology

Volume 3 - 2012 | https://doi.org/10.3389/fmicb.2012.00329

Challenges and opportunities in estimating viral genetic diversity from next-generation sequencing data

NB
Niko Beerenwinkel ^1,2^*
HF
Huldrych F. Günthard ³
VR
Volker Roth ⁴
KJ
Karin J. Metzner ³

1. Department of Biosystems Science and Engineering, ETH Zurich Basel, Switzerland
2. Swiss Institute of Bioinformatics Basel, Switzerland
3. Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, University of Zurich Zurich, Switzerland
4. Department of Mathematics and Computer Science, University of Basel Basel, Switzerland

Abstract

Many viruses, including the clinically relevant RNA viruses HIV (human immunodeficiency virus) and HCV (hepatitis C virus), exist in large populations and display high genetic heterogeneity within and between infected hosts. Assessing intra-patient viral genetic diversity is essential for understanding the evolutionary dynamics of viruses, for designing effective vaccines, and for the success of antiviral therapy. Next-generation sequencing (NGS) technologies allow the rapid and cost-effective acquisition of thousands to millions of short DNA sequences from a single sample. However, this approach entails several challenges in experimental design and computational data analysis. Here, we review the entire process of inferring viral diversity from sample collection to computing measures of genetic diversity. We discuss sample preparation, including reverse transcription and amplification, and the effect of experimental conditions on diversity estimates due to in vitro base substitutions, insertions, deletions, and recombination. The use of different NGS platforms and their sequencing error profiles are compared in the context of various applications of diversity estimation, ranging from the detection of single nucleotide variants (SNVs) to the reconstruction of whole-genome haplotypes. We describe the statistical and computational challenges arising from these technical artifacts, and we review existing approaches, including available software, for their solution. Finally, we discuss open problems, and highlight successful biomedical applications and potential future clinical use of NGS to estimate viral diversity.

Introduction

Many viruses, in particular RNA or single-stranded DNA viruses, exhibit extreme evolutionary dynamics. They have very high mutation rates, up to six orders of magnitude higher than in humans, short generation times, and large population sizes (Duffy et al., 2008). Under these conditions, genetic variants are produced constantly, and in each infected host, the virus population displays a high degree of genetic diversity. Rapidly evolving viruses are not only ideal systems for studying evolutionary mechanisms (Drummond et al., 2003), but many of them are significant pathogens of vital medical interest, including HIV, HCV, and Influenza (WHO, 2012).

Because of their diversity, intra-host virus populations are often referred to as mutant clouds, swarms, or viral quasispecies. The latter terms were originally introduced in the context of self-replicating macromolecules (Eigen, 1971; Eigen and Schuster, 1977) and have a precise mathematical meaning. A quasispecies is the equilibrium distribution of mutants in a mathematical model that accounts for mutation and selection (Eigen et al., 1988, 1989). In the framework of classical population genetics, it can be regarded as a coupled mutation-selection balance (Wilke, 2005). The main prediction of the quasispecies model is that selection acts on the population as a whole and hence the population dynamics cannot be understood from the fittest strain alone (Van Nimwegen et al., 1999; Wilke et al., 2001). The quasispecies model has later been applied to RNA viruses (Nowak, 1992; Domingo and Holland, 1997), hence the term viral quasispecies. The impact of the quasispecies model is not only due to its mathematical feasibility, but also its conceptual focus on the population as the target of natural selection (Burch and Chao, 2000).

The diversity of virus populations has repeatedly been shown to provide a selective advantage. For example, decreasing the mutation rate of poliovirus artificially, while maintaining its replication rate, resulted in reduced genomic diversity and in failure to adapt to adverse growth conditions (Vignuzzi et al., 2006). Similarly, pre-existing minority drug-resistant variants of HIV-1 have been shown to facilitate rapid viral adaptation leading to failure of antiretroviral therapy (Metzner et al., 2009; Li et al., 2011). In general, viral diversity is advantageous when the virus faces different selection pressures that need to be overcome by evolutionary escape (Iwasa et al., 2003, 2004). Changing selection pressures are common in the life of viruses, for example, after infecting a new host with a different immune response (Pybus and Rambaut, 2009), when infecting different cell types, while being exposed to different chemical agents, or due to changing multiplicity of infection (Ojosnegros et al., 2010). Understanding and modeling the escape dynamics of these processes is of direct relevance for clinical and public health decisions.

With the introduction of next-generation sequencing (NGS) technologies, the experimental analysis of viral genetic diversity has changed dramatically. Rather than using labor-intensive limiting dilution and individual cloning of viruses followed by traditional Sanger sequencing, NGS now allows for sampling the virus population in a highly parallel fashion in a single experiment. However, the novel high-throughput approach has several pitfalls associated with both the experimental protocol and the statistical analysis of the data. We address both aspects in this review and discuss several successful applications of NGS to viral diversity studies, including drug resistance, immune escape, and epidemiology.

Sample preparation

The usefulness of NGS for viral diversity estimation depends crucially on the quality of the sample and on the procedure to prepare the sample. NGS sequence reads mirror the accumulation of errors, some of them preventable others unavoidable. To minimize the error rate, each step requires careful handling, starting with biological sample retrieval and storage up to the last steps of the NGS procedure itself (Figure 1).

Figure 1

Viral genomes are usually protected by the viral capsid and some of them additionally by an envelope, for instance, HIV and HCV. However, retrieval and storage conditions of biological specimens are especially important when studying RNA viruses due to the fragility of RNA (Holodniy et al., 1995; Jose et al., 2005), because degraded RNA will jeopardize all further steps of the analysis. Before starting the extraction of viral genomes, the viral load of the specimen should be considered. The final number of genome copies sequenced provides the basis for assessing viral diversity from the sequence reads (Metzner et al., 2003; Casbon et al., 2011). Low amounts might require a concentrating step, for instance, ultracentrifugation of plasma.

The choice of protocols used for genome extraction and elimination of contaminating RNA and DNA from other sources like host cells depends on the intended downstream procedures. Numerous kits are offered to extract viral DNA or RNA whose pros and cons will not be discussed here. A more critical point is the enrichment of viral genomes in the context of sample complexity. Three scenarios can be envisioned. (1) The virus is known and an amplicon approach is chosen for NGS. Here, the specificity of the primers might allow for amplifying the viral genome without any upstream enrichment. Nevertheless, it is often beneficial to eliminate contaminating DNA or RNA by DNase or RNase treatment. For instance, investigating HIV RNA genomes requires the elimination of proviral DNA genomes (Fischer et al., 2002). (2) The virus is known, but a random approach is chosen for NGS. Due to the high heterogeneity of some viruses, it might be disadvantageous to use virus-specific primers for amplification due to potential primer bias or even complete failure of amplification (Metzner et al., 2003). In contrast, any random approach, including amplification using degenerated or random primers as well as non-specific adaptor ligation and subsequent amplification using adaptor-specific primers, cannot differentiate between the viral genome and any other nucleic acid (Reyes and Kim, 1991; Chang et al., 1992). Thus, the elimination of contaminating nucleic acids is mandatory when a high coverage of viral genomes is required, as for studying diversity, since the viral genomes represent only a low-abundant fraction in almost all biological specimens (Daly et al., 2011). DNase and RNase treatment, filtration, density gradient centrifugation, and their combinations are commonly used procedures. Enrichment strategies based on hybridization capture might also be suitable (Turner et al., 2009; Althaus et al., 2012) and, potentially, freeze thaw nuclease digestion protocols may also be beneficial to minimize contaminating RNA or DNA (Fischer et al., 2002) (3). The virus in unknown, therefore, random approaches have to be applied. The enrichment of viral genomes is an even greater challenge in this set-up. In this review, we focus on estimating viral diversity from NGS data, a second step after virus discovery (Lipkin, 2010).

After viral genome extraction, an amplification procedure has to be performed, because the current NGS technologies require a high input DNA amount and the viral genome amount is several orders of magnitude lower. Furthermore, RNA genomes have to be reverse transcribed prior to PCR. Every amplification process introduces errors. Reverse transcriptases (RTs) are error-prone enzymes, because of the lack of any proof-reading activity (Preston et al., 1988; Roberts et al., 1988). Some RTs are less error-prone than others, but, in general, RT errors are unavoidable and very difficult to distinguish from real mutations since they are introduced in the first step of amplification. Another important but often ignored problem with reverse transcription is that short, incomplete cDNA fragments can act as primers in subsequent PCRs and lead to in vitro recombination. This phenomenon has been considered only for RT-PCRs amplifying several kilobases (kb) long fragments (Fang et al., 1998). We have recently shown that this effect also occurs very frequently when amplifying short cDNA fragments of a size of only 0.6 kb and can be minimized by using an RNaseH-negative RT (Di Giallonardo et al., submitted).

Four main types of errors can occur during PCR and are relevant for NGS data: (i) biased amplification due to primer mismatches, (ii) in vitro recombination due to premature termination of strand elongation and subsequent false hybridization of short DNA fragments acting as primers or, less frequently, due to template switching, (iii) nucleotide misincorporation due to the inaccuracy of DNA polymerases, and (iv) resampling due to, for instance, too low amounts of input DNA copies (Eckert and Kunkel, 1991; Liu et al., 1996; Kanagawa, 2003). Several precautions can be taken to minimize these errors. Primer mismatches can be diminished by choosing primer binding sites in conserved regions of the viral genome or by using degenerated primers. Chimera formation can be reduced by several improvements of PCR conditions such as increasing the elongation time, decreasing the number of cycles, and deleting the final extension step (Meyerhans et al., 1990; Judo et al., 1998). Nucleotide misincorporation can be lowered by using high-fidelity DNA polymerases, and resampling can be reduced, for instance, by optimizing the input copy number. Even when applying all these precautions, it is currently not possible to completely avoid these PCR errors. Furthermore, the discrimination between artificial and real viral variants can be very difficult if not impossible. One possibility is to perform several independent PCRs assuming that most of the errors occur randomly with regard to the sequence position and the timing of the error, i.e., in which PCR cycle the error occurs, resulting in different variants of different frequencies in the replicates. A recently described method uses primer identifiers (IDs) to uniquely label each cDNA molecule (Jabara et al., 2011). This is an elegant procedure to reduce or even eliminate PCR errors, although errors induced during the reverse transcription cannot be addressed in this manner. In addition, the method is only applicable to amplicon-based approaches and a high number of sequence reads are required to obtain a sufficient number of consensus sequences, each of which has to be derived from at least three reads with the same primer ID. Thus, all unique or twice occurring reads, which represent the majority of sequence reads, cannot be considered in the analysis.

Overall, sample preparation is a critical issue in the process of NGS. If unrecognized, errors during sample preparation can lead to an artificially increased diversity of the investigated virus population. To avoid such misinterpretation, the pitfalls of sample preparation need to be identified and properly addressed.

Next-generation sequencing

In the last decade, many NGS technologies have been developed and several are commercially available today or about to become available in the near future (Mardis, 2008b; Metzker, 2010). Due to its massively parallel approach, NGS allows for generating much larger volumes of sequencing data in a cost-effective manner as compared to conventional sequencing methods. The increase in throughput has been so far-reaching that NGS is considered revolutionary, because it facilitates many new sequencing applications that had been out of reach (Mardis, 2008a; Schuster, 2008). One of these novel applications is the inference of viral genetic diversity from a single deep-coverage NGS experiment.

All NGS technologies involve the steps of template preparation, sequencing, and imaging, followed by data analysis, but they differ in the realization of each step. 454/Roche pyrosequencing has been the first NGS method commercially available and until today it is the most commonly used technology for the analysis of viruses (Margulies et al., 2005). For pyrosequencing, DNA is isolated, amplified and/or fragmented, adaptor-annealed, and amplified on beads in a micro-droplet emulsion PCR. DNA and beads have to be used in a ratio allowing the hybridization of only one DNA molecule to one bead, i.e., the majority of beads do not contain any DNA molecule. Thus, on each DNA-hybridized bead, a single template gives rise to several thousand copies. These beads are separated from the empty beads and loaded into 1.6 million wells of a picotiter plate, one bead per well, and enzymes for pyrophosphate sequencing are added. Sequencing by synthesis proceeds by adding the four bases in a cyclic order. In each cycle, the light emission associated with base incorporation is detected and remaining chemicals are washed out. The intensity of the light signal is approximately proportional to the number of nucleotides that have been incorporated. All generated signals are recorded as a series of peaks, called a flowgram, from which DNA bases are eventually called (Margulies et al., 2005).

The Illumina Genome Analyzer and HiSeq systems are currently dominating the NGS market (Bentley et al., 2008). Rather than emulsion PCR, Illumina relies on solid-phase amplification, which consists of initial priming and extending of single-stranded templates, followed by bridge amplification of each immobilized template with adjacent primers. In multiple cycles of annealing, extension, and denaturation, around 200 million molecular clusters are formed. For sequencing, all four nucleotides are added simultaneously. Each nucleotide is labeled with a different dye and they are modified to terminate DNA synthesis after incorporation. Color imaging is used to detect the incorporated nucleotide. In a cleavage step, the fluorescent dye is removed and termination is reversed by regenerating the 3′-OH group. Bases are called from the resulting four-color images.

We focus here on the 454/Roche and Illumina platforms, because the vast majority of reported virus sequencing applications have used these systems, but several other technologies can, and are likely to, be used as well, including ABI SOLiD, Ion Torrent, PacBio RS, and Polonator. The technical details in which platforms differ can have important consequences for their applicability to viral sequencing studies. Among other aspects, NGS platforms differ in throughput, runtime, costs, read lengths, and error patterns (Metzker, 2010). The currently most powerful 454/Roche sequencer GS FLX Titanium XL+ can produce up to 1 million reads per run of 700 bp average length, while Illumina's largest machine, HiSeq 2500, can generate up to 1.2 billion paired-end reads of 2 × 150 bp length. Both companies also offer smaller benchtop devices of their platforms that may be preferable in certain diagnostic and clinical settings. The Roche/454 Junior produces up to 100,000 reads of 400 bp average length in a single 10-h run, and the Illumina MiSeq generates up to 30 million paired-end reads of 2 × 150 bp length in 24 h. Thus, longer reads can be produced with the 454/Roche technology, but ultra-deep coverage is easier to obtain with Illumina (Loman et al., 2012).

In addition to the various errors that can occur during sample preparation, as discussed in “Sample Preparation”, all NGS platforms introduce sequencing errors. With 454/Roche pyrosequencing, insertions and deletions (indels) are the most common type of errors. They occur predominantly in homopolymeric regions of the target sequence, where the linear relationship between signal intensity and number of incorporated nucleotides starts to fail. Remaining nucleotides after washing can give rise to insertions or carry forward errors, while deletion errors can result from incomplete extension (Margulies et al., 2005; Balzer et al., 2011). The error rate has been shown to increase with read length and to depend on several other biological and technical factors, including the organism and genomic region to be analyzed and the position on the picotiter plate with respect to the flow of chemicals and the position of the camera (Gilles et al., 2011).

Illumina reads are not as susceptible to indel errors in homopolymeric regions, but artificial indels outside these regions and substitutions have similar frequencies (Archer et al., 2012). The Illumina mismatch rate also increases with read length and it further depends on the sequence context and the substitution type (Dohm et al., 2008; Kircher et al., 2009; Nakamura et al., 2011). Illumina reads are generated in forward and reverse direction, and errors predominantly occur on one of the two strands (Chapman et al., 2011; Varela et al., 2011). All NGS platforms report quality scores, defined as Q = −10log₁₀p, where p is the error probability (Ewing and Green, 1998), together with the called bases, but the calibration of these scores is challenging (Brockman et al., 2008; Kircher et al., 2009) and there is no consensus on how to compare scores across platforms.

Besides errors, the distribution of reads along the genome is critical for diversity estimation, especially if phasing of genetic variants is the goal. However, uniform coverage is difficult to achieve and, in practice, the read coverage often varies by orders of magnitude. The reasons for this variation are poorly understood, but for Illumina, the GC content of the target sequence is an important factor (Dohm et al., 2008). Uniform coverage is feasible within short segments by using a single amplicon. However, increasing the number of amplicons to cover longer segments can impair this uniformity, and shot-gun approaches introduce even more variation. For 454/Roche, Illumina, and ABI SOLiD, correlation of coverage and errors is fairly weak among the three different NGS platforms (Harismendy et al., 2009). Thus, for viral diversity estimation, where uniform coverage and error correction are critical, complementary sequencing strategies involving more than one platform may be more efficient than increasing the coverage on a single platform.

The large amounts of viral sequencing data obtained by NGS place substantial demands on information technology and computational data analysis in terms of storage, quality control, mapping, error correction, single nucleotide variant (SNV) calling, haplotype reconstruction, diversity estimation, and data integration (Pop and Salzberg, 2008; Vrancken et al., 2010; Barzon et al., 2011; Beerenwinkel and Zagordi, 2011). Data analysis usually starts by removing reads of exceptionally low quality. The rationale for this initial filtering step is that low-quality reads contribute disproportionally to the overall error rate, i.e., most errors occur on a few reads (Huse et al., 2007). Filtering can be based on quality scores or on properties of the read or the target sequence known to affect error rates, as discussed above. Optimized filtering has been shown to reduce the error rate in detecting genomic variation up to 300-fold (Reumers et al., 2011).

After filtering, the next step is to align the remaining reads. In re-sequencing studies of known viruses, this is typically done by mapping reads individually to a reference sequence and then aggregating the pairwise alignments into a multiple sequence alignment (MSA). For read mapping, local alignment using dynamic programming may be applied (Wang et al., 2007; Zagordi et al., 2011), but for larger data sets, efficient short read mappers are required. Several efficient mapping algorithms based on indexing techniques are available. Some of them can handle gaps, account for quality scores, and have a paired ends option (Trapnell and Salzberg, 2009; Wikipedia, 2012). In coding regions, a major goal of the alignment step is to identify indels that cause frameshifts. These alterations are likely to be sequencing errors, which are frequently observed using the 454/Roche platform. Hence, they are usually removed, but this bears the risk of losing virus variants harboring real indels. For correcting indel errors, a high-quality alignment is necessary, but in mixed samples, the use of a reference sequence can be suboptimal if reads originating from some subpopulations align only poorly to the reference sequence. To address this concern, a MSA may be computed directly, for example, by using a progressive MSA strategy that takes into account the approximate location of reads on the genome (Saeed et al., 2009). Similarly, for the HIV env gene, a multi-step procedure has been proposed, in which reads are located efficiently on a reference sequence by k-mer matching and MSAs are built locally in windows of width 70 nucleotides along the genome. From all local MSAs, in-frame consensus sequences are generated and concatenated. Finally, the reads are re-aligned to the global consensus sequence and all indels causing frameshifts are removed. Using the consensus rather than a reference sequence was shown to improve the alignment quality, especially if their divergence is high (Archer et al., 2010).

Local diversity estimation

From the aligned reads, one wants to reconstruct the original virus population in the sample, meaning the composition and relative frequencies of all individual viral genomes, also referred to as strains or haplotypes. Even after filtering and removal of frameshift-causing indels, many reads are still erroneous. Therefore, in mixed samples, error correction and haplotype inference are intrinsically tied to each other and, in fact, addressed jointly by most computational methods. This is in contrast to the simpler task of error correction in clonal samples, where implausible variants can easily be discarded using either k-mers, suffix trees/arrays, or MSA (Yang et al., 2012).

The haplotype inference problem occurs at different spatial scales depending on the length of the genomic region to be analyzed for diversity (Figure 2). When only a single genomic site is considered, diversity estimation means detecting SNVs. Local haplotype inference refers to analyzing windows in the MSA that are covered entirely by reads. Finally, global haplotype inference, also called quasispecies assembly, involves a jigsaw puzzling step of assembling local fragments into multiple haplotype sequences that span the entire genomic region of interest.

Figure 2

SNV calling is based on the observed nucleotide counts at a single sequence position. The simplest statistical model for separating errors from true variations is to assume that, at each genomic site, the number of errors follows the same Poisson distribution and to call SNVs that occur more often than expected by chance for a given error rate (Wang et al., 2007). This approach has been extended to account for site-specific error rates (Macalalad et al., 2012). The power and accuracy of SNV calling can be increased substantially by a control experiment, in which the same genomic region is sequenced from a clonal sample under conditions as similar as possible to those for the mixed sample. The rationale for this comparative sequencing approach is that the control experiment allows for estimating the specific error patterns of the experiment and hence for improved separation of biological signal from technical noise. In this setting, SNV detection is based on comparing nucleotide counts between two experiments, for example, using Fisher's exact test (Koboldt et al., 2012). Assuming independent Poisson distributions, another test is based on the difference of the number of observed nucleotides (Altmann et al., 2011). Count data from NGS experiments have repeatedly been shown to display more variation across sites than is captured by a binomial distribution, and the beta-binomial distribution is a popular choice for such overdispersed data (Flaherty et al., 2012; Gerstung et al., 2012). Based on this model and accounting for the strand-bias of sequencing errors, a sensitivity of up to 1/10,000 has been achieved for SNV calling at a coverage of around 10⁵ (Gerstung et al., 2012).

By dropping the assumption of independence among sites, SNV calling can be further improved. Considering the number of joint sequencing errors at two positions has been shown to significantly decrease the minimal frequency at which a variant is detectable (Macalalad et al., 2012). This phasing of two SNVs is possible only at a distance smaller than the maximal read length. For small distances, the SNV pair will be covered by many reads, but for larger distances the benefit of phasing will be undone by the loss of joint coverage. In fact, for deep coverage, pairs are more informative than single sites only if their distance is not larger than the average read length (Macalalad et al., 2012).

The idea of phasing SNVs is further extended by comparing entire reads within a sequence window they overlap. The size of the window is subject to the same trade-off as the distance between two SNVs discussed above: Small windows contain many reads but few SNVs for robust pairwise comparisons of reads, while large windows contain less reads but more segregating sites. Local haplotype inference is based on clustering reads within a given window (Figure 3). The rationale for clustering is that reads originating from the same haplotype should be more similar to each other than to reads from other haplotypes. This assumption is only valid if the error rate is low relative to the diversity of the population, and the ability to identify haplotype clusters increases with coverage (Eriksson et al., 2008).

Figure 3

Clustering was initially performed using the classical k-means algorithm (Jain and Dubes, 1981) and later formulated probabilistically and solved in a Bayesian fashion (Eriksson et al., 2008; Zagordi et al., 2010a). In particular, the latter approach allows for estimating the error rate and the number of clusters from the data—a notoriously difficult problem with any clustering method. The cluster centers are the predicted haplotypes and the cluster sizes are interpreted as the haplotype frequencies in the population. Error correction is based on a local read clustering solution by replacing all read bases with those of its cluster center (Figure 3). This method has been shown to reduce the per-base error rate after correction, to increase the sensitivity and specificity of local haplotype calling, and to improve the estimation of haplotype frequencies as compared to simple read counting or k-means clustering (Zagordi et al., 2010b). For the 454/Roche platform, a similar clustering approach called AmpliconNoise can be applied before base calling on the flowgrams (Quince et al., 2009, 2011). Here, the observed flowgrams are obtained from ideal flowgrams corresponding to read sequences subject to measurement noise. Whether clustering is based on sequences or on flowgrams, the distance measure between reads should reflect the pattern of experimental noise.

As an alternative to clustering, k-mer-based error correction, implemented in the program KEC, has been proposed for viral amplicon sequencing (Skums et al., 2012). This approach extends the EDAR error correction algorithm (Zhao et al., 2010) and initially does not require a read alignment. It consists of a number of heuristic steps with the goal of locating error regions in reads by considering rare k-mers and removing errors in these regions. In a final step, which eventually involves MSAs of the corrected reads, local haplotypes are reconstructed.

Global diversity estimation

The local methods discussed in the previous section focus on reconstructing haplotypes in a local window, the maximum size of which is effectively restricted to the average length of the reads. The global reconstruction problem, on the other hand, is defined as the genome-wide assembly of quasispecies, irrespective of machine-specific parameters like the average read length. The various approaches to solving this jigsaw puzzle described in the literature can be roughly divided into three groups: (1) graph-based methods that first aggregate the reads in a read graph and then search for a minimum set of paths through this graph, (2) probabilistic clustering models based on mixture models, and (3) de novo assembly methods which do not rely on the availability of a reference sequence.

Read graph-based global haplotype reconstruction consists in aggregating the reads in a read graph and subsequently identifying haplotypes as paths in this graph. The concept of a read graph has been independently introduced by Eriksson et al. (2008) and Westbrooks et al. (2008). The read graph contains the possibly pre-processed, for instance, locally error-corrected, reads as nodes. Directed edges connect two nodes when the reads agree on their non-empty overlap (Figure 4). The direction of the edge reflects the order of the starting positions on the reference sequence. The set of nodes is restricted to all irredundant reads, where a read is considered redundant if there is another read that overlaps completely and if both reads agree on this overlap. In a similar manner, the set of edges is restricted to include only those edges for which there would be no path between the corresponding nodes without this edge. The latter restriction is called transductive reduction in (Westbrooks et al., 2008), and it has been shown that this reduction can be computed efficiently. Finally, a source and a sink node are added to the graph, along with edges connecting all reads starting at the first position to the source and all reads ending at the last position to the sink (Figure 4).

Figure 4

Every path in the read graph connecting source and sink is a potential haplotype, and the problem of estimating the haplotypes present in a certain sample might be restated as finding a set of such source-sink paths that explains the reads well. Different formalizations of this problem lead to different optimization problems. One example is the search for the minimum set of paths that covers all reads implemented in ShoRAH (Eriksson et al., 2008; Zagordi et al., 2011). The same problem has been studied in a different way as a network flow problem (Westbrooks et al., 2008). A variant of the network flow formulation is the search for a set of haplotypes covering all reads with minimum costs (Westbrooks et al., 2008) and, in a slightly different fashion relaxing the requirement of a complete read cover, implemented in ViSpA (Astrovskaya et al., 2011). The combinatorial reconstruction is followed by frequency estimation using an Expectation Maximization (EM) algorithm (Eriksson et al., 2008; Westbrooks et al., 2008; Astrovskaya et al., 2011).

In a related approach termed QuRe, the same read graph idea is used to find a set of consistent quasispecies explaining the reads (Prosperi et al., 2011; Prosperi and Salemi, 2012). It differs from the methods above in the optimization procedure for finding the quasispecies. This is formalized as minimizing the number of in silico recombinants instead of finding a path cover explaining the reads. However, both optimization strategies are similar in nature, since avoiding in silico recombinants can be regarded as avoiding redundant paths in the read graph. Another advantage of QuRe is that it explicitly addresses the blockwise structure of the reads due to amplicon-based sequencing in the statistical analysis (Prosperi et al., 2011; Prosperi and Salemi, 2012).

Haplotype assembly based on amplicon sequencing is also addressed by the BIOA software (Mancuso et al., 2011). Here, a read graph-based framework is proposed that includes balancing of haplotype frequencies between neighboring amplicons followed by quasispecies reconstruction using a maximum bandwidth approach or a greedy algorithm. In the assembly step, the parsimony criterion of explaining the observed reads with a minimal number of haplotypes is relaxed to finding a quasispecies of minimal entropy explaining the reads. This strategy was shown to outperform shotgun-based quasispecies assembly using ViSpA.

QColors is another method that relies on the read graph as the main source of information for assembling reads into haplotypes, but it uses in addition a conflict graph consisting of edges between reads that overlap but disagree on the overlap (Huang et al., 2011). The reconstruction problem is then to find a partition of the reads into a minimal number of non-conflicting subsets, which defines a vertex graph coloring problem, hence the name QColors. A potential problem with this approach might be the sensitivity of the conflict graph to sequencing errors and the uncertainty in placing alignment gaps, which are not explicitly dealt with.

Another method that uses the read graph approach is called Hapler (O'Neil and Emrich, 2012). This method is specifically designed for situations characterized by low haplotype diversity and low read coverage (<25×), which, for instance, occur in the context of population-level de novo transcriptome assemblies or ecological studies. The minimum path cover problem is generalized and reformulated as a weighted bipartite graph matching problem, such that erroneous reads can be identified. Since, in general, the resulting path covers are again not unique, the analysis is equipped with a randomization step in which samples are drawn from the set of path covers, although this process seems to lack a clear probabilistic interpretation. Experiments under low-coverage conditions indicate that this method is successful in reconstructing local haplotypes over a region that is roughly determined by the average read length, which in our terminology would be classified as local reconstruction. Nevertheless, longer haplotype assemblies are possible with Hapler and specific care is taken in reconstructing consensus sequences with a minimal number of chimeric points.

A common property of all read graph-based approaches is that the haplotype reconstruction problem itself becomes deterministic in nature, while the unavoidable noise component present in observed reads is dealt with in a pre-processing error correction step—if at all.

Removing all the stochasticity in the observed reads by way of local error correction prior to global haplotype reconstruction has the limitation that corrections cannot be revised in the global context and miscorrections are propagated through subsequent steps. A probabilistic hierarchical model that circumvents this problem has been introduced (Jojic et al., 2008). The main idea is to model the generative stochastic process of read generation. Parameters and hidden variables in this method include the parental haplotype, the starting position, and the parameters related to the error transformation. Inference is carried out by maximizing the likelihood using the EM algorithm. A potential drawback of this approach is that the user has to fix the number of haplotypes to be reconstructed in advance, and no well-defined estimation process for this number is provided.

Probabilistic approaches are a second methodology for global haplotype reconstruction. PredictHaplo is one of these approaches which also automatically adjusts the number of haplotypes (Prabhakaran et al., 2010). In this model, a haplotype is represented as a set of position-specific probability tables over the four nucleotides, which can be augmented to include a fifth character representing alignment gaps (Figure 5). The underlying generative model assumes that reads are sampled from a mixture model, where each mixture component is interpreted as a haplotype, and the associated mixing proportion estimates the haplotype frequency. In order to avoid a priori specification of the number of mixture components, an infinite mixture model is employed (Ewens, 1972; Ferguson, 1973; Rasmussen, 2000), and for computational reasons, a truncated approximation of this stochastic process is used.

Figure 5

A further refinement of probabilistic haplotype reconstruction has been implemented in the program QuasiRecomb (Zagordi et al., 2012). Here, haplotypes are not reconstructed individually, but rather their distribution is estimated by a hidden Markov model. The model assumes that all haplotypes are generated from a small set of sequences by mutation and recombination. This model is taking into account that in some RNA viruses, such as HIV, recombination is very frequent and hence an important factor generating genetic diversity.

All approaches described so far make use of a known reference genome that serves as a fixed spatial coordinate system after read alignment. By contrast, de novo assembly methods are more general in nature since they do not require such reference genomes. Several assemblers specifically designed for certain NGS platforms like 454/Roche have been proposed in recent years (Finotello et al., 2012). The original goal of de novo assembly is reconstructing a single target genome sequence, rather than an ensemble of different genomes. Hence, the currently available genome assemblers are not designed to solve the whole-genome quasispecies assembly problem, but the different contigs they reconstruct may serve as a starting point for this jigsaw puzzle (Ramakrishnan et al., 2009).

Large-scale simulation studies show that all global reconstruction methods rely on the availability of relatively long reads. Coverage is also important when it comes to detecting low-abundant mutants, but even an arbitrarily high coverage cannot compensate for insufficient overlaps due to short reads. Given the typical diversity of virus populations, it appears that global haplotype reconstruction is currently only realistic for sequencing platforms producing long reads on the order of at least 300–500 bp. Accordingly, successful reconstructions have been reported predominantly for the 454/Roche sequencing platform.

Regarding the different computational approaches described above, it is generally difficult to conduct informative comparative simulation experiments, but two general trends have become evident. First, local read error correction has the tendency to under-correct the reads, which can lead to a large number of false positive global haplotypes, in particular, when combined with read graph approaches requiring a complete coverage of all reads. Quasispecies assembly methods that relax this coverage requirement (Astrovskaya et al., 2011; O'Neil and Emrich, 2012) or probabilistic approaches avoiding the read-graph construction (Jojic et al., 2008; Prabhakaran et al., 2010) are successful in decreasing the false positive rate. Second, the most problematic step in genome-wide reconstruction is the usually unavoidable (RT-)PCR pre-processing which can introduce significant artifacts. These artifacts might have a much stronger effect on the final quality of the haplotype reconstruction than the actual choice of the computational reconstruction method.

Computational methods for local and global haplotype reconstruction are summarized in Table 1. All of these tools have been developed in research environments and most are subject to continuous enhancements. Their usability and performance also depends on the quickly changing characteristics of the sequencing machines. In the future, comparative studies using simulated data, mixed control samples, or Sanger-sequenced gold standard samples are required to assess the performance of these tools under different conditions. In addition, software tools are available for NGS read data management and visualization. For example, Segminator II has been specifically designed to display sequence variability of temporally sampled virus populations (Archer et al., 2012).

Table 1

Program	Method	URL	References
QuRe	Read graph	http://sourceforge.net/projects/qure/	Prosperi and Salemi, 2012
ShoRAH	Read graph	http://www.cbg.ethz.ch/software/shorah	Zagordi et al., 2011
ViSpA	Read graph	http://alla.cs.gsu.edu/~software/VISPA/vispa.html	Astrovskaya et al., 2011
BIOA	Read graph	https://bitbucket.org/nmancuso/bioa/	Mancuso et al., 2011
Hapler	Read graph	http://nd.edu/~biocmp/hapler/	O'Neil and Emrich, 2012
AmpliconNoise	Probabilistic	http://code.google.com/p/ampliconnoise	Quince et al., 2011
PredictHaplo	Probabilistic	http://www.cs.unibas.ch/personen/roth_volker/HivHaploTyper	Prabhakaran et al., 2010
QuasiRecomb	Probabilistic	http://www.cbg.ethz.ch/software/quasirecomb	Zagordi et al., 2012

Available software tools for viral quasispecies inference.

Applications

NGS is widely applied to study viral diversity mainly in the context of drug resistance of clinically relevant viruses such as HIV, HCV, and HBV (Table 2). Most studies focus on pre-existing minority drug-resistant virus variants in treatment-naïve individuals and their impact on the success of antiviral therapy, epidemiological surveillance, and virus population dynamics during virological failure. The pathways of drug resistance development are of particular clinical importance, since they can lead to new drug design or new therapeutic strategies, for instance, avoiding cross resistance or rapid selection of resistant viruses (Beerenwinkel et al., 2003). Furthermore, epidemiological studies for a huge variety of human pathogenic viruses were performed using NGS technologies, including cytomegalovirus (CMV), Epstein Barr virus (EBV), HCV, influenza virus, norovirus, rhinovirus, rotavirus, and varicella zoster virus (VZV) (Table 2).

Table 2

Virus	Study	NGS platform	NGS approach	Basis of analysis	References
CMV	Epidemiology	454/Roche	Amplicon-based	Reads	Gorzer et al., 2010
CMV	Epidemiology	454/Roche	Shotgun	Consensus sequence	Jung et al., 2011
EBV	Epidemiology	Illumina	Shotgun	SNV, consensus sequence	Liu et al., 2011
EBV	Epidemiology	Illumina	Shotgun (amplicons)	SNV	Kwok et al., 2012
HBV	Drug resistance	454/Roche	Amplicon-based	Reads, SNV	Solmone et al., 2009; Homs et al., 2011; Rodriguez-Frías et al., 2012
HBV	Drug resistance	454/Roche	Amplicon-based	SNV	Margeridon-Thermet et al., 2009; Ko et al., 2012; Sede et al., 2012
HBV	Drug resistance	Illumina	Shotgun	SNV	Nishijima et al., 2012
HCV	Drug resistance	454/Roche	Amplicon-based	Reads	Bolcic et al., 2012; Fonseca-Coronado et al., 2012
HCV	Drug resistance	Illumina	Shotgun (cDNA)	SNV	Hiraga et al., 2011
HCV	Drug resistance	454/Roche	Shotgun (amplicons)	SNV, consensus sequences	Lauck et al., 2012
HCV	Drug resistance	Illumina	Paired-end (amplicons)	SNV	Nasu et al., 2011
HCV	Drug resistance	454/Roche	Amplicon-based	SNV	Powdrill et al., 2011
HCV	Epidemiology	454/Roche	Amplicon-based	Reads	Escobar-Gutiérrez et al., 2012; Forbi et al., 2012
HCV	Epidemiology	Illumina	Shotgun (cDNA)	SNV, consensus sequences	Ninomiya et al., 2012
HIV	Drug resistance	454/Roche	Amplicon-based	SNV	Hoffmann et al., 2007; Wang et al., 2007; Mitsuya et al., 2008; Le et al., 2009; Simen et al., 2009; Varghese et al., 2009; Lataillade et al., 2010, 2012; Alteri et al., 2011; D'Aquila et al., 2011; Delobel et al., 2011; Gianella et al., 2011; Ji et al., 2011; Kozal et al., 2011; Moorthy et al., 2011; Stelzl et al., 2011; Fisher et al., 2012; Messiaen et al., 2012
HIV	Drug resistance	454/Roche	Amplicon-based	Reads, SNV	Hedskog et al., 2010; Ji et al., 2010; Mild et al., 2011; Mukherjee et al., 2011; Armenia et al., 2012
HIV	Epidemiology	454/Roche	Shotgun (amplicons)	Consensus sequence	Bruselles et al., 2009
HIV	Epidemiology	454/Roche	Amplicon-based	Consensus sequence	Eshleman et al., 2011
HIV	Epidemiology	454/Roche	Amplicon-based	Reads	Redd et al., 2012
HIV	Tropism	454/Roche	Amplicon-based	Reads	Archer et al., 2009; Rozera et al., 2009; Abbate et al., 2011; Swenson et al., 2010; Vandenbroucke et al., 2010; Baatz et al., 2011; Bunnik et al., 2011; Raymond et al., 2011; Saliou et al., 2011; Svicher et al., 2011; Swenson et al., 2011a,b; Vandekerckhove et al., 2011
Influenza A virus	Epidemiology	Illumina	Shotgun (amplicons)	SNV	Kuroda et al., 2010; Kampmann et al., 2011
Influenza A virus	Epidemiology	454/Roche	Shotgun (amplicons)	SNV	Bartolini et al., 2011
Influenza A virus	Epidemiology	454/Roche	Shotgun	Reads	Lorusso et al., 2011
norovirus	Epidemiology	454/Roche	Shotgun (amplicons)	SNV, haplotype recon-struction	Bull et al., 2012
rhinovirus	Epidemiology	Illumina	Shotgun (amplicons)	SNV, consensus sequences	Tapparel et al., 2011
rotavirus	Epidemiology	454/Roche	Shotgun (cDNA)	Consensus sequences	Jere et al., 2011
VZV	Epidemiology	454/Roche	Shotgun (amplicons)	Consensus sequences	Zell et al., 2012

Applications of 454/Roche pyrosequencing and Illumina NGS technologies in clinical virology.

BAL, bronchoalveolar lavage; CMV, cytomegalovirus; EBV, Epstein Barr virus; HBV, hepatitis B virus; HCV, hepatitis C virus; HIV, human immunodeficiency virus; SNV, single nucleotide variant; VZV, varicella zoster virus.

NGS is also increasingly used in more basic research areas, such as characterization of transmitted HIV (Fischer et al., 2010) and HCV (Wang et al., 2010; Bull et al., 2011), estimation of infection dates (Poon et al., 2011), evolution during the course of infection with HIV (Rozera et al., 2009; Poon et al., 2010; Wu et al., 2011), HCV (Bull et al., 2011), and rhinovirus (Cordey et al., 2010), and hypermutation patterns (Reuman et al., 2010; Knoepfel et al., 2011). Recently, NGS technologies have been applied to obtain the whole genome of HIV using a coverage allowing quasispecies analysis beyond the generation of consensus sequences to study, for instance, patterns of immune escape (Bimber et al., 2010; Willerth et al., 2010; Henn et al., 2012).

All these applications demonstrate the growing importance of NGS in studying viral diversity. With this technology, we will gain further insights into transmission traits, viral evolution, and its association with pathogenesis. World-wide viral diversity surveillance will be important for vaccine design and vaccination strategies. Currently, genetic diversity is mainly studied based on the detection and analyses of SNVs, rather than the reconstruction of linked mutations, due to the challenges in local and global haplotype reconstruction discussed above. It will be a huge step forward when haplotype reconstruction in heterogeneous viruses matures into a routine procedure based on standardized experimental protocols and validated, automatic data analysis pipelines.

Outlook and conclusions

NGS opens up new roads to study viral diversity. It will tremendously increase our knowledge in virus evolution, fitness, selection pathways, and pathogenesis. Together with host genomics, viral diversity will allow insights into complex virus-host interactions. Full-length viral sequences may ultimately define truly conserved regions in viral genomes which might also be of relevance for vaccine and drug design. Clinically, the first application we can foresee is that in a single assay all drug targets relevant for antiviral treatment can be sequenced including information on minority drug-resistant variants. For all applications, sample procedures have to be chosen that minimize errors during sample preparation and sequencing. Several challenges in data analysis remain, especially in regard to alignments and global diversity estimation. In the future, some of these challenges might be diminished by upcoming third- and fourth-generation sequencing technologies, like single molecule or direct RNA sequencing.

Another not yet addressed future challenge will be making sense of the large amounts of genome data generated by NGS. For instance, clinical cut-offs need to be defined for minority drug-resistant virus variants, the clinical importance of new virus subtypes or even new viruses needs to be determined, and pathogenesis factors need to be confirmed in clinical settings. Thus, downstream analyses have to include large sets of well-documented patients, results from other experimental set-ups, etc. These are challenges as well as opportunities to answer important research questions which could not be addressed with conventional sequencing techniques.

Conflict of interest statement

Karin J. Metzner has received travel grants and honoraria from Gilead, Roche Diagnostics, GlaxoSmithKline, Bristol-Myers Squibb, Tibotec, and Abbott, and has received research grants from Abbott, Gilead, and Roche Diagnostics. Huldrych F. Günthard has been an adviser and/or consultant for the following companies: GlaxoSmithKline, Abbott, Novartis, Gilead, Boehringer Ingelheim, Roche, Tibotec and Bristol-Myers Squibb, and has received unrestricted research and educational grants from Roche, Abbott, Bristol-Myers Squibb, GlaxoSmithKline, Gilead, Tibotec and Merck Sharp & Dohme (all money went to institution). The other authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Statements

Acknowledgments

This work was supported by the Swiss National Science Foundation under grant number CR32I2_127017.

Conflict of interest

References

1
AbbateI.VlassiC.RozeraG.BrusellesA.BartoliniB.GiombiniE.CorpolongoA.D'OffiziG.NarcisoP.DesideriA.IppolitoG.CapobianchiM. R. (2011). Detection of quasispecies variants predicted to use CXCR4 by ultra-deep pyrosequencing during early HIV infection. AIDS25, 611–617. 10.1097/QAD.0b013e328343489e
2
AlteriC.SantoroM. M.AbbateI.RozeraG.BrusellesA.BartoliniB.GoriC.ForbiciF.OrchiN.TozziV.PalamaraG.AntinoriA.NarcisoP.GirardiE.SvicherV.Ceccherini-SilbersteinF.CapobianchiM. R.PernoC. F. (2011). ‘Sentinel’ mutations in standard population sequencing can predict the presence of HIV-1 reverse transcriptase major mutations detectable only by ultra-deep pyrosequencing. J. Antimicrob. Chemother. 66, 2615–2623. 10.1093/jac/dkr354
3
AlthausC. F.VongradV.NiederostB.JoosB.Di GiallonardoF.RiederP.PavlovicJ.TrkolaA.GunthardH. F.MetznerK. J.FischerM. (2012). Tailored enrichment strategy detects low abundant small noncoding RNAs in HIV-1 infected cells. Retrovirology9, 27. 10.1186/1742-4690-9-27
4
AltmannA.WeberP.QuastC.Rex-HaffnerM.BinderE. B.Müller-MyhsokB. (2011). vipR: variant identification in pooled DNA using R. Bioinformatics27, i77–i84. 10.1093/bioinformatics/btr205
5
ArcherJ.BaillieG.WatsonS. J.KellamP.RambautA.RobertsonD. L. (2012). Analysis of high-depth sequence data for studying viral diversity: a comparison of next generation sequencing platforms using Segminator, I. I. BMC Bioinformatics13, 47. 10.1186/1471-2105-13-47
6
ArcherJ.BravermanM. S.TaillonB. E.DesanyB.JamesI.HarriganP. R.LewisM.RobertsonD. L. (2009). Detection of low-frequency pretherapy chemokine (CXC motif) receptor 4 (CXCR4)-using HIV-1 with ultra-deep pyrosequencing. AIDS23, 1209–1218. 10.1097/QAD.0b013e32832b4399
7
ArcherJ.RambautA.TaillonB. E.HarriganP. R.LewisM.RobertsonD. L. (2010). The evolutionary analysis of emerging low frequency HIV-1 CXCR4 using variants through time–an ultra-deep approach. PLoS Comput. Biol. 6:e1001022. 10.1371/journal.pcbi.1001022
8
ArmeniaD.VandenbrouckeI.FabeniL.Van MarckH.CentoV.D'ArrigoR.Van WesenbeeckL.ScopellitiF.MicheliV.BruzzoneB.Lo CaputoS.AerssensJ.RizzardiniG.TozziV.NarcisoP.AntinoriA.StuyverL.PernoC. F.Ceccherini-SilbersteinF. (2012). Study of genotypic and phenotypic HIV-1 dynamics of integrase mutations during raltegravir treatment: a refined analysis by ultra-deep 454 pyrosequencing. J. Infect. Dis. 205, 557–567. 10.1093/infdis/jir821
9
AstrovskayaI.TorkB.MangulS.WestbrooksK.MăndoiuI.BalfeP.ZelikovskyA. (2011). Inferring viral quasispecies spectra from 454 pyrosequencing reads. BMC Bioinformatics12(Suppl. 6), S1.10.1186/1471-2105-12-S6-S1
10
BaatzF.StruckD.LemaireM.De LandtsheerS.ServaisJ. Y.ArendtV.SchmitJ. C.Perez BercoffD. (2011). Rescue of HIV-1 long-time archived X4 strains to escape maraviroc. Antiviral Res. 92, 488–492. 10.1016/j.antiviral.2011.10.003
11
BalzerS.MaldeK.JonassenI. (2011). Systematic exploration of error sources in pyrosequencing flowgram data. Bioinformatics27, i304–i309. 10.1093/bioinformatics/btr251
12
BartoliniB.ChillemiG.AbbateI.BrusellesA.RozeraG.CastrignanoT.PaolettiD.PicardiE.DesideriA.PesoleG.CapobianchiM. R. (2011). Assembly and characterization of pandemic influenza A H1N1 genome in nasopharyngeal swabs using high-throughput pyrosequencing. New Microbiol. 34, 391–397.
- Pubmed Abstract
- Google Scholar
13
BarzonL.LavezzoE.MilitelloV.ToppoS.PalùG. (2011). Applications of next-generation sequencing technologies to diagnostic virology. Int. J. Mol. Sci. 12, 7861–7884. 10.3390/ijms12117861
14
BeerenwinkelN.LengauerT.DäumerM.KaiserR.WalterH.KornK.HoffmannD.SelbigJ. (2003). Methods for optimizing antiviral combination therapies. Bioinformatics19(Suppl. 1), i16–i25.10.1093/bioinformatics/btg1001
15
BeerenwinkelN.ZagordiO. (2011). Ultra-deep sequencing for the analysis of viral populations. Curr. Opin. Virol. 1, 413–418. 10.1016/j.coviro.2011.07.008
16
BentleyD. R.BalasubramanianS.SwerdlowH. P.SmithG. P.MiltonJ.BrownC. G.HallK. P.EversD. J.BarnesC. L.BignellH. R.BoutellJ. M.BryantJ.CarterR. J.CheethamR. K.CoxA. J.EllisD. J.FlatbushM. R.GormleyN. A.HumphrayS. J.IrvingL. J.KarbelashviliM. S.KirkS. M.LiH.LiuX.MaisingerK. S.MurrayL. J.ObradovicB.OstT.ParkinsonM. L.PrattM. R.RasolonjatovoI. M.ReedM. T.RigattiR.RodighieroC.RossM. T.SabotA.SankarS. V.ScallyA.SchrothG. P.SmithM. E.SmithV. P.SpiridouA.TorranceP. E.TzonevS. S.VermaasE. H.WalterK.WuX.ZhangL.AlamM. D.AnastasiC.AnieboI. C.BaileyD. M. D.BancarzI. R.BanerjeeS.BarbourS. G.BaybayanP. A.BenoitV. A.BensonK. F.BevisC.BlackP. J.BoodhunA.BrennanJ. S.BridghamJ. A.BrownR. C.BrownA. A.BuermannD. H.BunduA. A.BurrowsJ. C.CarterN. P.CastilloN.CatenazziM. C. E.ChangS.CooleyR. N.CrakeN. R.DadaO. O.DiakoumakosK. D.Dominguez-FernandezB.EarnshawD. J.EgbujorU. C.ElmoreD. W.EtchinS. S.EwanM. R.FedurcoM.FraserL. J.FajardoK. V. F.FureyW. S.GeorgeD.GietzenK. J.GoddardC. P.GoldaG. S.GranieriP. A.GreenD. E.GustafsonD. L.HansenN. F.HarnishK.HaudenschildC. D.HeyerN. I.HimsM. M.HoJ. T.HorganA. M.HoschlerK.HurwitzS.IvanovD. V.JohnsonM. Q.JamesT.Huw JonesT. A.KangG. D.KerelskaT. H.KerseyA. D.KhrebtukovaI.KindwallA. P.KingsburyZ.Kokko-GonzalesP. I.KumarA.LaurentM. A.LawleyC. T.LeeS. E.LeeX.LiaoA. K.LochJ. A.LokM.LuoS.MammenR. M.MartinJ. W.McCauleyP. G.McNittP.MehtaP.MoonK. W.MullensJ. W.NewingtonT.NingZ.Ling NgB.NovoS. M.O'NeillM. J.OsborneM. A.OsnowskiA.OstadanO.ParaschosL. L.PickeringL.PikeA. C.Chris PinkardD.PliskinD. P.PodhaskyJ.QuijanoV. J.RaczyC.RaeV. H.RawlingsS. R.Chiva RodriguezA.RoeP. M.RogersJ.Rogert BacigalupoM. C.RomanovN.RomieuA.RothR. K.RourkeN. J.RuedigerS. T.RusmanE.Sanches-KuiperR. M.SchenkerM. R.SeoaneJ. M.ShawR. J.ShiverM. K.ShortS. W.SiztoN. L.SluisJ. P.SmithM. A.Ernest Sohna SohnaJ.SpenceE. J.StevensK.SuttonN.SzajkowskiL.TregidgoC. L.TurcattiG.VandevondeleS.VerhovskyY.VirkS. M.WakelinS.WalcottG. C.WangJ.WorsleyG. J.YanJ.YauL.ZuerleinM.MullikinJ. C.HurlesM. E.McCookeN. J.WestJ. S.OaksF. L.LundbergP. L.KlenermanD.DurbinR.SmithA. J. (2008). Accurate whole human genome sequencing using reversible terminator chemistry. Nature456, 53–59. 10.1038/nature07517
17
BimberB. N.DudleyD. M.LauckM.BeckerE. A.ChinE. N.LankS. M.GrunenwaldH. L.CaruccioN. C.MaffittM.WilsonN. A.ReedJ. S.SosmanJ. M.TarossoL. F.SanabaniS.KallasE. G.HughesA. L.O'ConnorD. H. (2010). Whole-genome characterization of human and simian immunodeficiency virus intrahost diversity by ultradeep pyrosequencing. J. Virol. 84, 12087–12092. 10.1128/JVI.01378-10
18
BolcicF.SedeM.MorettiF.WestergaardG.VazquezM.LauferN.QuarleriJ. (2012). Analysis of the PKR-eIF2alpha phosphorylation homology domain (PePHD) of hepatitis C virus genotype 1 in HIV-coinfected patients by ultra-deep pyrosequencing and its relationship to responses to pegylated interferon-ribavirin treatment. Arch. Virol. 157, 703–711. 10.1007/s00705-012-1230-1
19
BrockmanW.AlvarezP.YoungS.GarberM.GiannoukosG.LeeW. L.RussC.LanderE. S.NusbaumC.JaffeD. B. (2008). Quality scores and SNP detection in sequencing-by-synthesis systems. Genome Res. 18, 763–770. 10.1101/gr.070227.107
20
BrusellesA.RozeraG.BartoliniB.ProsperiM.Del NonnoF.NarcisoP.CapobianchiM. R.AbbateI. (2009). Use of massive parallel pyrosequencing for near full-length characterization of a unique HIV Type 1 BF recombinant associated with a fatal primary infection. AIDS Res. Hum. Retroviruses25, 937–942. 10.1089/aid.2009.0083
21
BullR. A.EdenJ.-S.LucianiF.McElroyK.RawlinsonW. D.WhiteP. A. (2012). Contribution of intra- and interhost dynamics to norovirus evolution. J. Virol. 86, 3219–3229. 10.1128/JVI.06712-11
22
BullR. A.LucianiF.McElroyK.GaudieriS.PhamS. T.ChopraA.CameronB.MaherL.DoreG. J.WhiteP. A.LloydA. R. (2011). Sequential bottlenecks drive viral evolution in early acute hepatitis C virus infection. PLoS Pathog. 7:e1002243. 10.1371/journal.ppat.1002243
23
BunnikE. M.SwensonL. C.Edo-MatasD.HuangW.DongW.FrantzellA.PetropoulosC. J.CoakleyE.SchuitemakerH.HarriganP. R.van 't WoutA. B. (2011). Detection of inferred CCR5- and CXCR4-using HIV-1 variants and evolutionary intermediates using ultra-deep pyrosequencing. PLoS Pathog. 7:e1002106. 10.1371/journal.ppat.1002106
24
BurchC. L.ChaoL. (2000). Evolvability of an RNA virus is determined by its mutational neighbourhood. Nature406, 625–628. 10.1038/35020564
25
CasbonJ. A.OsborneR. J.BrennerS.LichtensteinC. P. (2011). A method for counting PCR template molecules with application to next-generation sequencing. Nucleic Acids Res. 39, e81. 10.1093/nar/gkr217
26
ChangK. S.VyasR. C.DeavenL. L.TrujilloJ. M.StassS. A.HittelmanW. N. (1992). PCR amplification of chromosome-specific DNA isolated from flow cytometry-sorted chromosomes. Genomics12, 307–312. 10.1016/0888-7543(92)90378-6
27
ChapmanM. A.LawrenceM. S.KeatsJ. J.CibulskisK.SougnezC.SchinzelA. C.HarviewC. L.BrunetJ.-P.AhmannG. J.AdliM.AndersonK. C.ArdlieK. G.AuclairD.BakerA.BergsagelP. L.BernsteinB. E.DrierY.FonsecaR.GabrielS. B.HofmeisterC. C.JagannathS.JakubowiakA. J.KrishnanA.LevyJ.LiefeldT.LonialS.MahanS.MfukoB.MontiS.PerkinsL. M.OnofrioR.PughT. J.RajkumarS. V.RamosA. H.SiegelD. S.SivachenkoA.StewartA. K.TrudelS.VijR.VoetD.WincklerW.ZimmermanT.CarptenJ.TrentJ.HahnW. C.GarrawayL. A.MeyersonM.LanderE. S.GetzG.GolubT. R. (2011). Initial genome sequencing and analysis of multiple myeloma. Nature471, 467–472. 10.1038/nature09837
28
CordeyS.JunierT.GerlachD.GobbiniF.FarinelliL.ZdobnovE. M.WintherB.TapparelC.KaiserL. (2010). Rhinovirus genome evolution during experimental human infection. PLoS ONE5:e10588. 10.1371/journal.pone.0010588
29
D'AquilaR. T.GerettiA. M.HortonJ. H.RouseE.KheshtiA.RaffantiS.OieK.PappaK.RossL. L. (2011). Tenofovir (TDF)-selected or abacavir (ABC)-selected low-frequency HIV type 1 subpopulations during failure with persistent viremia as detected by ultradeep pyrosequencing. AIDS Res. Hum. Retroviruses27, 201–209. 10.1089/aid.2010.0077
30
DalyG. M.BexfieldN.HeaneyJ.StubbsS.MayerA. P.PalserA.KellamP.DrouN.CaccamoM.TileyL.AlexanderG. J.BernalW.HeeneyJ. L. (2011). A viral discovery methodology for clinical biopsy samples utilising massively parallel next generation sequencing. PLoS ONE6:e28879. 10.1371/journal.pone.0028879
31
DelobelP.SaliouA.NicotF.DuboisM.TrancartS.TangreP.AboulkerJ. P.TaburetA. M.MolinaJ. M.MassipP.MarchouB.IzopetJ. (2011). Minor HIV-1 variants with the K103N resistance mutation during intermittent Efavirenz-containing antiretroviral therapy and virological failure. PLoS ONE6:e21655. 10.1371/journal.pone.0021655
32
DohmJ. C.LottazC.BorodinaT.HimmelbauerH. (2008). Substantial biases in ultra-short read data sets from high-throughput DNA sequencing. Nucleic Acids Res. 36, e105. 10.1093/nar/gkn425
33
DomingoE.HollandJ. J. (1997). RNA virus mutations and fitness for survival. Annu. Rev. Microbiol. 51, 151–178. 10.1146/annurev.micro.51.1.151
34
DrummondA. J.PybusO. G.RambautA.ForsbergR.RodrigoA. G. (2003). Measurably evolving populations. Trends Ecol. Evol. 18, 481–488.
- Google Scholar
35
DuffyS.ShackeltonL. A.HolmesE. C. (2008). Rates of evolutionary change in viruses: patterns and determinants. Nat. Rev. Genet. 9, 267–276. 10.1038/nrg2323
36
EckertK. A.KunkelT. A. (1991). DNA polymerase fidelity and the polymerase chain reaction. PCR Methods Appl. 1, 17–24. 10.1101/gr.1.1.17
37
EigenM. (1971). Selforganization of matter and the evolution of biological macromolecules. Naturwissenschaften58, 465–523.
- Pubmed Abstract
- Google Scholar
38
EigenM.McCaskillJ.SchusterP. (1988). Molecular quasi-species. J. Phys. Chem. 92, 6881–6891.
- Google Scholar
39
EigenM.McCaskillJ.SchusterP. (1989). The molecular quasi-species. Adv. Chem. Phys. 75, 149–263.
- Google Scholar
40
EigenM.SchusterP. (1977). The hypercycle. A principle of natural self-organization. Part A: emergence of the hypercycle. Naturwissenschaften64, 541–565.
- Pubmed Abstract
- Google Scholar
41
ErikssonN.PachterL.MitsuyaY.RheeS.-Y.WangC.GharizadehB.RonaghiM.ShaferR. W.BeerenwinkelN. (2008). Viral population estimation using pyrosequencing. PLoS Comput. Biol. 4:e1000074.10.1371/journal.pcbi.1000074
42
Escobar-GutiérrezA.Vazquez-PichardoM.Cruz-RiveraM.Rivera-OsorioP.Carpio-PedrozaJ. C.Ruíz-PachecoJ. A.Ruiz-TovarK.VaughanG. (2012). Identification of hepatitis C virus transmission using a next-generation sequencing approach. J. Clin. Microbiol. 50, 1461–1463. 10.1128/JCM.00005-12
43
EshlemanS. H.HudelsonS. E.ReddA. D.WangL.DebesR.ChenY. Q.MartensC. A.RicklefsS. M.SeligE. J.PorcellaS. F.MunshawS.RayS. C.Piwowar-ManningE.McCauleyM.HosseinipourM. C.KumwendaJ.HakimJ. G.ChariyalertsakS.De BruynG.GrinsztejnB.KumarasamyN.MakhemaJ.MayerK. H.PilottoJ.SantosB. R.QuinnT. C.CohenM. S.HughesJ. P. (2011). Analysis of genetic linkage of HIV from couples enrolled in the HIV prevention trials network 052 trial. J. Infect. Dis. 204, 1918–1926. 10.1093/infdis/jir651
44
EwensW. J. (1972). The sampling theory of selectively neutral alleles. Theor. Popul. Biol. 3, 87–112.
- Pubmed Abstract
- Google Scholar
45
EwingB.GreenP. (1998). Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 8, 186–194. 10.1101/gr.8.3.186
46
FangG.ZhuG.BurgerH.KeithlyJ. S.WeiserB. (1998). Minimizing DNA recombination during long RT-PCR. J. Virol. Methods76, 139–148.
- Pubmed Abstract
- Google Scholar
47
FergusonT. S. (1973). A bayesian analysis of some nonparametric problems. Ann. Stat. 1, 209–230.
- Google Scholar
48
FinotelloF.LavezzoE.FontanaP.PeruzzoD.AlbieroA.BarzonL.FaldaM.CamilloB. D.ToppoS. (2012). Comparative analysis of algorithms for whole-genome assembly of pyrosequencing data. Brief Bioinform. 13, 269–280. 10.1093/bib/bbr063
49
FischerM.WongJ. K.RussenbergerD.JoosB.OpravilM.HirschelB.TrkolaA.KusterH.WeberR.GunthardH. F. (2002). Residual cell-associated unspliced HIV-1 RNA in peripheral blood of patients on potent antiretroviral therapy represents intracellular transcripts. Antivir. Ther. 7, 91–103.
- Pubmed Abstract
- Google Scholar
50
FischerW.GanusovV. V.GiorgiE. E.HraberP. T.KeeleB. F.LeitnerT.HanC. S.GleasnerC. D.GreenL.LoC. C.NagA.WallstromT. C.WangS.McMichaelA. J.HaynesB. F.HahnB. H.PerelsonA. S.BorrowP.ShawG. M.BhattacharyaT.KorberB. T. (2010). Transmission of single HIV-1 genomes and dynamics of early immune escape revealed by ultra-deep sequencing. PLoS ONE5:e12303. 10.1371/journal.pone.0012303
51
FisherR.Van ZylG. U.TraversS. A.PondS. L. K.EngelbrechS.MurrellB.SchefflerK.SmithD. (2012). Deep sequencing reveals minor protease resistance mutations in patients failing a protease inhibitor regimen. J. Virol. 86, 6231–6237. 10.1128/JVI.06541-11
52
FlahertyP.NatsoulisG.MuralidharanO.WintersM.BuenrostroJ.BellJ.BrownS.HolodniyM.ZhangN.JiH. P. (2012). Ultrasensitive detection of rare mutations using next-generation targeted resequencing. Nucleic Acids Res. 40, e2. 10.1093/nar/gkr861
53
Fonseca-CoronadoS.Escobar-GutierrezA.Ruiz-TovarK.Cruz-RiveraM. Y.Rivera-OsorioP.Vazquez-PichardoM.Carpio-PedrozaJ. C.Ruiz-PachecoJ. A.CazaresF.VaughanG. (2012). Specific detection of naturally occurring hepatitis C virus mutants with resistance to telaprevir and boceprevir (protease inhibitors) among treatment-naive infected individuals. J. Clin. Microbiol. 50, 281–287. 10.1128/JCM.05842-11
54
ForbiJ. C.PurdyM. A.CampoD. S.VaughanG.DimitrovaZ. E.Ganova-RaevaL. M.XiaG. L.KhudyakovY. E. (2012). Epidemic history of hepatitis C virus infection in two remote communities in Nigeria, West Africa. J. Gen. Virol. 93, 1410–1421. 10.1099/vir.0.042184-0
55
GerstungM.BeiselC.RechsteinerM.WildP.SchramlP.MochH.BeerenwinkelN. (2012). Reliable detection of subclonal single-nucleotide variants in tumor cell populations. Nat. Commun. 3, 811. 10.1038/ncomms1814
56
GianellaS.DelportW.PacoldM. E.YoungJ. A.ChoiJ. Y.LittleS. J.RichmanD. D.PondS. L. K.SmithD. M. (2011). Detection of minority resistance during early HIV-1 infection: natural variation and spurious detection rather than transmission and evolution of multiple viral variants. J. Virol. 85, 8359–8367. 10.1128/JVI.02582-10
57
GillesA.MegléczE.PechN.FerreiraS.MalausaT.MartinJ.-F. (2011). Accuracy and quality assessment of 454 GS-FLX Titanium pyrosequencing. BMC Genomics12, 245. 10.1186/1471-2164-12-245
58
GorzerI.GuellyC.TrajanoskiS.Puchhammer-StocklE. (2010). Deep sequencing reveals highly complex dynamics of human cytomegalovirus genotypes in transplant patients over time. J. Virol. 84, 7195–7203. 10.1128/JVI.00475-10
59
HarismendyO.NgP. C.StrausbergR. L.WangX.StockwellT. B.BeesonK. Y.SchorkN. J.MurrayS. S.TopolE. J.LevyS.FrazerK. A. (2009). Evaluation of next generation sequencing platforms for population targeted sequencing studies. Genome Biol. 10, R32. 10.1186/gb-2009-10-3-r32
60
HedskogC.MildM.JernbergJ.SherwoodE.BrattG.LeitnerT.LundebergJ.AnderssonB.AlbertJ. (2010). Dynamics of HIV-1 quasispecies during antiviral treatment dissected using ultra-deep pyrosequencing. PLoS ONE5:e11345. 10.1371/journal.pone.0011345
61
HennM. R.BoutwellC. L.CharleboisP.LennonN. J.PowerK. A.MacalaladA. R.BerlinA. M.MalboeufC. M.RyanE. M.GnerreS.ZodyM. C.ErlichR. L.GreenL. M.BericalA.WangY.CasaliM.StreeckH.BloomA. K.DudekT.TullyD.NewmanR.AxtenK. L.GladdenA. D.BattisL.KemperM.ZengQ.SheaT. P.GujjaS.ZedlackC.GasserO.BranderC.HessC.GunthardH. F.BrummeZ. L.BrummeC. J.BaznerS.RychertJ.TinsleyJ. P.MayerK. H.RosenbergE.PereyraF.LevinJ. Z.YoungS. K.JessenH.AltfeldM.BirrenB. W.WalkerB. D.AllenT. M. (2012). Whole genome deep sequencing of HIV-1 reveals the impact of early minor variants upon immune recognition during acute infection. PLoS Pathog. 8:e1002529. 10.1371/journal.ppat.1002529
62
HiragaN.ImamuraM.AbeH.HayesC. N.KonoT.OnishiM.TsugeM.TakahashiS.OchiH.IwaoE.KamiyaN.YamadaI.TatenoC.YoshizatoK.MatsuiH.KanaiA.InabaT.TanakaS.ChayamaK. (2011). Rapid emergence of telaprevir resistant hepatitis C virus strain from wildtype clone in vivo. Hepatology54, 781–788. 10.1002/hep.24460
63
HoffmannC.MinkahN.LeipzigJ.WangG.ArensM. Q.TebasP.BushmanF. D. (2007). DNA bar coding and pyrosequencing to identify rare HIV drug resistance mutations. Nucleic Acids Res. 35, e91. 10.1093/nar/gkm435
64
HolodniyM.MoleL.Yen-LiebermanB.MargolisD.StarkeyC.CarrollR.SpahlingerT.ToddJ.JacksonJ. B. (1995). Comparative stabilities of quantitative human immunodeficiency virus RNA in plasma from samples collected in VACUTAINER CPT, VACUTAINER PPT, and standard VACUTAINER tubes. J. Clin. Microbiol. 33, 1562–1566.
- Pubmed Abstract
- Google Scholar
65
HomsM.ButiM.QuerJ.JardiR.SchaperM.TaberneroD.OrtegaI.SanchezA.EstebanR.Rodriguez-FriasF. (2011). Ultra-deep pyrosequencing analysis of the hepatitis B virus preCore region and main catalytic motif of the viral polymerase in the same viral genome. Nucleic Acids Res. 39, 8457–8471. 10.1093/nar/gkr451
66
HuangA.KantorR.DelongA.SchreierL.IstrailS. (2011). QColors: An algorithm for conservative viral quasispecies reconstruction from short and non-contiguous next generation sequencing reads, in IEEE International Conference on Bioinformatics and Biomedicine Workshops. Publisher is Institute of Electrical and Electronics Engineers (IEEE), 130–136.
- Google Scholar
67
HuseS.HuberJ.MorrisonH.SoginM.WelchD. (2007). Accuracy and quality of massively parallel DNA pyrosequencing. Genome Biol. 8, R143. 10.1186/gb-2007-8-7-r143
68
IwasaY.MichorF.NowakM. A. (2003). Evolutionary dynamics of escape from biomedical intervention. Proc. Biol. Sci. 270, 2573–2578. 10.1098/rspb.2003.2539
69
IwasaY.MichorF.NowakM. A. (2004). Evolutionary dynamics of invasion and escape. J. Theor. Biol. 226, 205–214. 10.1016/j.jtbi.2003.08.014
70
JabaraC. B.JonesC. D.RoachJ.AndersonJ. A.SwanstromR. (2011). Accurate sampling and deep sequencing of the HIV-1 protease gene using a Primer, I. D. Proc. Natl. Acad. Sci. U.S.A. 108, 20166–20171. 10.1073/pnas.1110064108
71
JainA. K.DubesR. C. (1981). Algorithms for Clustering Data. Upper Saddle River, NJ: Prentice-Hall.
- Pubmed Abstract
- Google Scholar
72
JereK. C.MleraL.PageN. A.Van DijkA. A.O'neillH. G. (2011). Whole genome analysis of multiple rotavirus strains from a single stool specimen using sequence-independent amplification and 454(R) pyrosequencing reveals evidence of intergenotype genome segment recombination. Infect. Genet. Evol. 11, 2072–2082. 10.1016/j.meegid.2011.09.023
73
JiH.LiY.GrahamM.LiangB. B.PilonR.TysonS.PetersG.TylerS.MerksH.BertagnolioS.Soto-RamirezL.SandstromP.BrooksJ. (2011). Next-generation sequencing of dried blood spot specimens: a novel approach to HIV drug-resistance surveillance. Antivir. Ther. 16, 871–878. 10.3851/IMP1839
74
JiH.MasseN.TylerS.LiangB.LiY.MerksH.GrahamM.SandstromP.BrooksJ. (2010). HIV drug resistance surveillance using pooled pyrosequencing. PLoS ONE5:e9263.10.1371/journal.pone.0009263
75
JojicV.HertzT.JojicN. (2008). Population sequencing using short reads: HIV as a case study, in Pacific Symposium on Biocomputing, eds AltmanR. B.DunkerA. K.HunterL.MurrayT.KleinT. E. (World Scientific), 114–125. ISBN 978-981-277-608-2.
- Pubmed Abstract
- Google Scholar
76
JoseM.GajardoR.JorqueraJ. I. (2005). Stability of HCV, HIV-1 and HBV nucleic acids in plasma samples under long-term storage. Biologicals33, 9–16. 10.1016/j.biologicals.2004.10.003
77
JudoM. S.WedelA. B.WilsonC. (1998). Stimulation and suppression of PCR-mediated recombination. Nucleic Acids Res. 26, 1819–1825. 10.1093/nar/26.7.1819
78
JungG. S.KimY. Y.KimJ. I.JiG. Y.JeonJ. S.YoonH. W.LeeG. C.AhnJ. H.LeeK. M.LeeC. H. (2011). Full genome sequencing and analysis of human cytomegalovirus strain JHC isolated from a Korean patient. Virus Res. 156, 113–120. 10.1016/j.virusres.2011.01.005
79
KampmannM. L.FordyceS. L.Avila-ArcosM. C.RasmussenM.WillerslevE.NielsenL. P.GilbertM. T. (2011). A simple method for the parallel deep sequencing of full influenza A genomes. J. Virol. Methods178, 243–248. 10.1016/j.jviromet.2011.09.001
80
KanagawaT. (2003). Bias and artifacts in multitemplate polymerase chain reactions (PCR). J. Biosci. Bioeng. 96, 317–323. 10.1016/S1389-1723(03)90130-7
81
KircherM.StenzelU.KelsoJ. (2009). Improved base calling for the illumina genome analyzer using machine learning strategies. Genome Biol. 10, R83. 10.1186/gb-2009-10-8-r83
82
KnoepfelS. A.Di GiallonardoF.DaumerM.ThielenA.MetznerK. J. (2011). In-depth analysis of G-to-A hypermutation rate in HIV-1 env DNA induced by endogenous APOBEC3 proteins using massively parallel sequencing. J. Virol. Methods171, 329–338. 10.1016/j.jviromet.2010.11.016
83
KoS.-Y.OhH.-B.ParkC.-W.LeeH. C.LeeJ.-E. (2012). Analysis of hepatitis B virus drug-resistant mutant haplotypes by ultra-deep pyrosequencing. Clin. Microbiol. Infect. [Epub ahead of print]. 10.1111/j.1469-0691.2012.03951.x
84
KoboldtD. C.ZhangQ.LarsonD. E.ShenD.McLellanM. D.LinL.MillerC. A.MardisE. R.DingL.WilsonR. K. (2012). VarScan 2, Somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 22, 568–576. 10.1101/gr.129684.111
85
KozalM. J.ChiarellaJ.St. JohnE. P.MorenoE. A.SimenB. B.ArnoldT. E.LatailladeM. (2011). Prevalence of low-level HIV-1 variants with reverse transcriptase mutation K65R and the effect of antiretroviral drug exposure on variant levels. Antivir. Ther. 16, 925–929. 10.3851/IMP1851
86
KurodaM.KatanoH.NakajimaN.TobiumeM.AinaiA.SekizukaT.HasegawaH.TashiroM.SasakiY.ArakawaY.HataS.WatanabeM.SataT. (2010). Characterization of quasispecies of pandemic 2009 influenza A virus (A/H1N1/2009) by de novo sequencing using a next-generation DNA sequencer. PLoS ONE5:e10256. 10.1371/journal.pone.0010256
87
KwokH.TongA. H. Y.LinC. H.LokS.FarrellP. J.KwongD. L. W.ChiangA. K. S. (2012). Genomic sequencing and comparative analysis of Epstein-Barr virus genome isolated from primary nasopharyngeal carcinoma biopsy. PLoS ONE7:e36939.10.1371/journal.pone.0036939
88
LatailladeM.ChiarellaJ.YangR.DegroskyM.UyJ.SeekinsD.SimenB.JohnE. S.MorenoE.KozalM. (2012). Virologic failures on initial boosted-PI regimen infrequently possess low-level variants with major PI resistance mutations by ultra-deep sequencing. PLoS ONE7:e30118. 10.1371/journal.pone.0030118
89
LatailladeM.ChiarellaJ.YangR.SchnittmanS.WirtzV.UyJ.SeekinsD.KrystalM.ManciniM.McGrathD.SimenB.EgholmM.KozalM. (2010). Prevalence and clinical significance of HIV drug resistance mutations by ultra-deep sequencing in antiretroviral-naive subjects in the CASTLE study. PLoS ONE5:e10952. 10.1371/journal.pone.0010952
90
LauckM.Alvarado-MoraM. V.BeckerE. A.BhattacharyaD.StrikerR.HughesA. L.CarrilhoF. J.O'connorD. H.PinhoJ. R. (2012). Analysis of hepatitis C virus intrahost diversity across the coding region by ultradeep pyrosequencing. J. Virol. 86, 3952–3960. 10.1128/JVI.06627-11
91
LeT.ChiarellaJ.SimenB. B.HanczarukB.EgholmM.LandryM. L.DieckhausK.RosenM. I.KozalM. J. (2009). Low-abundance HIV drug-resistant viral variants in treatment-experienced persons correlate with historical antiretroviral use. PLoS ONE4:e6079. 10.1371/journal.pone.0006079
92
LiJ. Z.ParedesR.RibaudoH. J.SvarovskaiaE. S.MetznerK. J.KozalM. J.HullsiekK. H.BalduinM.JakobsenM. R.GerettiA. M.ThiebautR.OstergaardL.MasquelierB.JohnsonJ. A.MillerM. D.KuritzkesD. R. (2011). Low-frequency HIV-1 drug resistance mutations and risk of NNRTI-based antiretroviral treatment failure: a systematic review and pooled analysis. JAMA305, 1327–1335. 10.1001/jama.2011.375
93
LipkinW. I. (2010). Microbe hunting. Microbiol. Mol. Biol. Rev. 74, 363–377. 10.1128/MMBR.00007-10
94
LiuP.FangX.FengZ.GuoY.-M.PengR.-J.LiuT.HuangZ.FengY.SunX.XiongZ.GuoX.PangS.-S.WangB.LvX.FengF.-T.LiD.-J.ChenL.-Z.FengQ.-S.HuangW.-L.ZengM.-S.BeiJ.-X.ZhangY.ZengY.-X. (2011). Direct sequencing and characterization of a clinical isolate of Epstein-Barr virus from nasopharyngeal carcinoma tissue by using next-generation sequencing technology. J. Virol. 85, 11291–11299. 10.1128/JVI.00823-11
95
LiuS. L.RodrigoA. G.ShankarappaR.LearnG. H.HsuL.DavidovO.ZhaoL. P.MullinsJ. I. (1996). HIV quasispecies and resampling. Science273, 415–416. 10.1126/science.273.5274.415
96
LomanN. J.MisraR. V.DallmanT. J.ConstantinidouC.GharbiaS. E.WainJ.PallenM. J. (2012). Performance comparison of benchtop high-throughput sequencing platforms. Nat. Biotechnol. 30, 434–439.
- Pubmed Abstract
- Google Scholar
97
LorussoA.VincentA. L.HarlandM. L.AltD.BaylesD. O.SwensonS. L.GramerM. R.RussellC. A.SmithD. J.LagerK. M.LewisN. S. (2011). Genetic and antigenic characterization of H1 influenza viruses from United States swine from 2008. J. Gen. Virol. 92, 919–930. 10.1099/vir.0.027557-0
98
MacalaladA. R.ZodyM. C.CharleboisP.LennonN. J.NewmanR. M.MalboeufC. M.RyanE. M.BoutwellC. L.PowerK. A.BrackneyD. E.PeskoK. N.LevinJ. Z.EbelG. D.AllenT. M.BirrenB. W.HennM. R. (2012). Highly sensitive and specific detection of rare variants in mixed viral populations from massively parallel sequence data. PLoS Comput. Biol. 8:e1002417. 10.1371/journal.pcbi.1002417
99
MancusoN.TorkB.MandoiuI. I.SkumsP.ZelikovskyA. (2011). Viral quasispecies reconstruction from amplicon 454 pyrosequencing reads, in Proceedings of the 1st Workshop on Computational Advances in Molecular Epidemiology, (IEEE), 94–101. ISBN: 978-1-4577-1612-6.
- Google Scholar
100
MardisE. R. (2008a). The impact of next-generation sequencing technology on genetics. Trends Genet. 24, 133–141. 10.1016/j.tig.2007.12.007
101
MardisE. R. (2008b). Next-generation DNA sequencing methods. Annu. Rev. Genomics Hum. Genet. 9, 387–402. 10.1146/annurev.genom.9.081307.164359
102
Margeridon-ThermetS.ShulmanN. S.AhmedA.ShahriarR.LiuT.WangC.HolmesS. P.BabrzadehF.GharizadehB.HanczarukB.SimenB. B.EgholmM.ShaferR. W. (2009). Ultra-deep pyrosequencing of hepatitis B virus quasispecies from nucleoside and nucleotide reverse-transcriptase inhibitor (NRTI)-treated patients and NRTI-naive patients. J. Infect. Dis. 199, 1275–1285. 10.1086/597808
103
MarguliesM.EgholmM.AltmanW. E.AttiyaS.BaderJ. S.BembenL. A.BerkaJ.BravermanM. S.ChenY.-J.ChenZ.DewellS. B.DuL.FierroJ. M.GomesX. V.GodwinB. C.HeW.HelgesenS.HoC. H.IrzykG. P.JandoS. C.AlenquerM. L. I.JarvieT. P.JirageK. B.KimJ.-B.KnightJ. R.LanzaJ. R.LeamonJ. H.LefkowitzS. M.LeiM.LiJ.LohmanK. L.LuH.MakhijaniV. B.McDadeK. E.McKennaM. P.MyersE. W.NickersonE.NobileJ. R.PlantR.PucB. P.RonanM. T.RothG. T.SarkisG. J.SimonsJ. F.SimpsonJ. W.SrinivasanM.TartaroK. R.TomaszA.VogtK. A.VolkmerG. A.WangS. H.WangY.WeinerM. P.YuP.BegleyR. F.RothbergJ. M. (2005). Genome sequencing in microfabricated high-density picolitre reactors. Nature437, 376–380. 10.1038/nature03959
104
MessiaenP.VerhofstedeC.VandenbrouckeI.DinakisS.Van EygenV.ThysK.WintersB.AerssensJ.VogelaersD.StuyverL. J.VandekerckhoveL. (2012). Ultra-deep sequencing of HIV-1 reverse transcriptase before start of an NNRTI-based regimen in treatment-naive patients. Virology426, 7–11. 10.1016/j.virol.2012.01.002
105
MetzkerM. L. (2010). Sequencing technologies - the next generation. Nat. Rev. Genet. 11, 31–46. 10.1038/nrg2626
106
MetznerK. J.BonhoefferS.FischerM.KaranicolasR.AllersK.JoosB.WeberR.HirschelB.KostrikisL. G.GünthardH. F.StudyT. S. H. C. (2003). Emergence of minor populations of human immunodeficiency virus type 1 carrying the M184V and L90M mutations in subjects undergoing structured treatment interruptions. J. Infect. Dis. 188, 1433–1443. 10.1086/379215
107
MetznerK. J.GiulieriS. G.KnoepfelS. A.RauchP.BurgisserP.YerlyS.GunthardH. F.CavassiniM. (2009). Minority quasispecies of drug-resistant HIV-1 that lead to early therapy failure in treatment-naive and -adherent patients. Clin. Infect. Dis. 48, 239–247. 10.1086/595703
108
MeyerhansA.VartanianJ. P.Wain-HobsonS. (1990). DNA recombination during PCR. Nucleic Acids Res. 18, 1687–1691. 10.1093/nar/18.7.1687
109
MildM.HedskogC.JernbergJ.AlbertJ. (2011). Performance of ultra-deep pyrosequencing in analysis of HIV-1 pol gene variation. PLoS ONE6:e22741. 10.1371/journal.pone.0022741
110
MitsuyaY.VargheseV.WangC.LiuT. F.HolmesS. P.JayakumarP.GharizadehB.RonaghiM.KleinD.FesselW. J.ShaferR. W. (2008). Minority human immunodeficiency virus type 1 variants in antiretroviral-naive persons with reverse transcriptase codon 215 revertant mutations. J. Virol. 82, 10747–10755. 10.1128/JVI.01827-07
111
MoorthyA.KuhnL.CoovadiaA.MeyersT.StrehlauR.ShermanG.TsaiW. Y.ChenY. H.AbramsE. J.PersaudD. (2011). Induction therapy with protease-inhibitors modifies the effect of nevirapine resistance on virologic response to nevirapine-based HAART in children. Clin. Infect. Dis. 52, 514–521. 10.1093/cid/ciq161
112
MukherjeeR.JensenS. T.MaleF.BittingerK.HodinkaR. L.MillerM. D.BushmanF. D. (2011). Switching between raltegravir resistance pathways analyzed by deep sequencing. AIDS25, 1951–1959. 10.1097/QAD.0b013e32834b34de
113
NakamuraK.OshimaT.MorimotoT.IkedaS.YoshikawaH.ShiwaY.IshikawaS.LinakM. C.HiraiA.TakahashiH.Ul-AminM. A.OgasawaraN.KanayaS. (2011). Sequence-specific error profile of Illumina sequencers. Nucleic Acids Res. 39, e90. 10.1093/nar/gkr344
114
NasuA.MarusawaH.UedaY.NishijimaN.TakahashiK.OsakiY.YamashitaY.InokumaT.TamadaT.FujiwaraT.SatoF.ShimizuK.ChibaT. (2011). Genetic heterogeneity of hepatitis C virus in association with antiviral therapy determined by ultra-deep sequencing. PLoS ONE6:e24907. 10.1371/journal.pone.0024907
115
NinomiyaM.UenoY.FunayamaR.NagashimaT.NishidaY.KondoY.InoueJ.KakazuE.KimuraO.NakayamaK.ShimosegawaT. (2012). Use of illumina deep sequencing technology to differentiate hepatitis C virus variants. J. Clin. Microbiol. 50, 857–866. 10.1128/JCM.05715-11
116
NishijimaN.MarusawaH.UedaY.TakahashiK.NasuA.OsakiY.KouT.YazumiS.FujiwaraT.TsuchiyaS.ShimizuK.UemotoS.ChibaT. (2012). Dynamics of hepatitis B virus quasispecies in association with nucleos(t)ide analogue treatment determined by ultra-deep sequencing. PLoS ONE7:e35052.10.1371/journal.pone.0035052
117
NowakM. A. (1992). What is a quasispecies?Trends Ecol. Evol. 7, 118–121. 10.1016/0169-5347(92)90145-2
118
O'NeilS. T.EmrichS. J. (2012). Haplotype and minimum-chimerism consesus determination using short sequence data. BMC Genomics13, S4.
- Google Scholar
119
OjosnegrosS.BeerenwinkelN.AntalT.NowakM. A.EscarmísC.DomingoE. (2010). Competition-colonization dynamics in an RNA virus. Proc. Natl. Acad. Sci. U.S.A. 107, 2108–2112. 10.1073/pnas.0909787107
120
PoonA. F.McGovernR. A.MoT.KnappD. J.BrennerB.RoutyJ. P.WainbergM. A.HarriganP. R. (2011). Dates of HIV infection can be estimated for seroprevalent patients by coalescent analysis of serial next-generation sequencing data. AIDS25, 2019–2026. 10.1097/QAD.0b013e32834b643c
121
PoonA. F.SwensonL. C.DongW. W.DengW.Kosakovsky PondS. L.BrummeZ. L.MullinsJ. I.RichmanD. D.HarriganP. R.FrostS. D. (2010). Phylogenetic analysis of population-based and deep sequencing data to identify coevolving sites in the nef gene of HIV-1. Mol. Biol. Evol. 27, 819–832. 10.1093/molbev/msp289
122
PopM.SalzbergS. L. (2008). Bioinformatics challenges of new sequencing technology. Trends Genet. 24, 142–149. 10.1016/j.tig.2007.12.006
123
PowdrillM. H.TchesnokovE. P.KozakR. A.RussellR. S.MartinR.SvarovskaiaE. S.MoH.KouyosR. D.GotteM. (2011). Contribution of a mutational bias in hepatitis C virus replication to the genetic barrier in the development of drug resistance. Proc. Natl. Acad. Sci. U.S.A. 108, 20509–20513. 10.1073/pnas.1105797108
124
PrabhakaranS.ReyM.ZagordiO.BeerenwinkelN.RothV. (2010). HIV haplotype inference using a constraint-based Dirichlet process mixture model, in NIPS Workshop on Machine Learning in Computational Biology.
- Google Scholar
125
PrestonB. D.PoieszB. J.LoebL. A. (1988). Fidelity of HIV-1 reverse transcriptase. Science242, 1168–1171. 10.1126/science.2460924
126
ProsperiM. C. F.ProsperiL.BrusellesA.AbbateI.RozeraG.VincentiD.SolmoneM. C.CapobianchiM. R.UliviG. (2011). Combinatorial analysis and algorithms for quasispecies reconstruction using next-generation sequencing. BMC Bioinformatics12, 5.10.1186/1471-2105-12-5
127
ProsperiM. C. F.SalemiM. (2012). QuRe: software for viral quasispecies reconstruction from next-generation sequencing data. Bioinformatics28, 132–133. 10.1093/bioinformatics/btr627
128
PybusO. G.RambautA. (2009). Evolutionary analysis of the dynamics of viral infectious disease. Nat. Rev. Genet. 10, 540–550. 10.1038/nrg2583
129
QuinceC.LanzénA.CurtisT. P.DavenportR. J.HallN.HeadI. M.ReadL. F.SloanW. T. (2009). Accurate determination of microbial diversity from 454 pyrosequencing data. Nat. Methods6, 639–641. 10.1038/nmeth.1361
130
QuinceC.LanzenA.DavenportR. J.TurnbaughP. J. (2011). Removing noise from pyrosequenced amplicons. BMC Bioinformatics12, 38. 10.1186/1471-2105-12-38
131
RamakrishnanM. A.TuZ. J.SinghS.ChockalingamA. K.GramerM. R.WangP.GoyalS. M.YangM.HalvorsonD. A.SreevatsanS. (2009). The feasibility of using high resolution genome sequencing of influenza A viruses to detect mixed infections and quasispecies. PLoS ONE4:e7105.10.1371/journal.pone.0007105
132
RasmussenC. E. (2000). The infinite gaussian mixture model, in NIPS, eds SollaS. A.LeenT. K.MüllerK.-R. (The MIT Press), 554–560.
- Google Scholar
133
RaymondS.SaliouA.NicotF.DelobelP.DuboisM.CazabatM.Sandres-SauneK.MarchouB.MassipP.IzopetJ. (2011). Frequency of CXCR4-using viruses in primary HIV-1 infections using ultra-deep pyrosequencing. AIDS25, 1668–1670. 10.1097/QAD.0b013e3283498305
134
ReddA. D.MullisC. E.SerwaddaD.KongX.MartensC.RicklefsS. M.TobianA. A.XiaoC.GrabowskiM. K.NalugodaF.KigoziG.LaeyendeckerO.KagaayiJ.SewankamboN.GrayR. H.PorcellaS. F.WawerM. J.QuinnT. C. (2012). The rates of HIV superinfection and primary HIV incidence in a general population in Rakai, Uganda. J. Infect. Dis. 206, 267–274. 10.1093/infdis/jis325
135
ReumanE. C.Margeridon-ThermetS.CaudillH. B.LiuT.Borroto-EsodaK.SvarovskaiaE. S.HolmesS. P.ShaferR. W. (2010). A classification model for G-to-A hypermutation in hepatitis B virus ultra-deep pyrosequencing reads. Bioinformatics26, 2929–2932. 10.1093/bioinformatics/btq570
136
ReumersJ.RijkP. D.ZhaoH.LiekensA.SmeetsD.ClearyJ.LooP. V.BosscheM. V. D.CatthoorK.SabbeB.DespierreE.VergoteI.HilbushB.LambrechtsD.Del-FaveroJ. (2011). Optimized filtering reduces the error rate in detecting genomic variants by short-read sequencing. Nat. Biotechnol. 30, 61–68. 10.1038/nbt.2053
137
ReyesG. R.KimJ. P. (1991). Sequence-independent, single-primer amplification (SISPA) of complex DNA populations. Mol. Cell. Probes5, 473–481.
- Pubmed Abstract
- Google Scholar
138
RobertsJ. D.BebenekK.KunkelT. A. (1988). The accuracy of reverse transcriptase from HIV-1. Science242, 1171–1173. 10.1126/science.2460925
139
Rodriguez-FríasF.TaberneroD.QuerJ.EstebanJ. I.OrtegaI.DomingoE.CuberoM.CamósS.Ferrer-CostaC.SánchezA.JardíR.SchaperM.HomsM.Garcia-CehicD.GuardiaJ.EstebanR.ButiM. (2012). Ultra-deep pyrosequencing detects conserved genomic sites and quantifies linkage of drug-resistant amino acid changes in the hepatitis B virus genome. PLoS ONE7:e37874. 10.1371/journal.pone.0037874
140
RozeraG.AbbateI.BrusellesA.VlassiC.D'offiziG.NarcisoP.ChillemiG.ProsperiM.IppolitoG.CapobianchiM. R. (2009). Massively parallel pyrosequencing highlights minority variants in the HIV-1 env quasispecies deriving from lymphomonocyte sub-populations. Retrovirology6, 15. 10.1186/1742-4690-6-15
141
SaeedF.KhokharA.ZagordiO.BeerenwinkelN. (2009). Multiple sequence alignment system for pyrosequencing reads, in BICoB 2009, LNBI 5462, ed RajasekaranS. (Berlin Heidelberg: Springer-Verlag), 362–375.
- Google Scholar
142
SaliouA.DelobelP.DuboisM.NicotF.RaymondS.CalvezV.MasquelierB.IzopetJ. (2011). Concordance between two phenotypic assays and ultradeep pyrosequencing for determining HIV-1 tropism. Antimicrob. Agents Chemother. 55, 2831–2836. 10.1128/AAC.00091-11
143
SchusterS. C. (2008). Next-generation sequencing transforms today's biology. Nat. Methods5, 16–18. 10.1038/nmeth1156
144
SedeM.OjedaD.CassinoL.WestergaardG.VazquezM.BenettiS.FayF.TannoH.QuarleriJ. (2012). Long-term monitoring drug resistance by ultra-deep pyrosequencing in a chronic hepatitis B virus (HBV)-infected patient exposed to several unsuccessful therapy schemes. Antiviral Res. 94, 184–187. 10.1016/j.antiviral.2012.03.003
145
SimenB. B.SimonsJ. F.HullsiekK. H.NovakR. M.MacArthurR. D.BaxterJ. D.HuangC.LubeskiC.TurenchalkG. S.BravermanM. S.DesanyB.RothbergJ. M.EgholmM.KozalM. J. (2009). Low-abundance drug-resistant viral variants in chronically HIV-infected, antiretroviral treatment-naive patients significantly impact treatment outcomes. J. Infect. Dis. 199, 693–701. 10.1086/596736
146
SkumsP.DimitrovaZ.CampoD. S.VaughanG.RossiL.ForbiJ. C.YokosawaJ.ZelikovskyA.KhudyakovY. (2012). Efficient error correction for next-generation sequencing of viral amplicons. BMC Bioinformatics13(Suppl. 10), S6.10.1186/1471-2105-13-S10-S6
147
SolmoneM.VincentiD.ProsperiM. C.BrusellesA.IppolitoG.CapobianchiM. R. (2009). Use of massively parallel ultradeep pyrosequencing to characterize the genetic diversity of hepatitis B virus in drug-resistant and drug-naive patients and to detect minor variants in reverse transcriptase and hepatitis B S antigen. J. Virol. 83, 1718–1726. 10.1128/JVI.02011-08
148
StelzlE.ProllJ.BizonB.NiklasN.DanzerM.HacklC.StabentheinerS.GabrielC.KesslerH. H. (2011). Human immunodeficiency virus type 1 drug resistance testing: evaluation of a new ultra-deep sequencing-based protocol and comparison with the TRUGENE HIV-1 genotyping kit. J. Virol. Methods178, 94–97. 10.1016/j.jviromet.2011.08.020
149
SvicherV.BalestraE.CentoV.SarmatiL.DoriL.VandenbrouckeI.D'ArrigoR.BuonominiA. R.MarckH. V.SurdoM.SaccomandiP.MostmansW.AerssensJ.AquaroS.StuyverL. J.AndreoniM.Ceccherini-SilbersteinF.PernoC. F. (2011). HIV-1 dual/mixed tropic isolates show different genetic and phenotypic characteristics and response to maraviroc in vitro. Antiviral Res. 90, 42–53. 10.1016/j.antiviral.2011.02.005
150
SwensonL. C.MoT.DongW. W.ZhongX.WoodsC. K.JensenM. A.ThielenA.ChapmanD.LewisM.JamesI.HeeraJ.ValdezH.HarriganP. R. (2011a). Deep sequencing to infer HIV-1 co-receptor usage: application to three clinical trials of maraviroc in treatment-experienced patients. J. Infect. Dis. 203, 237–245. 10.1093/infdis/jiq030
151
SwensonL. C.MoT.DongW. W. Y.ZhongX.WoodsC. K.ThielenA.JensenM. A.KnappD. J. H. F.ChapmanD.PortsmouthS.LewisM.JamesI.HeeraJ.ValdezH.HarriganP. R. (2011b). Deep V3 sequencing for HIV type 1 tropism in treatment-naive patients: a reanalysis of the MERIT trial of maraviroc. Clin. Infect. Dis. 53, 732–742. 10.1093/cid/cir493
152
SwensonL. C.MooresA.LowA. J.ThielenA.DongW.WoodsC.JensenM. A.WynhovenB.ChanD.GlascockC.HarriganP. R. (2010). Improved detection of CXCR4-using HIV by V3 genotyping: application of population-based and “deep” sequencing to plasma RNA and proviral DNA. J. Acquir. Immune Defic. Syndr. 54, 506–510. 10.1097/QAI.0b013e3181d0558f
153
TapparelC.CordeyS.JunierT.FarinelliL.Van BelleS.SoccalP. M.AubertJ. D.ZdobnovE.KaiserL. (2011). Rhinovirus genome variation during chronic upper and lower respiratory tract infections. PLoS ONE6:e21163. 10.1371/journal.pone.0021163
154
TrapnellC.SalzbergS. L. (2009). How to map billions of short reads onto genomes. Nat. Biotechnol. 27, 455–457. 10.1038/nbt0509-455
155
TurnerE. H.NgS. B.NickersonD. A.ShendureJ. (2009). Methods for genomic partitioning. Annu. Rev. Genomics Hum. Genet. 10, 263–284. 10.1146/annurev-genom-082908-150112
156
Van NimwegenE.CrutchfieldJ. P.HuynenM. (1999). Neutral evolution of mutational robustness. Proc. Natl. Acad. Sci. U.S.A. 96, 9716–9720. 10.1073/pnas.96.17.9716
157
VandekerckhoveL.VerhofstedeC.DemecheleerE.De WitS.FlorenceE.FransenK.MoutschenM.MostmansW.KabeyaK.MackieN.PlumJ.VairaD.Van BaelenK.VandenbrouckeI.Van EygenV.Van MarckH.VogelaersD.GerettiA. M.StuyverL. J. (2011). Comparison of phenotypic and genotypic tropism determination in triple-class-experienced HIV patients eligible for maraviroc treatment. J. Antimicrob. Chemother. 66, 265–272. 10.1093/jac/dkq458
158
VandenbrouckeI.Van MarckH.MostmansW.Van EygenV.RondelezE.ThysK.Van BaelenK.FransenK.VairaD.KabeyaK.De WitS.FlorenceE.MoutschenM.VandekerckhoveL.VerhofstedeC.StuyverL. J. (2010). HIV-1 V3 envelope deep sequencing for clinical plasma specimens failing in phenotypic tropism assays. AIDS Res. Ther. 7, 4. 10.1186/1742-6405-7-4
159
VarelaI.TarpeyP.RaineK.HuangD.OngC. K.StephensP.DaviesH.JonesD.LinM.-L.TeagueJ.BignellG.ButlerA.ChoJ.DalglieshG. L.GalappaththigeD.GreenmanC.HardyC.JiaM.LatimerC.LauK. W.MarshallJ.MclarenS.MenziesA.MudieL.StebbingsL.LargaespadaD. A.WesselsL. F. A.RichardS.KahnoskiR. J.AnemaJ.TuvesonD. A.Perez-ManceraP. A.MustonenV.FischerA.AdamsD. J.RustA.OnW. C.SubimerbC.DykemaK.FurgeK.CampbellP. J.TehB. T.StrattonM. R.FutrealP. A. (2011). Exome sequencing identifies frequent mutation of the SWI/SNF complex gene PBRM1 in renal carcinoma. Nature469, 539–542. 10.1038/nature09639
160
VargheseV.ShahriarR.RheeS. Y.LiuT.SimenB. B.EgholmM.HanczarukB.BlakeL. A.GharizadehB.BabrzadehF.BachmannM. H.FesselW. J.ShaferR. W. (2009). Minority variants associated with transmitted and acquired HIV-1 nonnucleoside reverse transcriptase inhibitor resistance: implications for the use of second-generation nonnucleoside reverse transcriptase inhibitors. J. Acquir. Immune Defic. Syndr. 52, 309–315. 10.1097/QAI.0b013e3181bca669
161
VignuzziM.StoneJ. K.ArnoldJ. J.CameronC. E.AndinoR. (2006). Quasispecies diversity determines pathogenesis through cooperative interactions in a viral population. Nature439, 344–348. 10.1038/nature04388
162
VranckenB.LequimeS.TheysK.LemeyP. (2010). Covering all bases in HIV research: unveiling a hidden world of viral evolution. AIDS Rev. 12, 89–102.
- Pubmed Abstract
- Google Scholar
163
WangC.MitsuyaY.GharizadehB.RonaghiM.ShaferR. W. (2007). Characterization of mutation spectra with ultra-deep pyrosequencing: application to HIV-1 drug resistance. Genome Res. 17, 1195–1201. 10.1101/gr.6468307
164
WangG. P.Sherrill-MixS. A.ChangK. M.QuinceC.BushmanF. D. (2010). Hepatitis C virus transmission bottlenecks analyzed by deep sequencing. J. Virol. 84, 6218–6228. 10.1128/JVI.02271-09
165
WestbrooksK.AstrovskayaI.CampoD.KhudyakovY.BermanP.ZelikovskyA. (2008). HCV quasispecies assembly using network flows, in ISBRA 2008, LNBI 4983, eds MăndoiuI.SunderramanR.ZelikovskyA. (Berlin Heidelberg: Springer-Verlag), 159–170.
- Google Scholar
166
WHO. (2012). World Health Organization [Online]. Available online at: www.who.int [Accessed 1 May 2012].
- Google Scholar
167
Wikipedia (2012). List of sequence alignment software [Online]. Available: http://en.wikipedia.org/wiki/List_of_sequence_alignment_software#Short-Read_Sequence_Alignment [Accessed 1 May 2012].
- Google Scholar
168
WilkeC. O. (2005). Quasispecies theory in the context of population genetics. BMC Evol. Biol. 5, 44. 10.1186/1471-2148-5-44
169
WilkeC. O.WangJ. L.OfriaC.LenskiR. E.AdamiC. (2001). Evolution of digital organisms at high mutation rates leads to survival of the flattest. Nature412, 331–333. 10.1038/35085569
170
WillerthS. M.PedroH. A.PachterL.HumeauL. M.ArkinA. P.SchafferD. V. (2010). Development of a low bias method for characterizing viral populations using next generation sequencing technology. PLoS ONE5:e13564. 10.1371/journal.pone.0013564
171
WuX.ZhouT.ZhuJ.ZhangB.GeorgievI.WangC.ChenX.LongoN. S.LouderM.McKeeK.O'DellS.PerfettoS.SchmidtS. D.ShiW.WuL.YangY.YangZ. Y.YangZ.ZhangZ.BonsignoriM.CrumpJ. A.KapigaS. H.SamN. E.HaynesB. F.SimekM.BurtonD. R.KoffW. C.Doria-RoseN. A.ConnorsM.MullikinJ. C.NabelG. J.RoedererM.ShapiroL.KwongP. D.MascolaJ. R. (2011). Focused evolution of HIV-1 neutralizing antibodies revealed by structures and deep sequencing. Science333, 1593–1602. 10.1126/science.1207532
172
YangX.ChockalingamS. P.AluruS. (2012). A survey of error-correction methods for next-generation sequencing. Brief Bioinform. [Epub ahead of print] 10.1093/bib/bbs015
173
ZagordiO.BhattacharyaA.ErikssonN.BeerenwinkelN. (2011). ShoRAH: estimating the genetic diversity of a mixed sample from next-generation sequencing data. BMC Bioinformatics12, 119. 10.1186/1471-2105-12-119
174
ZagordiO.GeyrhoferL.RothV.BeerenwinkelN. (2010a). Deep sequencing of a genetically heterogeneous sample: local haplotype reconstruction and read error correction. J. Comput. Biol. 17, 417–428. 10.1089/cmb.2009.0164
175
ZagordiO.KleinR.DäumerM.BeerenwinkelN. (2010b). Error correction of next-generation sequencing data and reliable estimation of HIV quasispecies. Nucleic Acids Res. 38, 7400–7409. 10.1093/nar/gkq655
176
ZagordiO.TöpferA.PrabhakaranS.RothV.HalperinE.BeerenwinkelN. (2012). Probabilistic inference of viral quasispecies subject to recombination, in RECOMB 2012, LNBI 7262, ed ChorB. (Berlin Heidelberg: Springer-Verlag), 342–354.
- Google Scholar
177
ZellR.TaudienS.PfaffF.WutzlerP.PlatzerM.SauerbreiA. (2012). Sequencing of 21 varicella-zoster virus genomes reveals two novel genotypes and evidence of recombination. J. Virol. 86, 1608–1622. 10.1128/JVI.06233-11
178
ZhaoX.PalmerL. E.BolanosR.MirceanC.FasuloD.WittenbergG. M. (2010). EDAR: an efficient error detection and removal algorithm for next generation sequencing data. J. Comput. Biol. 17, 1549–1560. 10.1089/cmb.2010.0127

Summary

Keywords

next-generation sequencing, viral diversity, viral quasispecies, statistics, bioinformatics, haplotype inference, error correction, quasispecies assembly

Citation

Beerenwinkel N, Günthard HF, Roth V and Metzner KJ (2012) Challenges and opportunities in estimating viral genetic diversity from next-generation sequencing data. Front. Microbio. 3:329. doi: 10.3389/fmicb.2012.00329

Received

15 June 2012

Accepted

24 August 2012

Published

11 September 2012

Volume

3 - 2012

Edited by

Masaru Yokoyama, National Institute of Infectious Diseases, Japan

Reviewed by

Masaru Yokoyama, National Institute of Infectious Diseases, Japan; Fabio Luciani, University of New South Wales, Australia

This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.

*Correspondence: Niko Beerenwinkel, Department of Biosystems Science and Engineering, ETH Zurich, WRO-1058 8.40, Mattenstrasse 26, 4058 Basel, Switzerland. e-mail: niko.beerenwinkel@bsse.ethz.ch

This article was submitted to Frontiers in Virology, a specialty of Frontiers in Microbiology.

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Virology

REVIEW article

Challenges and opportunities in estimating viral genetic diversity from next-generation sequencing data

Abstract

Introduction

Sample preparation

Next-generation sequencing

Local diversity estimation

Global diversity estimation

Applications

Outlook and conclusions

Conflict of interest statement

Statements

Acknowledgments

Conflict of interest

References

Summary

Outline

Figures

Cite article

Article metrics

REVIEW article

Challenges and opportunities in estimating viral genetic diversity from next-generation sequencing data

Abstract

Introduction

Sample preparation

Next-generation sequencing

Local diversity estimation

Global diversity estimation

Applications

Outlook and conclusions

Conflict of interest statement

Statements

Acknowledgments

Conflict of interest

References

Summary

Outline

Figures

Cite article

Share article

Article metrics