Composition and Dynamics of H1N1 and H7N9 Influenza A Virus Quasispecies in a Co-infected Patient Analyzed by Single Molecule Sequencing Technology

A human co-infected with H1N1 and H7N9 subtypes influenza A virus (IAV) causes a complex infectious disease. The identification of molecular-level variations in composition and dynamics of IAV quasispecies will help to understand the pathogenesis and provide guidance for precision medicine treatment. In this study, using single-molecule real-time sequencing (SMRT) technology, we successfully acquired full-length IAV genomic sequences and quantified their genotypes abundance in serial samples from an 81-year-old male co-infected with H1N1 and H7N9 subtypes IAV. A total of 26 high diversity nucleotide loci was detected, in which the A-G base transversion was the most abundant substitution type (67 and 64%, in H1N1 and H7N9, respectively). Seven significant amino acid variations were detected, such as NA:H275Y and HA: R222K in H1N1 as well as PB2:E627K and NA: K432E in H7N9, which are related to viral drug-resistance or mammalian adaptation. Furtherly, we retrieved 25 H1N1 and 22 H7N9 genomic segment haplotypes from the eight samples based on combining high-diversity nucleotide loci, which provided a more concise overview of viral quasispecies composition and dynamics. Our approach promotes the popularization of viral quasispecies analysis in a complex infectious disease, which will boost the understanding of viral infections, pathogenesis, evolution, and precision medicine.

Influenza A viruses in a host exist as a population including thousands of virions containing closely related (but nonidentical) genomes, also called quasispecies (Lauring and Andino, 2010;Martínez et al., 2012;Watanabe et al., 2018;Bonomo et al., 2019;Domingo and Perales, 2019). These closely related genomes result from error-prone replication and frequent reassortment of influenza virus genomes (Steel and Lowen, 2014;Pauly et al., 2017). The complicated interactions (cooperativity or interference) among genomes and their productions collectively determine the biological or medical implications of a viral population such as fitness, virulence, pathogenesis, immune escape, or drug resistance (Vignuzzi et al., 2006;Sanz-Ramos et al., 2008;Aragri et al., 2016;Schuster, 2016;Perales, 2020). Therefore, it is primary to reveal the composition and dynamic of viral quasispecies to better understand viral infection, adaptation, and evolution at the level of population (Xue et al., 2017;Donohue et al., 2019;Jary et al., 2020).
Achieving thousands of full-length genomes in a viral population is decisive for quasispecies composition. Previous short-read massively parallel sequencing (MPS) projects have collected abundant consensus genomic sequences (CGSs) and single nucleotide variants (SNVs) to explore influenza virus quasispecies (Van den Hoecke et al., 2015;Ali et al., 2016;McGinnis et al., 2016). Nevertheless, the quasispecies composition is still unclear because of degenerated CGSs and scattered SNVs rooted in short reads from MPS (Schadt et al., 2010;Beerenwinkel et al., 2012;Chen et al., 2018b). The long-read single-molecule real-time sequencing (SMRT) provides an access to full-length influenza virus genomes even with a low frequency in a viral population (Ardui et al., 2018;Lui et al., 2019). The circular consensus sequencing (CCS) reads (average length 13.5 kb) produced by SMRT are 5-15 times as long as the genomic RNAs of influenza A virus, avoiding the fragmentation and assembly of genomes before and after sequencing (Wenger et al., 2019a;Van Poelvoorde et al., 2020).
The sample co-infected with two IAV subtypes is a very good opportunity to embody the advantage of SMRT in distinguishing different-subtype IAV genomic sequences and to quantify their abundances. The co-infection in avian hosts is common (29.59% in the live poultry market during 2016-2019 in China) (Bi et al., 2020). However, to our knowledge, there were only two human cases co-infected with two IAV subtypes reported in China since 2013. One case was a 15-year-old male co-infected with H7N9 and H3N2 in Jiangsu Province in April, 2013 and the other was a 58-year-old male with H7N9 and H1N1 in Zhejiang Province in January, 2014 (Zhu et al., 2013;Li et al., 2014). In this study, an 81-year-old male was diagnosed with H1N1 and H7N9 IAV by RT-PCR in Zhejiang Province in January 2016. Furtherly, the composition and dynamics of H1N1 and H7N9 IAV quasispecies in eight serial samples from this patient were revealed by SMRT, which provided a window to observe the viral quasispecies changes during the patient's hospitalization treated with antiviral drug oseltamivir.

Patient, Symptoms and Therapies
An 81-year-old male had a slight cough and chest distress on 1/ 12/2016 at his home in Xihu District, Hangzhou City, Zhejiang Province, China. On the morning of 1/15/2016, the symptoms worsened with a nasty cough, chest distress and fever. That afternoon, the man went to the local community hospital, where the temperature was 38.5 C, and then he was sent to the Hangzhou First People's Hospital for medical treatment. The examination showed that the white blood cell count (WBC) was 13.4 × 10 9 /L, the percentage of neutrophils (N%) was 92.2%, and the C-reactive protein (CRP) was 110 mg/L. The chest radiographs showed an infection of the right lower lung. The mezlocillin sodium and sulbactam sodium were given for intravenous injection as an anti-infective therapy. The patient was sent to the respiratory department for hospital treatment and pneumonia was confirmed on 1/16/2016. The next day, a PCR result from a throat swab was positive on influenza A virus and the patient was sent to infection ward for further treatment which included oseltamivir (75mg/bid) and meropenem drugs. On 1/19/2016, patient's symptoms worsened further and chest radiographs confirmed that infections spread on both lungs. The patient received endotracheal intubation and then admitted to intensive care unit (ICU)The treatment was continued, the dose of oseltamivir was doubled (150mg/bid) and meropenem was changed to imipenem. On the same day, the patient's RT-PCR test taken from the throat was positive on the M gene, H7 gene, N9 gene and H1 gene of influenza A virus. Next day, the patient was transferred to Hangzhou Xixi Hospital for treatment in isolation where the anti-viral oseltamivir (150mg/ bid) continued to be given until 2/20/2016. Although, the symptomatic treatments such as diuresis, analgesia, vasodilation and nutritional support has been given in the Xixi hospital, the patient did not show signs of improvement, and passed away on 2/28/2016.
In the patient's anamnesis it is stated that the patient went to local live poultry market and bought a live duck at a merchant's site about a week before the first symptoms appeared on 1/12/ 2016. The live duck was slaughtered, depilated, and bellied by the merchant at his site. After returning home, the patient salted the duck. The patient had a history of hypertension and denied the history of diabetes, viral hepatitis, tuberculosis, and other diseases. There is no history of trauma, surgery, or blood transfusion. Denying any history of drug or food allergies.

Samples and RT-PCR
Ten serial samples were collected from this patient from 1/19/ 2016 to 2/19/2016, including seven throat swabs and three sputa. The swabs and sputa were placed into 1 ml viral transport medium, transported to the laboratory within 24 h at 4°C, and then frozen at −80°C. Viral RNAs were extracted from samples using a RNeasy Mini Kit (QIAGEN, Germany). Identification of influenza A virus was achieved by RT-PCR using specific primers targeting the M, H7, N9, and H1 gene according to the protocol provided by WHO manual (Organization, W.H, 2002). This study was approved by the Institutional Review Board of BGI (NO.BGI-IRB 16008).

Single-Molecule Real-Time Sequencing
Top eight samples were taken to perform single-molecule real-time sequencing (SMRT). The cDNAs were synthesized from viral RNAs by reverse transcription using Uni12 and Uni13 primers (Bi et al., 2016). The PCR was performed using a Phusion High-Fidelity PCR Kit (New England Biolabs) utilizing the barcoded influenza A virus general primers (Supplementary Table S1) (Mei et al., 2016). The concentration of PCR product was quantified by the Agilent Technologies 2,100 bioanalyzer. The two corresponding volumes of PCR products (containing equal mass of dsDNA) were mixed into one sample and quantified in the bioanalyzer again. About 2-3 μg mixed sample was used to SMRTbell library construction following the 2 kb template preparation protocol (Roberts et al., 2013b). The sequencing was performed on a PacBio RS II instrument (Pacific Biosciences, USA) with one SMRT Cell used for each library, using P6/C4 chemistry with a 4 h movie (Bull et al., 2016). SMRTbell adapter sequences were removed and circular consensus sequence (CCS) reads were achieved with SMRT Analysis v2.3 (Roberts et al., 2013a).

Sequence Quality Control
The raw CCS reads were filtered by removing low quality reads (length<800bp, passes<5 or estimated accuracy<99.9%). The 800 bp length near the lower limit of influenza A virus genomic RNAs was used to exclude non-full-length genomic sequences. The other two criteria ensured reads with at least 99.9% estimated accuracy and necessary passes (Korlach, 2015). The sequencing error bases (frequency<0.3%) were additionally corrected to improve sequence reliability as follows: First, the remaining sequences after filtration were split into corresponding samples by 100% base match with barcodes. Then, the sequences of one sample were grouped by subtypes and genomic segments according to the sequence annotation result against an influenza virus genomes database downloaded from NCBI (https://ftp.ncbi. nih.gov/genomes/INFLUENZA) using BLASR (v5.1 with options: -bestn 1) (Chaisson and Tesler, 2012). All full-length genomic sequences of one group were aligned end to end using MUSCLE (v3.8.31 with default options) (Edgar, 2004). Following, the number and percentage of base A/C/T/G in the same nucleotide locus of genomic sequences were stated. Finally, the very low frequency base which percentage was less than 0.3% in the number of four type bases of the same nucleotide locus, was replaced with the dominant type of base with the largest proportion in this nucleotide locus.

Diversity Index of Genomic Sequences
The diversity index (Shannon entropy) of one group sequences is calculated by the formula (Crooks and Brenner, 2004): In which S is the Shannon entropy and P i is the ration of the number of one type of sequence to the number of total types of sequence in one group.

Nucleotide Loci With High-Diversity Base Composition
In order to screen out nucleotide loci with high-diversity base composition, we stated the number and percentage of base A/C/ T/G in one nucleotide locus and screened out the loci in which the percentages of at least two types of bases were more than 10%.

Sequences Quality Control
The clinical symptoms and therapeutic schedule of this patient were recorded in Table 1. Ten samples were collected from this patient on different days, including seven throat swabs (S1-4, S7-10) and three sputa (S5-7). The collected date, sample types and Ct value of RT-PCR for H1N1 and H7N9 of each sample were listed in Table 2. The top eight samples (S1-8) were performed using SMRT with four SMRT cells (S9 and S10 were RT-PCR negative for influenza A virus). A total of 142,496 CCS reads (221.72 Mb) were generated from four SMRT cells, of which 82,471 high quality reads (≥99.9% estimated accuracy) were selected for further analysis according to strict filtering criteria. The IAV mutation rate was about 0.018-0.025%, which means that each replicated influenza genome (∼13 kb) contained an average of 2-3 mutations (Pauly et al., 2017). However, the estimated base sequencing error rate of high-quality CCS reads (∼0.1%) in this study, was notable higher than the normal replication mutation rate (0.018-0.025%) of IAV genomes. Therefore, it was necessary to correct sequencing error bases in the prevention of taking them for real mutations. Finally, 69 group sequences were clustered from four SMRT cells according to samples, subtypes, and genomic segments ( Figure 1A). Among them, 58 group sequences were with a satisfactory abundance (the sequences number ≥20). In case that the base type (A/C/G/ T) in one nucleotide locus of one group sequences was less than 0.3%, it was considered as sequencing error base. The base sequencing error rates of 58 group sequences ranged from 0.13 to 0.21%, approximate to the estimated sequencing error rate of 0.1%, which are composed of 78.37% mismatches, 13.78% deletions and 7.16% insertions ( Figures 1B,C and Supplementary Table S2). Thus, we set 0.3% as the cutoff value to screen out sequencing error base types in nucleotide loci of each group sequences and correct them with dominant base types in these loci.

Monitoring the Composition and Dynamics of H1N1 and H7N9 Sequences
All the influenza virus genomic sequences produced by SMRT were clustered into 69 groups by eight samples, two subtypes, and eight genomic segments ( Figure 1A). To monitor the composition and dynamic of H1N1 and H7N9 genomic sequences, we calculated the number of sequence reads, the number of sequence types, and the diversity index of sequence types in each group (Figure 2). The 16 groups (eight from H1N1 and eight from H7N9) in the first sample S1 confirmed that this patient was coinfected with H1N1 and H7N9 IAV. Interestingly, in sample S1, although the number of H1N1 and H7N9 sequence reads were almost equal, the diversity index of H7N9 sequence types was obviously higher than that of H1N1 ( Figure 2). The reasons for the diversity difference of sequence types between H1N1 and H7N9 were unclear. One possible explanation was that in the upper respiratory tract (URT) environment with the dominant α-2,6-SA receptors preferentially recognized by H1N1 virus; the H7N9 virus had to generate more various genomic sequences to adapt to the relative hostile environment (Chen et al., 2013).
Furtherly, there was a sharp decrease of H7N9 viral load in the URT samples from S1 to S4. But the H7N9 viral load was still relative abundant in the subsequent LRT samples from S5 to S7 (Table 2 and Figure 2B). This might indicate that the H7N9 virus had transferred to LRT environment from URT . Meanwhile, there was an obvious increase of H1N1 viral load in the URT samples from S1 to S2 (Figure 2A). The reason for this increase might be due to the transfer of H7N9 from URT to LRT that contributes freeing up more cellular resources for H1N1 growth in URT environment.

High Diversity Nucleotide Loci in H1N1 and H7N9 Genomes
Fifteen and eleven high diversity nucleotide loci were detected in H1N1 and H7N9 genomes, respectively, in which the A-G transversion was the most abundant substitution type (67% in H1N1 and 64% in H7N9) ( Table 3). In H1N1, another two substitutions were C-T and A-C transversion (20 and 13% respectively). IN H7N9, the C-T, A-C and G-T took the same 0.09% proportion respectively (Table 3). In terms of amino acid, the percentage of non-synonymous mutation was greater than synonymous mutation, especially the percentage of nonsynonymous mutation was up to 91% (10/11) in H7N9, comparing with the non-synonymous mutation of 66% (10/15) in H1N1 (Table 3). In H1N1, all five synonymous mutations are in the internal genes of IAV, including two in NS gene (R78R and L69L), one in PB2 gene (T25T), one in NP gene (E64E) and one in M gene (R134R). In H7N9, the only one synonymous mutation was R753R in internal PB2 gene ( Table 3). The H7N9 with a high frequency of nonsynonymous mutation might indicate that H7N9 as avian-origin influenza virus was subjected to higher selection pressures in the human host (Wei et al., 2012;Xu et al., 2019). It is noteworthy to mention that among of these high-diversity loci, four mutations in H1N1 were involved in the evolution or viral drug-resistance, such as the R222K in the HA protein and the H275Y, A204T and K207R in NA protein (Table 3). In H7N9, three mutations were related to host adaption or viral drug-resistance, including the G685R and E627K in PB2 protein, and K432E in NA protein ( Table 3).

The Haplotype Analysis of H1N1 and H7N9 Virus Quasispecies
We achieved 25 and 22 haplotypes for H1N1 and H7N9 genomic segments in all eight samples based on combining high diversity nucleotide loci in the same genomic sequences, respectively. For  instance, we obtained seven haplotypes of NA in H1N1 based on four high diversity nucleotide loci ( Table 4). The bases on high diversity nucleotide loci of each haplotype and its abundance in each sample were listed in Supplementary Table S4. Then the composition and dynamics of viral quasispecies in this patient coinfected with H1N1 and H7N9 IAV were displayed along the clinical treatment process (Figure 3). In NA gene, the haplotype Hap_2 replaced Hap_1 as the dominant haplotype with more than half proportion (53.11%,1,119/2,107) in sample S2, which contain H275Y drug-resistant mutation. However, Hap_2 in NA failed to be continuously dominant and Hap_1 return the dominant haplotype in following samples (S3 to S6) ( Figure 3A). Similarly, we found Hap_3 of HA with antigenic drift mutation R222K transiently become dominant in sample S6 (56.19%, 127/ 226) ( Figure 3A). Compared with the genomic segment integrity of H1N1 viral quasispecies among almost all samples, The H7N9 quasispecies had a poor integrity except for the first sample S1 ( Figure 3B). In conclusion, the haplotype provides a more concise overview of viral quasispecies composition and dynamics than whole genomic sequences (Figure 2 and Figure 3).  Supplementary Table S2.
Frontiers in Genetics | www.frontiersin.org November 2021 | Volume 12 | Article 754445 FIGURE 2 | The abundance and diversity of H1N1 and H7N9 genomic sequences. In H1N1 (A) and H7N9 (B), from left to right, three subgraphs respectively indicate the number of sequence reads, the number of sequence types and the sequence diversity index in each group. In each subgraph, eight colors represent eight genomic segments of influenza A virus. One sequence type is defined that there are one or more different bases from any other sequence. The sequence diversity index is calculated as the Shannon entropy formula described in the "Materials and Methods". The detailed values about abundance and diversity of genomic sequences are listed in Supplementary Table S3.

DISCUSSION
Four mutations in H1N1 and three in H7N9 were related to viral drug-resistance, host adaption or evolution. The H275Y in NA protein of H1N1 was the widely investigated drug-resistant mutation against oseltamivir as the commonly used first-line drug for the treatment or prophylaxis of influenza (Hurt et al., 2012;Vidaña et al., 2020). Another two mutations A204T and K207R in NA protein of H1N1 were recently reported to have effects on drug-resistance and vaccine efficacy (Nandhini and Sistla, 2020;Skowronski et al., 2020). The antigenic drift mutation R222K in the HA protein was believed to play a role in virus evolution by altering receptor binding specificity (Al Khatib et al., 2019). For H7N9, the mutation E627K on PB2 is a wellcharacterized host adaption mutation from the avian signature Glu (E) to the mammalian-adapted signature Lys (K), which have been associated with enhanced polymerase activity, high virus replication and pathogenicity in humans (Baccam et al., 2001). Besides, the mutation G685R on PB2 also help to promotes the mammalian adaptation of avian influenza virus (Baccam et al., 2003;Capitán et al., 2011). Interestingly, the mutation K432E on NA, alone or together with mutation H275Y on NA, had a significant impact on the binding pattern and affinity of oseltamivir for neuraminidase, rendering neuraminidase less susceptible (Aguirre and Manrubia, 2008). The viral quasispecies as a viral population plays a very important role in the process of viral infection, adaption, and evolution through complex cooperative or competitive interactions (Domingo et al., 2012). A population of viruses can be partitioned into subpopulations by the genetic similarities (Baccam et al., 2001;Baccam et al., 2003). The spatial interactions of subpopulations have effects on host cell availability and defense responses (Aguirre and Manrubia, 2008;Capitán et al., 2011). A specific cooperative interaction is that mixed populations of D151 and G151 variants in H3N2 viruses grow better than pure populations of either variant, in which one subpopulation is good at entering new cells, while the other is better at exiting cells to spread the infection (Xue et al., 2016). In our case, the co-existent viral population of H1N1 and H7N9 might help to H7N9 subpopulation migration from URT to LRT and growth in LRT. The H1N1 subpopulation can grow in both URT and LRT of this patient, while H7N9 grow better in LRT than URT, which is related to the different distribution of α-2,3-SA and α-2,6-SA receptors in URT and LRT, as well as the preferential recognition of H1N1 and H7N9 with two receptors (de Graaf and Fouchier, 2014;Byrd-Leotis et al., 2017). The H7N9 subpopulation was easier to transfer to the LRT with the assistance of H1N1 subpopulation in the co-existence of H1N1 and H7N9 than that only in H7N9.
Besides, the competitive interactions among viral quasispecies are also reported (Andino and Domingo, 2015). An example is that in co-infected cells with wild-type polioviruses at a high multiplicity of infection (MOI) and drug-resistant virus at a much lower MOI, the yield of drug-resistant virus was significantly reduced to 3-7% of the output from a single infection due to the interference of chimeric capsid formation (Crowder and Kirkegaard, 2005). We detected the drug-resistant mutation H275Y in NA protein and antigenic drift mutation R222K in HA protein. But both failed to be continuously dominant in the subsequent viral quasispecies composition. A possible explanation is that the forming of the viral particles containing mutations were interfered by normal strains with a similar mechanism illustrated in above poliovirus.
The composition, complexity and dynamic of a viral quasispecies determined its biological or medical implications, such as host range, pathogenesis, and coping with selection pressure (Roedig et al., 2011;Gregori et al., 2016;Barbezange et al., 2018). In this work, the composition and dynamics of viral quasispecies in a patient co-infected with H1N1 and H7N9 influenza A virus were clearly revealed along his treatment process using the single-molecule real-time sequencing (SMRT). Compared with the flaws of consensus genomic sequences (CGSs) and single nucleotide variants (SNVs) by short-read massively parallel sequencing (MPS), the SMRT embody the obvious advantage in investigating the complex haplotype distribution of an IAV population, especially a population coexisting with two subtype IAV. Because of a human co-infected with two subtype IAV is very rare, the single patient in this study restricted the conclusions. Fortunately, the co-existence of two or more subtype IAV in wildfowls is not rare. These works studying the composition and dynamics of viral quasispecies in wildfowls co-infected with multi subtype IAV will be conducted in future. One scarcity of SMRT is the limit of detection (LOD) not good enough to detect the low abundant IAV sequences. For example, when the Ct values of RT-PCR for IAV in samples of this study were less than 30, SMRT was hard to generate IAV genomic sequences. This is also the reason why several types of sequences were not detected in some samples and only 69 rather than 128 groups to be conducted in the study. Besides, the expensive sequencing costs is another limitation. With the optimization and upgrade of SMRT in terms of limit of detection and sequencing cost, using SRMT to reveal the composition and dynamic of influenza A virus quasispecies will become a necessary method for studying viral biological behaviors and medical implications, which will boost the understanding of viral infections, pathogenesis, evolution, and precision medicine (Nakano et al., 2017;Wenger et al., 2019b;Beaulaurier et al., 2020).

Accession Number
The data that support the findings of this study have been deposited into CNGB Sequence Archive of CNGBdb with accession number CNP0001131.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. This data can be found here: https://db.cngb.org/search/project/CNP0001131/.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Institutional Review Board of BGI. The patients/ Frontiers in Genetics | www.frontiersin.org November 2021 | Volume 12 | Article 754445 8 participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
XY and ZS collected samples and performed experimental detection. PL, TJ, and LL performed single molecule real-time sequencing. PL, LL and GL wrote bioinformatics pipeline. PL, TJ, JP, and GF designed study and analyzed data. PL, GL, ZY, DJ, JP, and GF contributed to writing this study. All authors had full access to the final version of the manuscript and agreed to its submission.

FUNDING
This study was supported by Shenzhen Science and Technology Research and Development Project (No. JCYJ20151029151932602).
The funders had no role in the study design, data collection and interpretation, or the decision to submit the study for publication.