Comparative analysis of PB2 residue 627E/K/V in H5 subtypes of avian influenza viruses isolated from birds and mammals

Avian influenza viruses (AIVs) are naturally found in wild birds, primarily in migratory waterfowl. Although species barriers exist, many AIVs have demonstrated the ability to jump from bird species to mammalian species. A key contributor to this jump is the adaption of the viral RNA polymerase complex to a new host for efficient replication of its RNA genome. The AIV PB2 gene appears to be essential in this conversion, as key residues have been discovered at amino acid position 627 that interact with the host cellular protein, acidic nuclear phosphoprotein 32 family member A (ANP32A). In particular, the conversion of glutamic acid (E) to lysine (K) is frequently observed at this position following isolation in mammals. The focus of this report was to compare the distribution of PB2 627 residues from different lineages and origins of H5 AIV, determine the prevalence between historical and contemporary sequences, and investigate the ratio of amino acids in avian vs. mammalian AIV sequences. Results demonstrate a low prevalence of E627K in H5 non-Goose/Guangdong/1996-lineage (Gs/GD) AIV samples, with a low number of mammalian sequences in general. In contrast, the H5-Gs/GD lineage sequences had an increased prevalence of the E627K mutation and contained more mammalian sequences. An approximate 40% conversion of E to K was observed in human sequences of H5 AIV, suggesting a non-exclusive requirement. Taken together, these results expand our understanding of the distribution of these residues within different subtypes of AIV and aid in our knowledge of PB2 mutations in different species.


Introduction
Highly pathogenic (HP) avian influenza virus (AIV) outbreaks in domesticated poultry were rare prior to the 1990's (1,2).However, in 1996, an HPAIV H5N1 was detected in a domesticated goose (A/Goose/Guangdong/1/1996) that crossed species barriers and was detected in humans in 1997 (3,4).This AIV lineage created what we now know as the H5-Gs/GD lineage and is responsible for mortality in wild birds, poultry, mammals, and humans.The H5-Gs/GD lineage viruses have become adapted and distributed across the world via migratory waterfowl.This lineage has evolved into 10 genetically distinct clades (0-9) (1).Clade 2 versions of this lineage have become the most successful in terms of viral fitness and global distribution (1,5,6).The subclade 2.3.4 was first detected in 2008 in China and has continued to evolve into the current 2.3.4.4a-h viruses (1).The United States (U.S.) saw its first incursion of H5-Gs/GD lineage clade 2.3.4.4c viruses in 2014, which resulted in the deaths of over 47 million poultry, resulting in an estimated loss of 3.3 billion dollars . /fvets. .(1,7).Many countries, including the U.S., are experiencing largescale outbreaks from clade 2.3.4.4b viruses, which appear very adapted to migratory waterfowl based on their global spread over the past few years (8, 9).One interest of this subclade is the adaption to other species (10)(11)(12)(13)(14).A well-described avian-to-mammalian genetic adaptation is an amino acid change in the PB2 protein at residue 627.Traditionally, avian sequences contain glutamate (E) at this position, while mammalian/human sequences contain a lysine (K) (15).It was originally thought that this residue played a role in the ability of PB2 to replicate at lower temperatures and provided an explanation for the inefficient replication of AIV in non-avian hosts (16).While the temperature may still play a role, more recent structural studies have determined that PB2 interacts with host protein acidic nuclear phosphoprotein 32 family member A (ANP32A) at residue 627, and this interaction is the driving force behind the E627K mutation (17,18).The interaction is critical for the stabilization of the AIV polymerase complex (vPol), and the amino acid composition of ANP32A that surrounds residue 627 plays a major role in supporting AIV replication (18).However, other PB2 mutations have been found to support AIV replication in the absence of E627K, namely at positions 271, 590, and 591 (19-21).
Most avian ANP32A proteins encode an additional 33 amino acids (ANP32A 33 ) in between the two domains that are critical for AIV polymerase activity (22).The first four amino acids of the insert are a SUMO interacting motif (SIM) site that has been shown to increase the binding efficiency of ANP32A 33 with the vPol and is located directly above the PB2 627 residue (18,23).The SIM site contains a mixture of acidic and basic residues, which provides the stabilization of the complex, and likely allows for replication with either a PB2 627E or K.The other 27 additional amino acids duplicated from exon 4 (amino acid residues 149-175) are believed to strengthen the interaction between vPol and ANP32A.Humans, mammals, and ratite species lack the 33 amino acid insertion (ANP32A ), which results in a weaker interaction between ANP32A and the vPol leading to lower polymerase activity (17,18,23).The E627K mutation appears to compensate for the weaker interaction and helps to restore the vPol activity of AIV in hosts lacking ANP32A 33 (18,23).
Avian ANP32A contains natural variations in splicing patterns that result in three major isoforms of the protein, whereas humans only carry the ANP32A isotype (22,24).Chicken, turkey, and duck produce all three transcripts of ANP32A (ANP32A 33 , ANP32 29 (lacking SIM), and ANP32A ), but the ANP32A 33 isotype is the preferentially expressed isoform (approximately 65%).However, some species of wild waterfowl and other migratory birds express higher proportions of the human-like ANP32A (ANP32A ) and the partial insertion that lacks the SIM site (ANP32 29 ), indicating that ANP32A expression is strongly associated with host range (22)(23)(24).
In this study, we investigate the number of submitted sequences containing PB2 627E, K, or V from H5 AIV.We compare the prevalence of sequences with 627K between non-Gs/GD and Gs/GD lineages and examine the ratio of sequences within the Gs/GD lineage.Finally, we examine residue specificity within the broad host range of clade 2 AIV, including current global 2.3.4.4 viruses.

Sequence analysis
All sequences were obtained from the Global Initiative on Sharing All Influenza Data (GISAID EPIFLU TM ) database (25).The search parameters on GISAID always included type A, H5, and required a complete segment of PB2.Other parameters, such as host(s) and clade(s) were chosen as needed.Downloaded datasets were aligned in Geneious Prime (Boston, MA) using a MUSCLE alignment.Only complete and correct PB2 protein sequences with coverage over residue 627 were used for analysis.Sequences that did not meet this criterion were discarded.Tables were created based on the number of PB2 sequences that matched the criteria set, such as host species, total number, amino acid residue, and clade.Totals were calculated by adding up each group in the table, and in some cases, the total does not represent the entire dataset.Very few sequences had 627 residues that were not glutamate (E), lysine (K), or valine (V), so they were not included in the analysis total.The percentage of K residues was obtained by dividing the number of K residues by the total of that group and then multiplying by 100.When a clade or subclade was chosen in the "clade" search panel, all subclades were included in the analysis, unless otherwise noted.Tables including a species section were chicken/turkey only, all other avian species (in addition to chicken/turkey), mammals (not including humans), and humans.It is important to note that all timespans listed in this report are based on the dataset used for analysis.They do not represent the exact circulation of those virus clades.The findings of this study are based on data from 125,996 PB2 sequences available on GISAID as of March 2023.

Results
Prevalence of K in non-A/Gs/GD/ lineage H Nx viruses Using the GISAID "clade" search panel, we examined sequences that were not classified as part of the Gs/Gd lineage (25).The American-non-Gs/GD/96 (Am_non-GsGD) lineage had 789 complete PB2 sequences with 627 coverages, and the earliest viruses in the dataset were from Wisconsin, USA, in 1975.The subtypes in 1975 included H5N2, H5N6, and H5N1.The species in this lineage consisted of chicken/turkey (220), all other avian (567), mammalian (2), and human (0).Of the 789 sequences examined, only two had PB2 627K, the animals were a rhea and an emu (both from Texas, USA, 1993, H5N2).The percent of PB2 627K sequences for the Am_non-GsGD lineage was 0.25%.Unexpectedly, there were 17 chicken sequences with a V at position 627, which accounted for 2.15% of the total.They originated from an H5N2 isolation in Mexico in 2019 (Table 1, top).
For the Eurasian-non-Gs/GD/96 (Ea_non-GsGD) lineage, 390 sequences were examined.The earliest sequences were from Scotland in 1959 (H5N1).In this lineage, there were 46 chicken/turkey, 342 other avian, 2 mammals, and 0 human sequences.Of the 390 sequences, four had PB2 627K.All four sequences came from ostriches in South Africa in 2011 and 2015.The percentage of PB2 627K sequences from the Ea_non-GsGD  1).Based on the available data, the American and Eurasian lineages appear to have low mammalian/human spillover events and a low percentage of PB2 627K adaptations.

Prevalence of PB E/K/V in human sequences from di erent hemagglutinin subtypes
Finally, we examined the prevalence of PB2 627K in human AIV sequences with more common HA subtypes (26).First, the percent of PB2 627K in human sequences in subtypes (H1-H3) was examined.Interestingly, 95% of the H1 sequences sequenced contained the avian-like PB2 627E.We only observed 4.7% of all H1 sequences demonstrating the PB2 627K residue.However, the majority of the 39,418 sequences examined were from the pandemic H1N1 (pdm09) lineage, which began in 2009 and was known to contain a PB2 segment from an avian North American virus (Table 4) (27).As expected, both the H2 and H3 sequences contained higher percentages of PB2 627K residues, with prevalence rates of 91.9% and 99.4%, respectively (Table 4).Next, we examined the proportion of PB2 627K in avian-adapted AIV subtypes (H5, H7, and H9).H5 viruses obtained from human samples contained 39.1% PB2 627K sequences (Table 4).Of the 1,207 human sequences classified as subtype H7, 851, or 70.4% contained a K at position 627 (Table 4).Finally, the H9 subtype contained only two human sequences with PB2 627K (Table 4).Interestingly, the H7 and H9 viruses contained a higher proportion of 627V residues, 2.7% and 36%, respectively.Apart from the pdm09 H1N1 viruses, the data illustrate that human-adapted viruses typically contain PB2 627K.

Discussion
This research aimed to compare the distribution of PB2 627 residues between avian and mammalian sequences in H5 from non-Gs/GD and Gs/GD lineages and determine the prevalence between previous subclades and current ones.The current clade 2.3.4.4bH5Nx AIVs have a global distribution and raise concerns about mammalian adaptations as recent isolations have occurred in domestic and peridomestic mammals (12,28).In this study, we focused on one well-known avian-to-mammalian adaption residue, E627K/V, to determine if the 2.3.4.4 viruses have an increased propensity to mutate in that direction.
All sequences were obtained from GISAID, so the number of sequences was limited to what was available in the database.We chose GISAID because it contained the most sequences of   current (2021-present) AIV, but it is possible that older strains were missed in the analysis because they were not added retroactively (prior to 2008) (25).AIV surveillance and reporting are not a standard practice in all countries; consequently, samples are limited to countries that do report, and this may contribute to the low dataset numbers and sample bias in some clades.Obtaining samples from only dead birds may also result in sampling bias, but it is not possible to tell whether samples were taken during active or passive surveillance.
In the non-Gs/GD lineage, the majority of sequences contained the 627E residue.We observed only six avian sequences that contained PB2 627K and they belonged to the Ratite family (Ostrich, Rhea, Emu, Cassowary, and Kiwi) of birds (Table 1).Ratites were the only species of birds that contained PB2 627K residues in this group.Phylogenetic analysis of the ANP32A gene demonstrates that the Ratite family lacks the 33 amino acid insertion in exon 5 that most other avian species have, which presumably makes it more mammalian-like, and may explain why viruses isolated from these species select for 627K during replication (14,22).Of further note is that both rhea and emu sequences from the Am_non-GsGD lineage contained multiple polybasic residues at the HA cleavage site, suggesting that they were HPAI viruses.In the EA_non-GsGD lineage, the two ostriches from 2011 contained an HPAI cleavage sequence, whereas the two ostriches from 2015 contained an LPAI sequence.
Clade 2 is the most evolutionarily successful clade from the Gs/GD lineage of AIV based on the sample size and the number of subclades (Tables 2, 3) (1, 29).Within subclade 2, the proportion of avian and mammalian samples in clade 2.1 was unexpected.The largest portion of samples was human sequences from Indonesia, ranging from 2003 to 2015 (Table 3).Of note is the observation that most of the sequences in clade 2.2 containing 627K were of avian origin (Table 3).Clade 2.2 viruses were shown to be shed via the respiratory route in waterfowl rather than the typical cloacal route; therefore, it was proposed that clade 2.2 viruses maintained the 627K to compensate for the cooler temperature of the respiratory tract (20,30,31).Recently, a study using RNA-seq showed that some waterfowl, land fowl, and pelicans preferentially express a human-like ANP32A, which could also have contributed to a larger proportion of avian species containing 627K in clade 2.2, as there was no selective pressure to maintain 627E (22).Additionally, Long et al. found that the PB2 627K mutation did not affect pathogenesis or transmission in ducks, suggesting that mammalian adaption could be maintained in an avian species, which also supports the notion that there is little selection pressure to go from K627E (20).The clade 2.3 dataset was comprised of 17 subclades, including the current 2.3.4.4b viruses.Interestingly, 98.7% (9,672) of the 2.3 sequences contained a 627E residue despite only 92.4% (9,052) being of avian origin, demonstrating that non-avian species also maintained the 627E residue.This observation went both ways in that of the 105 clade 2.3 sequences demonstrating 627K, only 71.4% (75) were mammalian (Table 3).Within clade 2.3.4,more than half of the PB2 627K sequences were in the clade 2.3.4.4b subclade (Tables 2, 3).There were 16 chicken/turkey species and 7 other avians that contained PB2 627K in clade 2.3.4.4b, this may indicate that the virus is spilling over from mammals into avians and maintaining the residue at the time of isolation.Several accounts of mammalian spillover events have occurred since the 2.3.4.4b viruses have become predominant and the diversity of species being infected is unprecedented (Table 3) (9, 12, 13, 32).Nevertheless, based on our data, the number of 2.3.4.4b mammalian sequences with PB2 627K is still lower than the sequences with a 627E.Interestingly, we found that five sea mammals (four seals and one otter) contained PB2 627K, whereas a recent study examined dolphin and sea lion sequences from Peru and found they contained a 627E.Leguia et al. also proposed that transmission between sea lions off the coast of Peru could be occurring rather than independent avian spillover events because of the massive die-offs being observed (28).More investigation is required to determine if mammal-to-mammal and mammal-toavian transmission are occurring, and which residues are allowing for efficient replication of the virus.
Despite the fact that residue PB2 627 is almost exclusively a K or E in all influenza A viruses, a small portion contains a valine (V).A study by Chin et al. found that inducing random mutations at position 627 allowed for the 627V mutation when purified in a mammalian system but not in an avian system (33).This may indicate that the 627V mutations observed in all avian species were transmitted from a mammalian host (Tables 1-4).The study observed a slight reduction in replication compared to the 627K mutation using the culture-adapted A/Puerto Rico/8/34 (H1N1) virus (33).However, a more recent study used the polymerase genes from an avian H5N1 (A/Muscovy duck/Vietnam/TY93/2007; clade 2.3.4) to rescue a virus containing the PB2 627V mutation.Taft et al. found that the 627V mutation had significantly increased viral replication in mammalian cell culture and virulence in mice that was comparable to the 627K virus (34).Additionally, Luk et al. found that an H7N9 virus containing PB2 627V was extremely fit and transmissible between chickens and mammals (35).The number of V residues in H5 sequences remains low; however, the H7 and H9 human cases had a considerable number of sequences containing a 627V, many of which are from recent years (Table 4).
The dependency of viral polymerases on host ANP32A is one factor in crossing host barriers exemplified by the propensity for AIV sequences to mutate to 627K in mammalian sequences (Tables 1-4) (23).The Gs/GD lineage viruses are commonly found in wild waterfowl and appear to transcribe higher rates of the human-like ANP32A, which may account for shifts in 627 residue specificity (23,32).While PB2 627K is an established marker for mammalian adaption, it is not solely responsible for it (15,27,33).The pdm09 H1N1 viruses contained a PB2 627E, due to the avian origin of the segment but had other compensatory mutations that allowed for efficient replication in mammals and humans (19,20,36).As more mammalian species become infected by 2.3.4.4bAIV additional residues may yet be identified.It is known that efficient replication in a certain host requires adaption to the machinery within, and that adaption from one host species to another requires mutation and selection.In this study, we performed a differential analysis of PB2 residue 627 and demonstrated a non-exclusive requirement for conversion.
. /fvets. .laboratories for generating the genetic sequence and metadata and sharing via the GISAID Initiative, on which this research is based.
TABLE Comparative prevalence of PB E/K/V in Gs/GD/ H lineage clades .-. (Top) and distribution within various species in subclade . . .(Bottom).

TABLE Prevalence
bRecent H7N9 cases predominate sequences in this subtype.