Evolution of Indian Influenza A (H1N1) Hemagglutinin Strains: A Comparative Analysis of the Pandemic Californian HA Strain

The need for a vaccine/inhibitor design has become inevitable concerning the emerging epidemic and pandemic viral infections, and the recent outbreak of the influenza A (H1N1) virus is one such example. From 2009 to 2018, India faced severe fatalities due to the outbreak of the influenza A (H1N1) virus. In this study, the potential features of reported Indian H1N1 strains are analyzed in comparison with their evolutionarily closest pandemic strain, A/California/04/2009. The focus is laid on one of its surface proteins, hemagglutinin (HA), which imparts a significant role in attacking the host cell surface and its entry. The extensive analysis performed, in comparison with the A/California/04/2009 strain, revealed significant point mutations in all Indian strains reported from 2009 to 2018. Due to these mutations, all Indian strains disclosed altered features at the sequence and structural levels, which are further presumed to be associated with their functional diversity as well. The mutations observed with the 2018 HA sequence such as S91R, S181T, S200P, I312V, K319T, I419M, and E523D might improve the fitness of the virus in a new host and environment. The higher fitness and decreased sequence similarity of mutated strains may compromise therapeutic efficacy. In particular, the mutations observed commonly, such as serine-to-threonine, alanine-to-threonine, and lysine-to-glutamine at various regions, alter the physico-chemical features of receptor-binding domains, N-glycosylation, and epitope-binding sites when compared with the reference strain. Such mutations render diversity among all Indian strains, and the structural and functional characterization of these strains becomes inevitable. In this study, we observed that mutational drift results in the alteration of the receptor-binding domain, the generation of new variant N-glycosylation along with novel epitope-binding sites, and modifications at the structural level. Eventually, the pressing need to develop potentially distinct next-generation therapeutic inhibitors against the HA strains of the Indian influenza A (H1N1) virus is also highlighted here.


Introduction
Influenza is a global viral threat that can lead to severe or fatal diseases. It targets every class of individuals, including pregnant women and immunocompromised people (Cox and Subbarao, 2000;Rambaut et al., 2008;Makau et al., 2017). According to the World Health Organization (WHO), there have been approximately 3-5 million cases of influenza each year since 2009, with over 650,000 deaths (Iuliano et al., 2017;Jones et al., 2019). Commonly, the epidemic of influenza is highly reported in the winter season of the temperate zone. It not only affects individuals but also causes significant economic losses due to several factors including workplace absenteeism and costs of the treatment (Simonsen, 1999;Gatherer, 2009). The notable concern is the virulence of the influenza A viruses causing global pandemics. The pandemic outburst of influenza A (H1N1) in 2009 is the latest episode reported in the last decade (Garten et al., 2009;Intelli-et al., 2009;Mishra et al., 2010). In 2009, during the pandemic outbreak of the influenza A (H1N1) pdm09 strain, India reported about 27,236 virology laboratory-certified cases of influenza A (H1N1) with 981 fatal reports (https://www.ncdc.gov. in/dashboard.php). The WHO documented that the pandemic virus would continue as the seasonal influenza virus (WHO, 2010). The Ministry of Health and Family Welfare reported in 20 October 2020 that in the post-pandemic period (i.e., since 2010), the influenza A (H1N1) pdm09 strain caused nearly 185,578 laboratory-confirmed cases with more than 12,000 deaths in India. The maximum cases were reported from states like Rajasthan, Gujarat, Delhi, Jammu and Kashmir, Maharashtra, Madhya Pradesh, Telangana, Karnataka, and Tamil Nadu (Dashboard:: National Centre for Disease Control (NCDC). The periodic outbreak of influenza poses critical challenges in the public health.
In particular, the flu viruses, belonging to the Orthomyxoviridae family, are classified as influenza A, B, C, and D types. Influenza A-type viruses are reported to cause infection in multiple hosts, like avian and mammalian species, while the B-type influenza infection is restricted to humans (Paules and Subbarao, 2017;Ghaffari et al., 2019;Ravina et al., 2020). Influenza C causes a mild infection in humans but is not either epidemic or pandemic in nature. The type D-mediated flu is mainly reported in cattle and pigs but not in humans (Odagiri et al., 2015;Zhai et al., 2017). The genomes of influenza viruses A and B contain eight negative-sense singlestranded RNA (-ssRNA) segments, whereas those of influenza viruses C and D contain only seven -ssRNA segments due to the absence of one of the envelope glycoproteins (Wang and Veit, 2016;Su et al., 2017). The RNA segments of influenza A and B viruses with negative polarity encode about 10 proteins, namely, 1) two surface glycoproteins (hemagglutinin (HA) and neuraminidase (NA)), 2) one nucleoprotein (NP), 3) three polymerase proteins (PA, PB1, and PB2), 4) two matrix proteins (M1 and M2), and 5) two nonstructural proteins (NS1 and NS2) (Dandagi and Byahatti, 2011;Murhekar and Mehendale, 2016;Lazniewski et al., 2018;Chua et al., 2019). The C and D influenza viral RNA segments code for nine proteins due to the lack of envelope glycoproteins (Ferguson et al., 2016;Asha and Kumar, 2019). Among these proteins, both HA and NA disclose 18 and 11 subtypes of surface proteins, respectively. These surface proteins play a crucial role in the naming of viral diseases (Webster and Govorkova, 2014;Chua et al., 2019).
HA is a central factor in the initialization of the infection and responsible for binding of the virus to the host cell receptor (sialic acid) surface. HA promotes the fusion of the virus membrane with the host endosomal membranes to facilitate viral entry into the host cell (Saxena et al., 2018). Another surface glycoprotein NA intercepts the newly synthesized virion concentration by breaking the alpha-ketosidic linkage between sialic acid and the proximate sugar residue in order to stop 1) virion aggregation and 2) the virus binding back to the dying host cell via HA. This allows for the efficient release of viral progeny, which then spreads to new target cells. This results in the disruption of the identification of the HA receptor-binding site and facilitates the spread of viral particles beyond the infected site and promotes severe infection (McAuley et al., 2019). Previous studies suggest the phenotypic variation is guided by a series of mutations that change the antigenic properties of the strain (McDonald et al., 2007;Sriwilaijaroen and Suzuki, 2012). The majority of the antigenic drift in the influenza virus is thought to be guided by the mutations in the HA1 region of the HA protein (Wiley et al., 1981;Nelson and Holmes, 2007).
The evolution of influenza strains is mainly driven by the antigenic drift due to frequent and continuous mutations. With such dynamic antigenic changes, the virus continuously and steadily multiplies and accumulates in the cell/organism (Lin et al., 2009;Neumann et al., 2009;Shi et al., 2010). Variations generated by the mutations mainly affect the affinity or specificity of both antigenic and receptor-binding sites (Gerhard et al., 1981;Yokoyama et al., 2017), and also mediate conformational changes in the receptorbinding pocket as well (Sriwilaijaroen and Suzuki, 2012). With all these possible mutations, the virus becomes insensitive to the inhibitors, which were designed specifically for the native strains. Viruses with such significant variability pose a severe challenge to society, especially in the diagnosis, medication, and control of viral infection in humans (Sriwilaijaroen and Suzuki, 2012;Hütter et al., 2013;Alonso et al., 2015;Guillebaud et al., 2017;Sharma et al., 2019). Hence, it is important to study the mutational and phylogenetic evolution of the HA surface protein from different strains of the influenza virus, especially by characterizing the recognition sites such as the receptor-binding site, N-glycosylation site, and the antigenic sites.
The current study implements comparative sequence analyses to characterize and establish the evolutionary relationships of Indian isolates with the pandemic Californian strain (being the closest member to these Indian strains) reported in 2009 as a reference to describe changes reported in the swine (H1N1) virus during 2009-2018. In silico analysis is performed by comparing the HA protein sequences of the Indian influenza A (H1N1) virus to the reference pandemic strain (A/California/04/2009) with special emphasis on the characterization of various recognition sites including receptor-binding sites, antigenic binding sites, and glycosylation sites, by accounting the variants reported since 2009.
Hereafter, the isolates of HA protein sequences of the influenza A (H1N1) virus infecting humans from California and India will be referred to as HA Cal and HA Ind , respectively, throughout the article. For the structural analysis, two representative structures, namely, 1) HA Cal (reported in 2009, the reference strain) and 2) the HA protein

HA sequence retrieval and multiple sequence analysis (MSA)
To understand the mutational and evolutionary drift among the HA Ind protein sequences that circulated during the aforementioned period, MSA was carried out using ClustalW (Thompson et al., 1994). The evolutionarily closest sequence has been identified using the pair-wise distance matrix method. The resulting MSA was used to find the evolutionarily conserved regions in the examined sequences.

Evaluation of the phylogenetic relation
Following MSA, phylogenetic analysis was performed on HA Ind proteins reported during 2009-2018. In order to understand the evolutionary relationship of these HA Ind proteins of H1N1 strains from India, along with the reported pandemic Californian strain (A/ California/04/2009), a phylogenetic tree was constructed using the maximum parsimony method. Parsimony analysis was performed in PAUP (v.4.0) using a heuristic search approach along with the following settings: 1) characters unordered with equal weight, 2) random taxon addition, and 3) branch swapping with the tree bisection-reconnection (TBR) algorithm. Resampling was performed with 1,000 replicates by bootstrapping to check the reliability of the results (Felsenstein, 1985;Tamura et al., 2004;Victoria Martínez et al., 2008). The selected HA Ind sequences revealed a close relationship with the pandemic HA Cal protein of the pandemic strain (A/California/04/2009) reported in 2009. Hence, the HA Ind sequences, evolved as the closest members to the HA Cal protein, were clustered as one clade and used for further studies (Table 2).

Mutational analysis
Mutational analysis was carried out using the ClustalW alignment tool of BioEdit (version 7.2.5) (Hall, 1999) with a bootstrap value of 1,000 to generate a global alignment for the selected HA Ind proteins compared with the HA Cal protein (Table 2) to investigate whether there is any prevalence of phenotypic variation in the reported HA Ind protein sequences. The algorithm computed a distance matrix between each pair of sequences based on pairwise sequence alignment scores.

Ab initio structural modeling of the HA protein
In addition to sequence comparison, the effects of mutations on the structure of HA were also investigated. For the HA structure comparison analysis, the following were selected: 1) one of the Indian HA proteins (Acc No: QCP70896), reported in 2018 (will be referred to as HA Ind-2018 , hereafter) and 2) the reference HA Cal protein. The complete crystallographic structure of the reference HA Cal protein (with 566 amino acids) is not reported in the PDB website, and the reported HA Cal structure (PDB ID: 3LZG) has only 506 amino acids. Hence, the complete HA Cal structure was predicted using the ab initio modeling strategy implemented in Robetta (Raman et al., 2009;Song et al., 2013). The sequences of HA Cal and HA Ind-2018 (GQ280797 and QCP70896, respectively, in FASTA format) were retrieved from the NCBI databank (http:// www.ncbi.nlm.nih.gov/) and utilized for homology modeling. The superimposition of both crystal and modeled HA Ind-2018 (Pettersen et al., 2004). The robustness of the generated models was ensured by the RAMPAGE server (Ramachandran map).

Receptor-binding site (RBS) analysis
Receptor-binding site analysis was performed on the HA Ind protein to examine the mutation-mediated variation that emerged at the binding sites when compared to the HA Cal protein. Wei Hu et al. reported the highly conserved receptorbinding domains of the HA protein of the influenza A (H1N1) virus (Hu, 2010), and their information was used while characterizing both HA Ind and HA Cal strains.

Epitope-binding site (EBS) analysis
The analysis of the EBS gains importance as it provides the hotspot for membrane fusion between the host and pathogens. Analysis of the conserved EBS is essential to understand its dominance over the recognition of the antibody. Apart from the canonical/native epitope sites, the identification of mutation-derived new epitope sites is also essential to explain the exact viral-host interaction during the membrane fusion mechanism. Epitope sites of both HA Ind and HA Cal proteins of the influenza A (H1N1) virus were analyzed in the SVMTriP web server (http://sysbio.unl.edu/ SVMTriP) using default parameters to explain both the conserved EBS and the newly emerging EBS due to mutation. The potential antigenic sites in the HA sequence were examined using a string kernel-based support vector machine (SVM), SVMTriP (Yao et al., 2012). This SVM model calculates the similarity using the BLOSUM62 matrix for the tripeptides or trimers from the input sequences given in FASTA format. Finally, the predicted epitopes, within the default limit of 20 sites, are ranked according to their scores.

N-glycosylation site analysis
One of the most influential post-translational modifications is N-glycosylation, which affects antigenicity, biological activity, cell-cell interactions, protein solubility, protein folding, localization, and trafficking. Here, the N-glycosylation sites across functional domains of the HA protein are mapped to locate both known and mutation-derived new sites as well. The NetNGlyc 1.0 server (https://services.healthtech.dtu.dk/service.php? NetNGlyc-1.0) was used with default parameters to analyze the N-glycosylation sites that are conserved among the Indian isolates of influenza A (H1N1) viruses. The NetNGlyc 1.0 server predicts all possible sequence patterns, "N-X-S/T" (any amino acids except P at the X position) within HA protein sequences as potential N-glycosylation sites, based on an artificial neural network approach. The most probable N-X-S/T patterns with the highest percentage of occurrence are filled out using the cutoff value of 0.5. The locations of the predicted N-glycosylation sites in the monomer of the HA Ind protein are numbered according to the full-length HA Cal protein sequence.

Amino acid composition analysis
The ProtParam (https://web.expasy.org/protparam/) tool implemented in the ExPASy server is capable of predicting various physicochemical properties from the sequence, such as molecular weight, pI, amino acid composition, atomic composition, extinction coefficient, estimated half-life, instability index, aliphatic index, and grand average of hydropathicity (GRAVY) from the sequence. Here, variation in the amino acid composition of HA Ind proteins reported during 2009-2018 is analyzed in comparison with the HA Cal protein using ProtParam with default parameters to understand the genetic susceptibility and evolution of Indian isolates compared with the pandemic strain (A/ California/04/2009).

Secondary structure prediction
A high degree of conformational plasticity may present a barrier to the development of beneficial antibodies. The GOR IV web server (https://npsa-prabi.ibcp.fr/cgi-bin/npsa_automat. pl?page=npsa_gor4.html) was used with default parameters to 1) understand the degree of conformational plasticity by analyzing the secondary structure (alpha helix, extended strand, and random coil) and also 2) further illustrate the variable and invariable structural changes in HA Ind proteins reported during 2009-2018 compared with the selected HA Cal protein.

Electrostatic potential (ESP) analysis
Electrostatic interactions (EIs) play a vital role in determining biomolecular functions. In particular, the EIs, which govern biomolecular sensing, are highly regulated by the nature of electrostatic potential. Hence, analysis of effective biomolecular sensing requires a thorough characterization of the distribution of ESP over the biomolecular surface boundaries. Here, the electrostatic charge distribution over the surface of both HA Ind and HA Cal proteins is calculated at an ionic strength of 0.15 M and visualized/analyzed using the Adaptive Poisson-Boltzmann Solver (APBS) plugin, integrated in VMD software (version 1.9.3). The ESP, represented as an isoelectrostatic potential map, depicts red Frontiers in Molecular Biosciences frontiersin.org 05    (Xu et al., 2010;Corti et al., 2011;Xuan et al., 2011;Xu et al., 2012;Hong et al., 2013;Zhang et al., 2013;Zhang et al., 2013;Joyce et al., 2016;Lang et al., 2017;Cheung et al., 2020;Wu et al., 2020). Due to the lack of a complete structure of the HA protein, we modeled the complete structure of both HA Cal and HA Ind-2018 using the full-length sequence (566 aa) and compared it with the reported crystal structure (PDB ID: 3LZG). The superimposition of both crystal and modeled structures reveals similar architecture, as shown in Supplementary File S5. Hereafter, the threedimensionally modeled structures of the entire sequence of HA Cal and HA Ind-2018 are used for further comparative structural analyses.

Evolutionary relationship analysis
Knowledge on the extent of genetic reassortment, antigenic shifts, and drifts in HA surface proteins of the H1N1 influenza virus isolates reported in India has become an indispensable concept as it discloses the most important factors related to its virulence. By examining the evolution of sequences, we tried to highlight how the selective pressure on the viral protein changes over time, leading to alterations in antigenicity, which further discloses variation in host specificity toward their receptor. A total of 512 HA protein sequences reported from the Indian strains of H1N1 viruses circulated during 2009-2018 (HA Ind ) were retrieved from the NCBI flu database (Supplementary File S1). At first, an exhaustive MSA was performed on these selected HA Ind and HA Cal sequences using ClustalW. In ClustalW, the sequences expressing variations due to mutation during the evolution of virus strains are aligned in accordance with the evolutionary distance and are further analyzed for phylogenetic relationships in a year-wise manner. A comparison of all HA Ind proteins with reference to HA Cal revealed the presence of new mutations as well. The MSA and phylogenetic tree (constructed using PAUP) revealed that HA Ind proteins (evolved from 2009 to 2018, as given in Table 2) share a close evolutionary relationship with the HA Cal protein (Supplementary File S2) and were chosen for further investigation.

Mutational analysis on HA Cal and HA Ind protein sequences
Mutational analysis on the selected 10 HA Ind sequences was performed in comparison with HA Cal disclosed additional phenotypic variations at 84 positions, out of which 16 mutations gain importance as they share more conservation. Table 3 lists all the observed mutations in the selected HA Ind proteins. In HA Ind ,positions 114,180,273,300,516,202,468,214,220,100,338, and 391 (the grey highlighted positions in Table 3) have a greater probability of mutation than HA Cal. For a better understanding, the frequency (cumulative occurrence) of a mutation, in comparison with the reference HA Cal , is calculated and depicted in the frequency plot ( Figure 1). The residues disclosing more than 50% of mutational occurrences are in red font. Similar colors in the plot depict similar mutations, but the order of mutation from one residue to another is differentiated with the "*" symbol. For instance, the frequency of mutation of proline to serine is observed 10 times in HA Ind proteins, whereas the mutation of serine to proline in HA Ind proteins is only observed four times.
A year-wise analysis of mutational occurrence reported among the HA Ind proteins from 2009 reveals that 1) all selected HA Ind proteins possess mutations such as P100S, T214A, and I338V; 2) the mutations S220T and E391K are reported from 2010 onward; 3) the residues D114, K180, A273, K300, and E516 of HA Cal remain conserved among the selected HA Ind sequences reported until 2013 and the same sites disclosed mutations from 2014 onward such as D114N, K180Q, A273T, K300E, and E516K; 4) the mutations A13T, S101N, S179N, and I233T are reported since 2016; and 5) the mutations S202T and S468N are reported since 2013 and 2012, respectively. All these observations witness the occurrence of additional mutations that evolved over the successive period. It should be noted that the mutations of residues from T to A (T → A, S → N, D → N, K → Q, K → E, P → S, S → T, I → V, A → T, and E → K) play a significant role in the emerging diversity of HA Ind proteins.  (Hu, 2010) are given in Table 4. The RBS of the HA Ind strain (2009)  RBS1, RBS2, and RBS3 reveal mutations such as V33G (in 2012), S91R (in 2018), and S160G (2012), respectively. In RBS4, the mutation S202T that emerged in 2013 was conserved among the strains reported in successive years. RBS4 also witnessed an additional mutation (S200P) in the 2018 strain. In RBS5, all strains have inherited the A220T mutation along with few more such as 1) I233T in 2016, 2) A232G and I233T in 2017, and 3) I233T in 2018. In RBS6 and RBS7, mutations such as A273T and K300E were identified between 2014 and 2018. Specifically, the 2018 strain that emerged with seven mutations at the RBS ( Figure 2) may imply that HA Ind is more prone to mutation than HA Cal .

Receptor
In general, except for the 2009 strain, the RBS analysis reveals that the HA Ind strains circulated during 2010-2018 were significantly conserved except for the few aforementioned mutations. Despite the observed significant conservation at the sequence level, the emerging single mutation posed a challenge to the inhibitors in sensing the receptor-binding sites and hence prompted the scientific community to design sequence-specific receptor-binding agents for further inhibition.

Epitope-binding site (EBS) in HA Cal and HA Ind protein sequences
Epitope mapping is critical in the development of vaccines or therapeutic monoclonal antibodies as it offers information on the mechanism of action. In the current study, the epitope-binding domains were analyzed using the SVMTriP web server, and the predicted epitope segments in the HA Ind protein sequences are given in Supplementary Table S3 (Supplementary File S3) along with their rank and score. The analysis reveals about 10 antigenic sites when compared to the reference HA Cal protein. Of these 10 antigenic sites, C-EBS1, C-EBS2, C-EBS8, and C-EBS10 (amino acid positions from   In addition to the reported 10 C-EBSs, SVMTriP also identified 10 potential EBS (at residue positions 110-129, 146-165, 242-261, 317-336, 366-385, 386-405, 407-426, 408-427, and 515-534, and hereafter will be referred to as I-EBS) exclusively in HA Ind proteins (Table 5) File S3). The results suggest that mutational events trigger more antigenic sites. The newly identified antigenic sites such as I-EBS7, I-EBS8, and I-EBS10 (Figure 3) are anticipated to provide more interacting sites in the target, which would eventually fine-tune the process of therapeutic drug/vaccine design.

Prediction of N-glycosylation sites in HA Cal and HA Ind protein sequences
The attachment and release of viruses from their host cells exploit the phenomenon of glycosylation. For example, the N-glycosylation of the HA surface protein allows the pathogen to escape from the host's defense mechanism through co-evolving with the host protein and eventually identifying the host receptor for further fusion. Hence, N-glycosylation sites are crucial in determining the H1N1 host binding and release factors, which subsequently determine the fate of virus infection in the host as well. In line with this importance, N-glycosylation sites were predicted in HA Ind strains reported during 2009-2018 using NetNGlyc 1.0 v and are shown in Figure 4. The HA Cal protein possesses about eight N-glycosylation sites, namely, 27NNST30, 28NSTD31, 40NVTV43, 104NGTC107, 293NNTC296, 304NTSL307, 498NGTY501, and 557NGSL560 (here, each N-glycosylation site is referred to with the starting and ending positions of amino acids).
As disclosed by HA Cal , the selected HA Ind strains also disclose all these N-glycosylation sites except the Indian A/Blore/NIV1196/ 2009 strain (Table 6), which lacks 293NNTC296 and 304NTSL307 N-glycosylation sites. Along with these reported nine N-glycosylation sites, the HA Ind strains reported during 2016-2018 reported an additional N-glycosylation site, 179NQSY182. A clear observation demonstrated that the amino acid 179SKSY182 of the HA Cal protein was conserved in the HA Ind strains reported from 2009 to 2013, and by the mutation K180Q (reported in 2014-2015), the amino acid segment 179SQSY182 evolved as a precursor to the identified 179NQSY182 N-glycosylation site (in which S is mutated into one of the active site forming residues, N) in the HA Ind sequences reported in the subsequent years (2016)(2017)(2018).
It is also vital to ensure the structurally stable evolution of Indian strains by retaining the characteristic hydrophobic/hydrophilic interactions despite the encountered mutations. Therefore, a detailed study about the 3D structure of viral proteins along with physico-chemical characterization would be useful to understand

Amino acid composition of HA Cal and HA Ind protein sequences
The amino acid compositional variation of HA Ind proteins reported during 2009-2018 was compared with that of HA Cal using the ProtParam server ( Figure 5) to understand the impact of the mutational effect on the number of compositional amino acids toward the conformational stability of HA proteins (Supplementary File S4). A comparison of the statistical occurrence of each amino acid in HA Ind with HA Cal revealed a few interesting observations. For example, about 32 serine amino acids of the HA Cal strain mutated into threonine and asparagine in HA Ind with a statistical occurrence of 18 and 14, respectively. This shows that the propensity of serine getting mutated into threonine and asparagine is more prevalent in Indian strains. The observation of S mutating into T (a crucial amino acid in forming the active site of an enzyme) and N (one of the critical factors reported to regulate viral replication) (Lee et al., 2019) has biological significance. In particular, the observation of the new N-glycosylation site 179NQTY182 (which also forms a part of EBS4 observed at 160 (G) SFYKNLIWLVKKGNSYPKLS (N) 179 (Supplementary File S3), reported in the Indian strains from 2016 to 2018, is one such example. It is also vital to disclose the intermediate stages of mutation (S to N) from 179SKSY182 to 179SQSY182 and, finally, to 179NQSY182 over the studied period (Table 6). Another example of S mutated as T, resulting in RBS4 (195GIHHPSTSADQQSLYQNA212 to 195GIHHPSTTADQQSLYQNA212) and RBS5 (218VGSSRYSKKFKPEIAI233 to 218VGTSRYSKKFKPEIAI233), is shown in Table 4.

Predicted secondary structure of HA Cal and HA Ind proteins
Secondary structures of HA Cal and HA Ind sequences, pertaining to their structural stability, are analyzed using the GOR IV web server. Figure 6 depicts the compactness of the 3D structure of HA stains in terms of the fraction of residues forming the structural elements, particularly helix, sheet, and random coil. Analysis of the composition of the secondary structure in all HA Ind sequences revealed the prevalence of a high proportion (50%) of random coils when compared to the helix and extended sheets (which equally share 25% each). It should be noted here that the equal contribution of both helix and extended sheets is retained in the HA Ind strains until 2015. In the HA Ind strains reported from 2016 to 2018, the overall helical components are reduced by 2%, and accordingly, the occurrence of both extended sheets and random coils is increased. Such an observation is witnessed by the transformation of a few helices into extended sheets and random coils (for example, the amino acid segments 9-12, 230-234, and 236-241).   Overall, the present analysis indicates the high occurrence of random coils in all selected HA Ind strains as one of the potentially unique characteristics of HA strains, which lowers the structural compactness (along with additional contributions from the helices to extended sheets and random coils). Such increased random coil segments enhance the structure flexibility, thereby promoting an effective interaction with other essential components of the host.

Electrostatic potential (ESP) of HA Cal and HA Ind proteins
The host cell defense mechanism is highly sensitive to the physicochemical nature of the interacting viral particle, and the emerging mutations perturb their sensing mechanism. At the molecular level, explicitly, the EIs take a lead role in establishing strong complex formation. The electrostatic potential surfaces (ESPSs) of the HA proteins from both Indian (2018) and California strains are compared and contracted to better understand the potential of the Indian strain (Figure 7). Both electropositive and negative potential sites are shown in blue and red surfaces, respectively, along with the near-neutral residues as white surfaces. The mutated residues are labeled and indicated using yellow arrows. The ESP map, depicting the distribution of both positive and negative ESPSs of HA proteins, was generated using the Adaptive Poisson-Boltzmann Solver (APBS) to compare and contrast the electrostatic features of HA Ind-2018 and HA Cal proteins. Along with the ESPS, the effect of mutations on the solvent accessibility of both HA Ind-2018 and HA Cal proteins was also analyzed (Figure 7). Table 7 lists all mutated RBD residues in both HA Ind and HA Cal proteins. From the examination of the ESPS of both strains, it could be speculated that the HA Ind protein could get attached to the receptor more efficiently due to the emergence of potential electrostatic interactions. Some of the mutations observed at the RBD of the HA Ind protein are predicted to affect the antibody neutralization mechanism either by introducing conformational changes locally in the HA protein due to S91R, S200P, S202T, S220T, I233T, A273T, and K300E mutations (Amin et al., 2020;Gan et al., 2022;Jawad et al., 2022) or by altering its surface charge distribution due to D114N, K180Q, K300E, K319T, and E391K mutations. Such significant redistribution of the ESPS promotes increased resistivity against known therapeutics when compared to the HA Cal strain.

Discussions
The HA protein of the influenza A (H1N1) virus is known to play a significant role in the entry of viruses into the host and their pathogenicity as well. An "effective HA target-based vaccine/drug" has become a pressing need for society. The complexity in designing HA inhibitors arose due to several factors, including the higher rate of missense/point mutations. The H1N1 strain, A/California/04/2009, is the closest neighbor of all strains reported in India during the 2009 pandemic. A methodical analysis of the HA proteins of Indian strains from 2009 to 2018 was performed and compared with that of the A/California/04/2009 strain. The HA Ind strains, reported with more specific mutations at a higher rate, emerged with enhanced virulence (Tharakaraman and Sasisekharan, 2015) and also became resistant to antiviral drugs such as oseltamivir, zanamivir, and peramivir (Parida et al., 2016;Tandel et al., 2018). These viruses with frequent reassortment at the sequence level evolved as more virulent than the previous seasonal H1N1 viruses (Baillie and Digard, 2013;Su et al., 2015;Luo et al., 2018a) and acquired better abilities to infect humans, which caused worse outbreaks (Luo et al., 2018b).

Evolutionary relation between HA proteins
Here, we present a systematic analysis of the HA proteins of H1N1 to understand the adaptation and divergence among Indian  Variations in the composition of secondary structures in both HA Cal and HA Ind strains. The percentage of alpha helices (blue bars), extended strands (red bars), and random coils (green bars) is depicted (please also refer Table 2).

FIGURE 7
Electrostatic potential surface ( (Table 3). Analyses reveal that among the 16 mutations, seven mutations were found in the receptor-binding sites (Table 4), four were in antigenic sites (Table 5), and three were involved in the formation of N-glycosylation sites ( Table 6). The HA Ind strains are characterized by the mutations P100S, T214A, S220T, I338V, and E391K, i.e., possible beneficiary mutations that got fixed in the strains reported during 2009-2018 (Table 3). The literature suggests that T214A substitution in HA genes decreases the binding affinity with the host receptor (de Vries et al., 2013). We observed six new amino acid mutations (S91R, S138N, S200P, K319T, I421M, and E523D) in HA Ind-2018 . The mutations S91R and S200P were found to be unique in HA Ind-2018 , and these substitutions were abundant in the complete HA population (in 2018) compared to the pandemic HA Cal . The substitutions A13T, S101N, D114N, I312V, S468N, and E516K were also observed in HA Ind and are also reflected in the recent studies (Biswas et al., 2019;Prasad et al., 2020;Siddiqui et al., 2020). In accordance with our results, another research group carried out a mutational examination of H1N1 with random samples and observed that viruses circulated during 2017 have 18 detected substitutions in HA Ind (Jones et al., 2019). They also reported I233T, S179N, S181T, and I312V as new substitutions, among which S181T and I312V were presented as unique mutations in HA Ind isolates (Jones et al., 2019). Interestingly, we did not find I312V in 2017. The observed amino acid substitutions (S91, S200, S202, A214, and I233) have been found in receptor-binding sites envisaged to vary during the adaptation process to α2-6-linked sialic acid receptors in humans (Maines et al., 2009). The I223T amino acid substitution is linked with increased binding affinity to human α2-6-linked sialic acid receptors (Al Khatib et al., 2019). Substitutions S200P and S202T are responsible for enhanced receptor-binding avidity by altering the receptor-binding affinity, whereas the A214T substitution is linked to the decreased binding avidity (de Vries et al., 2013). A previous study suggested that S202T is one of the responsible substitutions involved in increased mortality and morbidity (Adam et al., 2019). Studies also support our observations that mutations of HA like P100S, T214A, S220T, I338V, and E391K are conserved mutations specific to the dominant variant(s) of influenza A (H1N1) viruses during post-pandemic circulation in India (Morlighem et al., 2011;Jones et al., 2019;Siddiqui et al., 2020). It is also evident from research that the substitutions S181T and I312V in HA could lead to altered glycan specificity (Jones et al., 2019). The substitution K180Q triggers conformational variation in ligand binding, which might trigger the failure of specific ligand-binding properties as well (Jones et al., 2019). The mutation S179N, associated with glycosylation, is responsible for the increased pathogenicity of the viral particle by preventing the antigenic sites of immune recognition (Al Khatib et al., 2019).

Sequence and functional analysis of EBS, RBS, and N-glycosylation sites
Out of 84 mutational sites, about 12 most probable conserved mutational sites at amino acid positions 100, 114, 180, 202, 214, 220, 273, 300, 338, 391, 468, and 516 have been observed in the last five consecutive years (Table 3). The HA Cal protein possesses seven characteristic receptor-binding sites (Hu, 2010) and has been compared against all HA Ind strains. Indian strains expressed mutations mainly like serine-to-threonine, alanine-to-threonine, and lysine-to-glutamine at various binding sites (RBS 4-7) over a period of time, i.e., mutations S220T (at RBS 5), S202T (at RBS 4), A273T (at RBS 6), and K300E (at RBS7) were reported since 2010, 2014). The results suggest that these mutations may trigger the alteration in the RBD and become resistant to available therapeutic options. In comparison, all HA Ind proteins emerge with more point mutations than the selected HA Cal , which play a significant role to evade from the known immune defense mechanism and further become life-threatening as well. Epitope mapping gains prime importance in the design of therapeutic monoclonal antibodies or vaccines, and any sequence-level mutations at the antigenic epitope sites hinder or delay the design of effective novel vaccines. In line with this, the studied Indian strains, which revealed significant mutations in the epitope-binding domains of HA proteins (Supplementary File S4), also delayed the successful identification of a vaccine for all Indian strains. Hence, faster evolution of the epitope-binding domain renders more complexity in the eradication process of the influenza A (H1N1) virus. Similar to epitope-binding sites, variations at N-glycosylation sites also increase complexity in the design of inhibitors (Zhang et al., 2004;Wei et al., 2010). Mutations in the glycosylation sites aid the HA protein (Table 6) to co-evolve with the host protein for the successful initiation of further infections.
The glycosylation of the influenza strain can disturb its host specificity, virulence, and contagious nature either directly by changing the biological activity of surface proteins (Schulze, 1997), or indirectly by 1) attenuating receptor-binding sites (Gao et al., 2009), 2) masking antigenic sections of the protein (Abe et al., 2004), 3) obstructing the HA protein precursor via its cleavage into the disulfide-linked subunits HA1 and HA2 (Ohuchi et al., 1989), and 4) regulating the catalytic activity or preventing proteolytic cleavage of the stalk of NA (Matsuoka et al., 2009). A report revealing the destabilization of the coiled coil of the HA protein due to the buried hydrophilic residue, Thr59, also endorses the sequential and structural level of distortions raised by the mutations threonine-to-serine observed in this study (Lin et al., 2018). Hence, the examined mutation-mediated structural diversification of the HA protein gains importance.

Frontiers in Molecular Biosciences frontiersin.org
The ESPS characterized for the Indian isolate reported in 2018 revealed significant changes in the electrostatic surface, which is also presumed to render strong binding of HA proteins with the host receptors. The specific mutations observed in HA Ind-2018 (for example, S91R, S181T, S200P, I312V, K319T, I421M, and E523D) may increase the fitness of the virus in a new environment and host, which may render a reduced efficacy toward the available treatment. The mutation-mediated adaptability and efficacy of HA Ind proteins of the influenza A (H1N1) virus need to be studied critically.

Conclusion
In essence, our data emphasize the evolutionary relationship of H1N1 strains circulated in India during the post-pandemic period, 2009-2018, with the A/H1N1pdm09 pandemic reference strain. The present study clearly depicts the presence of frequent mutations in HA Ind proteins of the influenza A virus, which drifted significantly from the reference HA Cal strain A/California/04/2009. In addition, the mutational, structural, and functional characterization of the circulated influenza A strains indicates that the regionally reported mutations in all HA Ind proteins may be associated with their adaptability in sustaining locally for efficient human transmissibility. In India, during the last few decades, a recurrent episode of influenza A virus infection has been reported among humans, which proposes several factors, including 1) the reflection of better detection technologies and, finally, 2) the need for constant surveillance to monitor any changes in the genomic content of the influenza A viruses that could initiate a potential transmission and pronounced virulence among humans. The findings presented here offer a better insight into the development of distinct next-generation therapeutic inhibitors by accounting all observed mutations in the reported isolates.
In this study, the observed mutational drift results in the 1) alteration of receptor-binding domains, 2) generation of new-variant N-glycosylation and epitope-binding sites, and 3) even modifications at the structural level. Molecular investigations, however, are warranted to confirm the binding and antigenic potential of such residue changes at this point and their associated impact on morbidity and mortality as well. Hence, continued surveillance at a national level is required for the early detection of such genetic changes in viruses and the associated secondary emergence of antiviral resistance. Overall, the present work highlights additional information required for the design of more specific inhibitors with increased selectivity against Indian influenza A (H1N1) viruses.

Data availability statement
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary Material.

Author contributions
AR and SP contributed to the conception and design of the study. SP performed and analyzed the in silico studies under the supervision of AR. MR and MS supported the additional analysis part. All authors approved the final version of the manuscript.