Structural analysis and molecular dynamics simulation studies of HIV-1 antisense protein predict its potential role in HIV replication and pathogenesis

The functional significance of the HIV-1 Antisense Protein (ASP) has been a paradox since its discovery. The expression of this protein in HIV-1-infected cells and its involvement in autophagy, transcriptional regulation, and viral latency have sporadically been reported in various studies. Yet, the definite role of this protein in HIV-1 infection remains unclear. Deciphering the 3D structure of HIV-1 ASP would throw light on its potential role in HIV lifecycle and host-virus interaction. Hence, using extensive molecular modeling and dynamics simulation for 200 ns, we predicted the plausible 3D-structures of ASP from two reference strains of HIV-1 namely, Indie-C1 (subtype-C) and NL4-3 (subtype-B) so as to derive its functional implication through structural domain analysis. In spite of sequence and structural differences in subtype B and C ASP, both structures appear to share common domains like the Von Willebrand Factor Domain-A (VWFA), Integrin subunit alpha-X (ITGSX), and ETV6-Transcriptional repressor, thereby reiterating the potential role of HIV-1 ASP in transcriptional repression and autophagy, as reported in earlier studies. Gromos-based cluster analysis of the centroid structures also reassured the accuracy of the prediction. This is the first study to elucidate a highly plausible structure for HIV-1 ASP which could serve as a feeder for further experimental validation studies.


Introduction
A negative strand open reading frame (ORF) spanning the Human Immunodeficiency Virus-1 (HIV-1) genome, was identified in 1988, and termed as Antisense Protein (ASP; Miller, 1988;Dimonte, 2017). ASP has strong conservation among HIV-1 isolates, especially in group-M (Berger et al., 2015;Cassan et al., 2016;Liu et al., 2018), but is absent in HIV-2, SIV, and non-group M HIV-1 strains. Since its discovery, a number of attempts have been made to characterize the nature and function of HIV-1 ASP, but with limited success, as the protein has no known homologs and its function has continued to remain a source of debate. Not much is known about the structure of the protein either, except that it is ~189 amino acid long, hydrophobic, and has two transmembrane (TM) domains, two conserved cysteine triplets (CCC), and a predefined PxxPxxP motif (Savoret et al., 2020). The ASP ORF overlaps with the env, rev, and tat genes of HIV-1, and specific mutations in this gene have been linked to different gp120 tropic signatures, suggesting a possible association with viral tropism (Dimonte, 2017). Ludwig et al. first reported the presence of a potential transcription initiator element situated in the antisense strand of the HIV-1 genome that generated antisense transcripts from the Transactivation Response element (TAR) region in the 3′ HIV-1 LTR (Ludwig et al., 2006), using the protocol of Haist et al. for the detection of antigenomic RNA (Haist et al., 2015). Subsequent studies provided evidence for the expression of the protein in HIV-1-infected cells and demonstrated its localization on the plasma membrane (Briquet and Vaquero, 2002;Clerc et al., 2011;Torresilla et al., 2013;Affram et al., 2019). The antibodies used for detection of HIV-1 ASP were raised against ASP residues 47-61 (Torresilla et al., 2013) and residues 97-110 in mice (Affram et al., 2019). Antibodies against ASP were first detected in HIV-1 patients' sera as early as 1995 (Vanhée-Brossollet et al., 1995). More recently, cytotoxic CD8+ T-cell responses directed against various domains of ASP have been detected in HIV-1 patients, confirming ASP expression during HIV infection (Bet et al., 2015;Savoret et al., 2020).
Although there is emerging evidence to suggest that ASP has a role in HIV life cycle and pathogenesis, strong experimental evidence to support its cellular function is lacking. The present study was aimed to model the stable tertiary structure of ASP by implementing extensive molecular dynamics simulation and use it to predict the potential functional role of this protein in HIV-1 infection and pathogenesis. We modeled the ASP of two HIV-1 group M viruses, Indie-C1 (subtype C) and NL4-3 (subtype B), using I-TASSER and stringently constrained the secondary structural elements using MODELLER (V.10.1). The modeled structures were further iteratively refined until a high plausibility was achieved. Subsequently, DALI (Domain Alignment) structural alignment based of the predicted structures was performed with all the proteins in Protein Data Bank to predict the overlapping structural domains, and suggest the probable functions of this unique protein.

Retrieval of ASP open reading frame
Due to the high degree of variability in HIV-1 sequences, we restricted our analysis to the ASP of two well-characterized HIV-1 reference strains, namely, Indie-C1 (subtype C) and NL4-3 (subtype B strain). The full-length sequences of these viruses were downloaded from HIV Los Alamos database. 1 A customized PERL program was 1 https://www.hiv.lanl.gov/content/index scripted to generate the reverse complement of the retrieved sequences, followed by prediction of long ORFs (100 bp or longer) using "ATGCCC" as the start string and "TAA, " "TAG, " and "TGA" as the stop string. The retrieved ORFs were translated into amino acids using the ExPASy (Expert Protein Analysis System) translate tool (https://web.expasy.org/translate/; Gasteiger et al., 2003). Clustal Omega (ClustalO; https://www.ebi.ac.uk/Tools/msa/clustalo/; Madeira et al., 2022) was used to align the amino acid sequences to check for sequence conservation. Since the ASP ORF overlaps with the envelope ORF, we scanned the amino acid sequence of ASP for disordered regions/residues using the DISOPRED server (http:// bioinf.cs.ucl.ac.uk/psipred/?disopred=1; Jones and Cozzetto, 2015).

Structure modeling and refinement
In order to identify a suitable structural template for homology modeling of HIV-1 ASP, BlastP search (Altschul et al., 1990) was done against the PDB database (Berman et al., 2000). Since no suitable template was identified, the ASP structure was modeled using the threading method with I-TASSER server (Zhang, 2008;Roy et al., 2010;Yang et al., 2015). Using the PSIPRED server, sequence-based secondary prediction was performed (McGuffin et al., 2000). The predicted structures were refined iteratively for proper assignment of secondary structure elements (SSE) through the constrained assignment module of Modeller (v.10.1;Martí-Renom et al., 2000). The resulting model was loop refined and geometry optimized using Modeller and Modrefiner (Xu and Zhang, 2011), respectively. Homology modeling was performed using Modeller software (v.10.1) by generating 50 models, and the least DOPE score (Shen and Sali, 2006) model was chosen for MD simulation for 200 ns using Desmond package (Bowers et al., 2006).

Molecular dynamics simulation of the refined structures
The modeled structures of ASP_Indie-C1 and ASP_NL4-3 were further geometry optimized using the academic version of Maestro (Schrödinger Release 2022-3: Maestro, 2021). Bond orders were assigned, and the structures were protonated using the PROPKA module (Madhavi Sastry et al., 2013) with pH set at 7.0. Subsequently, the structures were energy minimized using OPLS2005 force field (Shivakumar et al., 2012) with a Root Mean Square Deviation (RMSD) convergence cut-off of 0.3 Ǻ. The geometry optimized structures were subjected to MD simulation using Desmond-v4.5 with OPLS-2005 set as force field. The proteins were solvated in a cubical box using the SPC water model. The system was further neutralized by adding counter ions and energy minimized with a maximum iteration of 2,000 and 1.0 kcal/mol/Ǻ convergence threshold using the steepest descent and Limited-memory Broyden-Fletcher-Gold Farb-Shanno (LBFGS) algorithm (Liu and Nocedal, 1989). The bond length and bond angles of the molecules and geometry of water molecules were restrained using the SHAKE algorithm (Kräutler et al., 2001). Appropriate periodic boundary conditions were implemented and Particle Mesh Ewald was applied for long-range electrostatic calculations (Bulatov et al., 2011). The system was equilibrated in an NPT ensemble with a temperature of 300 K and pressure of 1 bar using Frontiers in Microbiology 03 frontiersin.org Martyna-Tobias-Klein (Martyna et al., 1994) pressure coupling with the Nose-Hoover coupling algorithm (Nosé, 1984;Hoover, 1985). The well-equilibrated systems were further subjected to production runs for 200 ns. Subsequently, the Root Mean Square Deviation (RMSD), Root Mean Square Fluctuation (RMSF), Radius of Gyration, and secondary structure of MD trajectories were analyzed.

Cluster analysis of the molecular dynamics simulation trajectories
In order to determine the conformational sampling of both ASP-Indie-C1 and ASP-NL4-3 during the simulation, a cluster analysis was performed. For this, the Desmond trajectory files were converted for gromacs compatibility using Visual Molecular Dynamics (VMD; http://www.ks.uiuc.edu/Research/vmd/; Humphrey et al., 1996). The gromacs compatible trajectories were used as backbone for RMSD-based clustering using the g_cluster module of gromacs by implementing gromos method (Daura et al., 1999).

Prediction of structural homologs
The final frame conformation of both the models obtained from the MD simulation was subjected to structural comparison with all the experimentally determined structures in PDB using the DALI server (http://ekhidna2.biocenter.helsinki.fi/dali/; Holm and Rosenström, 2010). The top-scoring PDB structures based on Z-score (Holm et al., 2008) in terms of structural homology were further probed for overlapping domains in order to assign probable functional significance to the modeled structures.

Retrieval of ASP ORF and amino acid sequence
The antisense ORF was retrieved from the Indie-C1 and NL4-3 whole-genome sequence using an in-house PERL script with "ATGCCC" as start string and the three stop codons as stop string ( Figure 1). The ORFs were translated and Clustal Omega was used to align the sequences. A high degree of similarity was observed in the first 100 amino acid residues between the two sequences. We identified confined repeats of two cysteine triplets (CCC) followed by a (PxxPxxP) motif within the first 60 residues. The overall identity between the two sequences was 75.4%; the conservation gradually diminished toward the C-terminus.

Secondary structure prediction
Sequence-based secondary structure prediction was attempted for the Indie-C1 and NL4-3 ASP ( Figure 2) using PSIPRED 4 server (http://bioinf.cs.ucl.ac.uk/psipred/; McGuffin et al., 2000), and compared with the secondary structure elements in the 3D structure derived using STRIDE (http://webclu.bio.wzw.tum.de/cgi-bin/stride/ stridecgi.py; Heinig and Frishman, 2004). Sequence-based secondary structure prediction was constrained in the predicted models. Non-homology of the antisense protein to any single PDB structure led us to implement the constrained strategy, as PSIPRED 4 has 84.2% prediction accuracy (Buchan and Jones, 2019). The secondary structures predicted by the two servers had discrepancies at the C-terminus. This difference can be attributed to the 25% dissimilarity found across the amino acid sequence when aligned with Clustal O (Supplementary Figure 1) and high dissimilarity found in the C-terminus region. The amino acid sequences were analyzed using DISOPRED to identify residues/regions contributing to structural disorder. It was found that only the terminal residues [Residue number-1 (in both NL4-3 and Indie-C1) and 191 (in Indie-C1)] contributed to some structural disorder, which was very negligible ( Supplementary Figures 2A,B).

Molecular modeling of ASP_Indie-C1
Antisense protein_Indie-C1 sequence was submitted to I-TASSER server for molecular modeling. The modeled structure gave a significant C-score of −3.46. The structural compliance of sequence-based secondary structure elements (SSEs) in the predicted model was checked and proper SSEs were assigned using the restrained modeling module of MODELLER (v.10.1). The conformation with the least DOPE score (−20331.8) was proceeded for refinement using Modrefiner. The quality of the refined model was checked using the Ramachandran plot (Figure 3), and outliers were refined by sequential refinement and loop modeling. The final loop refined structure showed that 90% of the residues were in the most favored region with no residues in the disallowed or generously allowed regions. The structure also showed an ERRAT score of 68.9 (Supplementary Figure 4) and a PROSA score of −3.79 (Supplementary Figures 3A-C), indicating the good quality of the modeled structure.

Molecular dynamics simulation of ASP_Indie-C1
The modeled ASP_Indie-C1 structure was subjected to molecular dynamics simulation for 200 ns, and the resulting trajectory was analyzed for various parameters. The RMSD graph showed that the backbone was deviated up to 6Ǻ and deflected to ~5.6Ǻ from 80 ns onward and maintained throughout the simulation ( Figure 4A). The RMSF plot showed a few residues in the N-terminus region to be fluctuating and reaching ~4.5 Ǻ, while the C-terminal residues showed fluctuation till ~9Ǻ ( Figure 4B). The structural compactness during the simulation period was calculated using the radius of gyration plot ( Figure 4C) and the structure was found to be highly compact as the gyration values were within ~1Ǻ. The protein secondary structure exhibited perturbations during the simulation period. It was observed that 24.68% of the residues formed the total secondary structure elements; 11.96% of these residues were in the helical form and 12.72% of the residues were present as strands during the simulation period ( Figure 4D). The lowest potential energy conformation of −86110.98 kcal/mol was observed at 36.08 ns, while the final frame structure at the 200 th ns showed a potential energy of −85521.29 kcal/mol ( Figure 4E).  Sequence-based secondary structure prediction. (A) Sequence-based secondary structure assignment for ASP_Indie-C1 using PSI-PRED; (B) Secondary structure of the modeled ASP_Indie-C1 predicted using STRIDE; (C) Sequence-based secondary structure assignment for ASP_NL4-3 using PSIPRED; and (D) Secondary structure of the modeled ASP_NL4-3 predicted using STRIDE.
Frontiers in Microbiology 05 frontiersin.org 3.5. Molecular modeling of ASP_NL4-3 The final frame structure of ASP_Indie-C1 ( Figure 4E) was chosen as the template to model the ASP_NL4-3 structure using Modeler (v.10.1), and the conformation with least DOPE score (−19670.75) was taken up for subsequent analysis. As in the case of ASP Indie-C1, the secondary structure elements of ASP_NL4-3 were used to constrain its 3D model so as to closely comply with the PSIPRED prediction ( Figures 2C,D). The structure was further refined using Modrefiner. The quality of the refined structure was checked using the Ramachandran plot. Here, a few loop-forming residues were found in the disallowed region, and therefore, loop refinement was carried out. This resulted in 92% of the residues falling within the most favored region, and no residues in the disallowed or generously allowed regions ( Figure 5). This structure gave an ERRAT score of 57.78 (Supplementary Figure 6) and PROSA score of −3.78 (Supplementary Figures 5A-C), which endorses the high plausibility and quality of the modeled structure.

Molecular dynamics simulation of ASP_NL4-3
The refined structure of ASP_NL4-3 was geometrically optimized using the Schrodinger Suite protein preparation wizard, and the model was dynamically simulated for 200 ns. From the RMSD plot, a deviation of 4Ǻ was observed in the initial period that stabilized around 180 ns with an RMSD of ~4.5Ǻ, and maintained throughout the simulation ( Figure 6A). The RMSF plot showed that few residues in the N-terminus reached a fluctuation of ~4Ǻ, while the C-terminal residues showed fluctuation of up to ~6Ǻ ( Figure 6B). The compactness of the protein structure during the simulation period was calculated using the radius of the gyration plot ( Figure 6C). The gyration values were found to be within the range of ~1Ǻ, indicating the structural compactness of the modeled protein. Changes in the secondary structure of the protein during the simulation period were analyzed, and it was observed that 17.8% of residues formed the total secondary structure elements. The structure had 12.48% of the residues in the helix and 5.32% of the residues in the strand form throughout the MD simulation ( Figure 6D). However, the secondary structure was found to be lost in a few regions during the simulation. The lowest potential energy of −99946.31 kcal/mol was observed at 74.78 ns, while the final frame structure at the 200th ns showed a potential energy of −99275.09 kcal/mol ( Figure 6E).

Cluster analysis of the molecular dynamics simulation trajectories
The conformational stability of both ASP_Indie-C1 and ASP_ NL4-3 throughout the simulation was determined by cluster analysis. Based on the RMSD distribution, it was observed that both trajectories had a deviation of ~0.2 nm (Figure 7). Therefore, clustering was performed with a cutoff value of 0.2 nm for both trajectories based on gromos method. This resulted in four and eight prominent clusters for ASP-Indie-C1 and ASP-NL4-3, respectively. The representative centroid structure for each cluster was obtained at the timeframe as shown in Figures 8, 9 respectively. The average potential energy attained was −85451.782 ± 90.820 and − 80487.372 ± 118.227, respectively, by ASP_Indie-C1 and ASP_NL4-3. The structures with the least potential energy (at 55 and 75th ns, respectively, by ASP_ Indie-C1 and ASP_NL4-3) were found to exhibit stable conformations (Supplementary Figures 7A-D).

Comparative analysis of structural homologs
The final frame structures of ASP_Indie-C1 and ASP_NL4-3 resulting from the MD trajectories were superimposed on each other ( Figure 10) and compared. Subsequently, the final frame structures were submitted to DALI server for structural alignment with all the experimentally determined protein structures in PDB database. The top 10 hits for both the structures were compared and analyzed (Table 1; Figures 11A,B). The structures were found to have RMSD less than 4Ǻ and Z-score (significant similarity score) greater than 2. From the results, it was found that ASP_Indie-C1 and ASP_NL4-3 aligned with the VWFA domain (visualized using pfam in DALI) present in all the top-ranking protein structures. We also submitted a representative frame structure from each centroid cluster of both ASP_Indie-C1 and ASP_NL4-3 to the DALI server and obtained similar hits (Supplementary Figures 8, 9).

Discussion
The role of HIV-1 Antisense Protein continues to remain an enigma even after three decades of its initial discovery. Though several Absence of a known homolog and lack of a three-dimensional structure are major challenges to defining the precise function of this protein.
In the present study, we modeled a stable three-dimensional structure for the protein and used it to uncover clues regarding its potential role in HIV infection.
Due to high C-terminus variability in the ORFs of ASP, we could not derive a full-length consensus sequence for the protein. We therefore, used the ASP sequence of Indie-C1 and NL4-3, two wellcharacterized reference strains for HIV-1 subtype C and subtype B, respectively, for structure prediction. Using an in-house scripted PERL program, the ASP ORFs were retrieved from the antisense strand of the Indie-C1 and NL4-3 genome. While the C-terminus region showed a significant level of sequence variability, the N-terminus region was more or less well conserved. Both sequences had two conserved CCC-and SH3-binding motifs at the N-terminal region. This feature has been reported in earlier studies which analyzed the expression and localization of ASP in NL4-3-infected Jurkat cells. Overlapping ORFs could contain residues that favor structural disorderliness in proteins resulting in requirement of a binding partner for stabilization (Pavesi et al., 2018). Hence, we analyzed the ASP sequence for the presence of potential residues that might contribute to structural disorder using the DISOPRED server but did not find any significant core residues contributing to structural disorderliness. Secondary structure prediction by PSIPRED and STRIDE identified two major transmembrane (TM) helices in Indie-C1 and NL4-3 ASP. Clerc et al. also demonstrated sub-cellular localization of the protein using the wild type and mutant ASP, ASPmut66, which lacked the transmembrane domain. The mutant protein did not localize on the plasma membrane, and showed a different staining pattern as compared to the wild-type protein (Clerc et al., 2011), providing experimental proof to demonstrate the presence of TM domains in ASP and suggesting its prominent association with the plasma membrane.
We first modeled the tertiary structure of ASP using the Indie-C1 sequence, and subsequently used it as the template for modeling the ASP_NL4-3 structure. The best model of ASP_Indie-C1 gave a significant C-score of −3.46 in I-Tasser. We also used AlphaFold 2 (https://colab.research.google.com/github/sokrypton/ColabFold/ blob/main/AlphaFold2.ipynb; Jumper et al., 2021) to predict the 3D structure (data not shown), but the model did not conform to the secondary structure elements predicted by PSIPRED and STRIDE; hence, we chose the model predicted by I-Tasser which had higher order of agreement with SSE prediction. This model was refined to assign appropriate secondary structure elements using Modeller (v.10.1). The least DOPE score model was refined using Modrefiner. The quality of the refined structure was evaluated using Ramachandran plot, PROSA, and SAVES. The final model was prepared using the protein preparation wizard of the Schrodinger Suite, and Molecular Dynamics simulation (MD) was carried out using Desmond for a time  The centroid structures of each cluster attained from the clustering of ASP_Indie-C1 trajectory (A) Cluster 1 (C1) 1,000th ps, (B) Cluster 2 (C2) 81,100th ps, (C) Cluster 3 (C3) 29,800th ps, and (D) Cluster 4 (C4) 50,500th ps.
scale of 200 ns. As seen from the RMSD plot of ASP_Indie-C1, deviations were observed till the simulation reached 80 ns, but subsequently achieved stability. A gradual increase in fluctuation in RMSF was observed at the carboxyl end indicating inconsistency at the C-terminus of the protein. The protein exhibited compactness after 80 ns of simulation. The Rg was found to be ~16.7Ǻ from the radius of gyration graph. We observed four helices (~10-30, ~60-70, ~80-95, and ~150-160) and four strands (~35-40, ~105-110, ~130-135, and ~160-163) in the SSE plot which correlates with the predicted secondary structure (Figure 2A). Taking into account all the above parameters, we postulate that the final frame structure at 200 ns could represent a stable structure for ASP. The tertiary structure of ASP_NL4-3 was generated using ASP_ Indie-C1 as the template and the structure was modeled using Modeller (v.10.1). The model with the lowest DOPE score was further refined and the model quality was assessed using Ramachandran plot, PROSA, and SAVES. The RMSD and Rg plot indicate that the ASP_ NL4-3 structure attained stability only at the end of the simulation period of 180 and 160 ns, respectively, with an Rg value of ~16.7Ǻ. Though RMSF showed high fluctuation, SSE analysis revealed the presence of two helices at the carboxyl end, of which one was diminished after 30 ns of simulation. The ASP_NL4-3 structure also had four helices and strands like the ASP_Indie-C1 structure. Though the minor strands predicted by PSIPRED were not observed, the major helices and strands were present in both the predicted secondary structures. The final frame models of both ASPs were superimposed to check for structural resemblance, which inferred a backbone RMSD of 6.637Ǻ. Though the ~75% sequence similarity was found between the two ASPs the models feature large conformational evolution during MD simulation, wherein the conformational differences were due to the lower sequence homology at the C-terminus. The final frames were submitted to DALI server for structural comparison with the PDB entries. The top 10 hits were selected for each model, and the common hits were identified ( Table 1). The functional significance of the identified homologs was analyzed to predict the probable biological role of ASP in HIV infection.
One of the shared structures predicted in the topmost hits was the Von Willebrand Factor domain-A (VWFA), which is present across the family of integrins, complement factors, collagens, and numerous other extracellular proteins. 2 These proteins function as multimeric protein complexes and play a role in key biological events such as cell-cell adhesion, cell migration, and signal transduction. We found that around 40% of the amino acid residues across ASP aligned with the VWA domain, suggesting that this domain could likely be conformational rather than linear (Supplementary Figures 10A,B). ASP was also predicted to share homology with ETV6 which is a transcriptional repressor. A histone acetyl transferase protein called TIP60 serves as co-repressor along with ETV6 (Nordentoft and Jørgensen, 2003). An earlier study showed that HIV-1 tat interacts with TIP60 leading to inhibition of its acetyl transferase activity (Creaven et al., 1999). Col et al. also reported that tat-mediated TIP60 inhibition affects important regulatory events such as DNA repair and Frontiers in Microbiology 09 frontiersin.org apoptosis due to DNA damage. Tat intercepts TIP60-facilitated apoptosis and provides sufficient time for proviral DNA transcription to increase virion production (Col et al., 2005). HIV-1 antisense transcription was found to be more active in monocyte-derived cells than in activated T cells, and unaltered in the absence of tat (Laverdure et al., 2012). Tat-mediated downregulation of ASP has also been reported (Michael et al., 1994), and this negative correlation is thought to be due to 5′ and 3′ LTR transcription. The Integrator Complex Subunit 13 (INTS13) was another hit predicted by DALI. Very interestingly, using HIV-1 promoter and genome wide analysis, Stadelmayer et al. showed that the Integrator Complex subunit's (INSTcom) target genes were enriched in the HIV-1 transactivation response (TAR) element/negative elongation factor (NELF)-binding element. They demonstrated that RNAPII pause-release was mediated by NELF at/from coding genes that was controlled by INSTcom, thereby affecting their processivity (Stadelmayer et al., 2014). Here, we identified a link between HIV-1 ASP and INSTcom, indicating a possible role for the protein in transcriptional repression and gene regulation. Zapata et al. went on to show that the ASP RNA transcripts interact with and recruit polycomb repressor complex 2 (PRC2) to the HIV-1 5′ LTR, leading to reduced binding of RNAPII and mRNA processivity (Zapata et al., 2017). They hypothesize that epigenetic silencing of the HIV-1 5'LTR through the PRC2 pathway could contribute to proviral latency.
The Integrin subunit alpha-X (ITGSX) came up as another hit near the C-terminus of the protein in the DALI prediction. Although we observed a reduced number of residues in subtype B ASP as compared to subtype C particularly at the C-terminus, the two functional domains predicted at the C terminus of the protein, i.e., Voltage Gate Calcium Channels alpha 2 and Integrin alpha 2, were conserved in both the models. ITGSX is known to form a complex with β2 integrin (CD18) and aggregate in the intracellular compartment as well as the plasma membrane of monocyte-derived macrophages (MDM; Pelchen-Matthews et al., 2012). ASP was found to translocate to the cytoplasm from the nucleus and localize on the cell membrane when U1C8 cells were stimulated with PMA (Affram et al., 2019). This observation provides support to the probable membrane localization property of ASP and suggests that it could be an integral cell surface protein. Multiple hypotheses suggest that ASP has a direct or indirect involvement in HIV-1 replication by inducing autophagy in monocytes (Vanhée-Brossollet et al., 1995;Torresilla et al., 2013). ASP has also been reported to The centroid structure of each cluster obtained from the clustering of ASP_NL4-3 trajectory (A) Cluster 1 (C1)-5,940th ps, (B) Cluster 2 (C2) 11,600th ps, (C) Cluster 3 (C3) 13,640th ps, (D) Cluster 4 (C4) 1,200th ps, (E) Cluster 5 (C5) 33,500th ps, (F) Cluster 6 (C6) 13,750th ps, (G) Cluster 7 (C7) 31,500th ps, and (H) Cluster 8 (C8) 98,400th ps.
frontiersin.org Z, optimized similarity score; rmsd, root mean square deviation; lali, length of alignment; nres, total number of residues; and %ID, percent identity.
Frontiers in Microbiology 11 frontiersin.org induce autophagy in monocytic cells through its interaction with the autophagy markers, LC3b-II and Beclin 1 (Torresilla et al., 2013). Another study also demonstrated autophagy inducing property of ASP in different clades of HIV-1 and its co-localization with p62 and LC3-II in autophagosome-like structures (Liu et al., 2018). Thus, it appears that ASP within the nucleus can cause transcriptional repression, and could also shuttle from the cytoplasm to localize on the plasma membrane, and induce autophagy by interacting with LC3b II and Beclin 1 for successful virion release. The presence of a cysteine-rich region in HIV-1 ASP suggests strong agglomeration/multimerization of the protein, which also links it with the autophagic pathway. In order to determine the consistency of domain occurrence across the MD trajectory, one representative centroid structure per gromos-based cluster was subjected to DALI analysis. This revealed that VWFA and ITGSX were consistently represented in all the centroid structures of the cluster, thereby indicating the predictive accuracy of the structural bioinformatics methods implemented.

Conclusion
This study is the first attempt to predict a plausible tertiary structure for HIV-1 ASP. Through in-depth bioinformatics analysis, we identified a number of structural homologs that provide valuable clues to the potential role of this protein in viral replication and disease pathogenesis. The most plausible functions of this protein as predicted from the hits appear to be transcriptional repression, autophagy, and viral latency. Prospective studies aimed at confirming the function of ASP in A B

FIGURE 11
Pfam alignment from DALI server: (A) Alignment of ASP_NL4-3 with top 10 ranking matches from DALI. (B) Alignment of ASP_Indie-C1 with top 10 ranking matches from DALI. Dark green shading marks structural equivalence with the query structure, light green shading means the domain is located in an unaligned region. Structurally equivalent domain families are shown as vertically aligned dark green blobs.
Frontiers in Microbiology 12 frontiersin.org experimental studies would provide deeper insights into its role in HIV infection, and unravel clues for potential therapeutic interventions to cure HIV-1 infection.

Data availability statement
The original contributions presented in the study are included in the article/Supplementary material, further inquiries can be directed to the corresponding authors.