Comparative analysis of codon usage patterns of Plasmodium helical interspersed subtelomeric (PHIST) proteins

Background Plasmodium falciparum is a protozoan parasite that causes the most severe form of malaria in humans worldwide, which is predominantly found in sub-Saharan Africa, where it is responsible for the majority of malaria-related deaths. Plasmodium helical interspersed subtelomeric (PHIST) proteins are a family of proteins, with a conserved PHIST domain, which are typically located at the subtelomeric regions of the Plasmodium falciparum chromosomes and play crucial roles in the interaction between the parasite and its human host, such as cytoadherence, immune evasion, and host cell remodeling. However, the specific utilization of synonymous codons by PHIST proteins in Plasmodium falciparum is still unknown. Methods Codon usage bias (CUB) refers to the unequal usage of synonymous codons during translation, resulting in over- or underrepresentation of certain nucleotide patterns. This imbalance in CUB can impact various cellular processes, including protein expression levels and genetic variation. To investigate this, the CUB of 88 PHIST protein coding sequences (CDSs) from 5 subgroups were analyzed in this study. Results The results showed that both codon base composition and relative synonymous codon usage (RSCU) analysis identified a higher occurrence of AT-ended codons (AGA and UUA) in PHIST proteins of Plasmodium falciparum. The average effective number of codons (ENC) for these PHIST proteins was 36.69, indicating a weak codon preference among them, as it was greater than 35. Additionally, the correlation analysis among codon base composition (GC1, GC2, GC3, GCs), codon adaptation index (CAI), codon bias index (CBI), frequency of optimal codons (FOP), ENC, general average hydropathicity (GRAVY), aromaticity (AROMO), length of synonymous codons (L_sym), and length of amino acids (L_aa) revealed the influence of base composition and codon usage indices on codon usage bias, with GC1 having a significant impact in this study. Furthermore, the neutrality plot analysis, PR2-bias plot analysis, and ENC-GC3 plot analysis provided additional evidence that natural selection plays a crucial role in determining codon bias in PHIST proteins. Conclusion In conclusion, this study has enhanced our understanding of the characteristics of codon usage and genetic evolution in PHIST proteins, thereby providing data foundation for further research on antimalarial drugs or vaccines.


Introduction
Proteins are the primary bearers of life activities, composed of codons made up of nucleotides that are translated into amino acids.Therefore, the final function of a protein is influenced by the usage of its codons.Proteins are primarily composed of 20 standard amino acids, and different species use various synonymous codons to encode the 18 amino acids except methionine (Met) and tryptophan (Trp) (Parvathy et al., 2022;Jiang et al., 2023).Although the genetic code has evolved, it remains highly conserved, allowing for the use of different codons to encode the same amino acid (Chaney and Clark, 2015;Bailey et al., 2021).The usage of synonymous codons is not uniform or predictable across organisms, genes, or even within the same gene in different species.Certain codons are often favored for encoding specific amino acids (Pakrashi et al., 2023;Wang et al., 2023;Zhao et al., 2023).Codon Usage Bias (CUB) refers to the phenomenon where synonymous codons are used with varying frequencies (Iriarte et al., 2021;Alqahtani et al., 2022;Chen and Yang, 2022;Khandia et al., 2022).These synonymous mutations, also known as "silent mutations, " do not alter the original protein sequence or structure.However, variations in synonymous codons among organisms can significantly contribute to genome evolution.Previous studies have identified several factors that influence CUB in different organisms, with natural selection (e.g., translation, gene length, and gene function) and mutation pressure (e.g., GC content and mutation position) being considered as the fundamental factors (Bhattacharyya et al., 2019;Shen G. et al., 2022;Shen X. et al., 2022;Hu et al., 2023;Matsushita and Kano-Sueoka, 2023).CUB has a significant impact on various cellular processes, including mRNA stability, transcription, translation efficiency and accuracy, protein structure, folding, expression, and function.Understanding CUB has practical applications in heterologous gene expression, species identification, primer design, predicting gene expression levels and functions, as well as designing synthetic genes for biotechnological purposes (Gorlov et al., 2018;Chassalevris et al., 2020;Wang and Blenner, 2022;Hernandez-Alias et al., 2023;Vaz et al., 2023).However, most studies on CUB have focused on bacteria, fungi, viruses, and mycoplasma (Dilucca et al., 2020;Hou, 2020;Wu et al., 2021;Li et al., 2022).Currently, there is still a lack of comprehensive understanding regarding the genetic features of codon bias in parasites, particularly in Plasmodium falciparum.
The World Malaria Report 2022 indicates that there were 247 million cases of malaria globally in 2021, with an estimated 619,000 deaths.Cases are mainly concentrated in certain countries and regions in Africa, Asia, and the Americas, so global malaria control still faces many challenges.Over the past 20 years, the development and largescale promotion of rapid diagnostic tests (RDTs), artemisinin-based combination therapy (ACT), and insecticide-treated nets (ITN) have been key to the global successful response to malaria (WHO, 2022).Plasmodium falciparum goes through various stages in its lifecycle, including the injection of sporozoites into the bloodstream by mosquitoes.These sporozoites invade liver cells and multiply, producing merozoites.The merozoites are released into the bloodstream and infect red blood cells.Inside the red blood cells, the parasites multiply and eventually cause the cells to rupture, releasing more merozoites to infect other red blood cells.Multiple studies have identified various invasion-related protein molecules mediating the invasion process of Plasmodium falciparum, such as Plasmodium falciparum erythrocyte membrane protein 1 (PfEMP1), Plasmodium falciparum merozoite surface protein 1 (PfMSP1), Plasmodium falciparum apical membrane antigen 1 (PfAMA1), Plasmodium falciparum rhoptry neck protein 4 (PfRON4), with PfEMP1 being particularly important (Lee et al., 2023;Leonard et al., 2023;Pulido-Quevedo et al., 2023;Wiser, 2023).In recent years, the involvement of Plasmodium helical interspersed subtelomeric (PHIST) proteins in the invasion process has also been investigated.PHIST is a unique protein family of Plasmodium falciparum and the PHIST proteins are characterized by a conserved domain of approximately 150 amino acids that are predicted to form four consecutive alpha helices.Some members of this family consist of an export signal sequence, the PEXEL motif, and the PHIST domain, while others also contain a DnaJ domain and tryptophan residue (Fierro et al., 2023;Hasan et al., 2023).Besides, the PHIST family includes different subgroups, such as PHISTa, PHISTa-like/PHIST, PHISTb, PHISTb-DnaJ, and PHISTc, which plays crucial roles in the parasite's biology, including host cell invasion, immune evasion, and modulation of the host immune response (Warncke et al., 2016;Kumar et al., 2019;Mutisya et al., 2020;Yang et al., 2020).Although the functional domains of PHIST proteins are conserved, the coding sequences vary significantly, and there is limited research on their codon usage.To gain insights into the genetics and evolution of Plasmodium falciparum, as well as to predict the function and regulation mechanisms of related genes, we conducted a systematic analysis and comparison of CUB among 88 sequences of PHIST proteins from five subgroups.We constructed a phylogenetic tree based on the relative synonymous codon usage and compared it with the tree constructed using the CDSs of PHIST proteins.This analysis of CUB provides further understanding of the Plasmodium falciparum's genetic characteristics and evolution.

Sequences
A total of 88 PHIST protein CDSs from Plasmodium falciparum were retrieved from the PlasmoDB1 for CUB analysis.The 88 PHIST proteins were divided into 5 subgroups (PHISTa, PHISTa-like/PHIST, PHISTb, PHISTb-DnaJ, PHISTc) and detailed information about these proteins is listed in Table 1.

Analysis of codon base composition
The codon base composition of 88 PHIST proteins was determined by CodonW software.Only 59 synonymous codons were analyzed in this study, except the first AUG codon (Met), the codon (UGG) encoding Trp, and the three termination codons (UAG, UAA, and UGA), respectively.The nucleotide at the 3rd codon location (C3, T3, G3, and A3%), the GC% contents of all three codon locations (GC1, GC2, and GC3%) and total GCs% and ATs% contents were measured (Boissinot, 2022).

Analysis of codon usage indices
Multiple factors can have an impact on CUB, thus, many statistical methods have been proposed to analyze the codon usage indices.The codon adaptation index (CAI) is a measure used to quantify the similarity between the codon usage of a gene or sequence and the codon usage bias of a reference set, which calculates the frequency of occurrence of each codon in the gene of interest and compares it to the frequency expected based on the codon usage bias observed in a reference set, typically a set of highly expressed genes in the same organism.And the CAI value ranges from zero to one; the larger the value is, the more frequently the CUB (Zhou et al., 2023).The codon bias index (CBI) reflects the components of highly expressed superior codons in a specific gene.And the value of CBI near zero indicates all codons are completely randomly used (Masłowska-Górnicz et al., 2022).The frequency of optimal codons (FOP) ranges from 0.36 (which means the codon usage bias is weak) to 1 (which means the codon usage bias is strong), calculated by counting the ratio of the optimal codon number to the total synonymous codon number in one specific gene (Li et al., 2023).The ENC considers the number of synonymous codons used for each amino acid and the frequency with which they are used in the sequence.And the ENC value ranges from 20 to 61, which is lower than 35, the codon usage bias is strong; while if it's higher than 35, the codon is randomly used (Tyagi and Nagar, 2022).The general average hydropathicity (GRAVY) values range from −2 to 2; positive and negative values represent hydrophobic and hydrophilic proteins, respectively (Munjal et al., 2020).The aromaticity (AROMO) value represents the frequency of aromatic amino acids (Phe, Tyr, and Trp) in a specific gene (Khandia et al., 2019).Besides, the length of synonymous codons (L_sym) and length of amino acids (L_aa), the two indices represent the number of synonymous codons and the number of translatable codons, respectively (Yang et al., 2021).

Analysis of relative synonymous codon usage
Relative synonymous codon usage (RSCU) value is an index used to analyze the codon usage bias in genomic or transcriptomic sequences.It compares the frequency of each synonymous codon (a codon that codes for the same amino acid) to the expected frequency if all synonymous codons for that amino acid were used equally.RSCU values range from 0.5 to 2.0 and can reveal information about evolutionary conserved regions, gene expression, and nucleotide composition biases in a genome or transcriptome.A RSCU value >1 indicates a positive codon bias (RSCU value >1.6 indicates a strong positive codon bias), while an RSCU value <1 indicates a negative codon bias, and an RSCU value =1 indicates a random codon usage (Beelagi et al., 2021).

Neutrality plot analysis
The neutrality plot can explain the balance between mutation pressure and natural selection in specific genes.The line of regression slope between GC3 and GC12 (the average GC codon content in GC1 and GC2) indicates that mutation pressure is the major factor affecting CUB when values come close to 1.In contrast, if there is no correlation between GC12 and GC3, the value comes close to 0, then the main driving force of the tested gene is natural selection (Patil et al., 2021).

PR2-bias plot analysis
Parity Rule 2 bias (PR2-Bias) plot analysis were performed based on [A3/(A3 + U3) vs. G3/(G3 + C3)].If the codon has no usage bias, A = T and C = G, the value was in the center point of the plot.In contrast, the other vectors emitted from the center point indicate the degree and direction of the gene bias (Huang et al., 2022).

ENC-GC3 plot analysis
The ENC-GC3 plot (ENC vs. GC3) is usually used to analyze the influencing factor of CUB in a specific gene, and the standard curve shows the functional relation between ENC and GC3.If the corresponding points are distributed around or on the standard curve, we can conclude that the mutation pressure is an independent force in CUB.In contrast, the natural selection factor may play a key role in the formation of codon bias (Kumar et al., 2021).

Correlation analysis
Correlation analysis was performed to explain the relationship among codon base composition (GC1, GC2, GC3, GCs), CAI, CBI, FOP, ENC, GRAVY, AROMO, L_sym, and L_aa of PHIST proteins.Pearson correlation analysis method was applied in correlation analysis.All processes were executed using the R corrplot package (Liu H. et al., 2020).

Phylogenetic analysis
The evolutionary relationships of 88 PHIST proteins were analyzed by RSCU results and amino acid sequences, respectively.The phylogenetic tree was constructed using the neighbor-joining method by MEGA 11.0 2 and a cluster heat map was generated by Hemi 1.0 software. 3

Software used
All indices of CUB were calculated by CodonW software 1.4.2. 4  Clustering and correlation analyses were conducted using the statistical software SPSS 18.0, and statistical significance was defined by a value of p < 0.05.Graphs were generated in GraphPad Prism 6.01.

Results of codon base composition in PHIST proteins
CUB can be significantly influenced by the general base composition of genomes.We chose 88 PHIST proteins from Plasmodium falciparum for analysis of codon usage (Supplementary Table S1).Our statistical analysis showed that the length of the encoding region for these PHIST proteins ranged from 255 bp to 3,657 bp, with PF3D7_0801000 having the longest length with a PRESAN (Plasmodium Ring-infected erythrocyte surface antigen N-terminal) domain and PF3D7_1149700 having the shortest length without any conserved domain (Table 1).We further calculated the base composition of 88 PHIST proteins and our results showed that all the 88 PHIST proteins are rich in the A3, T3, and ATs bases (Figure 1 and Supplementary Table S1).The content of A3% is most in PF3D7_0401800 (77.08%) and least in PF3D7_1477300 (46.49%), and the T3% content of PF3D7_0831300 (67.81%) is at a maximum level higher than that in others.Though the content of C3% (21.08%) in PF3D7_1477300 and G3% (18.97%) in PF3D7_0219800 are two most among these proteins, the content is still much lower than A3/ T3% (Figure 1A and Supplementary Table S1).In addition, analysis of nucleotide content at different synonymous codon positions showed that the values of GC2% ranged from 15.38 to 54.01% (mean: 24.52%), while GC3% ranged from 9.34 to 26.25% (mean: 16.43%).However, the GC1% values ranged from 25.89 to 49.72%, with the average value (33.60%) being higher than GC2% and GC3%.Besides, the content of ATs% (ranged from 66.61 to 80.65%, mean: 75.15%) is two to three times that of GCs% (Figure 1B and Supplementary Table S1).

Results of codon usage index analysis in PHIST proteins
We conducted an analysis on 88 PHIST proteins belonging to 5 subgroups and determined their CAI values.The range of CAI values for these proteins was 0.125 to 0.274, as shown in Table 1.Notably, the gene PF3D7_1201000 exhibited the highest CAI value, while PF3D7_1478500 had the lowest CAI value, suggesting a strong codon bias for PF3D7_1201000.When considering the different subgroups, the PHISTb-DnaJ subgroup showed the highest average CAI value of 0.183, followed by the PHISTb subgroup with a value of 0.180.On the other hand, the PHISTa-like/PHIST subgroup had the lowest mean CAI value of 0.159, which was similar to that of the PHISTa subgroup (0.160), indicating relatively weaker codon bias for both PHISTa-like/ PHIST and PHISTa subgroups.The 88 PHIST proteins that were analyzed showed a range of CBI values from −0.412 to −0.033 (Table 1).Notably, PF3D7_0401800 had the lowest CBI value, while PF3D7_0102200 had the highest CBI value, indicating a strong codon bias for PF3D7_0102200.Among the 88 PHIST proteins, the average FOP values ranged from 0.225 to 0.416 (Table 1).PF3D7_1149700 had the lowest FOP value, while PF3D7_0102200 had the highest FOP value with strong codon bias.The GRAVY values were calculated for the 88 PHIST proteins, and 87 of them had negative values, suggesting that they are likely hydrophilic proteins (Table 1).However, PF3D7_1477400 from the PHISTa-like/PHIST subgroup was considered hydrophobic.The aromatic amino acid (AROMO) values ranged from 0.061 to 0.176 (Table 1).PF3D7_1149700 had the highest AROMO value, while PF3D7_0402000 had the lowest AROMO value.The AROMO values varied significantly among different PHIST proteins, with an average of 0.111.The average ENC value for the 88 PHIST proteins ranged from 29.78 (PF3D7_1016600) to 49.72 (PF3D7_0219800), with an average ENC value of 36.69.Among the 88 proteins, 26 had ENC values below 35, with a significant portion belonging to the PHISTa subgroup.The remaining 62 proteins had ENC values above 35, indicating a weaker codon usage preference (Table 1).Additionally, Table 1 provides data on L_sym (ranging from 80 to 1,182) and L_aa (ranging from 85 to 1,219).

Defining codon usage patterns in PHIST proteins
The PHIST proteins were analyzed using an RSCU analysis to regulate the same pattern of codon usage.It was observed that CUB occurs in these proteins, and 87 out of the 88 PHIST proteins exhibited more than 22 positive codon biases (RSCU≥1), except for PF3D7_1301300, which had 21 positive codon biases (Figure 2 and Supplementary Table S2).Additionally, among the 88 PHIST proteins, more than 11 high-frequency codons (RSCU≥1.6)were found in the PHISTa subgroup (PF3D7_0601700, PF3D7_1000700, and PF3D7_1100600) and PHISTc subgroup (PF3D7_1016500 and PF3D7_1016700), indicating a stronger positive codon bias.Only 11 high-frequency codons were present in PF3D7_0424000.Furthermore, the RSCU analysis revealed that the most frequently used codons in the 88 PHIST proteins are AGA (Arg) and UUA (Leu), whereas CGG (Arg) is rarely used, and is even absent in the PHISTalike/PHIST and PHISTb subgroups.Among the optimal codons, PF3D7_1016600 has the highest value with AGA (Arg, RSCU = 6), followed by PF3D7_0401800 with UCA (Ser, RSCU = 4.85) and PF3D7_0831500 with UUA (Leu, RSCU = 4.85), indicating the strongest positive codon bias.On the other hand, PF3D7_0532400 has the lowest value with AAC (Asn, RSCU = 0.03) among the 59 synonymous codons.Furthermore, AGA (Arg) serves as the optimal codon in the PHISTa, PHISTb, PHISTb-DnaJ, and PHISTc subgroups, whereas UUA (Leu) is the optimal codon in the PHISTa-like/ PHIST subgroup.

Results of neutrality plot analysis in PHIST proteins
To determine the impact of mutation pressure and natural selection on CUB in the 88 PHIST proteins, a plot of neutrality was generated to analyze the relationship between GC12 and GC3 composition.The GC12 content ranged from 22.23 to 44.65%, while the GC3 content varied from 9.34 to 26.25% (Supplementary Table S1).To examine this association further, we plotted the neutrality paradigm for the 88 PHIST proteins, which were categorized into five subgroups: A. PHISTa subgroup.B. PHISTa-like/PHIST subgroup.C. PHISTb subgroup.D. PHISTb-DnaJ subgroup.E. PHISTc subgroup.The regression lines' slopes ranged from −0.6485 (PHISTb subgroup) to 0.1106 (PHISTa subgroup), indicating a weak association between GC12 and GC3 content in the PHIST proteins (Figure 3).Furthermore, the R 2 values of the standard curve varied from 0.0092 (PHISTa-like/ PHIST subgroup) to 0.1368 (PHISTb subgroup).The statistical analysis showed no significant correlation (p > 0.05) between the GC12 and GC3 values, suggesting that natural selection might be a driving force in the evolution of PHIST proteins in Plasmodium falciparum, consistent with previous studies.

Results of PR2-bias plot analysis in PHIST proteins
To investigate potential biases in PHIST proteins of Plasmodium falciparum, we conducted a Parity Rule 2 (PR2) plot analysis (Figure 4).The plot was divided into four quadrants with both axes centered on 0.5.In the first quadrant (upper right), A and G were identified as the optimal codons, while the third quadrant favored T and C. The majority of PHIST proteins were found in the first quadrant, with a minority exhibiting a preference for A over T and C over G (located in the second quadrant).Very few proteins were observed in the third quadrant.These findings indicate that factors other than codon bias, such as natural selection, play a significant role in shaping the codon usage patterns of PHIST proteins in Plasmodium falciparum.

Results of ENC-GC3 plot analysis in PHIST proteins
To validate the impact of GC3s on the codon bias of PHIST proteins in Plasmodium falciparum, we utilized a distribution plot with varying usage of codons (Figure 5).ENC values were compared to the corresponding GC3 values, and the resulting standard curve revealed that the relationship between ENC and GC3 is primarily influenced by mutation pressure rather than natural selection.In cases where the gene's GC content reflects mutational pressure, all plot points align with the expected curve, indicating random codon usage.Conversely, if natural selection exerts pressure on the gene, most plot points deviate below the expected curve, with only one point (PF3D7_0425250) surpassing it.Our findings demonstrate that while mutation pressure may contribute to codon bias, natural selection also plays a pivotal role, as indicated by the majority of plot points closely aligning with the standard curve, with only one point (PF3D7_0219800) falling directly on it.

Results of correlation analysis in PHIST proteins
To visually present the indices associated with the 12 primary contributors, we calculated correlations among the crucial indices to Codon base composition in PHIST proteins.The codon base composition of PHIST proteins was determined.(A) The percentages of nucleotides at the third codon location (C3%, T3%, G3%, and A3%) in PHIST proteins.(B) The GC% content at all three codon locations (GC1%, GC2%, and GC3%) and the total GCs% and ATs% content in PHIST proteins.The X-axis represents different PHIST protein subgroups, while the Y-axis represents relative percentages.identify the key factors contributing to codon bias (Figure 6).In the PHISTa subgroup, the GC2 value exhibited a significant correlation only with GCs (p < 0.001), whereas GC1 showed correlations with nearly all the indices (Figure 6A).Conversely, in the PHISTb-DnaJ subgroup, neither the GC2 nor GC3 values demonstrated correlations with other indices (Figure 6D).Additionally, except for the PHISTc subgroup, we did not observe a significant correlation between GC1 and GC2 or GC3 in the PHISTa, PHISTa-like/PHIST, PHISTb, and PHISTb-DnaJ subgroups (Figure 6).On the other hand, FOP values showed a significant correlation with the CBI among these PHIST proteins (p < 0.001), and there was a notable correlation between ENC and GC3 contents in the PHISTa, PHISTa-like/PHIST, PHISTb, and PHISTc subgroups, suggesting a possible influence of natural selection on the usage of synonymous codons.Moreover, only a few indices showed correlations in the PHISTa-like/PHIST subgroup (Figure 6B), whereas nearly all indices correlated in the PHISTa and PHISTb subgroups (Figures 6A,C), indicating that both mutation pressure and natural selection play key roles in the formation of codon bias.

Results of phylogenetic analysis in PHIST proteins
To evaluate how evolutionary processes affect the codon usage pattern of PHIST proteins in Plasmodium falciparum, we employed RSCU values of 88 PHIST proteins for cluster analysis (Figure 7A).
The findings revealed that the proteins were categorized into multiple clusters based on evolutionary distance.Most PHIST proteins in the PHISTa and PHISTb subgroups were assigned to separate clusters, whereas the PHISTa-like/PHIST, PHISTb-DnaJ, and PHISTc subgroups occupied distinct clusters.Unexpectedly, PF3D7_0831750 and PF3D7_1372000 of the PHISTa subgroup were placed in the same cluster as PF3D7_0201600 and PF3D7_0702100 of the PHISTb subgroup.To compare with the RSCU-based phylogenetic relationship, a neighbor-joining method was utilized for phylogenetic analysis based on CDS (Figure 7B).According to the CDS phylogenetic analysis, PF3D7_0831750 of the PHISTa subgroup exhibited a different evolutionary clade within the same cluster as PF3D7_0425300 and PF3D7_1253100 of the PHISTa subgroup, indicating a closer evolutionary relationship.

Discussion
Over the course of long-term evolution, organisms will eventually develop a specific set of codon usages that maintain the transmission of genetic information between nucleotides and amino acids (Hershberg and Petrov, 2008;Chakraborty et al., 2020).However, different genes within the same species or across different species exhibit preferences for codon usage (Liu X. Y. et al., 2020).As a result, the analysis of CUB provides valuable insights into the regulatory mechanisms of translation processes and enables the prediction and   The neutrality plot analysis in PHIST proteins.The correlations between the GC12 and GC3 in PHIST proteins were analyzed, while calculating the standard curve and R 2 , respectively.The X-axis represents GC3%, while the Y-axis represents GC12%.Frontiers in Microbiology 10 frontiersin.orgoptimization of exogenous genes for enhanced expression levels through industrial modification, even helping identify key functional sites (Yu et al., 2021;Wu et al., 2022).Currently, a complete understanding of codon usage characteristics for PHIST proteins in Plasmodium falciparum is lacking.Plasmodium falciparum is a protozoan parasite responsible for causing the most severe type of malaria in humans.Artemisinin has a good treatment effect on this disease, however, in some areas, such as Rwanda and Uganda, Plasmodium falciparum have developed partial resistance to artemisinin and even artemisinin-based combination drugs, resulting in the appearing of artemisinin-resistant Kelch13 mutant strains of Plasmodium falciparum (Balikagala et al., 2021;Dhorda et al., 2021).Therefore, the development of new antimalarial drugs or vaccines is urgently needed.Therefore, it is necessary to search for new drug targets.In previous studies, a protein (PF3D7_1372300) in the PHIST family was found to interact with PfEMP1, thus PHIST proteins may become potential new targets (Yang et al., 2020).PHIST proteins have been found to interact with host proteins also, such as cytoskeletal components and immune signaling molecules, to aid in the invasion and survival of the parasite within the host organism (Warncke et al., 2020;Shakya et al., 2022;Tripathi et al., 2022).Understanding the function and role of these PHIST proteins is crucial for developing effective strategies against malaria.
Research on PHIST proteins has revealed their diverse functions and involvement in various stages of the Plasmodium life cycle.PHIST proteins in different subgroups display significant variations in terms of their length and codon base composition.The average amino acid length of PHISTa subgroup is shorter than other subgroups, among which the length of PF3D7_1149700 is shortest.Though PHISTb-DnaJ subgroup possess the longest average amino acid length, PF3D7_0801000 of PHISTc subgroup is the longest among these PHIST proteins, indicating differentiation within the Plasmodium falciparum species (Table 1).Notably, the differences in synonymous codons primarily lie in the third codon position.Interestingly, our study found that all 88 identified PHIST proteins tend to end with A/T, which is consistent with previous research on Mycoplasma capricolum and Onchocerca volvulus, where enrichment of A and T at the end of genes was observed (Mazumder et al., 2018).The average values of A% and T% in the five subgroups were 62.50 to 65.06% and 54.90 to 58.13%, respectively, while the maximum average value of G/C% is only 13.47% of the PHISTa-like/PHIST subgroup.The T3% content of PF3D7_0401800 (33.48%) in PHISTb subgroup is the lowest among these proteins, but it is still much higher than that of G3% of PF3D7_1477300 (21.08%) from PHISTa-like/PHIST subgroup (Figure 1 and Supplementary Table S1), which proved that one specific gene shows diverse codon usage bias in the same or different species and the results are consistent with the feature of apicomplexan protozoa codon usage in other genes (Lamolle et al., 2022;Benisty et al., 2023).Most high-frequency PHIST protein codons are AGA (Arg) and UUA (Leu) analyzed by RSCU, indicating the strongest The ENC-GC3 plot analysis in PHIST proteins.The correlations between the effective number of codons (ENC) and the nucleotide G/C content at the third codon location (GC3) were analyzed in PHIST proteins, respectively.The standard curve represents the functional relationship between ENC and GC3 under mutation pressure rather than natural selection.The X-axis represents GC3%, while the Y-axis represents the ENC value.positive codon bias.However, CGG (Arg) is seldom used and has never appeared in the codon encoding the PHISTa-like/PHIST and PHISTb subgroups proteins, which also show the same tendency of using the third codon in Plasmodium falciparum (Figure 2 and Supplementary Table S2).
The values of CAI, CBI, FOP, and ENC are also analyzed in this paper.The CAI value of PF3D7_1201000 (0.274) in PHISTb subgroup is highest and the average CAI value of PHISTb-DnaJ subgroup (0.183) is higher than others, which indicates a strong codon bias in PF3D7_1201000 and PHISTb-DnaJ subgroup.Meanwhile, the average CAI value of the PHISTa-like/PHIST subgroup (0.159) and PHISTa subgroup (0.160) is almost the same, with a weak codon bias (Table 1).Additionally, specific PHIST genes, such as PF3D7_0102200 in the PHISTb-DnaJ subgroup, show a strong bias towards certain codons, as indicated by higher values of CBI and FOP.However, the average values of CBI and FOP in PHISTa-like/PHIST subgroup are the lowest, indicating the random codon usage within the subgroup.Besides, an ENC value below 35 signifies a strong preference for certain codons, an observation supported by some of the ENC values in our study (Prabha et al., 2017;Pepe and Keersmaecker, 2020) 1).Furthermore, our analysis revealed significant correlations among codon base composition (GC1, GC2, GC3, GCs), CAI, CBI, FOP, ENC, GRAVY, AROMO, L_sym, and L_ aa.And these correlations indicate the influence of base composition and codon usage indices on CUB, particularly with respect to GC1 (Figure 6).Mutation pressure and natural selection are two main factors affecting CUB, and the neutrality plot analysis, PR2-bias plot analysis, and ENC-GC3 plot analysis further support the role of natural selection in shaping the codon bias of PHIST proteins in Plasmodium falciparum (Figures 3-5).Although there are some differences in codon usage indices among the five PHIST protein subgroups, it is evident that the CUB of these proteins is influenced by strong natural selection.
Currently, RSCU clustering and CDS phylogenetic tree are commonly used for analyzing the evolutionary relationship of genes within the same or different species (Jiang et al., 2023).These two clustering analysis methods yield consistent results in certain subgroups, but their results may significantly differ in others.In this study, we examined the relationship among PHIST proteins in different subgroups of Plasmodium falciparum based on CDS and RSCU The correlation analysis in PHIST proteins.The correlations among codon base composition (GC1, GC2, GC3, GCs), codon adaptation index (CAI), codon bias index (CBI), frequency of optimal codons (FOP), the effective number of codons (ENC), general average hydropathicity (GRAVY), aromaticity (AROMO), length of synonymous codons (L_sym) and length of amino acids (L_aa) were analyzed in PHIST proteins, respectively.The color of the color block changes from blue to red, indicating that the correlation is increasing analyses, respectively.Interestingly, the phylogenetic relationships based on CDS analysis were deemed to be more reliable compared to those based on RSCU analysis.In the RSCU clustering analysis, the unexpected grouping of PF3D7_0831300 (PHISTa-like/PHIST subgroup), PF3D7_0936800 (PHISTc subgroup), and PF3D7_0831000 (PHISTb subgroup) occurred within the same cluster.However, the genetic relationships among some subgroups were accurately interpreted based on the RSCU values, which were consistent with those obtained from the CDS phylogenetic tree.For example, the relationships between PF3D7_0115100 and PF3D7_0800600 in the PHISTa subgroup, PF3D7_0601500 and PF3D7_0631100 in the PHISTb subgroup, and PF3D7_0801000, PF3D7_1016700, and PF3D7_1200900 in the PHISTc subgroup were correctly identified.
These results indicate that the phylogenetic outcomes based on RSCU analysis can serve as valuable supplementary information to those derived from sequence-based methods (Figure 7).

Conclusion
In summary, the codon usage patterns of PHIST proteins in Plasmodium falciparum are influenced by various factors, with natural selection being the primary driving force and mutation pressure playing a relatively minor role.Exploring these proteins as targets could open up new possibilities for the development of antimalarial drugs or vaccines.Nevertheless, additional research is required to The phylogenetic analysis in PHIST proteins.The evolution analysis among 88 PHIST proteins was clustered by relative synonymous codon usage (RSCU) value (A), the color of the color block changes from blue to red, indicating that the RSCU values are increasing and coding sequences (B), by the neighbor-joining method, respectively.10. 3389/fmicb.2023.1320060Frontiers in Microbiology 13 frontiersin.orgcomprehensively understand the exact mechanisms of PHIST proteins and their potential as therapeutic targets.

FIGURE 2
FIGURE 2 The RSCU values in PHIST proteins.The relative synonymous codon usage (RSCU) value was calculated by dividing the number of amino acids encoded by the same codon by their frequency of appearance in the same codon.The color of the color block changes from blue to red, indicating that the RSCU values are increasing, with an RSCU value >1 indicating positive codon bias.The homology of codons is shown on the right side and the subgroups are shown on the top side of the figure.

FIGURE 4
FIGURE 4The PR2-bias plot analysis in PHIST proteins.The correlations between A3/(A3 + U3) and G3/(G3 + C3) were analyzed in PHIST proteins, respectively.If a codon has no usage bias, the value will be in the center point of the plot.The first quadrant represents the codon preference for A/G, while the third quadrant represents the preference for T/C.The X-axis represents GC3%, while the Y-axis represents GC12%.(A) PHISTa subgroup.(B) PHISTa-like/ PHIST subgroup.(C) PHISTb subgroup.(D) PHISTb-DnaJ subgroup.(E) PHISTc subgroup.

FIGURE 6
FIGURE 6 . The scales on the right and bottom represent the strength of the correlation, while the components on the left and top represent different indices.(A) PHISTa subgroup.(B) PHISTa-like/PHIST subgroup.(C) PHISTb subgroup.(D) PHISTb-DnaJ subgroup.(E) PHISTc subgroup.One asterisk (*) indicates a significant correlation among indices at the p < 0.05; Two asterisks (**) indicate the correlation at the p < 0.01; Three asterisks (***) indicate the correlation at the p < 0.001.

TABLE 1
Codon usage indices in PHIST proteins.
. Overall, the average ENC value of the 88 PHIST proteins is 36.69,though 26 out of 88 ENC values were lower than 35, indicating a weak codon preference in Plasmodium falciparum PHIST proteins.