Structural and functional characteristics and expression profile of the 20S proteasome gene family in Sorghum under abiotic stress

The 26S proteasome is a molecular machine that catalyzes and degrades protein intracellularly with the help of its core complex called 20S proteasome. The 20S proteasomes degrade and cleave denatured, cytotoxic, damaged, and unwanted proteins via proteolysis and impart biotic and abiotic stress tolerance in model plants. This study identified 20 genes, namely, 10 SbPA and 10 SbPB that encode for α- and β-subunits of the 20S proteasome in Sorghum bicolor (L.) Moench (2n= 20). These genes have been found distributed on the 1st, 2nd, 3rd, 4th, 5th, 7th, and 10th chromosomes. These sorghum genes were orthologous to corresponding rice. Phylogenetic analysis clustered these genes into seven clades, each with one of the seven α-subunits (1 to 7) and one of the seven β-subunits (1 to 7). In silico gene expression analysis suggested that nine genes were involved in abiotic stress response (cold, drought, and abscisic acid hormone). The expression of these proteasomal genes was studied in shoots and roots exposed to different abiotic stresses (cold, drought, and abscisic acid) by quantitative real-time polymerase chain reaction. A significant increase in the relative fold expression of SbPBA1, SbPAA1, SbPBG1, SbPBE1, and SbPAG1 genes under ABA and drought stress provides an insight into its involvement in abiotic stress. No expression was observed for cold stress of these genes indicating their non-involvement. It is believed that additional investigation into the SbPA/SbPB genes would aid in the creation of S. bicolor cultivars that are resistant to climate change.


Introduction
Sorghum bicolor, a coarse grain, is primarily used as food and fodder in Asia, Africa, the Americas, and Australia.Besides fodder, like sugarcane, the juicy stalks of sweet S. bicolor can be utilized for preparation of syrup and jaggery.Grain from S. bicolor is germinated, dried, and processed to create malt, which serves as a substratum for fermentation in the creation of beer.India ranks second in the world for the cultivation (6.18 million hectares) and production (5.28 million tons) of S. bicolor.It is the third cereal crop after rice and wheat.For several decades, the main focus of research has been on developing agricultural cultivars that can survive biotic and abiotic stresses (Dhankher and Foyer, 2018;Kumar et al., 2019).Abiotic stress mainly involves temperature, salinity, and drought, causing crop yield losses, whereas biotic stresses include different fungal, bacterial, and viral diseases and insect pests (Sun et al., 2019).In response to these stresses, the physiological and molecular responses induce the proteolytic capacity of the eukaryotic cells to selectively remove/degrade the unnecessary/damaged proteins (Kurepa et al., 2009).
The ubiquitin-proteasome system (UPS) is an important protein degradation pathway that removes nuclear, cytosolic, and other membrane proteins (Glickman and Ciechanover, 2002;Finley, 2009).The 26S proteasome holoenzyme of UPS consists of a 19S regulatory particle and a 20S core particle, which recognizes ubiquitin signals and hydrolyzes unfolded polypeptides into short peptides (Saeki, 2017;Yu and Matouschek, 2017).The core complex is integral to the proteasome and is found in many cell types.Under stress, the 20S particle removes misfolded or damaged proteins, catalyzing protein degradation in a non-lysosomal, ATP-dependent manner.The 20S proteasome, a barrel shaped structure, forms the core of the 26S.20S is made up of 28 subunits arranged in four rings of 7 aand 7 b-subunits placed one above the other.In the center, two identical rings are made up of 7 b-subunits each.Terminal rings are made up of 7 a-subunits.This gives a symmetric (a 1-7/b 1-7/b 1-7/a 1-7) organization to 20S proteasome.Three b-subunits (numbered b1, b2, and b5) in the middle rings have a proteolytic active site with a specific substrate preference.The a-subunits at the terminal end of 20S proteasome are responsible for entry and exit of peptides.In Arabidopsis thaliana and many eukaryotes, there are 7 a-subunits and 7 b-subunits.Nomenclaturally, these aand b-subunits can be represented as PAA-PAG and PBA-PBG, respectively.
Genetics of UPS has been figured out in species like Arabidopsis, rice (Fu et al., 1998), wheat (Sharma et al., 2022), and rapeseed (Kumar et al., 2022).In wheat, the 20S proteasome genes were identified and characterized, providing their roles in different biological processes related to abiotic stress tolerance (Sharma et al., 2022).Additionally, it has been demonstrated that the genes encoding various 20S proteasome subunits in rapeseed are linked to biotic and abiotic stressors (Kumar et al., 2022).Several other functions such as tolerance to arsenic (Sung et al., 2016), antiviral response (Dielen et al., 2011), immune response, and organellar stress inducing programmed cell death (L.Sun et al., 2013;Sun H.H et al., 2013;Cai et al., 2018) are also contributed by the 20S proteasomal genes.In Arabidopsis, AtPBE1 regulates proteasome assembly under salt stress (Han et al., 2019).In maize, AtPBAC4 ortholog resulted in defective/collapsed kernels (Wang et al., 2019).In wheat, TaGW2, which encodes the E3 ring ligase of proteasomal complex, regulates the grain size (Song et al., 2007).In rice, OgTT1 (a2 subunit) is associated with heat tolerance and their adaptation (Li et al., 2015).
To date, the comprehensive investigations on structural and functional aspects, and expression profiles of 20S proteasomal genes The sequences of 20S proteasome genes for S. bicolor and rice are available in the ensembl database (https://plants.ensembl.org/index.html).The coding sequences (CDS) of 23 rice genes (OsPA/ O s P B ) a v a i l a b l e i n t h e e n s e m b l d a t a b a s e ( h t t p s : / / plants.ensembl.org/index.html)have been utilized to obtain the 20S proteasome gene sequences of S. bicolor.The gene sequences of S. bicolor have been searched using Tblastx (e value ≤ 1e -5) against the available rice genome assembly.The hits were examined for the presence of specific domains as available in query sequences using the conserved domain database (CDD) (Lu et al., 2020) search tool at NCBI.High-level query coverage and (more than 60%) sequence similarity, and the presence of all domains and motifs available in query sequences were the criteria that were used to identify the orthologs in Sorghum.

Promoter structure analysis and miRNA target prediction
Utilizing Plant CARE (http://bioinformatics.psb.ugent.be/webtools/plantcare/html/) (Lescot et al., 2002), it was possible to find cis regulatory elements in the gene sequence upstream of 1,500 bp from the promoter region.The probable micro-RNAs and their targets in the genes of S. bicolor were looked for using the default parameters of the web-based service psRNATarget (https:// www.zhaolab.org/psRNATarget/)(Dai et al., 2018).Here, e-value 0-3 was applied (Rensink & Buell, 2004;Lu et al., 2019).

Molecular phylogeny, synteny, and collinearity
Ensembl plants gene tree pipeline (protein sequence alignments) (Rogozin et al., 2003;Yu et al., 2019) was utilized to establish evolutionary relationship among proteasome genes using gene identifier for each gene of S. bicolor.Using the plant compara option, a gene tree of homologs across the genomes of S. bicolor and rice was created.This gene tree can be used to identify duplication and speciation events called paralogy and orthology, respectively.Ka/Ks ratios for seven pairs of gene were calculated using tbtool.Synteny/collinearity of S. bicolor with rice gene was determined using blocks of 25 genes.Genomicus tool v. 49.01 (https:// www.genomicus.bio.ens.psl.eu/genomicus-plants-49.01/cgi-bin/ search.pl)(Muffato et al., 2010) was used for this purpose.

Physio-chemical properties of candidate proteins
The NCBI keeps a database of conserved domains that can be searched on CDs (CDD).The major domains in the protein sequences of S. bicolor were identified using this tool.Rice has been manually searched for a/b-proteasome domains, which are distinctive traits of the proteasome family.Physio-chemical parameters such as amino acid composition, molecular weight, theoretical PI, the number of positively and negatively charged residues, instability index, aliphatic index, GRAVY, and stability have been calculated using the Protparam tool (https:// web.expasy.org/protparam/)(Laskowski et al., 1993) of Expasy.Similar to this, the SOPMA tool (Geourjon and Deleage, 1995) was used to calculate the secondary level characteristics of proteins (https://npsa-prabi.ibcp.fr/cgi-bin/npsaautomat.pl?page=/ NPSA/npsasopma.html).

Conserved motifs discovery and homology modeling
Meme suite (https://meme-suite.org/meme/tools/meme) was utilized to search motifs.Inter proscan database (https:// www.ebi.ac.uk/interpro/search/sequence/) was utilized to annotate identified motifs.Homology modeling was used to infer the predicted proteins' 3D structures.Pdb was tested using the Swiss model template library (https://swissmodel.expasy.org/).Geometric and energetic validation of the predicted 3D protein structures was done using the structural analysis and validation system (https:// saves.mbi.ucla.edu/).Saves v6.0's option procheck was used to compare the relative proportion of amino acids in a favored region to another region (Li et al., 2015).The quality of the protein was checked through the PROCHECK server by dihedral analysis of the Ramachandran plots of predicted candidate proteins.VERIFY-3D (Eisenberg et al., 1997) was used to evaluate how well the atomic model (3D) matched the amino acid sequence (Clavijo et al., 2017).To examine the statistics of non-bonded interactions between various atom types, the Errat option of saves was employed (Parmentier et al., 1997).

Structural characterization, subcellular localization, and gene ontology analysis
By aligning their representative structures, the 3D structures of proteins predicted for S. bicolor and those encoded by various Oryza sativa genes were compared using the Fat Cat server (https:// fatcat.godziklab.org/fatcat/fatcatpair.html).By comparing the root mean square deviation (RMSD) values of the C a atoms of the created structures to those of the corresponding 3D structures of the query genes, the similarity of the generated 3D structures in a globally optimized superimposition environment was determined.Subcellular localization and GO analysis were conducted using the Biomart tool (https://plants.ensembl.org/biomart/martview/1e86aac7e869419fd945a124d55c0405) (Smedley et al., 2009) and ShinyGO tool (http://bioinformatics.sdstate.edu/go/)(Ge et al., 2020), respectively.The protein-protein interaction network was performed to uncover unknown functions of proteins at the molecular level using string database (Szklarczyk et al., 2023).

Sequence alignment and phylogenetic analysis
To identify conserved and coevolving amino acid residues in S. bicolor and rice, multiple sequence alignment was performed using the MultAlin tool (http://multalin.toulouse.inra.fr/multalin/).The mutual information (MI) between two amino acid locations in MSA was calculated using the Mistic web server (http:// mistic.leloir.org.ar/results.php?jobid=202112021211022296).Coevolving residues were found using MI.The MI between two places (two columns in the MSA) measures how well we can anticipate the amino acid identification at a different position when we are aware of an amino acid at one position.Thus, MI is a measurement that enables the detection of associated and compensating mutation sites in homologous proteins.Using MEGA software version 6.0 (MEGA stands for molecular evolutionary genetics analysis), the phylogenetic analysis of protein amino acid sequences was carried out (Xu & Xue, 2019).To create an unrooted tree, a neighbor-joining approach with a bootstrap requiring 1,000 iterations was used.Mega software's Newick format tree was developed with iTOL (https://itol.embl.de/)(Letunic and Bork, 2016).

In silico expression profiling
The expression profiling of all the candidate SbPA/SbPB genes at different levels was performed using GENEVESTIGATOR (https://genevestigator.com/) (Hruz et al., 2008).First, at the tissue-specific level, the expression was studied on a quantitative basis in different tissues of sorghum, taking into account the SB-mRNASeq-SORGHUM datasets and selecting all the 20 SbPA/SbPB genes in a heat map format.Second, a set of experiments were performed for the 20 SbPA/SbPB genes at 10 different developmental stages.Third, the expression of these 20 SbPA/ SbPB genes was studied under abiotic (cold, drought, and ABA hormone) stress and the data were presented in the form of fold values.

Validation of proteasomal genes through qRT-PCR
For experimental validation of candidate genes, the sorghum cultivar HJ 541 was selected and their seeds were grown in soil-filled pots under greenhouse conditions.Various abiotic stress treatments (cold: 4°C, ABA and drought: 10% of soil moisture remaining in pot) were performed.The leaf and root samples were collected from 4-week-old plants after providing the abiotic stresses for 10 days.Total RNA was isolated from samples using the Maxwell RSC Plant RNA kit (Promega, United States), according to manufacturer's protocol.From the isolated RNA, cDNA was synthesized using the RevertAid cDNA synthesis kit (Thermo Scientific, United States).The Primer Quest tool of IDT was used to design specific primers for quantitative expression analysis.Actin was used as an endogenous control.Quantitative real-time polymerase chain reaction was performed using Quant Studio 6 Flex system, with three biological replicates.Relative expression of genes was quantified using the 2 −DDCT method to identify the expression pattern of proteasomal genes under abiotic stress.

Results
Numerous crops' sequenced plant genomes have been used to research genes involved in various developmental stages and stress tolerance (Lu et al., 2019).The sequenced genomes of model plants A. thaliana and O. sativa, which serve as a platform for comparative studies, are beneficial to crops whose genomes have not been sequenced (Rensink and Buell, 2004).The workflow used to characterize the 20S proteasomal genes in this study is given in Figure 1.

Gene structure, splice variants, and chromosomal location of 20S proteasome genes
The present investigation revealed 20 proteasomal genes in S. bicolor.These genes were classified into seven distinct a and b types of the 20S proteasome family.Table 1 contains detailed information about these genes, as well as the cDNA and CDS sequences of the asubunits (SbPAA-SbPAG) and b-subunits (SbPBA-SbPBG).Information on homology of S. bicolor with rice is given in Table 1.All the 20 Sorghum genes have been designated on the basis of corresponding genes reported in rice (Fu et al., 1998;Livneh et al., 2016).
Each SbPA gene has a length that varied from 2,221 to 9,357 base pairs.Exon and intron count in SbPA genes ranged from 1 to 11, respectively.The 10 SbPA genes are all intron-containing The SbPB genes' lengths ranged from 2,401 to 6,010 base pairs.Exon and intron count in SbPB genes ranged from 3 to 8 and from 2 to 7, respectively (Supplementary Table 1).Although there are some exceptions, the structural arrangement of exon and intron was found to be comparable in the majority of SbPA and SbPB genes.
The SbPA gene's cDNA sequence ranged from 612 to 2,221, while the SbPB gene's sequence ranged from 1,064 to 3,709 (Supplementary Table 1).SbPA (708-813) and SbPB (615-843) genes showed individual variations in CDS. Figure 2 shows the distribution of exons (solid yellow bar), introns (black lines), upstream and downstream areas (solid blue bar), and UTR (3' or 5') (solid green bar), as well as intron phase 0 (56.83%),phase 1 (30.93%), and phase 2 (12.23%).SbPA and SbPB genes were found to be unevenly distributed throughout 10 chromosomes, which may be the result of gene duplication or gene loss (Li et al., 2015;Clavijo et al., 2017;Yu et al., 2019).The Chromosome Sb10 had the maximum number (5) of genes (SbPAC1, SbPBA2, SbPBB2, SbPBC1, and SbPBE1) and chromosomes Sb3 and Sb5 had one gene each.Chromosomes Sb6, Sb8, and Sb9 are without any gene.The positions of all the genes were terminal and sub-terminal (Figure 3).In contrast to the prior work by Sassa et al., which reported that there were 14 genes in rice, the phylogenomic survey found that there are 23 genes (Sassa et al., 2000).More thorough phylogenomic surveys may also be able to resolve duplications and heterogeneity in other organisms.
Using gene trees that included Sorghum genes as well as genes from O. sativa, the orthology between 10 SbPA and 10 SbPB genes was studied.Utilizing the Ensembl Plant Compara pipeline, this tree was created (Supplementary Figure 1).

Gene duplication and synteny
The orthologous and paralogous interactions between the SbPA and SbPB genes in several taxa are outlined in Supplementary Table 2.The 23 genes (13 a and 10 b) in rice were divided into seven duplicates, one triple, and the remaining six singletons (Supplementary Table 2).The duplication pattern in rice was used to predict the comparable pattern in S. bicolor.Using the plant compara gene tree in the ensembl plant database allowed us to distinguish orthologs from paralogs.Seven gene pairs have been found in the peptides of Sorghum bicolour and rice.Ka and Ks ratios for all the seven pairs were found to be less than one.These values indicate stabilizing selection, i.e., genes are constrained to maintain their current function and thus acting against change favoring conservation (Supplementary Table 3).Synteny of 20 genes of S. bicolor with the rice gene has been found to be 100% (Supplementary Figure 2).However, the collinearity of genes of S. bicolor with rice was absent.

SSRs mining
Eleven (55%) of the 20 genes included a total of 22 SSRs.This suggests that only a small fraction of the genes in a gene family include SSRs.Of these, 14 SSRs were found in eight SbPA genes and eight SSRs in three SbPB genes.The number of SSR per gene also varied, with two genes (SbPAG2 and SbPBB1) having five SSRs, three genes (SbPAE2, SbPAG1, and SbPBB2) having two SSRs, and the remaining six genes having one SSR each (Supplementary Table 4).The most common SSRs (5) have hexanucleotide and trinucleotide motifs, followed by those with mononucleotide (3), dinucleotide (3), pentanucleotide (3), tetranucleotide (2), and heptanucleotide (1) motifs.

Promoter structural analysis and functional annotation of cis-regulatory elements
All genes' promoter regions were found to include many LAMP elements, CAAT-box, TATA-box, CCAAT-box, Sp1, CGTCA- Workflow used to characterize 20S proteasomal genes in S. bicolor.motif, ABRE, I-box, G-Box, GC-motif, CAT-box, P-box, TC-rich repeats, TATC-box, TGACG-motif, GT1-motif, MBS, TCT-motif, ATCT-motif, AT-rich element, TCCC-motif, MRE, GARE-motif, O2-site, ARE, A-box, GCN4_motif, TCA-element, chs-CMA2a, AACA motif, AuxRR-core, Box 4, GATA-motif, LTR, GTGGCmotif, Box II, the 3-AF1 binding site, and the Pc-CMA2c elements.Among these, CAAT-box, TATA-box, TGACG-motif, G-Box, MRE, TCT-motif, Sp1, and AT-rich elements were present in maximum number of genes (Figure 4).The CAT box and GCN4 motif were found responsive in the expression of meristem and endosperm, respectively (Supplementary Figure 3).All of the transcription factors have the A-box motif, which is a cis-acting regulatory element connected to P-box and L-box.It is involved in transcriptional activity that is triggered.In the promoter regions, Abox from various families of transcription factors was discovered.

miRNA target prediction
In the current investigation, we discovered 21 miRNAs that contained the SbPA and SbPB gene sequences.Only nine genes (six SbPA and three SbPB) were predicted to include the target locations for these miRNAs.Six miRNAs, the maximum number of target sites, were available for one gene (SbPBA1).The targets for SbPAC1 and SbPBC2 were available for three miRNAs.Similarly, the targets for SbPAB2, SbPAE1, and SbPAG2 were available for two miRNAs, and the targets for SbPAD1, SbPAE2, and SbPBC1 were available for one miRNA each.On numerous Sorghum chromosomes, various SbPA and SbPB genes served as the targets for the remaining 13 miRNAs (Table 2).
Most of the miRNAs suppressed the expression of genes with miRNA target sites through post-transcriptional cleavage while the two miRNAs (miR6225-5p and miR6230-3p) suppressed the gene expression through translational inhibition (Table 2).

Physicochemical properties of SbPA and SbPB proteins
Twenty SbPA/SbPB proteins have an average of 242 amino acids (aa), ranging from 204 (SbPAG1, SbPBC1, and SbPBC2) to 280 (SbPBE1).There were identified a-subunits with a length of 204 to 280 aa and b-subunits with a length of 204 to 270 aa (Supplementary Table 5).SbPA and SbPB proteins were identified to have a molecular weight range of 22,375.5 kDa (SbPAG1) to 30,218 kDa (SbPBE1).The range of the isoelectric point (PI) was 4.72 to 8.26.While the remaining proteins ( 16) with greater aliphatic indexes (71.5 to 100.8) were stable, the unstable proteins (4) had aliphatic indexes ranging from 79.22 to 87.26.The protein solubility value was represented by the hydropathy's overall average.Grand average of hydropathy (GRAVY) values ranged from −0.17 to −0.355, indicating that proteins are hydrophobic (Table 3).Because of this characteristic, proteins will fold correctly to maintain their stability and biological activity.The negative values of GRAVY indicates that proteins are non-polar and positive values of GRAVY indicate that proteins are polar.In S. bicolor, the peptides of 20S proteasome are rich in five amino acids (alanine, serine, glutamic acid, glycine, and leucine ranging from 8.93% to 12.22%).All these amino acids are involved in different biological functions in 20S proteasome (Supplementary Table 6).Five amino acids (arginine, phenylalanine, tyrosine, leucine, and glutamic acid) mainly provide catalytic sites for peptidase activity (Supplementary Table 10).

Functional domains and motifs of SbPA and SbPB proteins
The logo of 10 distinct motifs and the associated amino acids identified in the protein sequences of SbPA and SbPB proteins were given (Supplementary Figure 4).In Supplementary Table 7, the specifics of these proteins' sequences, e-values, and functions are    listed.These 20 motifs available in the database were not previously documented in rice and Arabidopsis (Parmentier et al., 1997;Fu et al., 1998;Sassa et al., 2000).Recently, Sharma et al. (2022) reported 20 motifs in wheat.Individual motifs ranged in length from 15 aa (motif 8) to 50 aa (motifs 7 and 9).In cellular protein catabolic processes, the roles of motifs 2, 3, 4, 5, and 7 have been linked to proteolysis (https:// www.ebi.ac.uk/interpro/result/InterProScan/iprscan5-R20211209-091508-0331-1308065-p2m/).The remaining five motifs need to be molecularly characterized because they were found to be novel.SbPA and SbPB proteins are anticipated to be in the nucleus and cytoplasm, as previously found in research for eukaryotes (Fu et al., 1998).One   5).
In species like yeast, Arabidopsis, and rice, similar data on the 20S proteasome for individual and subunit proteins have not been provided (Groll et al., 1997;Parmentier et al., 1997;Fu et al., 1998;Sassa et al., 2000).

Multiple sequence alignment and conserved amino acids
The OsPAE1 resembled 98.3% with SbPAE1 whereas OsPAF1 matched 89.55% with SbPAF1 (Supplementary Table 8).The OsPBC2 resembled 96.57% with SbPBC2 whereas OsPBE1 matched 83.33% with SbPBE1 (Supplementary Table 9).The similarity of all genes of S. bicolor with O. sativa was more than 80%.A high similarity of amino acids was observed between aand b-subunits of S. bicolor and rice.A total of 5 aa of a-subunits and 8 aa of b-subunits were found to be highly conserved among S. bicolor and rice (Supplementary Figures 5A, B).These 5 (a-subunits) and 8 (b-subunits) residues also had the highest MI (Supplementary Figures 6, 7).

Subcellular localization and function
The proteasome complex includes the proteins SbPA and SbPB.These SbPA and SbPB proteins are found in the nucleus and cytoplasm These proteins may function in molecular processes such as response to zinc ions, proteasomal ubiquitin-independent protein metabolic processes, proteasome complex, proteasome core complex, proteasome core complex alpha-subunit complex, endopeptidase complex, and threonine-type endopeptidase activity, threonine-type peptidase activity, proteasome, peptidase complex, and response to metal ions, according to GO analysis and functional annotation (Supplementary Figure 8).We have performed proteinprotein interaction network analysis to know the unknown functions of proteasomal proteins.We found 20 peptide nodes that correspond to number of gene sequences (Supplementary Figure 9).Based on the protein networks and their functional roles, we can say that the retrieved peptides show different types of functions such as ubiquitination, proteolysis, nitrogen metabolism, catabolic and anabolic processes, and endopeptidase and threonine peptidase activity (Supplementary Table 10).

Secondary and tertiary structures
The secondary structures of all the proteins were compared.The secondary structures were found to be dominated by the a-helix followed by random coils, extended b-strand and turns for all sequences.The random coils created the irregular structural areas that allow polypeptide chains to fold in a distinctive manner (Supplementary Table 11).The SbPA and SbPB proteins tend to form highly stable structures.Only six (30%) SbPA and SbPB proteins with similarities ranging from 29.81% to 54.98% to the matching rice template were selected to determine in silico 3D structures.For these 20 proteins, the Global Model Quality Estimation (GMQE) ranged from 0.73 to 0.82.A high-grade protein model is suggested by this.The Q-mean value varied between 0.74 ± 0.05 and 0.80 ± 0.06.The protein model and the reference (rice) proteins were similar to varying degrees (60.57%).The 3D-1D score (found using verify 3D) ranged from 78.22% to 97.12%, and the quality factor (calculated using ERRAT) ranged from 86.1878 to 97.8355.(Supplementary Table 12).The modeling of the three-dimensional structure of the proteins was performed by the modeling program Swiss-Model and the modeled structures are shown in Figure 5. PROCHECK server analysis of the modeled protein revealed a varied percentage of residues under the most favored (85%-97%), generously allowed (0.5%-1.4%), additionally allowed (2.9%-14.6%),and disallowed regions (0.4%-1.0%), indicating that the predicted models were of excellent geometry and were accepted for further analysis (Laskowski et al., 1993;Kumar et al., 2018;Batra et al., 2019).Out of these 20 proteasomal proteins, 15 proteins have over 90% of their amino acid residues in energetically favored regions (Supplementary Figure 10).These proteins' predicted 3D architectures give researchers a starting point for deciphering their molecular functions.

Alignment and functional annotation of 3D structures
The 3D protein structures of S. bicolor were superimposed (using the least amount of energy) onto the matching 3D protein structures of rice reference proteins (Supplementary Figure 11).Six proteins' 3D structures (SbPAB2, SbPAG2, SbPBA1, SbPBA2, SbPBB1, and SbPAD1) exhibited 1.3% to 3.07% resemblance to the matching OsPAA1 protein's 3D structure, with an RMSD of 0 Å (Supplementary Table 13).

Phylogenetic analysis
To create a phylogenetic tree, the amino acid sequences of S. bicolor and rice's PA and PB subunits were used individually (Figure 6).Each of the phylogenetic trees consisting of seven specific clades was found (Figures 6A, B).Meme results showed that motifs 1, 2, 3, 4, 6, and 8 are most conserved in the alpha domain, whereas motifs 4, 3, and 2 are most conserved in the beta domain of proteasome.The conserved motifs and these clades were similar (Figure 2).The findings of seven clades in the present study may have numerous taxonomic applications (Fürstenberg-Hägg et al., 2013).It is interesting to note that seven clades in the phylogenetic tree were generated by the orthologs of three species that belong to distinct aand b-subunits.A minimum of 7 and a maximum of 10 orthologs generated clades in the a-subunit tree.A minimum of 7 and a maximum of 13 orthologs generated clades in the b-subunit tree as well.Similar results were observed for rice, yeast, and Arabidopsis proteins (Fu et al., 1998;Sassa et al., 2000).These results revealed a higher degree of similarity between the aand b-subunits of S. bicolor and the comparable subunits of rice.The results also suggested that 20S proteasome genes of S. bicolor are orthologous to rice genes.

Differential gene expression pattern of 20S proteasome genes
Gene expression patterns under normal and stressful circumstances were studied.Under normal conditions, we examined tissue-specific and development-specific expression.We also observed cold, drought-and hormone responses under abiotic stress.

Tissue-specific expression
It was observed that different tissues (rhizome, shoot, inflorescence, seedling, and cell culture) showed variation in expression of 20S proteasomal genes.The rhizome showed the highest expression, which was then followed by the inflorescence, seedling, shoot, and cell culture (Figure 7A).

Differential gene expression during plant development
It was observed that various developmental stages (germination, seedling, tillering, stem elongation, booting, heading, flowering, milk, dough, and maturity stages) showed variation in expression of 20S proteasomal genes (Figure 7B).
The development-specific expression analysis placed the 20 genes in two categories of higher and medium expression.A total of 19 genes showed higher expression with four-to eightfold values in different stages (germination, seedling, tillering, stem elongation, booting, heading, flowering, milk, dough, and maturity stages).Only one gene (SbPBA1) showed medium expression with zeroto fourfold values in different stages (germination, seedling, tillering, stem elongation, booting, heading, flowering, milk, dough, and maturity stages) (Figure 7B).

Drought stress
Under drought stress, two samples of each leaf (BT×642 and RT×430) and root (BT×642 and RT×430) were studied at preflowering stage for expression of 20S proteasome genes.The expression results showed downregulation (0.0 to −1.50-fold) of two genes (SbPAE1 and SbPBE1) and one gene (SbPBD1) in BT x 642 and RT x 430 leaf samples, respectively.The remaining genes showed upregulation (0.0-to 1.50-fold) in their expression in both leaf samples.However, seven genes were found to be upregulated in each BT x 642 (SbPAA1, SbPAB1, SbPAF1, SbPBC1, SbPAB2, SbPBA1, and SbPBG1) and RT x 430 (SbPAA1, SbPAB1, SbPAF1, SbPBD1, SbPAB2, SbPBA1, and SbPBG1) root sample and their expression ranged from 0.0 to 1.50.All the remaining genes were found to be downregulated with an expression range of 0.0 to −2.0 in both root samples (Figure 7C).Phylogenetic tree constructed using protein sequences of (A) a-subunits and (B) b-subunits belonging to plant species, O. sativa and S. bicolor.Seven different colors in the tree represent seven different clades.
The qRT-PCR analysis of candidate genes (SbPBA1, SbPAA1, SbPBG1, SbPBE1, and SbPAG1) was performed using gene-specific primers, listed in Supplementary Table 14.The genes showed similarity with in silico results.An increased expression of SbPBA1 gene under drought and ABA treatments was found, i.e., 1.32-and 1.12-fold in leaves and 1.58-and 1.55-fold in roots, as compared to the control conditions.SbPBE1 gene was found upregulated only under ABA stress in both leaf (1.45-fold) and root (1.6-fold) tissues.Downregulation of this gene was found under cold and drought stress.An upregulation in expression of SbPAG1 under drought (1.43-fold) and ABA (1.67-fold) stress in leaf tissues but not in roots was also observed.An increased expression of SbPAA1 gene under drought and ABA treatments was also found, i.e., 5.17-and 4.78-fold in leaves and 4.69-and 4.12-fold in roots, as compared to the control conditions.In leaves, under drought and ABA stress, SbPBG1 showed increased expression of 4.02-to 4.87-fold.These five genes studied for qRT-PCR expression showed their downregulation under cold treatment (Figure 8).

Discussion
The present investigation revealed 20 proteasomal genes in S. bicolor.These genes were organized into seven different a and seven different b types of 20S proteasome family.In the majority of the SbPA and SbPB genes the structural pattern of exon and intron was found to be similar, although in some cases, this similarity pattern deviates.This deviation may be due to loss or gain of intron during gene evolution (Rogozin et al., 2003;Yu et al., 2019).In cDNA sequences, major differences can be due to size and number of introns present in SbPA and SbPB genes.Moreover, differences in the length of cDNA sequences may be due to variation in the length of UTRs present on the borders of cDNA.The phylogenomic analysis of 20 genes revealed that there are seven clades of each a (SbPAA-SbPAG) and b (SbPBA-SbPBG).An uneven distribution FIGURE 8 qRT-PCR expression levels of SbPBA1, SbPBE1, SbPBG1, SbPAA1, and SbPAG1 genes under abiotic stresses (cold, drought, and ABA hormone) in S. bicolor.et al. 10.3389/fpls.2023.1287950Frontiers in Plant Science frontiersin.org

Malik
of SbPA and SbPB genes on 10 chromosomes and this may be due to gene duplication or gene loss (Li et al., 2015;Clavijo et al., 2017;Yu et al., 2019).Gene duplication is a random and frequent process/ event that occurs by either tandem or block duplication mechanisms.The duplication events are helpful in understanding the expansion mechanism of 20S proteasomal family genes.Eleven (55%) of the 20 genes included a total of 22 SSRs.This suggests that only a small fraction of the genes in a gene family include SSRs.The structural and functional characteristics of SSR have been identified in a large number of genes (Gupta and Rustgi, 2004;Li et al., 2004;Varshney et al., 2005).In Sorghum, the simple sequence repeats (SSRs) were mapped by Taramino et al. (1997).The SSRs found in the genes encoding the 20S proteasome's aand b-subunits can be exploited to provide useful/functional markers for marker-assisted selection.These biomarkers can be employed to increase plant system tolerance to biotic and abiotic challenges.The lack of transposable and retro elements in the genes under investigation implied that the 20S proteasome family is not expressed in S. bicolor.Promoter analysis revealed that there are a wide range of cis regulatory elements that mediate transcriptional regulation of 20S proteasomal genes.The investigation employed the 1,500-bp 5' upstream of the promoter sequence.The development and stress response in Sorghum were found to be regulated by the consensus cis regulatory elements that were found to span the promoter of 20 proteasomal genes.A typical cis-acting element in promoter and enhancer regions is the CAAT-box.All families of transcription factors included it.Low-temperature responsiveness (LTR), drought responsiveness (MBS), and defense and stress responsiveness (TC rich repeats) are the recognized stress-related motifs.Similar to this, light responsiveness (LAMP element, MRE, TCT-motif, TCCC-motif, I-box, Sp1, GT1-motif, GTGGC-motif, GATA-motif, Box 4, Box II, ATCT-motif, PcCMA2c, chsCMA2a) and zein metabolism (O2 site) are development-related characteristics.Auxin responsiveness (AuxRR-core), abscisic acid responsiveness (ABRE), ethylene responsiveness, GA3 response (Gbox, P-box, GARE-motif, and TATC-box), salicylic acid responsiveness (TCA element), and MeJA responsiveness are a few more hormone-responsive motifs that have been discovered (CGTCA and TGACG motifs).Small non-coding RNAs called miRNAs play regulatory roles in cells at the post-transcriptional and translational levels.These cause result gene targets to deteriorate (Budak and Zhang, 2017).Upon induction, the microRNA sbi-miR396d induced drought tolerance in Arabidopsis by targeting growth-regulating factors coordinating cell division and differentiation that impact leaf development and orientation (Jones-Rhoades and Bartel, 2004).Digital expression analysis of the 20S proteasome genes in plant systems was previously studied (Fu et al., 1998;Li et al., 2015).Many biological processes, including plant development and reactions to various biotic and abiotic stressors, include 20S proteasome genes (Xu and Xue, 2019).These actions are a result of impulses from various signaling molecules that control plant growth under various stress conditions (Livneh et al., 2016).Numerous indications point to the possibility that hormone signals have an impact on the 20S proteasome gene's expression (Kurepa et al., 2009).A number of genes contributing biotic and abiotic stress tolerance have been identified by studying UPS genetics in different crops, for example, TaFBA1's role in heat tolerance (Li et al., 2018), the role of heat shock proteins in the breakdown of toxic and misfolded proteins (Awasthi and Wagner, 2005), and its participation in a number of human disorders (Bozaykut et al., 2020).This is the first investigation on the expression of 20S proteasome genes in S. bicolor under both normal and stressful circumstances.A publicly accessible transcriptome database was used for this.
Hoffmann and Rooney (2019) noted that the production of sorghum is impacted by both abiotic [drought, temperature (heat or cold), and soil fertility and/or composition (specifically soil pH, micronutrients, or fertility)] and biotic {insects like the sugarcane aphid (Melanaphis sacchari) and plant pathogens like stalk rot (Macrophomina phaseolina), head worm [Helicoverpa zea (Boddie)], midge [Contarinla sorghicola (Coquillett)], and green bug [Schizaphis graminum (Rondani)]}.Roozeboom and Prasad (2019) reported that Sorghum is a remarkably resilient plant, with the ability to compensate environmental stresses, insecticidal diseases, and nutrient availability.In comparison to wild types (with bloom phenotype), Jenks et al. (1994) discovered that leaves of a bloomless Sorghum mutant with thinner cuticles were more vulnerable to the fungi Exserohilum turcicum (Pass.)and Puccinia purpurea (eke) in the field.According to Burow et al. (2007), the bloom gene may significantly contribute to Sorghum's overall drought resistance.The upregulation of SbPBA1, SbPAA1, SbPBG1, SbPBE1, and SbPAG1 under ABA and drought stress provide an insight into its involvement in abiotic stress.No expression was observed for cold stress of these genes indicating their non-involvement (Figure 8).Transcript levels of Glycine max UBC2 (GmUBC2) and Arabidopsis UBC32 (AtUBC32) are upregulated in response to drought and/or salt stress (Zhou et al., 2010;Wan et al., 2011;Cui et al., 2012).Transgenic Arabidopsis plants overexpressing Vigna radiata UBC1 (VrUBC1), AhUBC2, or GmUBC2 are more tolerant to drought stress (Zhou et al., 2010;Wan et al., 2011;Chung et al., 2013).In plants, UPS components contribute to ABA-dependent responses to abiotic stresses by regulating ABA biosynthesis (through XERICO and PUB44) and ABA signaling (through ABA receptors PYL/PYR/RCAR, suppressors PP2C and transcription factors ABFs/ABIs) (Xu and Xue, 2019).According to a recent study, the deletion of PBE1, a b5 subunit, had a significant impact on how proteasomes assembled when subjected to salt stress (Han et al., 2019), proving that PBE1 is necessary for complete proteasome assembly.Additionally, PBE1 was discovered to reduce the transcription factor ABI5's protein accumulating activity, altering ABA-mediated salt stress signaling in plants (Han et al., 2019).The function of these above-mentioned genes that show differential expression may be planned for future studies in different breeding programs in order to develop sorghum cultivar resistant to various stresses.
The ubiquitin/20S proteasome system (UPS), which works after external stimuli, allows plants to modify their proteomes in response to their environment in order to develop and survive.Only a few genes and their functions have been fully understood to date out of the many genes that encode UPS components.It will be easier to understand the molecular regulatory mechanisms that underlie plant responses to environmental stimuli at the protein level with further identification of distinct substrates of proteasomal subunits and proteasome regulators, which will also help to provide practical strategies to increase crop tolerance to both biotic and abiotic stresses.Homologous or heterologous gene expressionbased genetic engineering can produce genotypes that perform better under environmental stress (Han et al., 2019).It may be attempted to boost stress tolerance in crops including rice, maize, sorghum, and wheat by overexpressing particular UPS components that act as positive regulators of various types of stress tolerance.

Conclusion
In the current study, the 10 SbPA and 10 SbPB genes of the 20S proteasome were discovered which serves as genetic tools for functional analysis of 20S proteasome in S. bicolor.Orthology of SbPA and SbPB genes with rice was inferred and identified.The proteins that the SbPA and SbPB genes encode have a full-length 3D model and are capable of imparting distinct proteolytic and biological functions to the 20S complex.Under typical abiotic stress, it was discovered that various S. bicolor organs expressed a number of SbPA and SbPB genes.In response to abiotic stressors and in the development of many plant organs, the 20S proteasome gene is crucial.In this approach, the current study offers a wealth of knowledge that can be applied to the creation of S. bicolor cultivars that are climatically adaptable.

FIGURE 2
FIGURE 2 Structure of SbPA and SbPB genes of Sorghum bicolor showing distribution of exons (yellow solid bars), introns (black lines), upstream/downstream regions (solid green bars), and intron phases marked as 0, 1 and 2. This figure also represents the conserved motifs identified in SbPA and SbPB proteins.

FIGURE 3
FIGURE 3Distribution of 20 SbPA and SbPB genes on 20 chromosomes of Sorghum bicolor belonging to 1-10 chromosomes.On each chromosome, gene names are given on the upper side and their physical positions in megabases (Mb) are indicated on the left.

FIGURE 4
FIGURE 4Promoter structure prediction in S. bicolor.Different colors depict the presence of identified cis regulatory elements.
distinct a-type (1-7) or b-type (1-7) domain is present in each of the 20 SbPA and SbPB proteins.Ten of these SbPA/SbPB proteins have a single a/b-type domain (Supplementary Table

FIGURE 5
FIGURE 5 FIGURE 7 (A) Tissue-specific, (B) development-specific expression profile of SbPA and SbPB genes in S. bicolor under normal conditions, and (C) under abiotic stress.The expression data are represented in the form of fold values.

TABLE 1
Detailed information about genes, cDNA, and CDS sequences for aand b-subunits of 20S proteasome in S. bicolor and rice.

TABLE 2
Putative miRNAs involved in post-transcriptional regulation of 20S proteasomal genes in S. bicolor.