Insights Into the Phylogenetic Distribution, Diversity, Structural Attributes, and Substrate Specificity of Putative Cyanobacterial Orthocaspases

The functionality of caspase homologs in prokaryotic cell execution has been perceived, yet the dimensions of their metabolic pertinence are still cryptic. Here, a detailed in silico study on putative cyanobacterial caspase homologs, termed orthocaspases, in a sequenced genome of 132 strains was performed. We observed that 473 putative orthocaspases were distributed among 62% cyanobacterial strains subsumed within all the taxonomical orders. However, high diversity among these orthocaspases was also evident as the conventional histidine–cysteine (HC) dyad was present only in 72.03% of orthocaspases (wild-type), whereas the rest 28.18% were pseudo-variants having substituted the catalytic dyad. Besides, the presence of various accessory functional domains with Peptidase C14 probably suggested the multifunctionality of the orthocaspases. Moreover, the early origin and emergence of wild-type orthocaspases were conferred by their presence in Gloeobacter; however, the complex phylogeny displayed by these caspase-homologs perhaps suggested horizontal a gene transfer for their acquisition. However, morpho-physiological advancements and larger genome size favored the acquisition of orthocaspases. Moreover, the conserved caspase hemoglobinase fold not only in the wild-type but also in the pseudo-orthocaspases in Nostoc sp. PCC 7120 ascertained the least effect of catalytic motifs in the protein tertiary structure. Further, the 100-ns molecular dynamic simulation and molecular mechanics/generalized born surface area exhibited stable binding of arginylarginine dipeptide with wild-type orthocaspase of Nostoc sp. PCC 7120, displaying arginine-P1 specificity of wild-type orthocaspases. This study deciphered the distribution, diversity, domain architecture, structure, and basic substrate specificity of putative cyanobacterial orthocaspases, which may aid in functional investigations in the future.


INTRODUCTION
Caspases are specialized cysteine-aspartate proteases, recognized to mediate apoptosis in metazoans. These peptidases are synthesized as inactive proenzymes, which upon activation attain dimeric configuration, having p20 and p10 sub-domains in each monomer (Earnshaw et al., 1999;Fuentes-Prior and Salvesen, 2004). The proteolytic activity of caspases is conferred by the p20subunit that harbors a critical histidine-cysteine (HC) dyad in its active site, which cleaves target proteins after aspartate (D) residue, while the p10 subunit has a regulatory role. Although the caspases are generally harbored by the metazoans, its homologs are abundantly found in lower life forms. Pioneering studies revealed the presence of caspase homologs not only in single-celled eukaryotes (Uren et al., 2000) but also in bacteria , portraying the possibility of caspase-homolog-dependent PCD in these life forms.
Caspase homologs are a structurally diverse group of proteases, belonging to the Peptidase C14 family which also subsumed the archetype caspases. This family of proteases is characterized by the presence of a caspase-hemoglobinase fold (CHF) composed of four parallel β-sheets surrounded by three α-helices . Further, these homologs are broadly classified into paracaspases (PCAs) (Jaworski and Thome, 2016), metacaspases (MCAs) (Tsiatsiani et al., 2011;Minina et al., 2017), and OCAs (Klemenčič and Funk, 2018), based on their protein architecture. While PCAs and OCAs are devoid of p10-like subunit, the MCAs consist of both the sub-domain homologs. Although the catalytic HC dyad was conserved among caspase homologs, the aspartate P1 cleavage specificity annotated for all the Peptidase C14 members was absent in them. PCAs are arginine (R)-specific Ca 2+independent dimeric proteases (Hachmann et al., 2012), whereas MCAs are Ca 2+ -dependent monomers which specifically cleave after either R or lysine (K) residues (Hander et al., 2019). Additionally, the OCAs are speculated to have specificity for the basic residues but their detailed structural and functional aspects are yet to be unveiled (Minina et al., 2020).Owning to the significance of PCD in prokaryotes, a metagenomic study in the Baltic Sea exhibited the occurrence of Peptidase C14-containing proteins only among 4% of all the bacterial population (Asplund-Samuelsson et al., 2016). This 4% of bacteria mostly subsumed the members of actinomycetes, α-, β-, and γ-proteobacteria, and cyanobacteria Asplund-Samuelsson et al., 2012). Most of the Peptidase C14-containing proteins in cyanobacteria lacked the p10-like sub-domain (Choi and Berges, 2013) and therefore considered as the OCAs, whereas only 4% accounted for putative MCAs (Klemenčič et al., 2019). An initial study on 33 whole-genome-sequenced cyanobacterial strains depicted the uneven distribution of OCAs, as complex strains with larger genome favored to acquire them (Jiang et al., 2010). Moreover, the substrate specificity of putative OCA, MaOC1, for RR dipeptide and Ca 2+ -independent autocatalysis was also demonstrated in M. aeruginosa PCC 7806 (Klemenčič et al., 2015). However, the identification of various pseudo-variants of caspase homologs lacking the functional HC dyad increased the obscure underlining of the dimension of their functionalities. In Trypanosoma brucei, the catalytic HC dyad was substituted by the YS (tyrosine-serine) motif in MCAs TbMC1 and TbMC4, probably rendering their proteolytic activity (Proto et al., 2011). The abundance of pseudo-OCAs in cyanobacteria having substituted catalytic motifs may suggest a wider applicability of these proteases (Klemenčič et al., 2019).
This study focuses on the functionality of wild-type OCAs (OCA having an HC catalytic dyad) and their physiological congruity in cyanobacteria. Here a comprehensive bioinformatics study has been conducted to describe the distribution, diversity, abundance, domain configuration, and phylogeny of 473 putative cyanobacterial OCAs. Additionally, the structural analysis of a wild-type OCA from Nostoc sp. PCC 7120 was performed using homology modeling and compared elaborately with their pseudo-counterparts. Further, docking and molecular dynamic (MD) simulation of the protein-substrate complex was performed to ensure the substrate specificity of this OCA. In addition, the binding free energy ( G bind ) calculated using the molecular mechanics/generalized born surface area (MMGBSA) also assured the stability of the complex. This study will provide an updated and holistic view on the attributes of putative OCAs and may pave new ways to answer the existence of OCA-mediated PCD in cyanobacteria.

Retrieving the Sequences of Putative Cyanobacterial OCAs
The protein sequences of putative cyanobacterial OCAs were retrieved from the NCBI database 1 . The p20-like sub-domain (Peptidase C14)-containing protein sequences in 132 whole-genome-sequenced cyanobacterial strains were identified by performing domain enhanced lookup time accelerated (DELTA) BLAST (Boratyn et al., 2012), using 4AFR (MCA 2) of Trypanosoma brucei (Q585F3.1) as the query. Sequences with incomplete Peptidase C14, either at the N-or C-terminal, or lacking the domain, were manually eliminated. Finally, 473 putative OCA sequences from 82 cyanobacterial strains were further analyzed.
Analyzing the Diversity, Distribution, and Abundance of Putative Cyanobacterial OCAs All the protein sequences were manually analyzed and subsequently categorized into true (wild-type) and pseudo (mutated) OCAs, based on the catalytic dyad forming the active site. Only those proteins with classical HC dyad were considered as wild-type, whereas proteins having any other amino acid residues in the catalytic dyad were regarded as pseudo-OCAs. Further, the distribution and abundance of all OCA subtypes (true and pseudo OCAs) along the taxonomical order of these 82 cyanobacterial strains were compared against their genome sizes and morphological complexities.

Domain Configuration of Putative Cyanobacterial OCAs
The full-length OCA sequences were subjected to CD search 2 (Marchler-Bauer et al., 2017) against the Pfam database (threshold E value = 0.01) (Mistry et al., 2021). CD-search outputs were analyzed, and only specific hits for conserved functional domains were considered since they were top-ranked hits and conferred very high confidence, while superfamily hits were averted. Additionally, transmembrane domains (TM) were identified using TMHMM v2.0 3 (Krogh et al., 2001). The number of different accessory domains that occurred in 473 putative OCAs was determined; however, successive repeats of domains within the same protein were disregarded. The domain architecture of few OCAs with frequently appearing accessory domains was visualized by Illustrator of Biological Sequence (IBS) v1.0 (Liu et al., 2015). Domain configurations for all the OCAs are enlisted in Supplementary Table 1.

Multiple-Sequence Alignment and Phylogeny
Multiple-sequence alignment (MSA) and phylogeny were performed using only the Peptidase C14 domain of all the putative OCAs. Additional domains associated with the proteases were not considered to avoid any possible uncertainty due to their varied origin and evolutionary history. Four hundred seventy-three Peptidase C14 domains were aligned by MUSCLE (Edgar, 2004), in MEGA v7.0 using default parameters, i.e., gap open penalty = −2.9; gap extended penalty = 0; and hydrophobicity multiplier = 1.2 (Kumar et al., 2016). Subsequently, the features of these sequences were displayed as the Hidden Markov Model (HMM) logo using Skylign 4 (Wheeler et al., 2014).
A rooted phylogenetic tree comprising 473 Peptidase C14 sequences was constructed by the neighbor-joining method (Saitou and Nei, 1987) using MEGA v7.0. The MCAlike protein from Thalassiosira pseudonana CCMP1335 (XP_002295354.1) was used as the outgroup, since it is a marine diatom and possesses a closer evolutionary relationship with cyanobacteria (Jiang et al., 2010). The branch lengths were in the same units as those of the evolutionary distances. The evolutionary distances were calculated using the Jones-Taylor-Thornton (JTT) matrixbased method (Jones et al., 1992). The reliability of each branch was tested with 1,000 bootstrap replications. All positions containing gaps and missing data in the MSA file were eliminated during tree construction. Distinct clades were expanded and viewed separately for better inference (Supplementary Figures 1-4). The bootstrap value (above 50) and sequence details for each node were also provided in Supplementary Figures 1-4.

Homology Modeling and Structural Analysis
Phylogenetic analysis displayed YS and YN as the most prominent catalytic dyad substitutions, which otherwise constituted of HC in the wild-type OCAs. Therefore, the theoretical modeling was performed to delineate and compare the structures of wild-type (HC) and mutated (YS and YN) OCAs. Since Nostoc sp. PCC 7120 is a model cyanobacterium and not only harbored the wild-type (HC) OCAs but also consisted of other two abundant forms of pseudo OCAs, i.e., YS and YN types, therefore, three OCAs, i.e., WP_010999143.1 (HC), WP_010999263.1 (YS), and WP_010997820.1 (YN), of Nostoc sp. PCC 7120 were selected. Due to the lack of suitable templates, only the Peptidase C14 domains of WP_010999263.1 (YS) and WP_010997820.1 (YN) were modeled. High-quality homology models were constructed by Swiss-Model 5 using default settings (Guex et al., 2009;Benkert et al., 2011;Bertoni et al., 2017;Bienert et al., 2017;Waterhouse et al., 2018;Studer et al., 2020). Further, template-based refinement and energy minimization of models were done using ModRefiner 6 , for which ideal templates for each model with high resolution, sequence identity, and maximum query coverage were retrieved from Swiss-Model (Supplementary Table 2; Xu and Zhang, 2011). The final models were validated by Ramachandran plot using PROCHECK 7 (Laskowski et al., 1993) and ModFOLD8 8 (Maghrabi and McGuffin, 2017;Supplementary Table 3). Additionally, the folds and topologies of these OCAs were generated by PDBsum 9 (Laskowski, 2001). The properties of OCA models were analyzed and visualized by UCSF Chimera (Pettersen et al., 2004).
Analyzing the Substrate-Binding Efficacy of Wild-Type OCA Two oligopeptides, i.e., arginyl-arginine (RR) and aspartylglutamyl-valyl-aspartate (DEVD), were used to study the substrate specificity of wild-type OCA (HC; WP_010999143.1). While DEVD is a classical caspase substrate, RR was previously used as a substrate for cyanobacterial OCA, MaOC1 (Klemenčič et al., 2015). For the docking analysis, the substrate and the protease were treated as ligand and receptor, respectively. Docking was performed with Biovia Discovery Studio 2019 using the CDOCKER program . CDOCKER is a powerful grid-based molecular docking method that uses Chemistry at Harvard Macromolecular Mechanics (CHARMm). Here, the ligand is allowed to flex while the receptor is held rigid during the refinement. The docked complex was visualized using FIGURE 1 | Abundance, distribution, diversity of putative cyanobacterial orthocaspases among (A) Gloeobacteriales, Synechococcales, (B) Chroococcales, (C) Pleurocapsales, Chroococcidiopsidales, Oscillatoriales, and (D) Nostocales. Cyanobacterial strains subsumed within different taxonomical orders are also categorized into unicellular, filamentous, and heterocytous branches based on their morphology. The full forms of the abbreviated strain names are mentioned in Supplementary Table 4. UCSF Chimera 1.13.1, and LigPlot+ was used to identify the substrate-interacting residues (Laskowski and Swindells, 2011).

Simulation of the HC-RR Complex
The HC-RR complex was subjected to all-atom MD simulation using the Amber16 package (Case et al., 2016). The initial structure was prepared in the tleap module of Amber16, utilizing the ff14SB force field (Maier et al., 2015). The ligand forcefield parameters were generated using antechamber (Wang et al., 2006) and gaff (Wang et al., 2004); also, the AM1-bcc method was used to generate charges (Jakalian et al., 2002). The simulation box was solvated by the TIP3P water model with a padding distance of 12 Å (Jorgensen et al., 1983), and NaCl was used as counter-ion in neutralizing the total charge of the system. Joung and Cheatham ion parameters were used for Na + and Cl − ions (Joung and Cheatham, 2008). The electrostatic interaction was calculated by the particle mesh Ewald (PME) method (Darden et al., 1993). Further, the SHAKE algorithm was used to restrain the binds involving hydrogen (Ryckaert et al., 1977) while the Langevin dynamics  and Berendsen barostat (Berendsen et al., 1984) were utilized for temperature and pressure control, respectively.
To minimize the energy of the complex, initially, the HC-RR complex was restrained and only the solvent was minimized. Thereafter, the restraint potential was removed from the complex and the entire system was minimized. Following the minimization, gradual heating of the systems was performed at NVT with a 100-kcal restraint potential on the complex, and the temperature was increased to 300K in 300 ps. After heating, each system was subjected to equilibration for 1.8 ns in six cycles of 300 ps in which the restraint potential was subsequently reduced from 100 to 0 kcal/mol (100, 50, 20, 5, 0.5, 0 kcal/mol) at NPT. After equilibration, a production run of 100 ns was performed each at the NPT ensemble. The trajectory was visualized by VMD 1.9.3 and UCSF Chimera 1.13.1.

Calculation of Binding Free Energy
The last 90-ns trajectory was used to calculate the binding free energy of the HC-RR complex ( G bind ) using the Molecular Mechanics Generalized Born Surface Area (MMGBSA) method implemented in Amber16. The calculation was performed at the dielectric constant of 1 and 80 for solute and solvent, respectively (Ylilauri and Pentikäinen, 2013). The parameters developed by Onfriev et al. was used for GBcalculation (igb=5; α=1.0, β=0.8. γ=4.85) (Onufriev et al., 2004). Additionally, entropic contribution ( S) was not taken into account for determining G bind . Other parameters were used as default.  (Figure 1). However, at least one copy of both wild-type and pseudo-OCAs was identified in all other strains. The abundance of OCA was estimated to be highest in Nostocales and Oscillatoriales with an average occurrence of 0.0921 and 0.0723 true OCA per 100 proteins, respectively. Besides, an abundance of 0.0416 true OCA per 100 proteins was observed in Chroococcales and the lowest abundance of 0.0145 true OCA per 100 proteins was estimated in Synechococcales. The average abundance of pseudo-OCAs was comparable for Nostocales (0.0349/100 proteins), Oscillatoriales (0.0264/100 proteins), and Chroococcales (0.0261/100 proteins), while the lowest abundance was recorded again for Synechococcales (0.0027/100 proteins) (Supplementary Table 4). The actual abundance of OCAs might obscure in Gloeobacterales, Pleurocapsales, and Chroococcidiopsidales due to the lesser number of whole-genome-sequenced cyanobacteria in them, yet OCAs harboring sequenced strains within these orders were examined.
Further, domain configuration revealed that the OCAs prevalently acquired a similar architecture. The common domain configurations attained by both wild-type and pseudo OCAs were classified into nine types (Figure 2A). In general, the Peptidase C14 domain was N-terminally located to the accessory domains in multidomain proteins (Figure 2); however, in Type V, the Peptidase C14 was located toward the C-terminal to HEAT_2 domain(s). Such architectures observed were WP_168163431.1 (Calothrix sp. 336/3), BAZ37947.1 (Calothrix sp. NIES 4101), WP_015119030.1 (Rivularia sp. PCC 7116), and WP_011611472.1 (T. erythraeum IMS101). Furthermore, it was observed that the OCAs of strains belonging to Nostocales consisted of all the domain configurations ( Figure 2B).

Multiple-Sequence Alignment and Phylogeny
The sequence alignment of all OCA sequences displayed substantial diversity. The HMM logo exhibited a relatively conserved region surrounding the catalytic dyad (Figure 3). The consensus regions flanking the catalytic residues were (Y/F/H),
The phylogenetic relationship between 473 putative cyanobacterial OCAs was determined by an NJ tree constructed using their Peptidase C14 domains. Wild-type Peptidase C14 with HC dyad formed nine clades, whereas YS and YN motifs containing pseudo-variants were accumulated into two distinct clusters. Sequences with any other substituted active site motifs were scattered all along the phylogenetic tree (Figure 4). A thorough assessment of individual clusters revealed that most of the HC clades consisted of sequences from distantly related strains with variable lifestyles and morphologies (unicellular to heterocytous), except HC V and HC VII clades, harboring only multicellular strains of Nostocales and Oscillatoriales (Supplementary Figures 1-3).

Structural Analysis of Cyanobacterial OCAs
Homology modeling of WP_010999143.1 (HC), WP_010999263.1 (YS), and WP_010997820.1 (YN) from Nostoc sp. PCC 7120 displayed the presence of typical CHF ( Figure 5A). Despite having different catalytic dyads, significant structural alterations among the three proteases were not evident. The core of these proteases comprised four parallel β-sheets, i.e., one N-terminal, two central, and one C-terminal sheets arranged in a 2-1-3-4 fashion, and three α-helices (one each after strands 1, 2, and between 3 and 4) ( Figure 5B). However, only in the Peptidase C14 domain of WP_010999143.1 (HC) did the third and fourth β-sheets precede the proximal (H) and distal (C) active sites, respectively. Besides, more hydrophilic residues (blue) were present on the surface of all the three OCAs, conferring them as soluble proteins ( Figure 5C).

Analyzing the Substrate Specificity of HC
Molecular docking of HC with RR and DEVD revealed the binding of only RR with HC ( Figure 6A), whereas no docked complex was formed between HC and DEVD (data not shown). The binding pocket for RR comprised L18, A19, L21, K23, A24, G80, D132, C133, Y173, A174, and E176 along with H81 and C134 (catalytic residues) (Figures 6B,C). Further, 100ns simulation and MMGBSA illustrated the stability of the HC-RR complex. Residual fluctuation of the HC-RR complex depicted an average RMSF value of 2.45 ± 0.13 Å, among which residues between 18 and 97 and between 107 and 135 recorded lower fluctuations ( Figure 7A). Most of the binding pocket residues were subsumed among these. Additionally, for 100-ns simulation the average RMSD of RR was 2.76 ± 0.54 Å, while the HC displayed two distinct trajectory: one from 0 to 50 ns with a lower average RMSD (3.43 ± 0.46 Å) and another from 51 to 100 ns with a higher average RMSD (5.21 ± 0.51 Å) ( Figure 7B). However, upon superimposing the average structures, only an RMSD of 0.89 Å was observed, suggesting no major conformational changes in protein topology ( Figure 8A). The presence of RR in the binding cavity of HC throughout the 100-ns simulation further conferred the specificity of HC for RR ( Figure 8B). Additionally, G bind (= −16.99 ± 5.99) of the HC-RR complex also exhibited high binding efficacy ( Table 2).  The optimal tree has the sum of branch length = 89.299. The tree is drawn to scale, with branch lengths in the same units as those of the evolutionary distances. The evolutionary distances were computed using the JTT matrix-based method and are in the units of the number of amino acid substitutions per site. The analysis involved 474 (473 cyanobacterial Peptidase C14 domain + 1 outgroup) sequences. All positions containing gaps and missing data were eliminated. Distinct clades are assigned based on the active site motifs in majority of the Peptidase C14 sequences. Eleven distinct clades were identified where HC catalytic dyads containing Peptidase C14 sequences formed nine whereas YS and YN catalytic motifs containing Peptidase C14 sequences formed two respective clades. Peptidase sequences with other substitution at catalytic motifs are scattered along the tree.

Rich Pool of Cyanobacterial OCAs Having Considerable Diversity and Sharing a Complex Phylogenetic Relationship
The distribution of putative OCAs among cyanobacteria exhibited a clear relation with larger genome size and morphological complexities (Kharwar et al., 2021), since the Oscillatoriales and Nostocales abundantly harbored them. This assumption was also supported by the absence of OCAs in the majority of the unicellular members of Synechococcales and their presence in the filamentous ones. However, the occurrence of considerable putative OCA genes among the Chroococcales, Pleurocapsales, and Chroococcidiopsidales suggested the existence of at least one more factor apart from genome size and morphology, favoring the prevalence of these genes. For example, the probable existence of OCA activity in colonies of Microcystis may confer advantages upon stress through altruistic adaptation, thereby enhancing inclusive fitness of the population (Durand et al., 2016). Further, the early origin of OCAs among cyanobacteria was depicted by the presence of these genes in the primitive cyanobacterium, Gloeobacter (Asplund-Samuelsson et al., 2012). Therefore, the loss of OCAs largely in the monophyletic clades of unicellular marine cyanobacteria Prochlorococcus, Synechococcus, and Synechocystis might be a cumulative consequence of extensive genome streamlining and solitary lifestyle (Ran et al., 2010;Larsson et al., 2011). On the other hand, the higher abundance of OCAs in the marine cyanobacterium Trichodesmium erythraeum IMS101 not only indicated the varied lifestyle preferences among cyanobacteria despite sharing the same habitat but also suggested lesser contribution of habitat in favoring the acquisition of these proteases. The occurrence of more OCAs in the mesophilic Leptolyngbya as compared to the thermophilic counterpart (Thermoleptolyngbya sp. PKUAC-SCTA183), further strengthened the previous assumption.
The rich pool of putative OCAs in cyanobacteria also displayed significant diversity based on the active site motif. While ∼72% accounted for wild-type OCAs (having conventional HC dyad) the rest ∼28% was pseudo-OCAs. In general, these pseudo-OCAs in cyanobacteria are considered as catalytically inert due to lack of functional HC motif in the active (C) Hydrophobicity surface view of modeled orthocaspases, where blue and orange colors represented hydrophilic and hydrophobic residues, respectively. Note that for WP_010997820.1 (YN) and WP_010999263.1 (YS), only the Peptidase C14 domain has been modeled due to template unavailability, while the full-length homology model was constructed for WP_010999143.1 (HC).
site, but in Trypanosoma brucei the role of TbMC4 (MCA having YS catalytic dyad) in the cell cycle progression and virulence warranted their activity in cellular processes (Proto et al., 2011). A recent in silico study also proposed the putative role of these pseudo-enzymes in photosynthesis (Klemenčič et al., 2019). Additionally, the presence of only true OCAs in Gloeobacter possibly indicated the origin of these pseudo-OCAs from their wild-type relatives in cyanobacteria (Klemenčič et al., 2019).
Being a rich repertoire of functional domains, apart from Peptidase C14, cyanobacterial OCAs presumably display multifunctionality, since these domains possess diverse catalytic capacities like protein-protein interactions (WD40),   protein modification (FGE sulfatase), scaffold (HEAT_2), and signaling (GUN4). Mostly, these accessory domains are located C-terminally to Peptidase C14 but in some OCAs the C-terminally positioned Peptidase C14 domain was also observed, although these alterations in domain architecture may not have any functional amendments. In addition, the distribution of accessory domains was also uneven among cyanobacteria, since the OCAs from heterocytous strains with broader physiological abilities exhibited the presence of diverse accessory domains. The underlying reason is possibly the higher abundance of true OCAs in heterocytous strains. Additionally, the employment of these multidomain OCAs may benefit the physiologically robust life strategies. Furthermore, the presence of TM infers membrane localization of some OCAs, while others devoid of TM are probably cytosolic. The presence of putative cytosolic and membrane-bound true OCAs depicted the possibility of both intrinsic and extrinsic PCD in cyanobacteria (Bhattacharjee and Mishra, 2020), yet they are poorly appreciated in prokaryotes and require extensive research to validate.
MSA and phylogeny revealed significant sequence diversity and intermixing of cyanobacterial OCAs from different taxonomical orders. The wild-type OCAs were subsumed within nine distinct clades, whereas YS and YN motifs containing pseudo-variants formed two distinct clades, respectively; however, all other pseudo-OCAs were scattered throughout the tree. Mostly, the clades bracketed cyanobacterial strains with different morphological features and/or lifestyle preferences, suggesting the possibility of horizontal gene transfer (HGT) events for acquiring these genes (Bidle and Falkowski, 2004). Additionally, the appearance of several OCAs from the same strain in different clades may also indicate multiple HGT events (Bhattacharjee and Mishra, 2020). Interestingly, the YN pseudo-variant was only harbored by nitrogenfixing multicellular cyanobacteria, whereas the YS variant was harbored by unicellular to multicellular strains, thus insinuating the possible physiological advantages of the YN variant in multicellular nitrogen-fixing strains. Besides, the distinct clades for YS and YN perhaps imply some functional similarities of these pseudo-proteases, while the scattered distribution of other pseudo-OCAs presumably implied their independent acquisition in cyanobacteria (Klemenčič et al., 2019). Nonetheless, in the future with more sequences of the putative cyanobacterial OCAs, a clearer phylogenetic relationship could be unveiled.

Docking and MD Simulation Simultaneously Revealed Basic Substrate Specificity for Wild-Type OCA
The structural homology of three putative OCA variants with the HC, YS, and YN catalytic dyads is conferred by conserved CHF in their Peptidase C14 domain. Therefore, the substitution of catalytic motifs does not considerably amend the 3D conformation of these proteins (Klemenčič et al., 2019); however, the catalytic capabilities or substrate specificity will presumably alter. True OCAs preferably display substrate specificity for basic substrates due to critical negatively charged amino acid residues in their specificity pocket, as earlier observed in MaOC1 (Klemenčič et al., 2015). However, preference for basic substrates is common for all the characterized caspase homologs, till date. Expectedly, in our study, the putative wild-type OCA, WP_010999143.1 (HC), displayed high substrate affinity for the RR dipeptide, whereas no binding complex was formed with DEVD. Furthermore, the structural stability of the HC-RR complex was validated by analyzing the trajectory for the 100-ns MD simulation run. The negative binding energy calculated using MMGBSA also confirmed the binding efficacy of the RR dipeptide with the protease. The accommodation for the RR dipeptide was possibly determined by the two acidic residues, i.e., D132 and E176, in the binding pocket, thus providing Arg-P1 cleavage specificity to the protease. Similarly, in TbMC2 (MCA of T. brucei) D95 and D211 exhibited their significance in binding basic substrates (Bidle and Falkowski, 2004). Additionally, D353, E500, and D266 were previously identified as significant contributors for substrate-binding pockets in NtMC1 (MCA of Nicotiana tabacum L.), PCA MALT-1, and AtMC9 (MCA of Arabidopsis thaliana), respectively (Jong et al., 2011;Acosta-Maspons et al., 2014).

CONCLUSION
The identification of caspase homologs in prokaryotes advocated for the possibility of a protease-based cell death mechanism among these life forms. However, the involvement of all these caspase homologs in cell execution pathways is still uncertain. The present study exclusively describes the distribution, diversity, abundance, domain architectures, and phylogenetic relationship of putative cyanobacterial OCAs. Further, this work unraveled the conservation of CHF and Arg-P1 specificity of wild-type OCA in Nostoc sp. PCC 7120 through detailed structural analysis, molecular docking, and MD simulation. Although the wild-type cyanobacterial OCAs possessed conserved fold and catalytic dyad, the protease activity and substrate specificity cannot be corroborated without experimental validations. However, delineating the functionality of the OCAs in cyanobacteria, progenitor of oxygenic photosynthesis, is crucial for understanding not only the evolution and acquisition of caspase homologs in the plant kingdom but also the physiological pertinence of these proteases in PCD.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/ Supplementary Material, further inquiries can be directed to the corresponding author.

AUTHOR CONTRIBUTIONS
SB conceptualized the study and performed the computational analysis. SB and SK retrieved sequences from the database and performed the curation and analysis. AKM conceived and supervised the work. SB wrote the manuscript with contribution and thorough revision from SK and AKM. All authors contributed to the article and approved the submitted version.