Characterization of host substrates of SARS-CoV-2 main protease

The main protease (Mpro) plays a crucial role in coronavirus, as it cleaves viral polyproteins and host cellular proteins to ensure successful replication. In this review, we discuss the preference in the recognition sequence of Mpro based on sequence-based studies and structural information and highlight the recent advances in computational and experimental approaches that have aided in discovering novel Mpro substrates. In addition, we provide an overview of the current understanding of Mpro host substrates and their implications for viral replication and pathogenesis. As Mpro has emerged as a promising target for the development of antiviral drugs, further insight into its substrate specificity may contribute to the design of specific inhibitors.


Introduction
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the causative agent of the coronavirus disease 2019 (COVID-19) pandemic, is a positive-sense single-stranded RNA virus that utilizes its two cysteine proteases, nsp3/papain-like protease (PL pro ), and nsp5/3chymotrypsin-like protease (3CL pro ), to cleave its polyproteins into functional viral proteins required for virus replication (Koudelka et al., 2021;Sabbah et al., 2021).Nsp3 cleaves three distinct sites of nsp1-nsp4, while nsp5 cleaves 11 distinct sites of nsp5-nsp16; thereby nsp5 is also referred to as the main protease (M pro ).M pro is a conserved protease in the family Coronaviridae (Ullrich and Nitsche, 2020;Xiong et al., 2021).The mature M pro is a dimeric cysteine protease and its catalytic dyad is formed by His41 and Cys145 (Ullrich and Nitsche, 2020;Hu et al., 2022).Besides viral polyproteins, viral proteases likewise cleave host proteins to hinder host immune responses and promote viral replication (Pablos et al., 2021).In this review, we first address the substrate specificity and further analyze the implication of M pro cleavage on host substrates in various biological processes.

Substrate specificity of SARS-CoV-2 M pro
The substrate specificity of SARS-CoV M pro has been previously investigated.The recombinant protein substrates with saturation mutagenesis at each of the P5 to P3' positions were used to profile the sequence preference of M pro substrates (Chuck et al., 2010).In addition, the 11 autoproteolytic cleavage site sequences in SARS-CoV-2 pp1ab and host substrates were applied to analyze the sequence logo of the cleavage site.Thus far, the consensus sequence motif of M pro substrates is recognized as (L/F/M)-Q↓(S/ A/G/N), where ↓ is the cleavage site.In brief, this motif is composed of a conserved P1 residue Gln flanked by a hydrophobic (Leu, Phe, or Val) at P2 and a small aliphatic amino acid (Ser, Asn, Gly, or Ala) at P1' positions (Miczi et al., 2020;Koudelka et al., 2021;Moustaqil et al., 2021;Pablos et al., 2021;Zhang et al., 2021).The P1, P2, and P1' residues are important to determine substrate specificity, whereas the less conserved P3, P4, and P3' residues increase the recognition and binding stability of the substrates (Hu et al., 2022).P3 and P3' positions prefer positively charged residues to negatively charged ones (Chuck et al., 2010).Although M pro primarily prefers Gln, it has also been found to recognize non-canonical Met or His at the P1 residue (Koudelka et al., 2021;Pablos et al., 2021).The identification of new substrate sequences can aid in the design of specific inhibitors that can target M pro activity with higher affinity and selectivity.

Identification of host substrates
Computational and experimental methods are widely used for substrate identification.For computational methods, NetCorona 1.0, a publicly available web server originally designed to predict putative SARS-CoV M pro cleavage sites, has been commonly used for identifying SARS-CoV-2 M pro substrates (with a suggested threshold of 0.5), since the sequence of SARS-CoV-2 M pro shares 96% identity with that of SARS-CoV M pro (Miczi et al., 2020;Zhang et al., 2021;Scott et al., 2022).Another approach is to search for short stretches of homologous human-pathogen protein sequences (SSHHPS) using BLAST analysis, which is based on the principle that the cleavage site sequences found in the viral genome are identical to the cleavage sites on host cell substrates (Miczi et al., 2020).As to experimental methods, a commonly used screening procedure is the liquid chromatographymass spectrometry (LC-MS)-based terminal amine isotopic labeling of substrates (TAILS) that not only identifies substrates but also their corresponding cleavage sites (Koudelka et al., 2021;Meyer et al., 2021;Pablos et al., 2021).Besides, Moustaqil et al. (2021) screened 71 human innate immune pathway proteins (HIIPs) using the cell-free Leishmania tarentolae protein expression system, which allows the direct visualization in SDS-PAGE of the target protein fused to GFP.
Table 1 lists the host proteins that have been identified as potential substrates for SARS-CoV-2 M pro through computational or experimental methods, and further supported by the detection of cleaved products.Among the identified substrates, five proteins have available structure data in Protein Data Bank (PDB), while the rest were predicted by AlphaFold (Table 1).Through analysis of the structure information, we observed that the cleavage sites are commonly located in loops or loops connected to α-helixes or β-sheets (Figure 1), suggesting that most of the target sequences are accessible to M pro .This implies that in addition to the prediction of cleavage sequences, structural analysis is also important for evaluation of the accessibility of putative cleavage sites (Miczi et al., 2020;Moustaqil et al., 2021).

Biological functions of substrates
Research on exploring the functional consequences of M pro cleavage on host proteins is still ongoing.It is important to note that host proteins serve multiple functions, and their dysfunction may have implications for more than one biological process.The implications of M pro cleavage, according to published information or the known biological function of the substrates, are discussed below.

Innate immune response
The innate immune system releases inflammatory cytokines and chemokines as an immediate defense against invading pathogens.However, viruses can manipulate the innate immune response to evade the host's antiviral defenses (Diamond and Kanneganti, 2022).M pro was discovered to cleave interleukin-1 receptor-associated kinase 1 (IRAK1), a kinase involved in the regulation of the innate immune response (Miczi et al., 2020).Several viruses such as porcine epidemic diarrhea virus and borna disease virus 1 target IRAK1 to block IRAK1/TRAF6/NF-κB signaling pathway activation, consequently reducing the expression of the IFN-III subtypes, IFN-λ1, and -λ3 (Zhang et al., 2019;Zheng et al., 2022).Notably, inhibition of IRAK1 using pacritinib had effectively attenuated the pro-inflammatory cytokine release triggered by the GU-rich ssRNA sequence derived from the SARS-CoV-2 spike protein (Campbell et al., 2023).Similarly, the SARS-CoV-2 M pro cleavage of the TAK1 binding protein (TAB1) results in decreased TAB1 protein levels in virus-infected cells and is proposed to inhibit cytokine production by disrupting the interaction between TAB1 and the transforming growth factor-β-activated kinase 1 (TAK1), which is necessary for constitutive activation of NF-κB (Jackson-Bernitsas et al., 2007;Moustaqil et al., 2021;Pablos et al., 2021).mRNA-decapping enzyme 1A (DCP1A), one of the interferonstimulated genes (ISGs), was recently identified as an M pro substrate (Song et al., 2023).Cleavage of DCP1A by porcine deltacoronavirus M pro has been demonstrated to decrease antiviral activity (Zhu et al., 2020).It is conceivable that SARS-CoV-2 M pro cleaves IRAK1, TAB1, and DCP1A to disturb the production of pro-inflammatory cytokines and attenuate the immune defense (Miczi et al., 2020).
On the other hand, hyperinflammation, characterized by cytokine storm, is a significant contributor to severe cases of COVID-19 (Diamond and Kanneganti, 2022).SARS-CoV-2 M pro specifically cleaved Nod-like receptor protein 12 (NLRP12), as evidenced by significant reductions of NLRP12 protein levels in SARS-CoV-2 infected cells (Moustaqil et al., 2021).Its cleavage is proposed to enhance pro-inflammatory cytokine and chemokine production via NF-κB signaling, and perturb the NLRP3 inflammasome assembly to trigger the cleavage of pro-caspase-1, thereby enhancing the release of IL-1β, all associated with the hyperinflammation observed in severe COVID-19.Another ISG cleaved by M pro is the solute carrier family 25 member 22 (SLC25A22; Zhang et al., 2021).Knockout of SLC25A22, a mitochondrial glutamate carrier, has been associated with decreased immunosuppressive function in colorectal cancer (Yoo et al., 2020;Zhou et al., 2021), implying its involvement in immune response activation.
Fas-associated factor 1 (FAF1) is a positive regulator of type I interferon (IFN) signaling and is involved in the activation of the Fas-mediated pathway of apoptosis.However, there are contrasting results on the role of FAF1 in regulating the antiviral immune response.FAF1 is suggested to reduce virus-induced type I IFN activation by inhibiting nuclear translocation of the transcription factor IRF3 (Song et al., 2016).In contrast, FAF1 is hypothesized to  immunity (Dai et al., 2018).More studies are needed to confirm the role of FAF1 cleavage in virus infection.

Transcription and translation
Viruses can affect host gene expression at the transcriptional level.In addition, since viruses lack functional ribosomes, they attempt to usurp the host's translational apparatus by competing with cellular mRNA to achieve successful replication.For instance, RNA polymerase II-associated protein 1 (RPAP1), which is crucial to bridging RNA polymerase II with gene-enhancer elements to increase transcription, and the polypyrimidine tract-binding protein (PTBP1), essential for pre-mRNA splicing and mRNA export, are both cleaved by M pro .Proteolysis of PTBP1 after SARS-CoV-2 infection leads to the redistribution of PTBP1 from the nucleus to the cytoplasm (Pablos et al., 2021).In polioviruses, proteolysis of PTBP1 is speculated to switch viral translation to replication (Back et al., 2002).Thus, M pro might target RPAP1 and PTBP1 to divert transcription and translation machineries from host to virus.
Pinin (PNN), a multifunctional nuclear phosphoprotein involved in the regulation of transcription and alternative RNA splicing, has also been identified as a substrate of M pro (Meyer et al., 2021).Depletion of PNN has been demonstrated to result in apoptosis in vitro and early lethality in vivo (Leu et al., 2012).Furthermore, PNN binds to the transcriptional co-repressor C-terminal binding protein 1 (CTBP1).The interaction of PNN and CTBP1 alters CTBP1 silencing function (Alpatov et al., 2004).The overlapping pathways enriched in PNN-KD and CTBP1-KD cells include the TNFα-induced canonical NFκB signaling pathway and the IFN response pathway (Zhang et al., 2016).CTBP1-mutated neuronal cells were more susceptible to West Nile virus than control cells, consistent with the lower expression of IFN-response genes in CTBP1-mutated cells (Vijayalingam et al., 2020).Cleavage of PNN and CTBP1 by M pro is suggested to alter the transcription of host antiviral response genes and induce apoptosis (Miczi et al., 2020;Meyer et al., 2021).Furthermore, M pro cleaves Histone deacetylase 2 (HDAC2), which primarily regulates gene transcription by modifying histones and is also required for ISG transcriptional elongation (Chang et al., 2004).In consequence, the cleavage of HDAC2 by M pro results in the impairment of ISG expression (Song et al., 2023).
Yes-associated protein 1 (YAP1), a transcriptional co-activator, participates in Hippo pathway.Since YAP negatively regulated an antiviral immune response via inhibiting the translocation of IRF3 to the nucleus, cleavage of YAP1 is presumed to enhance innate immunity (Wang et al., 2017).The kinase activity of mitogen-activated kinase-kinase-kinase-kinase 5 (MAP4K5), another Hippo pathway regulator, can be inactivated by M pro cleavage.cAMP response element binding protein 1 (CREB1) is a transcription factor that dimerizes with ATF1 to regulate the transcription of anti-apoptotic and cell proliferation genes.Besides, CREB1 binds YAP1 and forms a positive feedback loop with each other (Chen et al., 2018).M pro cleavages of YAP1, MAP4K5, and CREB1 indicate that SARS-CoV-2 can hijack the Hippo-YAP signaling pathway (Pablos et al., 2021) for mediating a variety of cellular processes, including cell proliferation, differentiation, apoptosis, and immune response.

Apoptosis and autophagy
To maintain homeostasis, cells undergo two types of programmed cell death (PCD)-apoptosis and autophagy (Kennedy, 2015).Inhibition of these PCDs by SARS-CoV-2 aids the virus to avoid elimination in the cells and ensure viable cells for viral replication, while induction may benefit the virus by the regulation of immune response and virus release (Li et al., 2020(Li et al., , 2021(Li et al., , 2022)).Moreover, SARS-CoV-2 exploits autophagy to prevent virus degradation (Chen et al., 2020).Several proteins involved in apoptosis and autophagy have been identified to be targeted by M pro .
Baculoviral IAP repeat-containing protein 6 (BIRC6) functions as an inhibitor of apoptosis and autophagy by ubiquitinating pro-apoptotic factors and LC3B, leading to their proteasomal degradation (Ehrmann et al., 2022).M pro cleavage of BIRC6 may promote apoptosis and autophagy, in line with the induction of apoptosis and autophagy upon SARS-CoV-2 infection (Li et al., 2020(Li et al., , 2021)).Transactive response DNA binding protein 43 kDa (TDP-43) is critical in RNA regulation, including the expression of viral RNA (reviewed in Rahic et al., 2023).Cleavage of TDP-43 by M pro induced cytotoxicity in neurons, which could contribute to the pathogenicity of SARS-CoV-2 in the nervous system (Yang et al., 2023).
Galectin-8 (LGALS8) is involved in the regulation of immune responses and directly binds to Spike S1 glycans and the autophagy adaptor NDP52 (Pablos et al., 2021).LGALS8 is proposed to sense the glycosylated Spike S1 protein and activate xenophagy, a type of selective autophagy targeting invading pathogens to lysosomes, to reduce SARS-CoV-2 infection (Pablos et al., 2021).Furthermore, the autophagy adaptor protein FYVE and the coiled coil domain containing 1 (FYCO1) has been identified as a candidate COVID-19 susceptibility and severity gene and is believed to be the key mediator that connects double-membrane vesicles (the main site of coronavirus replication) from the endoplasmic reticulum to the microtubule network in host cells (Reggiori et al., 2011;Parkinson et al., 2020;Lee et al., 2021;Jahanafrooz et al., 2022).The elimination of FYCO1 resulted in the accumulation of early autophagosomes (Pankiv et al., 2010).M pro cleavage of LGALS8 and FYCO1 possibly enables SARS-CoV-2 to escape antiviral xenophagy (Pablos et al., 2021) and induce incomplete autophagy.

Cell metabolism
SARS-CoV-2 infection alters host cell metabolism (Andrade Silva et al., 2021;Mullen et al., 2021).In fact, proteins that play roles in cell metabolism were found to be substrates of M pro .Cleavage of Ring finger protein 20 (RNF20) destabilizes the RNF20/RNF40 complex, which is essential for their ubiquitin E3 ligase activity.As a result, this blocks the degradation of the sterol regulatory element binding protein 1 (SREBP1), and subsequently increasing the lipid metabolism for promoting SARS-CoV-2 replication (Zhang et al., 2021).
Phosphoribosylaminoimidazole succinocarboxamide synthetase (PAICS), a de novo purine biosynthetic enzyme was previously identified to be crucial in influenza virus replication (Karlas et al., 2010;Generous et al., 2014).PAICS is proposed to be a candidate for a noncanonical route for SARS-CoV-2 infection in human placentas (Constantino et al., 2021).SARS-CoV-2 infection has been reported to promote de novo purine synthesis through nsp9 (Qin et al., 2022).Silencing of PAICS reduced virus titers (~10-fold), suggesting that cleavage of PAICS by M pro results in altered function of PAICS (Meyer et al., 2021), which may influence the de novo purine synthesis.
Insulin receptor substrate 2 (IRS2) regulates insulin signaling and the control of glucose homeostasis.Hepatitis C virus infection downregulates IRS2 expression by upregulating the suppressor of cytokine signaling (SOCS) and by activating the mTOR/S6K1 signaling pathway, resulting in insulin resistance (Kawaguchi et al., 2004;Pazienza et al., 2007;Bose et al., 2012).Notably, new-onset hyperglycemia has been associated with SARS-CoV-2 because non-diabetic COVID-19 patients were found to have increased risk of insulin resistance (Chen et al., 2021;Wihandani et al., 2023), which may be associated with M pro cleavage of IRS2 (Pablos et al., 2021).

Intracellular transport and cytoskeleton
The intracellular transport system and cytoskeletons are essential for viral infections, particularly for transporting viral components to specific subcellular compartment sites of translation, replication, and secretion.The Golgi apparatus is an integral component of the viral life cycle.SARS-CoV-2 remodels the Golgi structure for viral release, hence, M pro cleavage of Golgin subfamily A member 3 (GOLGA3), which is involved in the organization of the Golgi apparatus and its associated vesicles (Meyer et al., 2021;Pablos et al., 2021), may also be linked to this modulation (Zhang et al., 2022).Moreover, GOLGA3 has been associated with COVID-19 and has been identified to interact with nsp13 (Gordon et al., 2020;Deng et al., 2021).M pro cleavage of GOLGA3 may play a role in reconfiguring the endoplasmic reticulum to facilitate Golgi trafficking during virus assembly.
Although RNA viruses replicate in the cytoplasm, they also exploit the nucleocytoplasmic trafficking system to inhibit the host immune response (Sajidah et al., 2021), which may explain why SARS-CoV-2 M pro cleaves the nuclear pore complex 107 kDa subunit (NUP107) and Importin subunit alpha-4 (IMA4), which are both important members of nuclear pore transport (Pablos et al., 2021).IMA4, also known as karyopherin subunit alpha-3 (KPNA3), has been shown to be targeted by the Japanese encephalitis virus NS5 protein to hinder the nuclear import of its cargo molecules IFN regulatory factor 3 and NF-κB, thereby subsequently inhibiting type 1 IFN production (Ye et al., 2017).
Septin (SEPT) is recognized as a component of the cytoskeleton (Mostowy and Cossart, 2012).Septin polymerizes into filaments at the cell cortex or in association with other cytoskeletal proteins, such as actin or microtubules.M pro cleaves several septin proteins, including SEPT2, SEPT6, and SEPT9, to affect the septin complex, causing an unstable filament structure and inducing cilia dysfunction (Lee et al., 2023).

Discussion
With the help of computational and experimental methods, scientists have gained valuable insights into the substrates of M pro .NetCorona analysis is widely used for substrate prediction.Intriguingly, some of the identified substrates have low NetCorona scores (Table 1), implying other issues should be considered.Further information, like binding affinity, may improve the original algorithm.The steric effects on substrate specificity also play an important role for the assessment.Notably, the cleavage sites of HDAC2 and PAICS are buried in the structure, warranting further study regarding the mechanism of M pro cleavage of these two proteins.Deep learning of sequencedbased prediction and structural analysis can likewise improve the accuracy of prediction.
Identification of viral host substrates helps determine specific virus-host interactions, including the cellular pathways involved, and the mechanisms of viral replication and pathogenesis.Consequently, researchers can gain valuable insights into how viruses cause diseases and develop strategies to control or treat viral infections.After COVID-19 infection, certain individuals developed post-acute sequelae of SARS-CoV-2 infection (PASC), known as long COVID.The persistence of viral RNA or proteins for weeks in these patients implies the presence of an impaired immune response.Exploring potential role of M pro in this aspect would be valuable.Besides, identifying the specific sequences of host substrates targeted by M pro can have significant implications in developing peptidomimetic protease inhibitors.Discovering new substrate sequences can enhance the design of effective antiviral strategies.Continued research is essential to improve our understanding of M pro function and develop potent antiviral therapies against coronaviruses.

FIGURE 1
FIGURE 1Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) M pro cleavage sites in selected target proteins.The proteins are depicted along with the corresponding PDB ID, except for FAF1, which is predicted by AlphaFold.The predicted cleavage sequences (yellow) are shown, with P1 and P5 residues, and an asterisk denoting the P1-Gln residue.