RL-MLZerD: Multimeric protein docking using reinforcement learning

Numerous biological processes in a cell are carried out by protein complexes. To understand the molecular mechanisms of such processes, it is crucial to know the quaternary structures of the complexes. Although the structures of protein complexes have been determined by biophysical experiments at a rapid pace, there are still many important complex structures that are yet to be determined. To supplement experimental structure determination of complexes, many computational protein docking methods have been developed; however, most of these docking methods are designed only for docking with two chains. Here, we introduce a novel method, RL-MLZerD, which builds multiple protein complexes using reinforcement learning (RL). In RL-MLZerD a multi-chain assembly process is considered as a series of episodes of selecting and integrating pre-computed pairwise docking models in a RL framework. RL is effective in correctly selecting plausible pairwise models that fit well with other subunits in a complex. When tested on a benchmark dataset of protein complexes with three to five chains, RL-MLZerD showed better modeling performance than other existing multiple docking methods under different evaluation criteria, except against AlphaFold-Multimer in unbound docking. Also, it emerged that the docking order of multi-chain complexes can be naturally predicted by examining preferred paths of episodes in the RL computation.


Supplementary Material
Supplementary Combinations of four parameters used in RL were tested on three targets each for 3-chain, 4chain, and 5-chain complexes. The four parameters explored were the normalization factor (Eq. 2), the full reward, the reward given to a model that did not pass the Metropolis criterion, and the penalty for a model with too many clashes. We tested 2, 4, 6, and 8 for the normalization factor; 100, 50, and 10 for the full reward; 10, 5, and 1 for the reward to a rejected model; -2, -8 and -10 for the penalty. For each parameter combination, we recorded the best (smallest) RMSD among all the generated model. Then, in the table we reported the best and worst RMSD among all the parameter combinations as well as the average RMSD value. The majority reports the value that were selected for the largest number of times among the nine targets. Here we report the total number of models generated by RL-MLZerD for each target.

Supplementary
Supplementary Figure S1. The best RMSD within top 5 scored models relative to the secondary structure contents.
The best RMSD (Å) among the top 5 scored models were plotted relative to the percentage of residues in  helices,  strands, and loops. Thus, there are three data points for each target complex. The secondary structure was defined with DSSP (1). Residues classified as H, G, and I were considered as  helices, those classified as E and B were considered as  strands, and the rest were grouped as loop. There is no clear correlation between RMSD and the secondary structure of the structure. We investigated the effect of ions in modeling the complex structure. There are 13 protein complexes in our dataset that contain at least one ion. The complex structures were grouped into three; the group one includes structures with an ion at the interface (within 5 Å to heavy atoms from two or more chains in the complex); the second group are cases where an ion exist in the structure but not at the interface; and the third group are for structures with no ion. We evaluated statistical significance of difference of the RMSD value distribution. First, we considered group one and two, i.e. complexes that have ions at the interface and complexes with ions but not at the interface. We performed the Wilcoxon-Mann-Whitney (2) test using the overall best RMSD and the top 5 best RMSD. We used Wilcoxon-Mann-Whitney because the variance between the two groups was significantly different according to the Bartlett test (3) The p-value of the test was 0.14 and 0.26, respectively, for the overall best RMSD and the top 5 best RMSD. Therefore, the difference is not significant at p-value of 0.05.

Supplementary
Next, we compare group one and the rest of the dataset (groups two and three), i.e. complexes with an ion at the interface and the rest. We used t-test (4) this time because the variance assumption is met (by Bartlett test (3) the p-value was 0.24 and 0.76 for the overall best RMSD and the top 5 best RMSD, respectively). The p-value of the t-test was 0.22 and 0.63 for the overall best RMSD and the top 5 best RMSD, respectively. Based on these results, we concluded that ion presence does not significantly affect the modeling accuracy of RL-MLZerD. The best possible RMSD that can be generated from pairwise decoy set used was approximated as follows: We first selected the top 5 best RMSD models in the pairwise decoy pool of 1,000 decoys for each subunit pair and then generated all possible combinations of the top 5 best RMSD decoys and chose the complex with the best RMSD among them. The RMSD is reported in the column named the Best possible RMSD. When compared with the best RMSD actually generated by RL-MLZerD, the approximated best RMSD (or better) was identified among all the generated models for 36.7% (11/30) cases. For 83.3% (25/30) cases, a model within 1.0 Å RMSD was generated by RL-MLZerD with an average RMSD difference of 0.79 Å.

Supplementary
Supplementary Episodes of RL-MLZerD with the fixed probability were analyzed. The RMSD of the best possible RMSD that can be generated from pairwise decoy set is reported in the column named the Best possible RMSD. When compared with the best RMSD actually generated by RL-MLZerD in the unbound experiment, the approximated best RMSD (or better) was identified among all the generated models for 13.33% (4/30) cases. For 60.0% (18/30) cases, a model within 1.0 Å RMSD was generated by RL-MLZerD with an average RMSD difference of 1.67 Å.

Supplementary Information 1. Evidence of assemble order of the targets.
Only the proteins with assembly order information are discussed.

1A0R (3 chains)
Heterotrimeric complex of phosducin/transducin βγ (5). Phosducin (chain P) binds to the transducin β γ dimer (chains B and G) in a regulatory fashion. The association between the transducin β and γ subunits is very strong (5,6). Thus, the assembly order is BG>BGP. Evidence type: Biological inference is used to determine the interaction between transducin βγ (subunit B and G), Experimental results from (5, 6) supports the interaction of the transducin βγ and elucidate the impact of phosducin on the subcomplex. Hence the evidence of this assemble order is Biological inference (B) and Experimental evidence (E).

6GWJ (3 chains)
Fungal Gon7 (subunit D) was shown to be an intrinsic disordered protein that adopts a well-defined structure upon complex formation with (LAGE3) (subunit B) (7), therefore, subunit B and D form a complex first. Then, subunit K is assembled next as due to the linear arrangement of the five subunits (GON7, which is subunit D; LAGE3, which is subunit B; OSGEP, subunit K;) of the eukaryotic KEOPS complex in which the three subunits are a part. (8)(9)(10). Thus, the assembly order is BD> BDK. Evidence type: Experimental evidence from (7) highlights the complex formation between subunit B and D. Biological inference is then used to determine subunit K interaction with the rest of the subcomplex because of the linear nature of the interaction as shown in (8)(9)(10). Hence the evidence for this assemble is Biological inference (B) and Experimental evidence (E).

1VCB (3 chains)
The VHL-elonginC-elonginB structure (11). This complex is between elongin B and C (chains A and B) and von Hippel-Lindau (VHL) tumor suppressor (chain C). Elongin B and C separately have poor interaction with VHL (12). Thus, the assembly order is AB>ABC. Evidence type: Biological inference is used to determine subunit A and C subcomplex interaction because both are part of a larger complex. Experimental evidence from (12) discuss the implication of the VHL (subunit C) on subunit A and B. The authors further proposed a model of the complex where VHL is interacting with elongin-BC in absent of elongin-A. Finally due to the modeling order of the whole complex, we determine that the assembly evidence of the complex is Biological inference (B), Experimental evidence (E) and Model of assembly (M).

1A6A (3 chains)
The HLA class II histocompatibility membrane glycoprotein (alpha and beta, which are subunit A and B) forms a complex (13) and HLA class II histocompatibility gamma chain plays a critical role in MHC class II antigen (subunit C) processing by stabilizing peptide-free class II alpha/beta heterodimers (14). Thus, the assembly order is AB> ABC Evidence type: Biological Inference (B)

1IOD (3 chains)
The crystal structure of X-bp (Snaclec anticoagulant protein subunit A and B) in complex with the Gla domain peptide of factor X (subunit C) at a 2.3Å resolution. The two patches of the Gla domain essential for membrane binding are buried in the complex (subunit A and B) formation (15). Thus, the assembly order is AB> ABG. Evidence type: Biological Inference (B) because subunit A and B are from the same organism and (15) showed Factor X binding (subunit C) to form a complex thereafter.

4YX7 (3 chains)
Numerous studies have shown the SpaO homologues FliM (subunit A) and FliN (subunit B) to form a robust, stable ring (the 'C-ring') at the cytoplasmic face of the basal body (16). Furthermore, Notti R, et al. (17) showed a heterotypic interaction between SPOA domains (subunit A and B) serving as a scaffold for sorting platform assembly in both injectisome (subunit C) and flagellar T3SS. Solution nuclear magnetic resonance (NMR) data supports the crystallographic model, and structure-guided mutagenesis shows that this interaction is necessary for formation of the SpaO-OrgB-InvC complex. Thus, the assembly order is AB > ABC. Evidence type: Biological inference is used to determine the complex between subunit A and B and results from (16) supports this conclusion. Experimental evidence from (17) show the interaction modality between subunit C and the rest of the complex. Hence the assemble evidence for the complex is Biological Inference (B) and Experimental evidence (E).

2H47 (3 chains)
The subunit of aromatic amine dehydrogenase (AADH) light and heavy chain (subunit A and B) forms a complex (18). The transient kinetic studies of the electron transfer reaction from AADH to azurin (subunit C) was performed (19). Thus, the assembly order is AB>ABC. Evidence type: Experimental evidence (E) due to the experimental results of (18) and (19).

2GD4 (3 chains)
Coagulation Factor X is a glycoprotein composed of a heavy chain (subunit H) and light chain (subunit L) held together by a single disulfide bond (20). Antithrombin AT (subunit I) interacts with factors IXa (subunit H) which form a complex between pentasaccharide-activated AT and fXa. (21). Thus, the assembly order is HL> HIL Evidence type: Experimental evidence (E) due to the experimental results of (20) and (21).

2ASS (3 chains)
Skp1-Skp2 (subunit A and B) form a complex (22). Cks1 (subunit C) binds to the catalytic subunit (Skp2) of the cyclin dependent kinases and is essential for their biological function (23). Thus, the assemble order is AB> ABC. Evidence type: Biological inference is used to determine subunit A and B complex interaction, experimental results from (22) supports this conclusion. Experimental results from (23) was used to determine the interaction of subunit C with the other sub-complex. Therefore, the evidence for this target is Biological inference (B) and Experimental evidence (E).

1P3Q (3 chains)
The Vps9p CUE domain (subunit Q and R) forms extensive dimer contacts across a crystallographic two-fold axis (24). The CUE domain of the sorting protein Vps9p binds directly to monoubiquitin (subunit V) (25). Thus, the assemble order is QR>QRV. Evidence type: Biological inference is used to determine subunit Q and R complex interaction, experimental results from (24) supports this conclusion. Experimental results from (25) was used to determine the interaction of subunit V with the other sub-complex. Therefore, the evidence for this target is Biological inference (B) and Experimental evidence (E).
1JSU (3 chains) cyclin A (subunit B) and Cdk2 (subunit A) form a complex, Cyclin-dependent kinase inhibitor 1b (subunit C) interacts with both cyclin A (subunit B) and Cdk2 (subunit A) (26,27). Thus, the assemble order is AB>ABC. Evidence type: Experimental evidence (E) from both manuscripts suggest the initial complex formation of subunit A and B and the inhibitor activity of subunit C.

1QGW (4 chains)
The overall structure of Phycoerythrin 545 α1α2ββ dimer forms a boat-shaped molecule. The dimer is formed by the two tightly linked α1β monomers (subunit A and C) and α2β monomers (subunit B and D). The α subunits are not identical with the major sequence and structural difference residing in the C-terminal regions. The C-terminal extension of α1 is important in the chromophore binding. Almost all the protein-protein contact between the two monomers is mediated through the α subunits. A Cα tracing of the dimer forms the asymmetric unit of the complex (28). Thus, the assembly order is AC> ACBD. Evidence type: The author discussion from (28) elucidate the interaction between the subunits, hence the assembly evidence is Author Discussion (AD).

6MWR (4 chains)
Beta-2-microglobulim (subunit B) is a component of the class I major histocompatibility complex (MHC) (subunit A), hence both form a complex. γδ T cell receptors form a complex (subunit C and D) (29,30). The two subcomplexes come together to form a single complex as it is show that the diversity of the T cell is restricted by MHC class I related molecule (30). Thus, the assemble order is AB> ABCD or CD>ABCD. Evidence type: Biological inference is used to determine the interaction between subunit A and B. Experimental results from (29,30) show the interaction between subunit C and D. Finally, experimental results from (30) highlight the impact of both individual subunits coming together to form a single complex. Hence the evidence for this assemble is Biological inference (B) and Experimental evidence (E).

1W85 (5 chains)
There is a crystal structure where pyruvate dehydrogenase E1(D180N, E183Q) bound to the peripheral sub-unit binding domain of E2 (31) . This structure consists of two copies of the pyruvate dehydrogenase E1 α subunit (chains A and C in the PDB file, denoted as A and A', respectively), two copies of the β subunit (chains B and D, denoted as B and B'), and the peripheral subunit binding domain of E2 (chain I). Because the structure of E1 can be solved in the absence of E2 (32), it is expected that E1 will assemble before E2 binds. Thus, the assembly order is AA ′ BB ′ >AA ′ BB ′ I. Evidence type: Biological inference (B) as supported by (31) and (32).

4RT4 (5 chains)
The subunits (A, B, C and D) of Protein dpy-30 are an important part of the MLL1/MLL core complex (33). The dpy complex directly interact with the Compass component BRE2 (subunit E) through its Dpy30-binding motif (34). Thus, the assembly order is ABCD>ABCDE. Evidence type: Biological inference is used to determine the interaction between subunit A, B, C and D by drawing inference from (33). Experimental evidence from (34) is then used to determine subunit E interaction with the rest of the complex. Hence the evidence for this assemble is Biological inference (B) and Experimental evidence (E).