Recognition of fold- and function-specific sites in the ligand-binding domain of the thyroid hormone receptor-like family

Background The thyroid hormone receptor-like (THR-like) family is the largest transcription factors family belonging to the nuclear receptor superfamily, which directly binds to DNA and regulates the gene expression and thereby controls various metabolic processes in a ligand-dependent manner. The THR-like family contains receptors THRs, RARs, VDR, PPARs, RORs, Rev-erbs, CAR, PXR, LXRs, and others. THR-like receptors are involved in many aspects of human health, including development, metabolism and homeostasis. Therefore, it is considered an important therapeutic target for various diseases such as osteoporosis, rickets, diabetes, etc. Methods In this study, we have performed an extensive sequence and structure analysis of the ligand-binding domain (LBD) of the THR-like family spanning multiple taxa. We have use different computational tools (information-theoretic measures; relative entropy) to predict the key residues responsible for fold and functional specificity in the LBD of the THR-like family. The MSA of THR-like LBDs was further used as input in conservation studies and phylogenetic clustering studies. Results Phylogenetic analysis of the LBD domain of THR-like proteins resulted in the clustering of eight subfamilies based on their sequence homology. The conservation analysis by relative entropy (RE) revealed that structurally important residues are conserved throughout the LBDs in the THR-like family. The multi-harmony conservation analysis further predicted specificity in determining residues in LBDs of THR-like subfamilies. Finally, fold and functional specificity determining residues (residues critical for ligand, DBD and coregulators binding) were mapped on the three-dimensional structure of thyroid hormone receptor protein. We then compiled a list of natural mutations in THR-like LBDs and mapped them along with fold and function-specific mutations. Some of the mutations were found to have a link with severe diseases like hypothyroidism, rickets, obesity, lipodystrophy, epilepsy, etc. Conclusion Our study identifies fold and function-specific residues in THR-like LBDs. We believe that this study will be useful in exploring the role of these residues in the binding of different drugs, ligands, and protein-protein interaction among partner proteins. So this study might be helpful in the rational design of either ligands or receptors.

mutations. Some of the mutations were found to have a link with severe diseases like hypothyroidism, rickets, obesity, lipodystrophy, epilepsy, etc.
Conclusion: Our study identifies fold and function-specific residues in THRlike LBDs. We believe that this study will be useful in exploring the role of these residues in the binding of different drugs, ligands, and protein-protein interaction among partner proteins. So this study might be helpful in the rational design of either ligands or receptors. KEYWORDS nuclear receptor, thyroid hormone receptor-like family, ligand-binding domain, sequence conservation, phylogeny, relative entropy, multi-harmony, functional specificity

Background
The nuclear receptor (NR) superfamily is the most abundant transcription factor (TFs) of metazoans (1,2). NRs play an essential role in many biological processes such as development, metabolism, physiology, reproduction, and inflammation, as well as in many pathological processes such as cancer, diabetes, rheumatoid arthritis, neurologic and psychiatric syndromes, immunosuppression, hormone resistance syndromes, cardiovascular diseases, metabolic syndrome, etc. (3)(4)(5)(6)(7)(8). Most NRs share a similar structural architecture organized into five or six modular regions (A to F), sharing variable degrees of homology (9, 10) [ Figure 1A]. A brief description of the NR domains is as follows. A/B: N-terminal domain (NTD): The NTD is a highly disordered domain with little sequence conservation. The NTD contains the activator function-1 region (AF-1), which interacts with a variety of coregulator proteins in a cell and promoter specific manner and is subject to alternative splicing and post-translational modifications (10-12). C: DNA binding domain (DBD): The C region is the highly conserved DNA binding domain (DBD) responsible for recognition and binding to specific DNA sequences called hormone response elements (HREs) in the promoter region of a target gene. The DBD is composed of two tetracysteine zinc-finger motifs (the N-terminal motif (ZF-1) and C-terminal motif (ZF-2)) exclusive to NRs and other sequence elements known as P-, D-, T-and A-boxes ( Figure 1A) (13). These structural elements contribute to HRE specificity (Pbox), dimerization (D-box), and contact with the DNA backbone (T-and A-boxes) (13). In the RXR-TR DBD structure, the T-and A-boxes are shown to directly interact with the minor groove of the spacer element (i.e 4 bp in TR) ( Figure 1B) (14). The T and A-boxes also responsible for assisting DBDs to conduct domain-domain interactions on DNA (15). The C-terminal region of DBD contains the nuclear localization signal and is located in the (16) D: Hinge Region: The hinge region is a variable-length and poorly conserved linker region between the DBD and the LBD. Variability in the length of this region allows the C and E domains to adopt different conformations and consequently contributes to the selection of the DNA binding sites. E: Ligand-binding domain (LBD): LBD is less conserved than DBD. The LBD is an allosteric signaling domain that binds to ligands and also interacts directly with DBD and co-regulator proteins in the multi-domain structure of the receptor organized on its DNA response element ( Figures 1D-F) (17,18). This domain commonly contains 12 a-helices and four b-strands that fold into three parallel layers to form an alpha-helical sandwich and a hydrophobic ligand-binding pocket (LBP) at the base of the receptor ( Figure 1C). The C terminus of LBD contains another activation function surface (AF-2), which is composed of helices 3, 4, and 12. Helix 12, is conformationally dynamic and stabilized upon ligand binding, and appears to communicate with AF-2 to facilitate interaction with different co-regulator proteins (19). F: The F region is not present in all NRs and its function remains unclear.
According to sequence alignment and phylogenetic analyses of the conserved C and E domains, NRs were classified into eight subfamilies (NR1 to NR8) (Table 1) (9,20), which share the conserved modular structure ( Figure 1A), except NR7 members who have two DBDs (9). Non-canonical NRs lack one of the two conserved domains (C in vertebrates and E in invertebrates) and are grouped into the NR0 subfamily (9,20,21).
Data on the origin of nuclear receptors suggest that they were not hormonal receptors with a high affinity for a particular ligand, and that feature was acquired later during evolution (22). The THR-like family evolved as receptors with selective ligands (3,22). The thyroid hormone receptor-like (THR-like) family is one of the largest families belonging to the nuclear receptor superfamily. THR-like receptors are ligand-activated transcription factors that bind to the DNA and regulate gene Domain structure of NRs. (A) The modular domain structure of NRs is composed of an unstructured NTD that contains the Activation Function 1 (AF-1) surface, a DBD that contains two highly conserved zinc finger motifs with P-, D-, T-, A-boxes (), a flexible hinge region, an LBD that binds to ligands and interacts with co-regulator proteins (Nuclear receptor co-activator 2 (NCOA2), Cyan) through the Activation Function 2 (AF-2) surface and less conserved F domain. (B) A cartoon structure of TR-RXR DBD dimer bound to its DNA response element showing various motifs of DBD. (C) The TR LBD structure shows its alpha-helical sandwich fold in which helix H5, H8, and H9 (Yellow) are sandwiched between helix H2, H3 (pink) on one side and by helix H6, H7, H10 (green) on the other side. The ligand binding pocket is shown at the bottom of the LBD. (D) The RAR-RXR heterodimer on its DNA response elements (direct repeats with single-base spacer.) (E) The PPAR-RXR heterodimer on its DNA response element (direct-repeats with single-base spacer). (F) The LXR-RXR heterodimer on its DNA response element (direct repeats with fur-base spacer). The Co-activator LXXLL motif is seen (yellow) in all structures on both LBDs. expression of related pathways and play an essential role in the development, metabolism, and physiology of an organism (Table 2)  . Among all the domains DBD is highly conserved, while the LBD is moderately conserved (3,9,23). The ligand-binding domain is the interaction site for ligands and co-regulators (co-activators (SRC1) or corepressors (NCoR, SMRT)) (24). Ligand binding in the binding pocket induces a conformational change in LBD, leading to the release of the corepressors and the recruitment of co-activators. This conformational change in LBD propagates to the DNA binding domain through direct contact that promotes the DBD to bind the DNA and the subsequent transactivation of genes (9,25). Structural findings in multiple nearly full-length complexes have now shown that DBD and LBDs communicate, through direct contacts (13,15,17) (Figures 1D-F). THR-like receptors form monomers, homodimers and heterodimers, most commonly with RXR on bipartite HREs composed of tandem AGGTCA 'half site'. The HREs for RAR, PPAR, and LXR differ from one another only by the number of base pairs between two hexameric direct repeats (DR1-D5) ( Figures 1D-F) (9). The LBD is the main interaction site for ligands, coactivators, and corepressors and is also involved in the heteromerization of receptors. The ligands of THR-like receptors are small lipophilic molecules, including thyroid hormone, retinoic acid, fatty acids, heme, bile acids, and sterols. (Table 2). Based on sequence homology THR-like receptors are classified into 11 subfamilies i.e. (A) thyroid hormone receptor (THR) (B) retinoic acid receptor (RAR) (C) peroxisome proliferator-activated receptor (PPAR) (D) Rev-ErbA (E) E78C-like (F) RAR-related orphan receptor (ROR) (G) CNR14-like (H) Liver X receptorlike (I) Vitamin D receptor-like (J) Hr96-like (K) Hr1-like (Table 2) (20). THR-like receptors represent novel targets for the development of therapeutic agents for the treatment of numerous diseases, including type 2 diabetes, obesity dyslipidemia, atherosclerosis, and metabolic syndrome (7). There have been many recent advances in the development of new therapeutic agents for a subset of these receptors, including peroxisome proliferator-activated receptors, liver X receptors, and farnesoid X receptors (4,6,7). Considering the therapeutic potentials, we analyze the LBD of the THR-like family to predict the fold and function-specific sites by using informationtheoretic methods such as relative entropy and sequence harmony. In the present study, we perform extensive in-silico analysis to retrieve essential details regarding phylogenetic relationships and functionally important residues in the LBD of THR-like receptors. Taken together, the results provide a necessary framework for structural and functional studies of these important families of protein. This study would be helpful in detail in understanding the molecular basis of selective mutation, which might open new possibilities for the design of a new class of therapeutic agents.

Data collection
To extract the protein sequences of the THR-like family, we searched various protein databases (NCBI, UniProt, EMBL) (https://www.ncbi.nlm.nih.gov/, https://www.uniprot.org/, https://www.embl.org/). We have created an HMM profile (Hidden Markov model) from the nuclear receptor family 1 (NR1) sequence seed alignment (26) (Figure 2). The HMM profile was searched against each protein database and fulllength THR-like protein sequences were extracted using an internal script. CD-hit (Cluster Database at High Identity with Tolerance) was used to reduce sequence redundancy (27). Sequences that have more than 90% sequence similarities were removed from our database.

Mapping of ligand-binding domain and sequence extraction
THR-like family proteins have mainly two conserved domains DBD and LBD. To extract the LBD sequences from the full-length THR-like protein, we need to map the start and end of the domain in the sequence. To map the LBD in the protein sequences, we have designed a strategy. The THR-like LBD sequence seed alignment was downloaded from the conserved domain database (CDD) (28). Seed alignment was used to create an HMM profile using HMMER 3.0 (26). The HMM profile was searched for in THR-like full-length sequences. The Hmmsearch (26) output provides the start and end of the LBD in each sequence. The LBD sequences were extracted by using an in-house script ( Figure 2).

Sequence and structure alignment
All THR-like LBD sequences were aligned using MAFFT with the default parameters (29) (Figure 2). In an evolutionary process, protein structures are more conserved than sequences. Therefore, to include the structure conservation information, we have created a structure sequence multiple sequence alignment of THR-like LBD. Structures of THR-like LBD protein belonging to different subfamilies (Table S1) were downloaded from the PDB database and aligned using the program STAMP (30), which is an integral tool in VMD (31). A structure-sequence MSA of THR-like LBD was created by aligning the structure profile and the sequence profile using the profile-profile alignment command in MAFFT (29). Jalview was to visualize and edit the MSA (32). The sequences of the CNR14-like subfamily (nematode), Hr96-like, and Hr1-like subfamily were removed from the analysis due to low significance (short length, lower number). Motif analysis was carried out by using the MEME (Multiple EM for Motif Elicitation) suite (33). MEME searches for repeated, ungapped sequence patterns that occur in DNA or protein sequences for motif prediction.

Phylogenetic analysis of THR-like ligandbinding domain
A phylogenetic tree was generated from the alignment of multiple sequences of THR-like LBD using FastTree2 software with the default amino acid substitution model (34) ( Figure 2). FastTree2 is known for its ability to handle large alignments and run speeds that are faster than other software. FastTree2 adds minimum-evolution subtree-pruning-regrafting (SPRs) and maximum-likelihood NNIs. FastTree2 uses heuristics to restrict the search for better trees and estimates a rate of evolution for each site. Further, the phylogenetic tree was visualized and analyzed by using Archaeopteryx software (35) Calculation of relative entropy (RE) scores of THR-like LBD Relative Entropy (RE) scores were calculated by comparing the amino acid probability distribution for each column of the m u l t i p l e s e q u e n c e a l i g n m e n t w i t h t h a t o f t h e background distribution.
RE or the Kullback-Leibler divergence (KL divergence) was originally introduced by Solomon Kullback and Richard Leibler in 1951 as the direct divergence between two distributions (Kullback and Leibler 1951). It is used to measure the difference between an amino acid distribution P and some background distribution P null . The RE score of column i is defined as Pi(x) is the probability of the occurrence of amino acid x in column 'i' of the MSA, where Pnull is the background probability of amino acid x, which is generally calculated as the probability of finding an amino acid x in all available protein sequences, i.e. protein sequences in Swiss-Prot database.
We used our in-house perl script to do the calculations (36,37). The high-scoring residues were further mapped onto the thyroid hormone receptor alpha structure (PDB ID: 1NAV; Chain A) by carrying out profile-profile alignment between the sequence and structural alignment.

Multi-harmony analysis
Many protein families have subfamilies with functional specialization, such as the binding of different ligands or the participation of different protein-protein interactions. A small number of amino acids generally determine functional specificity. Multiple sequence alignments are often used to reveal functionally important residues within a protein family and identification of key residues that determine functional differences between protein subfamilies. We have used Multi-Harmony, an interactive web server for detecting sub-typespecific sites in LBD of THR-like receptors (38). The multi-Harmony server uses a combined method of sequence Harmony (SH) (39) and multi-Relief (MR) (40) to detect subfamilyspecific residues. Sequence Harmony is based on Shannon's entropy and determines to what extent amino acid compositions between groups differ. Multi-Relief identifies Workflow of the study: Flow diagram of the study to predict fold and function-specific residue via relative entropy and sequence harmony calculation. residues based on RELIEF, a state-of-the-art machine learning technique for feature weighting.
The SH score can be calculated as the relative entropy of group A relative to the sum of the probabilities of both groups ( p A + p B ).

Mutation analysis
A collection of currently known natural-occurring THR-like LBD mutations was assembled From the UniProtKB database (41). The resulting dataset includes naturally occurring LBD mutations in the following receptors: PPAR, RAR, THR, LXR, VDR and ROR (Table S2). Based on the generated list, all mutations were classified according to our scoring scheme as fold and function-specific mutations. Furthermore, mutations were mapped to have an association with any disease in humans.

Results
Retrieval and downstream filtering of sequences A total of 1276 THR-like receptor sequences were collected from the databases. We have used nuclear receptor family 1 (NR1) sequence seed alignment to search the database. Sequence redundancy was removed by CD-HIT software, keeping 90% sequence similarity as the clustering threshold. CD-hit provided 1176 sequence clusters. THR-like receptors consist of several domains out of them, two are major domains; DBD and LBD. As we were interested in studying LBD, we designed a strategy to fetch the LBD sequence out of a full-length THR-like sequence. We have created an HMM profile from the THR-like LBD sequence seed alignment downloaded from the conserved domain database. The HMM profile was searched against the THR-like full-length sequences database. We have used the bit score value (X=114) as the threshold per domain in the hmmsearch command against the THR-like sequence database. The hmmsearch output also provides the start and end of the LBD in each sequence. The LBD sequences were extracted by using an in-house script. The short length (between 100aa and 130aa) LBD domain sequences were removed, and finally, 1158 THR-like LBD sequences were used for the study.

Multiple sequence alignment of THR-like LBD
Multiple sequence alignment of THR-like LBD sequences was performed using MAFFT software. To check the accuracy of alignment, we performed a profile-profile alignment. As structures are more conserved than sequences, we aligned the THR-like LBD structure profile created using STAMP software (10 known structures of THR-like LBD proteins; details in Table 2) with the THR-like LBD sequence profile. We employed WebLogo 3 (http://Weblogo.threeplusone.com/ create.cgi) (42) to visualize the conservation patterns that emerge from the profile-profile alignment. The WebLogo shown in Figure 3 represents the conserved residues in the LBD region of the THR proteins. In the WebLogo plot, all the hydrophobic residues (YVMCLFIW) are colored black, while neutral amino acid (SGHTAP) and hydrophilic residues (RKDENQ) are colored green and blue, respectively ( Figure 3). MEME Motif analysis provided 12 motifs in the protein sequences ( Figure 3). These motifs of THR-like LBD provided information about seven highly conserved signaling motifs and five interaction sites. The interaction sites are the motifs where ligands, DBD and co-proteins interact. Motif A occupies consensus sequence positions 207-216. This is known as the inverse NR-box LLxxL motif and is one of the main ligand interaction sites. The NR box (NR interacting box LxxLL) plays an important role in the interaction of NRs with co-activators and therefore, in the regulation of transcription. Motif B occupies consensus sequence positions 222-232 and resides in a critical region for the coactivator function. Motif C also known as the LxxDDQ motif occupies positions 236-245 and is a region that is also critical to co-activator function.

Phylogenetic analysis of THR-like LBD
THR-like family is the largest ligand-activated nuclear receptor family, with members existing only in metazoans and not found in plants, algae, fungi, and protists. Fasttree2 (34) which shows good and fast performance with large alignments was used for the tree construction from the THR-like LBD multiple sequence alignment. LBD sequences are clustered based on sequence homology and lineage. The resultant monophyletic tree showed clustering of the sequences into 8 subfamilies (Figure 4). The clustering of the branches is from points near the center of the tree. Cluster 1 includes all thyroid hormone receptors (red in the phylogenetic tree). The second cluster includes all members of the retinoic acid subfamily (green in the phylogenetic tree). The third group is Vitamin D receptorlike subfamily (colored blue). The fourth cluster is The Liver X receptor-like subfamily. The fifth cluster is the peroxisome proliferator-activated receptor subfamily. The sixth cluster is The Rev-ErbA subfamily. The seventh cluster is The RAR-related orphan receptor subfamily. The eighth cluster is The E78c subfamily from the phylogenetic tree analysis; it is predicted that there all these sequences are clustering based on their evolutionary relationships. Here is a brief description of the subfamilies.
Subfamily 1-Thyroid hormone receptor: Thyroid hormone receptors are activated by binding to thyroid hormone. Thyroid hormone receptors play a critical role in the regulation of metabolism, heart rate, and development of organisms (3). TH receptors also have non-genomic effects leading to second messenger activation, and the corresponding cellular response (43). Thyroid hormone receptors regulate gene expression by binding to hormone response elements (HREs) in DNA as monomers, homodimers, or heterodimers with other nuclear THR-like LBD sequence conservation logo: The sequence logo shows conservation patterns that emerged from the multiple-sequence alignment of THR-like LBDs. The overall height of the stack represents the conservation of the sequence at a particular position, and the frequency of the respective amino acid at that position is indicated by the height of the amino acid symbol. The conserved motifs (red boxes) and interaction sites (site for ligand, coactivators/corepressors and DBD binding) are highlighted (green boxes). receptors (retinoid X receptor (RXR)). There are two main classes of the thyroid hormone receptor, alpha (THRA) and beta (THRB). Mutations in THRs are associated with thyroid hormone resistance disease (44). Clinically mutations have been observed in both the THR isoforms, however, THRB gene mutations are much more common.
Subfamily 2-Retinoic acid: Retinoic acid receptors (RAR) are activated by all-trans retinoic acid and 9-cis retinoic acid. There are three retinoic acid receptors, RAR-alpha, RAR-beta, and RAR-gamma. RARs form heterodimers with members of the retinoid X receptor (RXR) subfamily (25). RARs also have nongenomic effects via activating kinase signaling pathways, which fine-tune the transcription of the RA target genes. The disruption of RA signaling pathways is thought to underlie the etiology of several hematological and non-hematological malignancies, including leukemias, skin cancer, head/neck cancer, lung cancer, breast cancer, ovarian cancer, prostate cancer, renal cell carcinoma, pancreatic cancer, liver cancer, glioblastoma and neuroblastoma (45).
Subfamily 3-Vitamin D receptor-like: The VDR-like subfamily includes the vitamin D receptor, the pregnane X receptor, and the constitutive androstane receptor. Vitamin D receptor (VDR) heterodimerizes with retinoic acid upon binding calcitriol (an active form of vitamin D). The most potent natural agonist is calcitriol (1,25-dihydroxycholecalciferol) and the v i t a m i n D 2 h o m o l o g e r c a l c i t r i o l , 1 -a l p h a , 2 5dihydroergocalciferol) is also a strong activator (3). Mutations in the VDR gene are associated with type II vitamin D-resistant rickets. A disorder of vitamin D metabolism resulting in severe rickets, alopecia, secondary hyperparathyroidism and hypocalcemia. The pregnane X receptor (PXR), is a nuclear receptor that senses the presence of foreign toxic substances and in response up-regulates the expression of proteins involved in the detoxification and clearance of these substances from the body (46). PXR heterodimerizes with RXR and regulates the transcription of the cytochrome P450 gene CYP3A4. It is activated by a variety of compounds that induce CYP3A4, including dexamethasone and rifampicin (47). Constitutive androstane receptor (CAR) like PXR also functions as a sensor of endobiotic and xenobiotic substances. CAR and PXR play an important role in the detoxification of foreign substances and the metabolism of drugs. CAR-regulated genes are involved in drug metabolism and bilirubin clearance (48).
Subfamily 5-Peroxisome Proliferator-activated receptor: Peroxisome proliferator-activated receptors (PPARs) play essential roles in the regulation of cellular differentiation, development and metabolism (carbohydrate, lipid, protein) and tumor genesis of higher organisms. PPARs have three isoforms termed alpha, gamma, and delta (beta). PPARs heterodimerize with the retinoid X receptor (RXR) and regulate the transcription of targeted genes. Endogenous ligands for the PPARs include free fatty acids, eicosanoids, and Vitamin B3 (50).
Subfamily 6-Rev-ErbA: The Rev-Erb proteins are members of the nuclear receptor (NR) superfamily of intracellular transcription factors and key regulatory components of the circadian clock. There are two forms of the receptor, Rev-Erb alpha and Rev-Erb beta. These receptors act as key regulators of clock gene expression through transcriptional repression of Bmal1. Through their regulation of clock-controlled genes, the Rev-Erb proteins affect several physiological processes throughout the body, including metabolic, endocrine, and immune pathways (51).
Subfamily 7-RAR-related orphan receptor: The RAR-related orphan receptors (RORs) are intracellular transcription factors. There are three isoforms of ROR, ROR-a, -b, and -g. The RORs are somewhat unique in that they appear to bind as monomers to hormone response elements as compared to other nuclear receptors which bind as dimers. RORs are activated by oxysterols. Some other natural substances have also been reported to bind to RORs, such as all-trans-retinoic acid binds with high affinity to ROR-b and -g but not ROR-a. RORs act as lipid sensors and hence may play a role in the regulation of lipid metabolism (52).
Subfamily 8-Ecdysone-induced E78c: The members of the 78c protein induced by ecdysone are found in arthropods, trematodes, mollusks, and nematodes. E78c protein has been linked to insect development and metamorphosis (53). Ecdysone-induced protein 78C (E78), a nuclear hormone receptor closely related to Drosophila E75 and mammalian Rev-Erb and Peroxisome Proliferator-Activated Receptors was originally identified as an early ecdysone target. The previous study shows that E78C plays a role in oogenesis and is important for proper egg production and maternal control of early embryogenesis (53).

Prediction of fold-specific residues: Results of RE calculation
Fold-specific residues are responsible for maintaining the overall fold of the protein. Hence, such residues should be conserved across the THR protein alignment; an in-house Perl script was used to predict the fold-specific residues by employing a relative entropy calculation (RE). RE calculations identify residues whose probability distribution is significantly different from their background probability distribution. RE, therefore, predicts residues that are not identified by the traditional scoring methods (36). RE scores calculated for each MSA alignment position were plotted against sequence conservation in WebLogo of THR-like LBD. High RE scores are represented by a sharp upward peak (blue color) in the graph at an alignment position ( Figure 5). It was observed that the residues identified as fold-specific are scattered across the helices (Figures 6A, C). Interestingly, H4, H6, H8, H9, H10, and H11 contain certain residues with very high RE scores. To gain insight into the distribution of the high-scoring residues on a 3D structure, the top 30 residues with the highest scores were mapped onto the crystal structure of the thyroid hormone receptor alpha of homo sapiens (PDB ID: 1NAV). Table 3 summarizes the details of all residues which were indicated in the structure.

Prediction of function-specific residues: Results of multi-harmony calculation
Keeping in view the therapeutics of the THR-like family, we utilized another information-theoretic measure to explore distinct sequence signatures in the subfamilies. Though the proteins belonging to the THR-like family share the same protein fold, their ligand recognition and binding are different and very specific to each subfamily. It was of interest to explore the differentially conserved residues in one subfamily compared with another. The superposition of THR-like LBD structures reveals that the top part of the receptor is more or less similar compared to the bottom (which contains the LBP). This variability across THR-like receptors at the ligand-binding region allows THR-like receptors to recognize a diverse set of ligands. Therefore, to predict the specificity determining residues in the THR-like LBD we have used the multi-harmony web server. The Sequence Harmony (SH) and Multi-Relief (mR) methods in one web server allow simultaneous analysis and comparison of specificity residues. The SH scores calculated for each alignment position from the MSA were plotted against sequence conservation in weblogo of THR-like LBD ( Figure 5, green color). The SH score lies between 0 and 1. The lower score shows the differential conservation of the residues at that alignment position. Refinements with low SH scores are the specificities that determine the residues. Low SH scores are represented by a downward peak (green color) in the graph at an alignment position ( Figure 5). 30 residues with the minimum SH scores (considered 0.4 SH score as cutoff) are provided in Table 4. To get an idea of the distribution of the minimum SH scoring residues on a 3D structure, the top 30 residues with the minimum SH scores were mapped onto the homo sapiens crystal FIGURE 5 Mapping of RE (relative entropy, blue) and SH (sequence harmony, green) scores onto the sequence WebLogo of THR-like receptors LBDs. RE peeks (upward, blue) highlight the alignment position, which is structurally important (fold-specific), and SH peeks (Downward) show that the alignment is of functional significance (function-specific). Highlighted boxes represent the conserved motifs (red) and interaction sites (green). structure of the thyroid hormone receptor alpha (PDB ID: 1NAV) (Figures 6B, D). The Result shows that most of the minimum SH scoring residues are around the ligand-binding cavity ( Figure 6D). Table 4 summarizes the details of all residues which were indicated on the structure and a detailed list of residues with consensus sequences in all other subfamilies at that alignment position is provided in the supplementary information (Table S2).

Mapping of LBD mutants as function and fold-specific
In this study a list of all known natural mutations similar to THR-like LBD was composed in this study from the UniProt knowledge database and the literature search (Table S3). Natural mutations were found in THRa, THRb, RARa, RARb, VDR, LXRa, PPARa, PPARg and RORa, RORb, and RORg. Further mutations were classified according to our scoring scheme as fold and function specific (Table S3). THRA structure was used to map the predicted fold-, function-specific residues and natural mutations in THRA (Figures 7A-C). We have compiled a list of mutations that are associated with diseases in humans ( Tables 5, 6). On the basis of mutation and the analysis of their results, it was observed that very highly conserved residues are not prone to mutations due to their essential role in protein function. The mutation in THRA LBD lies around the cavity ( Figure 7C). We have observed that receptor-wise, THRB has more mutation compared to THRA, VDR, and PPARA ( Figure 8A). Mutations are present in all motifs and interaction sites but residues lying at motif G, motif A, motif C, motif F, motif B and interaction sites INS 5, INS 1, INS 2, INS 3 are more prone to mutations ( Figure 8B). Mutational analysis also shows that most mutations are at the sites of functional significance compared to fold ( Figure 8C).

Discussion
In this study, we have combined both sequence and structural information from the THR-like LBDs into the multiple sequence alignment. Length−wise all THR-like  ligand-binding domains are very similar. Finding conserved motifs should highlight regions of critical importance in ligand binding. Those regions can provide amino acid patterns that are a unique signature to the THR-like LBD. From the MSA we have found that the THR-like LBDs consist of seven conserved motifs and five interaction sites that are aligning with the previous studies (54). THR-like LBDs have the interaction site for the respective ligand and other co-proteins. It was observed that five interaction sites are present in the LBDs. Interaction site 1 lies within motif A, Interaction sites 2,3 and 4 lie between motifs C and D, and interaction site 4 lies between motifs F & G.
The NR−boxes (LxxLL) or inverse NR−boxes (LLXXL) can bind to specific NR regions; they may have a role in the interaction between NRs, specifically in hetero and homo−dimerization. The MSA was used to draw a phylogenetic tree of the THRlike LBDs. The phylogenetic analysis performed depicts the distinct separation of THR-like LBDs into the monophyletic branches. The THR-like LBDs are clustered according to the lineage.
Fold-and function-specific residues found in any protein of interest, and these residues are differentially conserved among the subfamilies and provide them functional specificity, such as binding at different ligands or being involved in different protein-protein interactions. To predict fold-specific and function-specific residues in the THR-like family, we used information-theoretic methods. A high RE score at the alignment position corresponds to the fold-specific residue, and on the other hand minimum, SH score at the alignment position in MSA corresponds to the function-specific site. The structure of the THR-like LBD is composed of 12 a-helices folded as an antiparallel tri-layer alpha-helical sandwich fold in which antiparallel alpha-helices (H5, H8, H9; the "sandwich filling") are flanked by two alpha-helices on one side (H2, H3) and three on the other (H6, H7, and H10; the "bread") (9). The ligand-binding cavity is within the interior of the LBD and just below three antiparallel alpha-helical sandwich "fillings" ( Figure 1C). The H12 is connected to H11 by a flexible loop which allows H12 to swing upon binding to a ligand, trapping the ligand and stabilizing the conformation of the NR (9). In this way, the LBD mediates recognition and interaction of ligands at the ligand-binding pocket (LBP), dimerization with LBDs from other NRs, interaction with co-regulators (coactivators [LxxLL motif in H4] or corepressors ([LxxxIxxxI/L motif in H1]) and ligand-dependent transactivation at the ligand-dependent activation function AF2 domain (H12), which enables the recruitment of co-regulators (21). The mapping of RE and SH scores on the human thyroid hormone structure has shown that high RE scoring residues are distributed onto the helixes in the LBD domain and are critical to maintaining the THR-like LBD fold, while the mapping of SH scores has shown that minimum SH scoring residues are distributed around the ligand bind pocket and contribute to ligand specificity in subfamilies ( Figure 6). The dominance of nonpolar residues is more compared to polar and charged residues in both lists (foldspecific and function-specific residues) ( Table 3, 4). Additionally, to validate our study, we have compiled a list of natural mutations in THR-like LBDs and how they affect their function (Table S3). Since the interaction sites are mainly responsible for the ligand selectivity or co-proteins interactions, mutation analysis provides information that all the interaction sites and conserved motifs are prone to mutation (Figure 8B). Motif G and INS5 have high mutation count as compared to other motifs. Motif G and INS5 lie in helix H12 and H11, which play a crucial role in ligand trapping, binding, and stabilization of the receptors. Any mutation in these regions will lead to impairment of ligand binding, dimerization, and coreceptor binding. The selected mutations A B C FIGURE 7 Mapping of mutations onto the THRA structure: (A, B) Mapping of fold-specific (red) and function-specific residues (blue) onto the human THRA LBD structure (PDB, 1NAV), respectively. (C) Mapping the mutation (sphere, gray) in THRA onto the structure shows that fold-and function specific sites are prone to mutations (spheres can be seen in blue and red regions).

RORg
Disruption phenotype Mice show decreased adipocyte size and high insulin sensitivity, leading to improved control of circulating fatty acids. Mutants are protected against hyperglycemia and insulin resistance in the state of obesity. Loss of circadian pattern of some clock genes expression in peripheral tissues and massive apoptosis of thymocytes. Knockout mice for isoform 2 lack all lymph nodes and Peyer patches, as well as LTi cells. They also show a reduction of T(H)17 cells in the lamina propria by at least 10 times to less than 1% of T (H) cells. Mice are less susceptible to autoimmune inflammatory diseases.

S.
No.

Receptor Mutation Classification Alignment Position
Disease-Associated were further classified into fold-and function-specific residues according to our scoring scheme and analyzed how the mutation affects the function of the protein. Most of the mutations are function-specific which signifies that mutations are linked to impairment in ligand and co-receptor binding that leads to malfunctioning of the receptor and is associated with diseases in humans (55)(56)(57)(58)(59)(60). We have found that 47 THR-like LBD mutations are critical and linked to diseases in humans (Table 7). Mutations in the LBD of THRa, THRb, RARa, RARb, VDR, PPARg, RORa, RORb, and RORg receptors are associated with several diseases (Table 6). A brief description of the diseases is provided for reference (Table 5). Among all receptors, THRb LBD is more prone to mutation and is involved in GRTHD, GRTHR, and PRTH diseases. THRb mutations A317T/S and A234T are involved in GRTHD and affect hormone binding. THRb mutation R316H/C is involved in PRTH and also impairs hormone binding (Table 6). THRb residues R316 and A317 constitute the ligand binding domain. The A317 side chain interacts with iodine, sculpts the T3 in the ligand-binding pocket, and R316 is a part of the polar cluster in THRb. Both mutants show decreased T3 affinity and weak transcription activity. The mutant R316H fails to dimerize which explains the weak transcription (61). The residue R316 forms two hydrogen bonds with the helix1, upon mutation to H316 both the hydrogen bond is lost which leads to displacement of helix1 and thereby loop after helix1. The stability of helix1 is required for ligand binding and dimerization (61). The mutant A317T repositions 3,5,3'triiodothyroacetic acid, distending the face of the receptor that binds the coregulators and (61). Mutation analysis has also shown that some alignment positions are very critical (Table 7). Two and more than two receptors having mutations at these positions lead to impairment of LBD function resulting in the occurrence of disease. Three receptors have the mutation, THRb (L346F), VDR (H305Q) and PPARg (F388L) at alignment position 363 and are associated with GRTHD, VDDR2A and FPLD3 diseases, respectively (55,57,62). These findings suggest that the spatial position of the residues is also important with the residue (polar, nonpolar, and neutral) at that position and may help in the selectivity, binding and interaction of ligands with the coactivators and co-repressors.

Conclusions
In conclusion, THR-like receptors are vital transcription regulators and therefore mentioned information acquired can be used in a variety of ways. Phylogenetic analysis also provided ; loss of calcitriol receptor activity; no effect on interaction with RXRA; changed interaction with NCOR1; loss of interaction with NCOA1; no effect on sequence-specific DNA binding.

39
VDR R274L/H Fold 257 in VDDR2A; loss of calcitriol receptor activity; decreased affinity for calcitriol by a factor of 1000; no effect on interaction with RXRA; changed interaction with NCOR1; loss of interaction with NCOA1; no effect on sequence-specific DNA binding. new insights for THR-like clustering and identified several key regions that exist through evolution in the NR LBD. The mutation analysis highlighted mutational hotspots while also providing insights into their effects, especially when they are localized in conserved motifs of THR-like LBD. Due to the diverse array of genes regulated by these proteins, along with the fact that many drugs are not explicitly specified for one receptor, drugs that target THR-like receptors tend to have unwanted side effects. Improving our understanding of THR-like receptors could pave the way for future therapeutics. Our study might be helpful to pharmaceutical companies and academic researchers in developing synthetic ligands that can specifically target these receptors. The conserved motifs and interaction sites can be used as selected targeted regions for rational design and discovery of novel drugs. The development of new drugs can be achieved through specific in silico techniques that can compose ligands, which can interact with specific conserved regions and force new alterations in the protein's dynamics.

Data availability statement
The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding authors. ; loss of calcitriol receptor activity; decreased affinity for calcitriol by a factor of 1000; no effect on interaction with RXRA; changed interaction with NCOR1; loss of interaction with NCOA1; no effect on sequence-specific DNA binding.