Computational Exploration of Putative LuxR Solos in Archaea and Their Functional Implications in Quorum Sensing

LuxR solos are unexplored in Archaea, despite their vital role in the bacterial regulatory network. They assist bacteria in perceiving acyl homoserine lactones (AHLs) and/or non-AHLs signaling molecules for establishing intraspecies, interspecies, and interkingdom communication. In this study, we explored the potential LuxR solos of Archaea from InterPro v62.0 meta-database employing taxonomic, probable function, distribution, and evolutionary aspects to decipher their role in quorum sensing (QS). Our bioinformatics analyses showed that putative LuxR solos of Archaea shared few conserved domains with bacterial LuxR despite having less similarity within proteins. Functional characterization revealed their ability to bind various AHLs and/or non-AHLs signaling molecules that involve in QS cascades alike bacteria. Further, the phylogenetic study indicates that Archaeal LuxR solos (with less substitution per site) evolved divergently from bacteria and share distant homology along with instances of horizontal gene transfer. Moreover, Archaea possessing putative LuxR solos, exhibit the correlation between taxonomy and ecological niche despite being the inhabitant of diverse habitats like halophilic, thermophilic, barophilic, methanogenic, and chemolithotrophic. Therefore, this study would shed light in deciphering the role of the putative LuxR solos of Archaea to adapt varied habitats via multilevel communication with other organisms using QS.

Acylated homoserine lactones are characterized as the major signaling language for interaction among Gram-negative bacteria. It is processed by various homologs LuxI/LuxR type QS system in bacteria (Miller and Bassler, 2001). Two important proteins, LuxI and LuxR, control the expression of luciferase operon (luxICDABE), and thereof are the key regulators of QS circuit. LuxI homolog protein is AHL synthase that catalyzes the reaction between S-adenosyl methionine (SAM) and an acyl carrier protein (ACP) to produce AHL molecules (Rutherford and Bassler, 2012). While, LuxR-like proteins activates the transcription of the target DNA by binding to its cognate AHL molecule (Schauder and Bassler, 2001). Moreover, a LuxR homolog protein comprised of two domains, i.e., N-terminal region (response regulatory domain) that binds to its specific autoinducer and C-terminal region with Helix-Turn-Helix (HTH) motif responsible for binding the DNA and hence modulates the expression of genes (Donaldson et al., 1990;Hanzelka and Greenberg, 1995).
Previously, we have developed a database named SigMol, which encompasses information of all QSSMs reported in prokaryotes . Interestingly, in the database few species of Archaea was reported to exploit QS phenomenon. For example, Paggi et al. (2003) studied the presence of intraspecies communication in Natronococcus occultus through AHLs and showed their correlation with the production of extracellular proteases. Later on, FilI/FilR regulators were known to process carboxy-AHLs in Methanosaeta harundinacea strain 6Ac for cell assembly and carbon metabolic flux (Zhang et al., 2012). Moreover, some archaea like Methanosarcina mazei, Methanothermobacter thermautotrophicus (Zhang et al., 2012), Natrialba magadii (Montgomery et al., 2013), etc. were also proved to perform cross-talk through QS. However, there is a huge gap in the experimental exploration of QS potential among archaea due to difficulties in culturing them.
Despite, various bioinformatics resources available for QS like Quorumpeps (Wynendaele et al., 2013), QSPpred (Rajput et al., 2015), SigMol , etc., the attempts to explore QS mechanism computationally and evolutionarily in Archaea is lacking. To best of our knowledge, this is the first study that focused on investigating QS in archaea kingdom through multidimensional perspectives. We performed stepwise analyses to unveil the QS potential of LuxR solos in Archaea via their distribution, similarity with bacteria, functional characterization followed by correlation between taxonomy and ecological niche.
Amongst all the three domains, only LuxR (C-terminus DNA binding, IPR000792) domain is reported in 110 archaeal proteins (Supplementary Table S1). However, we classified LuxR proteins as solos due to the absence of cognate LuxI domain with them. Furthermore, we used 110 LuxR containing sequences in all the analyses to unveil the functionality of QS in Archaea.

Multiple Sequence Alignment
Alignment of LuxR archaeal sequences was done with respect to TraR of Agrobacterium tumefaciens, to observe the presence of functionally conserved key residues among them (W57, Y61, D70, P71, W85, G113, E178, L182, and G188) (Zhang et al., 2002). The TraR of A. tumefaciens was previously used as the reference during MSA for aligning bacterial LuxR sequences (Subramoni et al., 2015). MSA was performed using MAFFT v7.0, which employs variants of fast Fourier transform method for identifying homologs regions and alignment (Katoh et al., 2002). Further, the aligned sequences were visualized using MSAReveal.org 1 software. It disclosed the uniqueness in aligned sequences by showing the statistics for length, % identity, gaps, and consensus.

Domain Analysis
Domain analysis was done to check the possibility of the conserved portions of protein that can exist, evolve, and function independently from the rest protein chain. LuxR containing proteins was scanned for the presence of all the possible domains (universal) by employing two different strategies. Firstly, the domains among LuxR proteins that are reported in InterPro database were extracted. Secondly, NCBI-Conserved Domain Database (CDD) (Marchler-Bauer et al., 2015) is used for CD-search and only those domains are enlisted that comes out as "specific hit." The outcome of domains by employing both strategies (InterPro and NCBI-CDD) was further explained in two ways, i.e., occurrence of unique domains reported in all proteins and frequency of domain combination. Moreover, pictorial depiction of all domains in 110 LuxR proteins was constructed using Domain Draw tool (Fink and Hamilton, 2007).

Motif
Motif analysis was done to fetch the structural characteristic (or super secondary) in the protein. Firstly, we extracted the motifs from Archaea LuxR sequences and then scanned them with Gram-negative bacteria to examine the extent of their similarity. Motif discovery and scanning were done using Multiple Em for Motif Elicitation (MEME) and Motif alignment and search tool (MAST) v4.11.2 software (Bailey et al., 2015), respectively. MEME is used to identify novel and ungapped motifs from the input sequences. Consequently, MAST scans and sorts the sequences by the best-combined match to all extracted motifs by MEME.

Gene Ontology Annotation
Gene Ontology (GO) annotation enables the assignment of protein functions computationally. GO consortium constructed three structurally controlled vocabularies (ontologies) to portray the gene products linked with biological process, molecular function, and cellular components in species independent mode. GO annotation of LuxR containing archaeal proteins were done using GOA database to find the biological, molecular, and cellular function of the sequences (Ashburner et al., 2000). Molecular 1 http://www.bioinformatics.org/msareveal/ function annotation used to describe activities of sequence occurring at the molecular level. The Cellular function provides the information regarding the component of cell where the sequences are active. Moreover, the biological process determines a series of events being driven by some organized assemblies of molecular functions.
In the present study, we focused on highlighting the preference of all the assigned GO (n) functions among Archaeal proteins among three different domains of GO. Therefore, combinatorial mathematics based approach was employed. Firstly, we determined a maximum number of GO function combinations (1 to m) that can be assigned. Secondly, we extracted a number of proteins involved in all the combinations (1 to m) of GO functions. This combinatorial mathematics based approach resulted in the range (maximum to minimum) of proteins that possess common functions. Thirdly, the visualization were done using UpsetR (Lex et al., 2014) package in R. In this study, we gave all the possible information of GO annotation assigned functions like proteins involved in individual function and in all possible intersection sets (number of proteins involves in 2, 3, 4, functions etc.). For example, if all the 110 proteins reported to be involved in five GO functions (e.g., A, B, C, D, and E), then total combinations comes out from the formula given below: where, n is total number of functions assigned. So, in the case of five GO functions assigned "Total combinations" resulted in 31 (= 2 5 -1). Further, to know individual patterns (GO function) per combination. Following formula is used: Patterns per combination = n C k where, n is total number of functions assigned and k is number of combination of which we need to fetch out the patterns [single (A, B, C, D), double (AB, BC, CD), triple (ABC, BCD), etc.]. Therefore, to know the "patterns per combination" in which 31 "total combinations" are involved of five different GO functions: Patterns per 1 combination single = 5 (= 5 C 1 ) Patterns per 2 combination double = 10 (= 5 C 2 ) Patterns per 3 combination triple = 10 (= 5 C 3 ) Patterns per 4 combination quadruple = 5 (= 5 C 4 ) Patterns per 5 combination quintuple = 1 (= 5 C 5 ) All the combinations and their respective patterns are described in a user-friendly manner by plots in result section of the manuscript.

Ligand Binding Prediction
Prediction of ligand binding was accomplished to observe the potential of LuxR proteins to involve in QS phenomenon. The ligands were identified using COACH software available in I-TASSER package (Yang et al., 2013). It identifies the ligands using two approaches, i.e., structure-based (TM-SITE) and evolution-based (S-SITES) ligand-binding sites prediction.

Clustering
Grouping of the LuxR containing protein of Archaea was done to identify the closely related members among all LuxR regulators. All the sequences of archaea were clustered by CLANS (CLuster ANalysis of Sequences) (Frickey and Lupas, 2004). It is a Java-based application, which performs the clustering using network-based approach (unaligned sequences) by employing all-against-all BLAST searches and the pairwise attraction values were calculated based on high scoring segment pair p-values.

Phylogenetic Analyses
Phylogenetic analyses were executed to observe the evolutionary relationship of LuxR containing protein of Archaea with other families of LuxR and respective BLAST hits. It was done using Molecular Evolutionary Genetics Analysis (MEGA) 7.0 (Gupta et al., 2016;Kumar et al., 2016). Protein sequences were grouped and aligned using MUSCLE (Edgar, 2004;Hall, 2013) software integrated into same package. Further, the aligned sequences were analyzed phylogenetically employing Maximum Likelihood (ML) method.
The model used for LuxR proteins tree building was WAG (Whelan and Goldman, 2001) with rate variation among sites was formulated with a gamma distribution (shape parameter = 1). Additionally, 16s rRNA tree of 107 sequences (65 archaea and 42 bacteria) was build using Kimura-2-parameter (Kimura, 1980) model. Statistical support for evolutionary analyses was computed by bootstrap in ML (using 1000 pseudo-replicates) along with all the positions with less than 95% site coverage was removed (i.e., fewer than 5% alignment gaps, missing data, and ambiguous bases were allowed at that position).

Ecological Niche
For correlating the taxonomy and ecological distribution of Archaeal LuxR solos. We extracted the habitat of all Archaea possessing LuxR from various sources like Encyclopedia of life, PubMed, UniProt, JGI genome portal, GenBank, etc. Further, CIRCOS v0.69 stand-alone software was used to visualize the relationship between taxonomy and habitat of Archaea (Krzywinski et al., 2009).

Extent of the Distribution of Putative LuxR Solos in Archaea
To check the distribution of potential LuxR solos in Archaea, InterPro database was explored as described the methodology section. In total, 110 LuxR sequences were obtained from 94 unique Archaea strains. Further, we used 110 LuxR containing sequences for all the analyses, for unveiling their potential to participate in QS. The correlation between their length and frequency was determined to get the brief overview of their distribution. A pictorial summary of archaea sequences (LuxR containing) used in the study depicted as scattered plot with marginal histogram constructed in R for defining sequence length vs. number of sequences in Figure 1 with the average sequence length of 255 residues.

Similarity of Putative LuxR Solos of Archaea with Bacteria
Domain extraction was performed to examine the similarity of the independently existing functional unit in Archaea. The presence of all possible domain hits was done using two strategies: (i) InterPro, and (ii) NCBI-CDD database. Moreover, the combinations of all possible domains were also explored in every protein using both the strategies. By using the first strategy, 24 unique domains were reported among 110 LuxR containing archaeal sequences.  Figure 2 whereas the list of all the domains along with their occurrence is given in Supplementary Table S2. Moreover, the domain diagram showing the combination of InterPro assigned domains in all LuxR containing Archaeal proteins is provided in Figure 3.
The presence of motifs in putative LuxR solos of Archaea was examined. At e-value 1, we extracted 10 motifs using MEME tool that varies in length, sequence coverage from 16 to 50 and 11 to 110, respectively. The detailed information like sequence logo, motif width, regular expression and sequence coverage is provided in Supplementary Table S3. Further, the fetched motifs were searched in Gram-negative bacteria for observing their similarity with them. We found Archaeal LuxR motifs in 14350 out of 73131 Gram-negative bacterial LuxR sequences. The observations suggest that motifs in LuxR solos of Archaea displayed similarity with bacteria LuxR.
Alignment of Archaea LuxR proteins against TraR of A. tumefaciens was done to observe the conservation of key residues for ABD and HTH domains. ABD's residues that make extensive Van der Waals forces with pheromones like W57, Y61, D70, P71, W85, G113 found conserved in 19, 0, 12, 02, 21, and 01 archaeal proteins, respectively. Whereas, HTH binding key residues like E178, L182, G188 present in 25, 48, and 101 LuxR containing proteins. List of proteins possesses key conserved residues are given in Supplementary Table S4. The presence of maximum conserved residues of bacterial LuxR that participate in QS among archaea showed their relatedness with them.

Functional Characterization of Archaeal LuxR Solos
Gene Ontology analyses carried out for three aspects namely biological process, cellular component, and molecular function.

Cellular Component
Out of total GO annotation for cellular component, scanned archaeal LuxR containing proteins assigned to only three

Evolutionary Trend of Putative LuxR in Archaea
Evolutionary history of archaea LuxR containing sequences was checked by phylogenetic analyses. ML tree for LuxR containing

(Continued)
Frontiers in Microbiology | www.frontiersin.org  All sequences grouped into two major clades one of archaeal origin and another of bacterial. Archaeal clade was further divided into three sub-clades namely halophilic, methanogenic and thermophilic with high bootstrap values corresponding to their ecological niche.
There are some instances for the presence of members of different ecological niche in another clade with the exception of halophilic sub-clade that contains only halophiles. Whereas methanogenic sub-clade harbors Haloquadratum walsbyi, Halococcus hamelinensis 100A6, Haloferax sulfurifontis that are halophiles; and thermophilic sub-clade reported to have members of halophiles, ammonia oxidizing archaea (AOA), methanogens and mesophiles like Salinarchaeum sp., Nitrososphaera gargensis, Methanoregula formicica, and Euryarchaeota archaeon SM23-78.  Locations of four species that contain more than one LuxR containing proteins are interesting in phylogenetic tree. As, multiple LuxR copies (via gene duplication event) of three species like Haloterrigena turkmenica VKM B-1734, Natronomonas moolapensis CSW8.8.11, and Halonotius sp. J07HQW1 was found at distant places within their respective group halophiles. Moreover, one copy of Haloferax sulfurifontis ATCC BAA-897isdistantly placed with Methanococcus maripaludis in the phylogenetic tree.

Distribution of LuxR in Diverse Ecological Niche
The pattern of the distribution of LuxR containing protein in Archaea was examined according to their taxonomy and ecological niche. One hundred and ten LuxR proteins are from 94 unique archaeal species and scattered in 05 different phylum. Maximum archaea belonged to Euryarchaeota followed by TACK, DPANN, environmental samples and unclassified group as shown in Figure 7. Predominant habitats are halophilic followed by thermophilic, methanogenic, and anaerobic (Figure 7). On correlating the taxonomy and niche specific, we found that most members of Euryarchaeota possessing LuxR proteins are from halophiles or extreme halophiles. Whereas TACK group archaea preferred to be in thermophilic or extreme thermophilic habitat. Moreover, clustering analyses also suggest the significant correlation between taxonomy and habitat. Out of 110 LuxR sequences, 88 remain clustered in 15 groups at p-value 1e − 30 according to their habitat and taxonomy. Among 15 clusters, eight, three, one clusters are exclusively of halophiles, thermophile, and methanogens. However, three clusters possess species from mixed habitat like halophiles, thermophiles, mesophile, e.g., Halolamina sediminis, Candidatus Nitrosopumilus salaria BD31, uncultured marine thaumarchaeote KM3_43_G12, Thermoplasmatales archaeon SG8-52-4, etc. (Supplementary Figure S4 and Table S8).

DISCUSSION
LuxR solos are diversely distributed transcriptional regulators in bacteria known to play an important role to sense and respond to environmental cues (Venturi and Ahmer, 2015). They are able to sense internal as well as external signals and helps in the adaptation of microbes despite absence of cognate LuxI (Hudaiberdiev et al., 2015). However, they are well established to involve in QS among bacteria (Patankar and Gonzalez, 2009). Although, till date, they are extensively explored in the bacteria kingdom but their role in Archaea is unexplored. Therefore, in the present study, we tried to explore their distribution in Archaea, the similarity with bacterial LuxR, functional characterization, evolutionary trend and ecological relatedness. Subramoni et al. (2015) searched InterPro database to find the putative LuxR solos proteins. These LuxR solos have ABD and DNA binding domain in bacteria. Likewise, Santos et al. (2012) explored LuxI/LuxR in Pfam database to fetch putative proteins and identified the LuxR regulators with HTH transcriptional factors that involved in QS. We have used the similar strategy to searched LuxI/LuxR in InterPro database and recognized 110 LuxR solos in Archaea that lack ABD and possess only DNA binding, HTH domain.
Our analyses revealed that LuxR solos of Archaea shared similarity with bacteria and able to perceive small molecules. Although, some domains are not exclusive to archaea but also found in bacteria like Transcription regulator (LuxR, HTH), DNA binding domain, Signal receiver, etc. (Santos et al., 2012;Subramoni et al., 2015). Although, LuxR containing archaeal proteins explored in our study contains various type of domains that indicates the relationship of archaea in signal transduction and its response to wide range of environmental modulators as reported in bacteria (Skerker et al., 2005). Moreover, LuxR based QS signaling is different in Gram-negative (single transcription factors) and Gram-positive (two-component system) bacteria (Sturme et al., 2002). Domains repertoire extracted by our study belonged to one-component and two-component system that are found in Archaea and/or bacteria. Interestingly, we extracted domain from putative LuxR solos of Archaea that are involved in two-component system, which is reported to be acquired via HGT from bacteria (Koretke et al., 2000;Ulrich et al., 2005). From scanned domains, MerR and HTH_1 are the exclusive key component of one-component system extracted from putative LuxR solos of Archaea (Ulrich et al., 2005). Whereas domains like GerE, PAS, HTH, etc. are involved in both one-component and two-component systems (Taylor and Zhulin, 1999;Galperin et al., 2001). Moreover, we also found archetypal signal input (small molecules binding) domains like PAS, GAF, CheY in putative LuxR solos of Archaea (Galperin et al., 2001;Ulrich and Zhulin, 2010).

HTH motif (RGL[TS]XEE[IV]A[ED]AL[GD][IV]SRSTV
[LS]EH) of GerE domain present at C-terminal of LuxR proteins in bacteria is reported to involve in signal sensing or QS. Moreover, this motif is also reported in HMM logo in Pfam (PF00196) and sequence logo from PROSITE (PS50043) database as LuxR_HTH motif with their implication in QS. This motif (Motif 1) is also present in putative LuxR solos of Archaea. However, the majority of the motifs are conserved according to the ecological niche. Interestingly, alignment results showed 10-25% similarity of Archaeal LuxR solos with bacteria, which are almost same as found among bacterial LuxR solos (18-25%) (Subramoni et al., 2015). Furthermore, we also found substitution among invariant amino acids of ABD that displayed the diversity of LuxR solos to sense wide range to autoinducers (AHLs or non-AHLS).
Gene Ontotology annotation studies showed that Archaeal LuxR solos involved in regulation of transcription through autophosphorylation of a histidine kinase and transfer the phosphate moiety to aspartate that further acts as a phosphor donor to response regulator proteins. However, they also possess sigma factor activity that aids them in making sequence-specific contacts with the promoter elements. Further, they are also annotated to be functional intracellularly in the cell as that of bacterial LuxR regulators (Santos et al., 2012). However, GO-based functional assignment showed that Archaeal potential LuxR solos involved in signal sensing mechanism. Furthermore, to examine their role in QS, the ligand-binding prediction was performed. Although, the analysis showed that Archaeal LuxR solos are functionally characterized by the ability to bind AHLs and non-AHLs ligands as bacterial LuxR solos (Subramoni et al., 2015). It was further supported by MSA, which displayed that among 06 conserved key residues ABD, 05 are found conserved in few Archaea LuxR proteins (Supplementary Table S4). The substitution among invariant amino acids indicates their potential to sense a wide range of signaling molecules (AHLs and/or non-AHLs). However, the presence of AHLs as signaling molecules in Archaea was already reported in SigMol database and various other studies (Paggi et al., 2003;Zhang et al., 2012;Rajput et al., 2016). Additionally, the presence of non-AHL ligands like DKPs was also established in the previous study (Tommonaro et al., 2012). However, other non-AHLs ligands like, α-pyrones, dodecanoyl-CoA, pyruvic acid, amino acids, metal ions, etc. showed their similarity with bacterial LuxR solos (Patel et al., 2013;Brameyer et al., 2014;Heermann, 2015, 2016;Venturi and Ahmer, 2015).
Our analysis revealed that bacteria are remote homologs of Archaeal LuxR protein. The phylogenetic analyses of the LuxR solos protein of Archaea and bacteria showed that they both evolved separately with less substitution per site in archaea as compared to bacteria. However, analyses further confirm the presence of few cases for the transfer of LuxR copies between bacteria and Archaea through HGT. Moreover, placement of multiple LuxR solos copies in same Archaea like Haloterrigena spp., Natronomonas spp., Halonotius spp., Haloferax spp. both distantly with different microbial strains and with each other indicates that they are acquired through HGT and gene duplication events, therefore, possessing diverse ligand binding properties like the bacterial LuxR solos copies (Subramoni et al., 2015).
Our study is based on exploring the archaea for an imperative and fundamental phenomenon known as QS. All the analyses showed that Archaea LuxR solos could bind to AHLs and non-AHLs ligands and participate in QS. However, experimental details need to confirm the ligand specificity but difficulties in culturing the Archaea led this kingdom under-explored. Therefore, we used computational approach to explore the extent and functionality of Archaea against QS cascade. Varied computational analyses like similarity, functional characterization and evolutionary history showed their involvement in QS through AHLs and/or non-AHLs ligands. Moreover, potential ecological niche of archaea was collated from literature and correlated with the outcome of our analyses for better understanding for the trend of QS being exploited via extremophiles. However, the extent of the diversification for QS in archaea is still a question that needs to be further explored. Simultaneously, the evidence reported in the literature for the occurrence of dominant microbial lifestyle, i.e., biofilms in archaea mostly via syntropic interaction with bacteria strengthen our findings that these extremophiles have capabilities perform intraspecies, interspecies, and even interkingdom crosstalks and thrive extreme environment through QS.

AUTHOR CONTRIBUTIONS
The idea was conceived by MK and also helped in interpretation, analysis, and overall supervision. Data collection and analyses was performed by AR and MK. The manuscript was written by AR and MK.