AUTHOR=Clarke Neil D. , Taylor John S. TITLE=Taxonomic distribution of opsin families inferred from UniProt Reference Proteomes and a suite of opsin-specific hidden Markov models JOURNAL=Frontiers in Ecology and Evolution VOLUME=Volume 11 - 2023 YEAR=2023 URL=https://www.frontiersin.org/journals/ecology-and-evolution/articles/10.3389/fevo.2023.1190549 DOI=10.3389/fevo.2023.1190549 ISSN=2296-701X ABSTRACT=Opsins are a large and sequence-diverse family of light-responsive G-protein coupled receptors involved in vision, circadian rhythm, and other processes. Numerous subfamilies have been defined based on sequence similarity, cell-type localization, signal transduction mechanism, or biological function, but there is no consensus classification system. Here, we retrieved more than 2000 opsin sequences from UniProt Reference Proteomes using multiple hidden Markov models (HMM) that we built from diverse sub-families of opsins. Opsin-specific HMMs were also used in an annotation procedure that represents sequences as a vector of HMM scores and assesses the similarity of these vectors to those of annotated sequences. UniProt Reference Proteomes are built from genome sequences, allowing us to make meaningful comparisons of the number of opsins in each of the 260 species available at the time of the survey, in absolute terms and relative to a larger super-family of which opsins are a member. Merging opsin counts into higher order taxa paints a broad view of the taxonomic distribution of opsins, and of opsin-subfamilies, annotated according to three different schemes.