Computational Infrared Spectroscopy of 958 Phosphorus-bearing Molecules

Phosphine is now well established as a biosignature, which has risen to prominence with its recent tentative detection on Venus. To follow up this discovery and related future exoplanet biosignature detections, it is important to spectroscopically detect the presence of phosphorus-bearing atmospheric molecules that could be involved in the chemical networks producing, destroying or reacting with phosphine. We start by enumerating phosphorus-bearing molecules (P-molecules) that could potentially be detected spectroscopically in planetary atmospheres and collecting all available spectral data. Gaseous P-molecules are rare, with speciation information scarce. Very few molecules have high accuracy spectral data from experiment or theory; instead, the best available data is from the RASCALL approach and obtained using functional group theory. Here, we present a high-throughput approach utilising established computational quantum chemistry methods (CQC) to produce a database of approximate infrared spectra for 958 P-molecules. These data are of interest for astronomy and astrochemistry (importantly identifying potential ambiguities in molecular assignments), improving RASCALL's underlying data, big data spectral analysis and future machine learning applications. However, this data will probably not be sufficiently accurate for secure experimental detections of specific molecules within complex gaseous mixtures in laboratory or astronomy settings.


INTRODUCTION
Phosphine (PH 3 ) is currently a strong biosignature candidate as there are few, if any, non-biological formation pathways of phosphine for terrestrial planets . A tentative discovery of phosphine in the cloud decks of Venus was recently reported, with predicted abundances on the order of ppb (Greaves et al., 2020) 1 that cannot be explained by non-biological sources . To investigate the presence and formation mechanisms of phosphine on Venus, and to interpret future observations of planetary atmospheres, we must improve our understanding of the chemical networks that may include phosphine. A crucial tool in this process is the ability to detect phosphorus-bearing molecules (P-molecules) that can provide clues to the formation pathways of phosphine, and provide insight into the mechanisms of a possible phosphine-producing biosphere. Gaseous P-molecules can be remotely detected using spectroscopy, but currently very limited spectral data is available for these molecules.
A more in-depth understanding of planetary environments through the interpretation of both archival and future observational data, will require spectral data on all relevant atmospheric molecules. To follow-up potential phosphine detections in Venus and exoplanets will similarly require in-depth analyses of the wider context of these atmospheres, which in turn relies on our ability to detect the P-molecules that participate in the chemical networks where phosphine is present. Thus, discussions and explorations in this paper pioneer key processes and considerations by which an initial biosignature detection can be followed up, and as a by-product identify a wide variety of opportunities and challenges in the field of spectral detection of unknown chemistry (whether geochemical, photochemical or biochemical) that will be crucial for upcoming explorations of exoplanetary atmospheres. P-molecules are particularly interesting in astrobiology due to the phosphorus's ability to create complex organic molecules with unique functionality. Phosphorus plays a universally vital role in cellular metabolism (ATP), storage of genetic information (RNA/DNA), formation of cell membranes (Phosolipids) and in cell regulation (phosphate buffer). Phosphorus, in the form of phosphates (PO 4 3 -), plays an essential role in carbon chemistry as it: 1) maintains constant negative charge in biochemical conditions; 2) most phosphates (especially polyphosphates) are thermodynamically unstable and have multiple energetic intermediates that can enable polyphosphates like ATP to act as rechargeable batteries in nearly all cellular metabolism; and 3) it works as an efficient pH buffer with free phosphate in cellular plasma regulating acidity (Pasek, 2014). Phosphorus is present in all life on Earth (S. et al., 2016) and is expected to be central to life elsewhere. Therefore, understanding the abundance and chemical form of phosphorus on other planets will play an important role in the search for life beyond Earth (Elser, 2003;Hinkel et al., 2020).

Infrared Spectroscopy of Phosphorus-bearing Molecules
• a targeted approach, developed in sub-section 2.1, that iteratively builds up a list of molecules based on known or proposed chemistry in planetary atmospheres including Jupiter, Earth and Venus; • a reaction-agnostic approach, developed in sub-section 2.2, that simply enumerates all molecules that fulfil certain criteria.

Targeted Approach
Our goal in this section is to identify target P-molecules that may be detectable in planetary atmospheres, including species that are predicted to be important for understanding the phosphorus chemistry on Venus. Table 1 details the small number of atmospheric P-molecules that have been explicitly considered.
Let us first clarify important terminology and phosphorus chemistry concepts. PO 4 3is the most oxidised form of phosphorus, with a phosphorus oxidation state of +5, and is generally present in the atmosphere as phosphoric acid, H 3 PO 4 . Other forms of phosphorus are considered to be reduced phosphorus, with PH 3 being the most reduced form of phosphorus (oxidation state of −3), but not thermodynamically favoured at temperatures below 800 K with low hydrogen-pressure (Visscher et al., 2006) or in oxidising environments like modern Earth where it reacts rapidly with OH • and O • radicals. Therefore, the dominant forms of phosphorous on a planet will depend on the planet's (bio)geochemical cycles, as well as whether the atmosphere contains reduced (e.g. H 2 , CO) or oxidised gases (e.g. O 2 , H 2 O, CO 2 ).
Phosphorus compounds are generally categorised as organic (containing carbon) or inorganic (do not contain carbon). Only some of the P-molecules in the atmosphere are volatile, for example large quantities of inorganic phosphorous are dispersed into the atmosphere as coarse solid particles (aerosols) from dusts or combustion sources (Mahowald et al., 2008). Inorganic (poly)phosphates and several organic atmospheric phosphorous compounds are soluble in water and thus bioavailable. Additionally, plant activity can emit complex biogenic P-molecules that aggregate as coarse aerosols, which are insoluble and only transport and deposit organic phosphorous locally, rather than globally (Tipping et al., 2014). 2.1.1 P-molecules in hydrogen-rich reducing gas giants, e.g. Jupiter, Saturn In the reducing environments of Jupiter and Saturn, the most abundant P-molecule is phosphine (PH 3 ). Though phosphine is not the most thermodynamically favourable form of phosphorus at temperatures of the atmosphere, phosphine formed in the hot deep layers is brought to the top of the atmosphere through convection. In modelling this phenomena and seeking to understand the phosphorus chemistry of gas giants, Barshay and Lewis (1978); Fegley and Lodders (1994); Borunov et al. (1995) considered the abundances of PH 3 , PH 2 , PH, P 4 O 6 , P 4 O 7 , P 4 O 8 , P 4 O 9 , P 4 O 10 , PS, P 2 , P, PO, PO 2 , PF, PC, PCl, PN, P 4 and P 3 ; with many of these compounds having very low modelled abundances. P 4 O 6 and P 4 O 10 are particularly notable as they arise often in the literature considering P-molecules and gas-phase chemistry due to their stability, despite their large molecular weight. P 4 O 6 has a boiling point of 173.1 • C, while P 4 O 10 sublimes at 360 • C. These properties implies the vapour pressure and thus gaseous abundance of both compounds may be appreciable, especially in higher temperature environments. It has also been hypothesised that alkyl phosphines, i.e. PR 1 R 2 R 3 , may be formed in hydrogen-rich environment from the photolysis of PH 3 in the presence of hydrocarbons (Guillemin et al., 1995(Guillemin et al., , 1997.

P-molecules expected on Earth
The speciation of phosphorus in the Earth's atmosphere is quite different than gas giants. Earth's atmosphere is an oxidising environment and therefore the reduced species PH 3 is associated solely with biological and industrial activity . Instead, phosphates (PO 4 3 -) are most common, with H 3 PO 4 assumed as the dominant species and the only P-molecule with gas-phase kinetic data . In the context of this paper, the most notable thing about phosphorus on Earth is the almost complete absence of gas phase P-molecules. Most descriptions of Earth's phosphorus cycle (e.g. Schlesinger and Bernhardt (2020)) completely ignore any atmospheric involvement of P-molecules, and focus instead on the much more numerous and biologically critical processes by which phosphorus moves through the lithosphere, hydrosphere and biosphere.
The atmospheric impact of P-molecules can usually be neglected because most P-molecules either have low volatility (such as P 4 O 10 ) and quickly "rain out" into the hydrosphere, or are highly reactive and are destroyed in Earth's oxidising atmosphere. Consequently, few P-molecules are the subject of study in the Earth's atmosphere, and no P-molecules are included explicitly in the two most chemically comprehensive Earth atmospheric models, the Master Chemical Mechanism (Rickard and Young, 2005) and GEOS-chem (The International GEOS-Chem User Community, 2019). notation are provided for molecules with six or fewer non-hydrogen atoms. The abbreviations are as follows: R means in list of AllMol dataset with RASCALL data available (excluded molecules are generally intermediates, radicals or contain more than 6 non-hydrogen atoms), LL means high resolution line list available, ExpDB means present in experimental database other than NIST while ExpLit means we have identified spectra in the experimental literature (a non-comprehensive search), with s, l, aq, g indicating molecular state of spectra, with matrix indicating argon matrix spectra (similar to gas phase).
For the phosphorus that does cycle into the Earth's atmosphere, the conditions are so oxidising that atmospheric budgets have total atmospheric phosphorus (primarily from dust or biogenic aerosols) as generally being oxidised to PO 4 3and deposited into the Earth's oceans (Mahowald et al., 2008). Atmospheric deposition and biological fixation of phosphorus is typically considered negligible in comparison to total phosphorus, and subsequently ignored in terrestrial phosphorus budgets (Zhang et al., 2019).

Infrared Spectroscopy of Phosphorus-bearing Molecules
Experimentally, the speciation of atmospheric phosphorus on Earth is still poorly understood; typical analytical techniques destroy speciation information (Morton et al., 2003), such as acidification of samples to pH 1 in spectrophotometry (Mahowald et al., 2008). Contemporary techniques are able to distinguish between soluble/insoluble phosphorous and inorganic/organic phosphorous (Violaki et al., 2018), but an exact chemical inventory of these species has not been made. Recently, there has been recognition of plant emissions such as phosphate esters, i.e. P(OR 1 )(OR 2 )(OR 3 ), in contributing to atmospheric volatile organic phosphorus, not just to coarse biogenic aerosols ; but the overall impact of atmospheric organic phosphorous is not widely recognised yet.
A perhaps surprising source of information on potential gaseous P-molecules in atmospheres comes from the origin of life literature. Phosphorus is considered an essential component of life, yet dominant phosphorus sources (notably apatite) are only slightly soluble, raising the question of how phosphorus was introduced into the hydrosphere in sufficient quantities to enable life to emerge on Earth (e.g. Yamagata et al. (1991);Schwartz (2006)). Studies into the solution to this phosphorus problem -usually volcanoes, lightning, and meteorites -lead to consideration of some gas-phase P-molecules. For example, Yamagata et al. (1991) discussed the volatilisation of P 4 O 10 from high temperature apatite in volcanoes; P 4 O 10 can then be hydrolysed to form phosphates such as H 3 PO 4 . Schwartz (2006) considers the production of phosphite PO 3 3and phosphorus acid H 3 PO 3 by lightning in volcanoes. Ritson et al. (2020) also proposed that water can react with meteorite mineral to produce organophosphates by reacting with the phosphide species (containing P 3 -) from meteorite mineral enstatite chondrites to produce P-molecules with various oxidation states that are then fully oxidised through photochemical reactions to the bioavailable PO 4 3form. Ablation of cosmic particles can produce phosphorus gases such as PO 2 that then dissociates to PO (Carrillo-Sánchez et al., 2020).

P-molecules expected on Venus
In the observable upper and middle atmosphere, Venus is an oxidising environment due to the high concentration of sulfuric acid -a strong oxidising agent -and the high production rate of oxidising radicals through photolysis (Bierson and Zhang, 2020). Therefore, H 3 PO 4 is predicted to be the most dominant P-containing species in the upper atmosphere , with some phosphate in the form of dehydration products, e.g H 4 P 2 O 7 , H 5 P 3 O 10 . H 3 PO 3 concentration is predicted to be negligible, at tens of milligrams across the whole atmosphere . In the lower Venusian atmosphere, P 4 O 6 is thermodynamically favoured and dominates the chemistry of P-molecules at this altitude (Krasnopolsky, 1989), while P 4 O 10 is disfavoured.
Overall, similar geological processes are likely to occur in Venus as on Earth as the bulk composition for both planets is expected to be similar (Treiman, 2009;Shellnutt, 2013). Some differences are expected due to the differing atmospheric composition (far less O 2 , more CO 2 and more sulfuric acid) (Johnson and de Oliveira, 2019), lack of plate tectonics on Venus (Nimmo and McKenzie, 1998), the higher ground temperature (Taylor et al., 2018) and lack of water oceans (Taylor et al., 2018) on Venus. The effect of these differences on the atmospheric speciation of P-molecules is unexplored.

P-molecules from life on Earth
Life can produce a much richer range of molecular species than geological processes and has the potential to influence the sources and sinks of molecules to drastically impact the atmospheric composition, e.g. enabling 21% oxygen on Earth. P-molecule species produced by life (Seager et al., 2016) include CH 5 O 3 P, C 2 H 8 NO 2 P, two structural isomers of CH 5 O 4 P, H 3 PO 4 and, of most recent interest, phosphine (PH 3 ).
As reviewed by Sousa-Silva et al. (2020), on Earth, atmospheric PH 3 is associated exclusively with life, either through anthropogenic sources (e.g., agriculture), or through its production in anaerobic ecosystems (e.g., lake sediments, marshlands), but has very low abundance (ppt/ppq locally at sites of anaerobic activity). The largest sink for PH 3 on the Earth is destruction by OH • (Glindemann et al., 2005), causing a very short PH 3 lifetime measured in hours .
Despite phosphine being present in a range of environments -almost exclusively anoxic -the exact basis and mechanism for phosphine formation in nature is not well understood. Early work has reported the production of phosphine from mixed bacterial cultures (mixed acid and butyric acid bacteria) in the laboratory (Jenkins et al., 2000). Pasek and colleagues  proposed that phosphite Zapata Trujilo et al.

Infrared Spectroscopy of Phosphorus-bearing Molecules
[H 2 PO 3 ] -2 and hypophosphite [H 2 PO 2 ]are first produced through microbial metabolism, and these compounds are then converted to phosphine by other mechanisms. Bains et al. (2019) suggest that, in some environments, it is a combination of phosphate-reducing bacteria and the coupling with phosphite metabolism that results in phosphine release. Several very recent studies are beginning to provide more informed insights into the potential roles of specific microorganisms and pathways in phosphine production. For example, Fan et al. (2020a) indicated that the production of acetic acid via the tricarboxylic acid cycle promoted the production of phosphine. Most recently it was found that the phosphine production was enhanced when the hydrogen levels were increased (Fan et al., 2020b). The authors suggested that phosphine production was promoted with hydrogen as an electron donor (i.e. H 2 PO 4 -+ H + + 4 H 2 → PH 3 + 4H 2 O), and it was concluded that both reducing power and excess electrons are necessary prerequisites for the production of phosphine. The activity of the enzyme dehydrogenase was shown to be positively correlated with phosphine production (Fan et al., 2020b), suggesting that this enzyme's function in producing electrons and reducing agents contributes to phosphine generation. Furthermore, co-factors such as NADH and riboflavin vitamins were suggested to be key in phosphine production (Bains et al., 2019;Fan et al., 2020b). Given the limited studies and the debate surrounding the exact pathways and the diverse microorganisms potentially involved in phosphine production, significantly more work is needed in this area on the biological basis for phosphine production.

P-molecules potentially involved in phosphine production on Venus
Recently, phosphine has risen to prominence due to its potential as a strong biosignature (with few non-biological sources on temperate planets) and tentative detection on Venus. We refer to the very detailed previous publications for known and proposed geochemical and photochemical networks of phosphine production in exoplanets (Sousa-Silva et al., 2020) as well as an indepth consideration on Venus .
For this paper, we are interested in enumerating the P-molecules identified in these papers as part of the reaction network involving phosphine. The photochemical network proposed by Bains et al. (2020) for PH 3 formation involves many other P-molecules in a radical reaction network: H 4 PO 4 , H 2 PO 3 , HPO 3 , HPO 2 , HPO, PO 2 , PO, PH and PH 2 , several of which will be transient intermediates.

Discussion of targeted approach
The targeted approach followed in this section helped identify molecules of particular interest that are not obvious to non-specialists, such as P 4 O 10 , as well as identifying challenges to remote detection such as the relative low volatility of many P-molecules. This approach has the ability to synthesise relevant interdisciplinary knowledge across sub-fields and enhance the common understanding of the scope, context and limitations of existing disciplinary expertise.
Overall, there is a paucity of modelling work on the atmospheric speciation, kinetics and reaction networks of P-molecules. Though we can identify some species likely to be important, this poor understanding means that it is desirable to consider a much broader range of potential volatile P-molecules, as described in the next section, in order to detect the P-molecules present in a given atmosphere and facilitate the elucidation of atmospheric phosphorus reaction networks. This broad perspective will be particularly critical for characterising diverse exoplanet atmospheres.

Reaction-agnostic Approach
The search for extra-terrestrial life currently relies on the detection of biosignature gases, i.e. gases produced by life that accumulate in the atmosphere and could be detected remotely (Schwieterman et al., 2018a). A molecule's suitability as a biosignature is impacted by multiple factors (e.g. the host planet's atmospheric chemistry and geology) and so any selection criteria are necessarily broad. There is an unavoidable Earth-centric bias in the search for life in the universe; nonetheless, it is laudable to attempt to approach the search for biosignatures on exoplanets as agnostically as possible. With that in mind, Seager et al. (2016) proposed a list of more than 16,000 molecules (hereafter AllMol list) that may be associated with life and are likely to be volatile in the atmosphere of potentially habitable exoplanets.
The AllMol list contains molecules with up to six non-hydrogen atoms that are expected to be volatile and stable at Earth's standard temperature and pressure (STP). Volatile molecules were estimated as those with boiling points below 150 • C, as most molecules with boiling points above this temperature are likely to be nonvolatile. Stability was interpreted as molecules being able to remain as pure entities under STP Zapata Trujilo et al.

Infrared Spectroscopy of Phosphorus-bearing Molecules
conditions and not reacting readily with water. The cutoff of molecules with up to six non-hydrogen atoms was chosen as it implies volatility for a substantial fraction of molecules, including several molecules that are currently studied as biosignature gases.
The AllMol list contains 2,130 P-molecules made of the elements C, N, O, F, S, Cl, Se, Br and I. However, for our quantum chemistry studies, we have excluded molecules containing the rare elements Se, Br and I, as they posed additional computational difficulties that were not worth addressing given the rarity of these elements. Specifically, an initial guess geometry could not be generated for the Se-containing molecules, while Br-and I-containing molecules had much larger computational cost due to the large number of electrons. For completeness, we have included the water-reactive PCl 3 and PF 3 molecules in our calculations due to their relevance in organophosphorus chemistry, thus leading to a working list containing containing 962 P-molecules. Figure 1. Distribution of AllMol list's 2,130 P-molecules as a function of total number atoms and nonhydrogen atoms are shown to the left and right top plots, respectively. The total distribution is displayed in grey, with those in blue corresponding to molecules that do not contain Br, Se, and I; our working list of 962 molecules. Figure 1 presents the distribution of the molecules present in the full 2,130 P-molecules list and our working list with 962 molecules, considering the total number of atoms (left), the total number of nonhydrogen atoms (right). For both lists, most molecules have between 9 to 14 total atoms and five to six non-hydrogen atoms.
For further insight into our working list, Figure 2 shows the count of elemental composition and various combinations of elements, and the number of atoms in each combination of elements. Carbon and Hydrogen are the most abundant elements in P-molecules overall, but there are actually more C a H b P c Cl d than C a H b P c molecules. Almost all our working molecule set contained carbon, i.e. were organic; only 20 molecules were inorganic.
Another useful categorisation is to consider the bonds to the phosphorus atom within our working set of 962 P-molecules; broadly speaking, this determines the class of the compound. These statistics are detailed in Table 2. The dominant class were organophosphines (602/962) in which the phosphorus atom was bonded only to carbon or hydrogen. Phosphorus-oxygen single or double bonds were the other key phosphorus bond types, with 203 molecules with P=O only, 61 with P-O only and 73 with both P=O and P-O bonds. A small number of compounds have P-S, P=S and/or P-N bonds. Only 42 compounds had no P-C bonds at all.
Despite the large number of molecules considered, the constraints imposed when constructing AllMol exclude many molecules considered in our targeted approach, especially large phosphate oxides such as P 4 O 10 and radicals such as the diatomics PN, PC, PO and PH. Therefore, there is surprisingly little overlap between our targeted molecule set and our working list of 962 P-molecules. This small overlap probably occurs due to the sparsity of gas-phase P-molecules in Earth-conditions and would not be expected for many other elements. Figure 2. Distribution of elements within our subset of 962 P-molecules. The horizontal histogram on the left details the number of molecules that contain the corresponding element to the right. The dots create sets of elements that form various molecules, with the counts of those sets shown in the histogram above. For example the first row contains 197 molecules that are made up of only Cl, C, H, and P. The plot above the histogram details the distribution of molecules' size, given by total number of atoms in each set of elements.

INFRARED SPECTROSCOPY
Infrared (IR) spectroscopy is currently the technique of choice for future searches of extrasolar biosignatures in planetary atmospheres (Schwieterman et al., 2018b). The successful identification of molecules requires available reference spectroscopic data. Therefore, in this section we consider pre-existing experimentallyderived data, RASCALL-generated data based on functional group decomposition and the newly generated CQC spectral data based on computational quantum chemistry (CQC) calculations for the P-molecules considered.

Existing Experimentally-derived Data
The infrared spectral data available for P-molecules are relatively sparse, especially in the gas-phase. There are three main sources of data: (1) line lists of spectral positions and intensities, (2) experimental databases usually containing only un-digitised image spectra (often in liquid phase), and (3) individual papers. Of these, only line lists provide astronomers with accessible data in a format suitable for molecule detection.
Extensive line lists containing spectral line positions and intensities in the infrared spectral regions are available for 11 P-molecules. These data are generated individually for each molecule by combining the best available experimental data and ab intio CQC calculations, with the latter particularly necessary for dipole moments. There are two broad methodological approaches: the variational approach solving the nuclear motion Schrödinger equation on an explicit potential energy surface, and the empirical approach where model Hamiltonian constants are used. Specifically, line list data is available in the centralised ExoMol database (Tennyson et al., 2016bWang et al., 2020) in a standardised format for the Other Potentially Overlapping Categories 269 No P-H bonds 42 No P-C bonds 26 Contains P=S and/or P-S bond 21 Contains P-N bond 7 Contains Aromatic P-bearing ring Table 2. Categorisation of P-molecules within our 962 working set according to the phosphorus bonds in the molecule. Only categories with more than 5 molecules have been included.
Experimental infrared spectral absorption cross-sections have been collated mainly by national institutes such as the USA National Institute for Standards and Technology (NIST, Chu et al. 2020) and the Japanese National Institute of Advanced Industrial Science and Technology (AIST 3 ). NIST's extensive database of spectral data contains cross-sections with a wide range of accuracy, resolution, and instrumental setups. For hundreds of molecules, NIST is the only source of spectral data, and as such mistakes can go unnoticed for many years (e.g., liquid state spectra mislabelled as gas phase, incorrectly assigned spectra or vibrational modes (Sousa-Silva et al., 2019)). AIST's Spectral Database for Organic Compounds has a useful feature to sort by molecular weight as a proxy for volatility as well as a helpful ability to search by spectral features in a given spectral range. For the P-molecules considered here we have identified two (C 2 H 7 O 3 P (O=P(OC)OC), CH 5 O 3 P (O=P(O)(C)O)) 4 matches in the NIST database, and three ( Often when infrared data cannot be found in databases or linelists, they may still be found in individual papers. However, in the absence of a centralised database, identifying and processing data for individual Zapata Trujilo et al.

Infrared Spectroscopy of Phosphorus-bearing Molecules
molecules is time-consuming due to the diverse literature and poor data digitisation. For some target Pmolecules, we performed a non-exhaustive literature search for experimental data. Usually, the spectral data was contained solely in figures, and digitisation software would be needed. Papers containing experimental infrared data from the twentieth century, when a large body of these papers were written, often suffer from problems of saturation (concentration too high for linear absorption), low resolution, and insufficient detail to obtain absorption cross-sections. These problems can result in intensities that are inconsistent with lower pressure measurements, the fine structure being obscured, and difficulty assessing the abundance of the molecule being identified. In the case of more transient molecules generated by pyrolysis, photolysis or in situ reaction, the target species may be produced in a mixture of gases, its' partial pressure unknown and the spectra affected by bands from other species present. Thus, experimental data can be used for visual identification of molecules but is generally unsuitable for use by astronomers.
Data useful for astronomical purposes can be obtained by measurements of the gaseous, low-pressure, infrared spectra of molecules at the very low temperatures achievable in a jet expansion (representing the interstellar medium) and room temperature (representing temperate potentially habitable planets). This data should be produced at high resolution in the full mid-infrared range and distributed in digitised format either as a spectra or as individual line identifications. Typically, we would expect the difficulty of the experiment to depend primarily on the ease of acquiring a pure gaseous sample of a molecule. Stable molecules with a high vapour pressure that can be bought from commercial chemical companies would usually be measurable in low-resolution in a day though high resolution scans would take a few weeks, especially if data is taken for multiple temperatures. Unstable molecules or those not commercial available would require more extensive synthesis.

Predicted Spectra from Functional Group Decomposition: RASCALL
RASCALL was the first approach to address the large deficiencies in infrared spectral reference data for atmospheric molecules by mass data production using computational approaches. The RASCALL program produces spectral data stored in the RASCALL database; a living document constantly updated . Currently, the database contains spectral data for 15,477 out of the 16,368 AllMol molecules, with a total of 201,985 fundamental frequencies. The RASCALL database contains spectral data for 1,992 P-molecules with 44 different functional groups, only four of which specifically consider P atoms (namely P -H bend and stretch, P --O stretch, and (O --)PO -H stretch). Figure 3 illustrates the functional groups frequency distribution across the RASCALL data, highlighting those frequencies corresponding to P-molecules. The P-H stretch and bends are the most ubiquitious P-molecule-specific functional group, with significant numbers of P=O stretches. Of particular interest is the sparsity surrounding the P-H stretch functional group at 2360 cm −1 . This region could represent an interesting signal to look at when searching for molecules with the P-H functional group. The main contaminant will be CO 2 which has a strong absorption peak at 2,350 cm −1 that, in high abundances and at low resolution, can obfuscate the spectral feature from the P-H stretch. However, this CO 2 spectral band is usually much more narrow than that caused by P-H stretch, allowing for multiple strong transitions in the wings of the P-H band to be detectable (e.g., as is the case between PH 3 and CO 2 ), especially when the P-H stretch frequency is shifted slightly in different environments. The figure also shows that the majority of functional groups in the data set are shared by molecules with and without phosphorus, with the most prominent one corresponding to the C-H stretch near 3,000 cm −1 .
RASCALL is a computational method that does not utilise quantum chemistry but relies on structural chemistry, especially on functional group theory, to efficiently produce approximate molecular spectral data for arbitrary molecules (Sousa-Silva et al., 2019). As functional groups account for characteristic spectral features, RASCALL estimates the contribution of each functional group present in a given molecule to generate a first approximation to the molecule's vibrational spectrum. The spectrum given by RASCALL is composed of the approximate vibrational frequencies of the molecule's most common functional groups together with the qualitative intensity for each frequency. The functional group database contains more than 100 functional groups and is also a living document, updated as new spectrally active functional groups are identified.
RASCALL is an extremely quick and powerful approach, but the functional group approach has some inherent limitations. Most notably, the approximate spectra predicted by RASCALL are based on identified functional groups without taking into account their neighbouring atoms and bonds. This functional group approach makes it nearly impossible to predict the non-localised vibrational modes in the fingerprint Zapata Trujilo et al. Figure 3. Frequency distribution for all functional groups within RASCALL. The blue bars correspond to the functional groups present in all 1992 P-molecules in RASCALL that do not involve P and the orange bars are functional groups containing P. The green bars correspond to functional groups that are not included in any of the P-molecules present in RASCALL. Data bins of 20 cm −1 has been selected for visibility. spectral region from 500 -1450 cm −1 , and restricts the accuracy of local mode predictions in diverse environments.

Infrared Spectroscopy of Phosphorus-bearing Molecules
Ongoing RASCALL updates will expand functional group definitions to help address this weakness. For example, consider the O-H stretch region near 3,600 cm −1 in Figure 3, where there are few spectral features in our data. As the spectral behaviour of the O-H stretch strongly depend upon the remaining atoms and bonds in the molecule, and consequently vary widely between molecules, RASCALL does not consider it a single functional group. Instead, RASCALL uses a categorisation criteria based on different O-H sub-groups that must be considered individually to provide more realistic O-H stretch frequencies.
Currently, the RASCALL database has categorised only a small portion of the O-H variants and those affecting P-molecules have not been included yet.
RASCALL currently only provides qualitative intensities ranging from 1 to 3, representing weak to strong absorption, respectively; this is an area of active method development.

Method
An alternative to the RASCALL approach is to use standard computational quantum chemistry (CQC) approaches to directly solve the Schrodinger equation (within a given approximation) and predict vibrational frequencies and intensities of input molecules. Our goal here is to develop the first version of the harmonic CQC-H1 procedure: a high-throughput, largely automated, reliable approach that can be used for hundreds to thousands of molecules by taking as input the molecule's Simplified Molecular Input Line Entry System (SMILES) notation to produce computationally-derived infrared spectra.

Infrared Spectroscopy of Phosphorus-bearing Molecules
Harmonic frequency and intensity calculations for our CQC-H1 approach were performed under the standard double-harmonic approximation utilising the ωB97X-D hybrid functional (Chai and Head-Gordon, 2008;Alipour and Fallahzadeh, 2016) together with the augmented def2-SVPD basis set (Weigend and Ahlrichs, 2005;Rappoport and Furche, 2010). This model chemistry combination (i.e. hybrid functional/double-zeta basis set augmented with diffuse functions) was chosen as it reproduces reliable dipole moments (Zapata and McKemmish, 2020), a key component for vibrational intensities, and ωB97X-D represents a good general purpose hybrid density functional (Goerigk and Mehta, 2019). Harmonic frequencies were also scaled using a multiplicative scaling factor of 0.9542 (Kesharwani et al., 2015a). All calculations were performed with the Gaussian 16 quantum chemistry package (Frisch et al., 2016).
The initial geometries for all 962 P-molecules were optimised using a tight convergence criteria (maximum force and maximum displacement smaller than 1.5x10 −5 Hartree/Bohr and 6.0x10 −5Å , respectively) and an ultrafine integration grid (99 radial shells and 590 angular points per shell). For approximately 50 molecules, the jobs did not converge to a minima with the automated approach and needed manual intervention, most commonly recomputing an input geometry in Avogadro (Hanwell et al., 2012). Four molecules were excluded from our analysis due to geometry convergence problems (see sub-section 3.3.4), leading to a total of 958 P-molecules considered in our CQC-H1 approach. Figure 4 presents the frequency distribution of the scaled CQC-H1 harmonic frequencies compared to the RASCALL data, considering the 868 molecules with data available from both sources; incorporating CQC-H1 data from all 958 molecules (total of 28,152 frequencies) produces a similar frequency distribution. The fundamentals bands predicted by the harmonic calculations are predominantly found in the region below 500 cm −1 , within 600 -1,400 cm −1 (the fingerprint spectral region) and in the 2,900 -3,100 cm −1 domain characterised mostly by the C-H stretches. Like the RASCALL data, our calculated harmonic data also exhibits a small number of signals between 2,000 and 2,700 cm −1 , apart from the signals around 2,360 cm −1 , corresponding to the P-H stretch signal.   Figure 4 highlights how the CQC-H1 approach, unlike RASCALL, can differentiate the frequencies at which functional groups absorb based on the specific chemical environment surrounding that functional group. For example, RASCALL places all C-H stretches at a particular frequency value (prominent blue bar at 2,923 cm −1 ), whereas the scaled harmonic calculations for this functional group result in frequencies that are spread over a larger frequency window. The figure also demonstrates the capability of the quantum chemistry calculations to provide data in the fingerprint region of the spectrum (500 -1450 cm −1 ), as all normal modes are computed (by comparison, RASCALL only predicts around 45% of the normal modes). Calculation of normal mode frequencies in the fingerprint region poses a fundamental and probably insoluble challenge to the RASCALL approach, as these fingerprint modes involve motion of large portions of the molecule rather than the movement of isolated functional groups.  To further illuminate the differences in spectral predictions, Figures 6 compares the RASCALL and CQC-H1 vibrational spectra for a selection of P-molecules relevant to planetary bodies, and Figure 7 does the same with P-molecules formed through biotic processes (see Table 1). The SMILES code for each molecule as well as the maximum intensities for the harmonic data is shown in the figure and indicates the vertical scaling of the data.

Infrared Spectroscopy of Phosphorus-bearing Molecules
Across the molecules presented in these two figures, different degrees of agreement can be observed between RASCALL and the CQC-H1 data. Overall, for most molecules there is clear semi-quantiative agreement in the location of peaks across both sources of data, while RASCALL often overestimates the intensity of weak bands, especially around 3,000 cm −1 . As an example, the RASCALL spectrum for methylphosphonic acid (O=P(O)(C)O top right of Figure 7) shows a high intensity peak for the C-H stretch with nothing shown in the harmonic CQC-H1 data as the band intensity is significantly low, highlighting the limitations in RASCALL's intensity approximations. Regarding the band positions, several of the subplots in both figures show a shift of more the 20 cm −1 in the RASCALL data corresponding to the P-H stretch (around 2,360 cm −1 ). This likely arises from inadequacies in the P-H frequency data in RASCALL and could be easily corrected with an update using our new P-molecule CQC-H1 data. These figures also provide further evidence of the current deficiencies in the treatment of O-H stretches in RASCALL. Figure 6. Comparison of the RASCALL (blue) and scaled harmonic (grey) quantum chemistry data available for eight of the P-molecules mentioned in Table 1. The SMILES code is presented for each molecule as well as the largest values for the predicted intensities (km/mol) with the harmonic calculations. Figure 8 reports the vibrational spectra for two P-molecules for which both theoretical and experimental data from NIST is available. The top figure shows an overall fair agreement between the different sources of data, especially in the C-H stretch region where both the NIST and scaled harmonic CQC-H1 data are very alike. In the fingerprint domain (500 -1450 cm −1 ), the agreement is somewhat less obvious, but still a qualitative similarity is found between the NIST and CQC-H1 data. RASCALL, as previously stated, performs less accurately in this area. On the other hand, there is poorer agreement in the bottom figure as the data collected from NIST corresponds to the solid-phase spectrum for methylphosphonic acid Zapata Trujilo et al. Figure 7. Comparison of the RASCALL (blue) and scaled harmonic (grey) quantum chemistry data for six P-molecules produced by life (according to the AllMol list). The SMILES code is presented for each molecule as well as the largest values for the predicted intensities (km/mol) with the harmonic calculations.

Infrared Spectroscopy of Phosphorus-bearing Molecules
(O=P(O)(C)O) as gas-phase spectrum is not available (or at least not easily accessible). We can see that RASCALL only provides data for the C-H stretches, disregarding the O-H stretches present in the molecule. Though the scaled harmonic calculations do supply frequencies for both the C-H and O-H stretches, the calculated intensities for the C-H stretches are significantly lower and they are therefore overshadowed by the other frequencies. In the fingerprint region, the agreement between the experimental and scaled harmonic data is somewhat better, with the scaled harmonic frequencies being slightly off and missing some bands. This discrepancy could be due to anomalous frequencies or hydrogen bonding manifesting in the solid-phase spectrum. A gas-phase spectrum would be preferred for a more meaningful comparison.
Something particularly important to highlight from this analysis is that among all the 958 P-molecules considered in our study, we could only find easily accessible reference spectroscopic data for the two molecules illustrated in Figure 8, justifying the necessity of supplemental sources of reference data.

Consideration of Anharmonic Vibrational Treatment
Further improvements to our harmonic CQC-H1 approach may be achievable by performing the calculations with the more expensive and complete Generalised Vibrational Second-order Perturbation Theory (GVPT2) approximation (Barone, 2005;Puzzarini et al., 2019a); an anharmonic method that allows the calculation of overtones and combination bands along with the fundamental frequencies. We tested Zapata Trujilo et al. Figure 8. A comparison of the RASCALL, quantum chemistry and experimental data for two P-molecules for which experimental data is available. The figure also presents the SMILES code for each molecule.

Infrared Spectroscopy of Phosphorus-bearing Molecules
this approach (hereafter named as CQC-A1) by calculating anharmonic frequencies and intensities for 250 smaller molecules in our dataset, finding significant issues.
Specifically, substantial deviations were found between the scaled harmonic (CQC-H1) and anharmonic (CQC-A1) fundamental frequencies across the molecules tested. These deviations were predominately observed in two cases: (1) low-frequency transitions with small force constants, where differences of up to a factor of 50 between the scaled harmonic and anharmonic frequencies were found; and (2) the P-H stretch frequencies, where the anharmonic frequencies are 200 -700 cm −1 or higher over the scaled harmonic ones. The first issue is well-known for large amplitude vibrations (LAMs) and is an inherent limitation of perturbative approaches, while the second issue is far more concerning and can be traced back to deficiencies in the density functional to compute higher-order derivatives of the potential energy, as recently noted by Barone et al. (2020).
The theoretical foundations, technical specifications and analysis of our anharmonic results are provided in Appendix A, including a discussion of these two key failures and their likely causes.
For the purposes of the main article's objectives, we can conclude that (1) anharmonic GVPT2 calculations are not yet suitable for automated high-throughput calculations due to the prevalence of unexpected anomalous unreliable results and (2) to establish the best anharmonic treatment will require careful testing against experimental frequencies and some criteria that identifies when calculations are unreliable.

Challenges, Limitations and Future Directions
The calculation of vibrational spectra for all P-molecules has various challenges and limitations that are worth discussing.
Our automated approach for obtaining molecular geometries is generally successful, but some issues and limitations were found with its performance. First, as the current version of the libraries used in our automated script for initial geometries does not support the optimisation of molecules containing Se, these molecules were excluded from our final data set. Second, the generated geometries often optimised to saddle-points (indicated by one or more imaginary frequencies) and needed to be manually corrected. As a matter of fact, one of the C 4 H 7 P isomers (CC#CPC) had to be removed from our working set due to the prevalence of imaginary frequencies in the calculation despite testing the available options to deal with this problematic; future work will attempt to automate this process to enable high throughput calculations. Finally, for three molecules (namely C 2 H 4 FO 2 P (O=P1(O)C(F)C1), C 3 H 4 ClOP (O=PCC=CCl) and C 2 H 5 O 2 P (O=P1(O)CC1)), the geometry optimisation procedure led to their decomposition in the final state, due to their inherent instability. These molecules were identified through their anomalous vibrational partition functions that were calculated for a future application, but could have easily been missed.
Perhaps the most significant limitation of our current method is that only one conformer, as calculated by our automated approach, was considered. However, other conformers may certainly have lower energies than the ones used in our calculations. This limitation will be addressed in the future by consideration of multiple conformers generated by an automated semi-empirical conformational search followed by DFT optimisations. Data for the low energy conformers will be presented concurrently in the database alongside their relative energies to enable a Boltzmann-weighted summation of their contributions at the target temperature to be used in spectral predictions.
In this study, we have only considered one model chemistry (ωB97X-D/def2-SVPD); however, the level of theory (e.g. density functional approximation), basis set, vibrational treatment and software package can be easily modified within the same analysis framework as new software capabilities and better benchmarking results become available. Indeed, beyond the anharmonic approach discussed above, we are also very interested to explore a hybrid approach, where harmonic calculations are performed at a very high level of theory (usually corresponding to CCSD(T) or B2PLYP calculations with larger basis sets) and the calculated frequencies and intensities are then corrected by means of GVPT2 anharmonic calculations performed at a computationally less-demanding method (e.g. hybrid functionals coupled with double-zeta basis sets) (Biczysko et al., 2010;Barone et al., 2014;Biczysko et al., 2018). The method has shown to provide reliable results for small to medium-sized molecules at reasonable computational times (though significantly longer than the current method), and will be considered for future work.
Our analysis has not included isotopes because the non-dominant isotope has abundances below 4.5% in all cases except for chlorine. Nevertheless, expansion to isotopes is straightforward in Gaussian and will be considered in future work.
Ideally, the calculated spectra should incorporate the true rotational profiles associated with the vibrational bands. The necessary band-by-band A, B and C rotational constants and dipole moments in the principal molecular axis are given within the Gaussian output file. An automated method for the generation of rotational spectra and rotational envelopes for each vibrational band from calculated rotational constants and dipole moment components will be considered in a future publication.

Synergies between RASCALL and CQC data
RASCALL and our CQC approach are symbiotic methods. RASCALL supplies preliminary data on any arbitrary molecule, providing guidance and helping to prioritize theoretical calculations. Conversely, CQC data can easily contribute to the refining, expanding, and improving the functional group data that are the primary input for the creation of RASCALL data. For example, a major limitation of RASCALL is the reliance on good data for the prediction of the spectral behavior of different functional groups. In RASCALL 1.0 (Sousa-Silva et al., 2019), these data are generated from experimental spectra and/or theoretically extrapolating from existing functional group data. Future updates to the RASCALL database can use a small number of CQC calculations to parameterise these functional group data; specifically, infrared spectra can be computed for a representative series of molecules containing a functional group, with the average predicted vibrational frequencies and intensities extracted for the functional-group related vibrations (as identified most robustly through consideration of the vibrational eigenvectors). In this Zapata Trujilo et al.

Infrared Spectroscopy of Phosphorus-bearing Molecules
way, a relatively small number of high level CQC calculations can be used to parameterise RASCALL. Subsequently, RASCALL can predict vibrational spectra for very large molecules beyond the reach of traditional CQC methods, a key future application of this approach.

DISCUSSION
In this section, we discuss a few important aspects of this research, including: the diverse potential uses of these data; how the spectroscopic data work alongside the kinetic and reaction network data to enable better understanding of remote gaseous environments; and a brief discussion of the advantages and challenges of our interdisciplinary approach for biosignature detection follow-up.

Data Utilisation
Our data predicts semi-quantitative spectral intensities for most of the P-molecules studied for the first time, essential information to assess detectability in remote environments. Molecules with strong transition intensities can be far more easily detected than those with weaker transitions. For instance, on Earth, the very strong infrared absorption of CO 2 , which, at over 0.041% of the atmosphere, dominates associated spectroscopy and majorly influences global temperatures, while O 2 at 21% atmospheric concentration does not absorb infrared light due to selection rules and only has very weak (forbidden) visible transitions. The data in this paper provides sufficiently accurate intensity predictions to both rank molecular detectability and place good thresholds on the minimum observable abundance of molecules in a given environment.
Accuracy requirements for frequencies are much more demanding and certainly our CQC results, as expected, do not reach spectroscopic accuracy, unlike some molecule-specific line list approaches. Thorough error analysis is beyond the scope of this paper but is certainly a worthwhile future pursuit. For the CQC-H1 harmonic data, we estimate our errors as 38 cm −1 based on the root-mean-squared error of the scaling factor of the ωB97X-D/def2-SVPD model chemistry from Kesharwani et al. (2015a) which was calculated for 119 experimental frequencies of 30 molecules, similar to other model chemistries with hybrid functionals and augmented double-zeta basis sets. RASCALL errors are expected to be larger but this needs to be verified by comparison to experimentally-measured frequencies.
Despite being unsuitable for definitive molecular identification in complex gaseous mixtures such as remote atmospheres, our frequency information provides useful information for remote characterisation of gaseous environments such as planets. First, our data can be used to categorise molecules into groups that may be difficult to disambiguate with observational data at certain resolutions and spectral windows. Second, our data can help assess the difficulty of detecting a molecule or class of molecules and identify optimal spectral windows by considering the specific molecule amongst possible contaminants. For example, the major contaminant to the P-H peak prevalent in our P-molecules is the CO 2 infrared absorption, which can then be closely considered as discussed above. Finally, the scope and accuracy of our data is still enough to both comprehensively build up and selectively constrain a pool of molecular candidates that may be responsible for a particular signal.
The value of this is evident when considering the detection of phosphine on Venus and subsequent debate about whether the single observed microwave line possibly arose from a different molecule. According to the data available to the involved astronomers, the only possible contender for the signal was the nearby absorber, SO 2 , which could be ruled out. However, while the available data did cover the molecules that are likely to be most abundant in that context, it was limited in coverage; the data we produce as part of this work could support a much more comprehensive investigation for similar detections in the infrared region.
Another key use of this data is the assignment of experimental spectra. For example, computational quantum chemistry calculations have been used previously to correct misassignments in P-molecule infrared spectroscopy (Robertson and McNaughton, 2003;McNaughton and Robertson, 2006). The CQC approach can be useful to aid molecule identification for experiments with complex molecular mixtures formed, for example, by a discharge or as reaction products.
Finally, the generation of a large molecular dataset is worth consideration within the context of machine learning (ML). Certainly, the last few years have witnessed a delayed but definitive permeation of techniques and approaches from the latest wave of artificial-intelligence research, i.e. deep learning, into chemistry (Butler et al., 2018;Tkatchenko, 2020) and, more broadly, the physical sciences (Carleo et al., 2019). This influence has extended to the production of infrared spectra, with, for example, one study considering the hybridisation of ML and molecular-dynamics simulations (Gastegger et al., 2017).

Infrared Spectroscopy of Phosphorus-bearing Molecules
More recently, VPT2 calculations have been mixed with data generated by neural networks to explore anharmonic corrections to vibrational frequencies (Lam et al., 2020). However, ML is more traditionally used in processing pre-produced data, and, indeed, ML models can be trained on a variety of relations present in the dataset within this work. Certain coding packages support such an approach, for example, the Python-based DeepChem (Ramsundar et al., 2019), which wraps around RDKit (RDK, 2000) to convert molecular SMILES codes into hashed extended-connectivity fingerprints (Rogers and Hahn, 2010); these breakdowns of molecular structure are useful as input feature vectors for ML models. Consequently, it is possible to efficiently learn how combinations of molecular substructures influence infrared frequencies and intensities; RASCALL is based on a similar principle derived from domain knowledge in organic chemistry that functional groups determine infrared frequencies and approximate intensities. It is likely that ML can improve on the RASCALL dataset by providing updated functional group information extrapolated from CQC data. As a related example, Kovács et al. (2020) recently explored ML for predicting infrared spectra for polycyclic aromatic hydrocarbons based on a NASA Ames dataset of more than 3,000 spectra. Given that ML benefits from the statistical power provided by big data, the high-throughput nature of our CQC results is particularly valuable in fuelling the performance of future ML models.

Molecules in Reaction Network Modelling
Important species in reaction networks might be difficult to detect remotely as they have low concentrations, e.g. the important OH radical in Earth's atmosphere is mostly detected with in situ measurements (Stone et al., 2012;Piccioni et al., 2008). Therefore, the spectroscopic measurements need to be combined with a chemistry-based reaction network model that contains reaction rates for all molecules in the atmospheric system. For observable species, if the observed abundance is very different from the predicted abundance this could be due to incorrect model predictions, misinterpreted data, or could indicate unusual chemistry that warrants further investigation (e.g. the detection of phosphine on Venus).
To help readers understand the strengths and limitations of existing approaches in reaction network modelling and kinetics rate predictions, we provide appendices with brief summaries of the current approaches in these fields. Appendix B provides an overview of reaction network modelling, which is important to contextualise the sources and sinks of volatile molecules. Appendix B focuses on approaches to modelling the Earth's atmosphere and references some introductory texts on the more limited reaction network modelling of exoplanets. Appendix C introduces the fundamentals of theoretical kinetics calculations, which can be used to supplement rate constants whenever they are missing from reaction networks. Popular codes for performing theoretical kinetics calculations are also referenced in Appendix C.
The application of reaction networks and kinetics modelling is considered below for the specific situation of the potential for atmospheric formation of phosphine on Venus.

Constraining Models Involving Phosphine on Venus
The preceding sections of this paper present spectra for a wide array of P-molecules that could feed into PH 3 formation in the Venusian atmosphere. Ultimately, if PH 3 were being formed from volatile P-molecules and not through a geological process, these P-molecules must decompose into single-phosphorus molecules that can be successively reduced to PH 3 . To understand this process and elucidate potential abiotic pathways to PH 3 , we ideally want spectroscopic measurement of the Venusian atmospheric concentrations of these PH 3 -precursor P-molecules which could greatly constraint the Venus atmospheric models. The high-throughput spectra in this paper are a first step towards these future spectroscopic measurements.
The thick cloud layers in Venus's atmosphere around 48 -70 km prevent significant solar radiation from penetrating to the lower Venusian atmosphere (Titov et al., 2007;Bains et al., 2020). The atmospheres within and above the cloud deck, however, do receive significant solar radiation (Titov et al., 2007). Thus this middle Venusian atmosphere, like Earth, is largely driven by reactions with radical species formed through photochemistry (Prinn and Fegley, 1987). A photochemical network approach can therefore be used to simulate the composition of the middle atmospheres of Venus, which typically has temperatures of 200 -350 K (Ando et al., 2020). This approach is considered Bains et al. (2020) when modelling the production and destruction of phosphine in Venus.
However, the atmospheric processes of Venus are far less studied than for Earth and much data is missing. Even in modelling the most abundant species in the middle and lower atmospheres of Venus (SO x , CO x , Cl • /HCl), Bierson and Zhang (2020) note that ∼40% of their reaction rates used have no experimental measurements, and those that are measured are only upper limits, or for a single temperature.

Infrared Spectroscopy of Phosphorus-bearing Molecules
This statistic reveals much is unknown about the reaction rates of core Venusian atmospheric processes, especially minor cycles like phosphorus reactions. Bierson and Zhang (2020) highlight which rates contained in their Venus atmospheric model are of highest priority for experimental measurement or ab initio prediction. The photochemical model of PH 3 formation by Bains et al. (2020) presents similar crucial reactions that can be considered a priority in terms of: spectroscopic detection of these species (or their precursors), lab-based measurement of key reaction rates with radicals, or ab initio calculations. These are radical-mediated reactions that could generate the direct precursors of PH 3 , as shown in Scheme 1.
Proposed, network limiting, reactions of P-molecules in model of photochemical PH 3 production by Bains et al.
The spectroscopic detection and quantification of any of the intermediates shown in Scheme 1 would greatly help constrain photochemical models of PH 3 formation. High quality line lists are available for PO from ExoMol (Prajapat et al., 2017b). However, spectroscopic signatures of molecules in Scheme 1, such as the P -H stretch, P --O stretch and P -H bending modes, are common to multiple P-molecules (see Figure 3). Therefore a moiety-based approach to predict spectra, like RASCALL, will yield false positives, and the CQC spectra presented in this paper provide an improvement for the identification of these intermediates. In the absence of spectroscopically determined abundances of these P-molecules, reaction network modelling must be used.
The reactions in Scheme 1 generate immediate PH 3 -precursors and are the presumed bottle-necks in the photochemical reaction pathway. These key reactions also lack any reaction rate data. Instead, surrogate rate data from equivalent nitrogen-containing species undergoing the equivalent reaction are used . However, the nitrogen surrogate reaction energies differ by ∼50 -60 kJ/mol from calculated energies of the actual phosphorous compounds Chase, 1998). In theoretical kinetics calculations each 10 kJ/mol difference in activation energy can alter calculated reaction rates by an order of magnitude. Therefore, the use of nitrogen surrogates could lead to misestimation of rates by several orders of magnitude, with implications on the importance of Scheme 1 reactions as network bottlenecks. Therefore, rate data crucially needs to be determined for the true phosphorous compounds in Scheme 1, either experimentally or with ab initio calculations. The instability of HPO and PO • limits the availability of lab-based kinetics studies , but their chemistry can be calculated with high-level quantum chemical methods since only 2 -3 atoms are involved. High-accurate composite ab intio methods can calculate energies of these small systems to kJ/mol, or even sub-kJ/mol, accuracy (Karton, 2016;Tajti et al., 2004;Karton et al., 2006). After accurate calculation of the geometries and energies of these reactions, the theoretical kinetics methods outlined in Appendix B could be used to calculate reaction rates. In fact, many molecules in the atmospheric reaction networks are likely to be transient and hard to detect, so theory may provide the most viable route to good estimates of their reaction rates.

Initial Interdisciplinary Survey Approach to Biosignature Followup
Astrobiology and the related study of the chemistry of planetary atmospheres are such a diverse fields that no single person can be an expert on all aspects. Instead, interdisciplinary collaborative approaches are essential.
Establishing productive interdisciplinary collaborations is rewarding but challenging, and proved essential in this pilot to appreciate diverse aspects of biosignature follow-ups. We found that astronomers, geologists, origin of life researchers, experimental spectroscopists and computational spectroscopy theorists and data Zapata Trujilo et al.

Infrared Spectroscopy of Phosphorus-bearing Molecules
scientists all had significant core knowledge -sometimes trivial in their field but unknown to others and useful in combination. Identifying and refining the salient contributions of each sub-discipline -often not what was originally anticipated -and placing it within the context of this work required time and frequent communication, aided by modern technology tools. As a concrete example, the scarcity of gaseous P-molecules and the relative lack of knowledge on P-molecule speciation in Earth's atmosphere was surprising to many authors. Unexpectedly, most key knowledge on gas-phase P-molecules came not from modern atmospheric chemistry modelling, but from origin of life research. Atmospheric chemistry expertise instead was crucial in highlighting an under-appreciated limitation of spectroscopy in remote characterisation of atmospheres; crucially important intermediates and radicals may be unobservable remotely as their reactivity makes their atmospheric lifetime extremely short and prevents atmospheric buildup to observable concentrations.

CONCLUSIONS
The key new data presented in this paper is the calculated infrared spectra of 958 phosphorus-bearing molecules (P-molecules), which represents the best available data for almost all of these molecules. These data can be useful to highlight ambiguities in molecular detection in remote atmospheres and thus prevent misassignments of spectral features while suggesting potential assignments for a given spectral signal. These data also provide sufficiently reliable intensities of different spectral features between molecules to enable evaluation of the limits of detectability for different molecules.
These data were produced with a high-throughput mostly automated methodology using computational quantum chemistry (CQC) with the ωB97X-D/def2-SVPD model chemistry used to calculate harmonic frequencies and intensities (CQC-H1) for all 958 P-molecules. Compared to the previously available RASCALL spectral data which was produced based on the frequencies of functional groups within individual molecules, these new CQC data introduce for the first time quantitatively accurate predicted intensities and frequencies data for vibrations within the fingerprint spectral region (approximately 500 -1,450 cm −1 ) that involve large molecular motions as well as improved frequency predictions for higher frequency modes through consideration of detailed chemical environmental effects. Though further improvements to our CQC-H1 approach may be obtained by performing the calculations with anharmonic methodologies like GVPT2, we identified some challenges and limitations, particularly for anharmonic prediction of modes with low force constants, and highlighted future opportunities for methodology improvements, noting that modifications of the quantum chemistry procedure are trivial to implement within our framework. We also note the recurrence of the sporadic large errors in GVPT2 ωB97X-D calculations (first noted by Barone et al. (2020)), which seemed to affect mostly P-H stretches through for a significant number of molecules. Future work to determine an appropriate functional for anharmonic calculation is warranted as these calculations are the only data source for accurate frequencies and intensities for overtone and combination bands, which provides a more complete picture of molecular opacity and may help distinguish between some molecules.
The other key contribution of this paper is the demonstration of significant advantages with an interdisciplinary approach to follow-up of biosignature detection. Phosphine and P-molecules are certainly of broad interest astrophysically in gas giants and as potential biosignatures, but the immediate impetus for this paper was the tentative detection of PH 3 in the clouds of Venus with extraordinary high abundance . An important aspect of investigating this detection is to look for other gaseous Pmolecules that could be sources or sinks of phosphine in Venus and can provide insights into the possible atmospheric network that allows for the accumulation of phosphine. To identify the molecules of interest, we used two approaches; the targeted approach consolidating known or predicted chemistry to identify gas-phase P-molecules of particular interest for characterisation of remote planetary atmospheres, and the reaction agnostic approach which instead considered all potentially volatile stable P-molecules with six or fewer non-hydrogen atoms. We conclude that, given the low volatility of many P-molecules and the relative poor understanding of gaseous phosphorus chemistry, a more reaction-agnostic comprehensive search for volatile molecules is probably the most suitable path forward for P-molecules.

CONFLICT OF INTEREST STATEMENT
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

AUTHOR CONTRIBUTIONS
LKM conceived, designed and managed the project. MNG, ESC, LSDJ, FARL, AS and JCZT collated pre-existing data. JCZT, LKM, PK and JO developed the CQC approach. JCZT applied CQC to produce novel vibrational spectra. AS, JCZT produced the figures. JCZT, AS, ESC, FARL, JO analysed data. CM, ER and CDT provided expert knowledge and feedback to assist with analysis. JCZT, LKM, CS, KNR, BPB, LSDJ, DJK, GGS, LS and BLT provided expert knowledge and wrote significant sections of the paper. All authors reviewed literature and wrote, edited or provided feedback on sections of the paper. All authors reviewed and approved the final manuscript.

FUNDING
KNR is supported from a grant by the Australian Research Council (DP160101792).

ACKNOWLEDGMENTS
Thanks to Maria Cunningham, Maria Perez-Peña and Max Litherland for their enthusiastic participation in the hackathon that started this project.
LKM would also like to thank her awesome colleagues for the writing groups and active encouragement that fast-tracked this paper to submission in the difficult 2020 year.
This research was undertaken with the assistance of resources from the National Computational Infrastructure (NCI Australia), an NCRIS enabled capability supported by the Australian Government.

SUPPLEMENTAL DATA
For the purposes of review, this data has been made available at https://drive.google.com/ drive/folders/1FEr9eKwHmxg9EJNqW3bMhaDNQ_MxrI8g?usp=sharing The supplementary data consists of: • A read.me file explaining the full supplementary information contents; • A csv file listing all molecules considered with relevant information (e.g. SMILES code, boiling point); • A csv file with tabulated frequencies (cm −1 ) and intensities (km mole −1 ) including the empirical formula and SMILES code for each molecule, the mode to which the frequency and intensity belongs to, and the mode kind (i.e. fundamental, scaled fundamental, overtone or combination band); • A csv file containing the force constants for fundamental frequencies for the 250 molecules with GVPT2 anharmonic data available; • A zip file with individual folders for each molecules named by molecular formulae and SMILES codes.
Within each folder there is all RASCALL, CQC-H1 and where available CQC-A1 quantum chemistry data for the molecule, along with the raw Gaussian output files and links to all other known spectral data sources.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/supplementary materials, further inquiries can be directed to the corresponding author.
All data produced in this paper and all RASCALL data used in this paper is available in the Supplementary Information section. ExoMol data is available at exomol.com.

APPENDIX A: ANHARMONIC CALCULATIONS A1: Theory Background
Vibrational frequency and intensity calculations in quantum chemistry are mainly performed within the so-called double harmonic approximation, where the potential energy and the dipole moment are assumed to be quadratic and linear in the normal mode coordinates, respectively (Turrell, 2006). This approximation is particularly appealing in quantum chemistry due to its computational ease and affordable scaling with larger systems, thus it has become a valuable tool in the interpretation of experimental vibrational spectra. However, there are two major drawbacks to the double harmonic approximation that limit its performance. Firstly, harmonic frequency calculations tend to systematically overestimate experimental frequencies Zapata Trujilo et al.

Infrared Spectroscopy of Phosphorus-bearing Molecules
and secondly, the selection rules that govern the harmonic approximation allow only the prediction of fundamental frequencies, neglecting overtones and combination bands. Multiplicative scaling factors can be applied to the calculated frequencies to match experimental values (Scott and Radom, 1996;Irikura et al., 2005;Merrick et al., 2007;Alecu et al., 2010;Laury et al., 2012;Kesharwani et al., 2015b;Hanson-Heine, 2019). However, this approach only provides improvements in the frequencies' positions without considering their respective intensities, and it has no effect on the neglected overtones and combination bands.
To obtain computationally-derived vibrational spectra in better agreement with experimental data, anharmonicity must be explicitly considered in the calculations. Variational approaches like the Vibrational Self-consistent Field (VSCF) (Bowman, 1986;Jung and Gerber, 1996;Chaban et al., 1999;Bounouar and Scheurer, 2008;Bowman et al., 2008;Roy and Gerber, 2013;Panek et al., 2019) or Vibrational Configuration Interaction (VCI) (Christiansen, 2007;Bowman et al., 2008;Scribano et al., 2010) theories can be used to this end, yet they are often limited by the molecular size and memory requirements. Instead, Vibrational Second-order Perturbation Theory (VPT2; Nielsen 1951), has gained popularity in this matter as it represents a good compromise between accuracy and computational cost for medium-to-large sized molecules Harding et al., 2017;Grabska et al., 2017;Kirchler et al., 2017;Beć and Huck, 2019). In the context of VPT2, the vibrational Hamiltonian is divided into unperturbed (H 0 ) and perturbative terms (H 1 and H 2 ), where the former corresponds to the common harmonic Hamiltonian and the perturbative terms incorporate third and semi-diagonal fourth derivatives to the potential energy, respectively (the second perturbative term H 2 also includes a kinetic contribution from the vibrational angular momentum) (Puzzarini et al., 2019b). The perturbative processing of this Hamiltonian results in a handful of simple and general formulas that can be used to calculate the vibrational frequencies for fundamentals, overtones and combination bands (Barone, 2005;Bloino, 2015). In a similar fashion, equations for the calculation of vibrational intensities are also derived under the VPT2 framework by considering both mechanical and electrical (higher-order derivatives of the dipole moment function) anharmonicity Stanton, 2006, 2007;Barone et al., 2010;Bloino, 2015).
The analytical expressions for calculating both frequencies and intensities allow the simple and straightforward computation of more realistic vibrational spectra. However, a critical limitation arises when resonances or near-degenerate states appear (Nielsen, 1945;Amos et al., 1991). These states in most cases lead to nearly vanishing denominators in the VPT2 working equations, thus resulting in nonphysical values for the calculated frequencies and intensities. Vibrational energies are more commonly affected by type I (ω i ≈ 2ω j ) and II (ω i ≈ ω j + ω k ) Fermi resonances (FR), whereas vibrational intensities are plagued with both Fermi-type and the so-called Darling-Denninson resonances (DDR) (ω i ≈ ωj) (Darling and Dennison, 1940;Bloino et al., 2015). Several approaches have been proposed to deal with resonance states and here we aim to provide a general description of those most commonly used.
The first and most common approach is the so-called deperturbed VPT2 (DVPT2) method which consists on the identification and removal of resonance states from the perturbative formulation. As the resonance terms are completely disregarded from the calculations, the DVPT2 method is unable to provide a complete picture of the vibrational nature underlying systems plagued with resonance states. This drawback is overcome by the Generalised VPT2 (GVPT2) (Barone, 2005;Puzzarini et al., 2019a) method where the identified resonance states are treated separately through variational calculations and latter reintroduced as off-diagonal terms in the computations. In both cases (DVPT2 and GVPT2) the resonant terms are identified via two consecutive tests: a frequency difference threshold (∆ ω ) followed by the Martin test (Martin et al., 1995) to evaluate the deviation between the VPT2 result and a model variational calculation (K). Taking a different approach, the calculations can also be performed under the Degeneracy-corrected Second-order perturbation theory (DCPT2) method where all possible resonant terms are replaced by non-divergent expressions (Kuhler et al., 1996). This method allows to compute vibrational frequencies without further concerns for resonant terms in the perturbative formulation, but struggles when strong couplings between low-and high-frequency vibrations occur . As an alternative, Blonio and co-workers developed the Hybrid-degeneracy Corrected VPT2 (HDCPT2) method that mixes both standard VPT2 and DCPT2 frameworks to calculate anharmonic vibrational frequencies . Using a transition function, the method is able to identify those states that would be better treated under a VPT2 formulation and, likewise, those with the DCPT2 method. These VPT2 variants (currently coded in the Gaussian package) are mostly based on the conventional Rayleigh-Schrödiner Zapata Trujilo et al.

Infrared Spectroscopy of Phosphorus-bearing Molecules
perturbation theory allowing simple algebraic equations for the anharmonic frequencies and intensities. However, alternative significant work dealing with resonance states in VPT2 has also been performed recently, considering Van Vleck perturbation theory instead (Krasnoshchekov et al., 2014;Rosnik and Polik, 2014).
Despite the advantages provided by the aforementioned VPT2 flavours, there are some considerations that are worth mentioning. Though default values have been defined for the appropriate identification of resonant states in the DVPT2 and GVTP2 methods, in most cases, it is recommended to assign these values based on the specific molecular system under study. This limitation clearly hinders any high-throughput calculation of vibrational spectra as defining appropriate thresholds becomes impractical when assessing hundreds of molecules at once. Instead, one could make use of the HDCPT2 method that performs similarly to GVPT2, with the advantage of a threshold-free formulation. However, the current version of HDCPT2 only allows the calculation of vibrational frequencies and the extension to vibrational intensities is still under study . We thus use GVPT2 in our calculations.
Finally, it is important to note that, due to their perturbative nature, though all VPT2 approaches generally perform well for semi-rigid molecules, there are substantial errors when dealing with large-amplitude vibrations, torsion and inversion modes, in the presence of double-well potentials and when considering floppy molecules Bloino et al., 2016;Grabska et al., 2017;Puzzarini et al., 2019b,a).

A2: Further Computational Details
For the CQC-A1 approach, the GVPT2 method was used as it allows a general treatment of resonance states affecting both frequencies and intensities. These resonant states were identified using the following default thresholds: Where the 1 − 2 superscript corresponds to Fermi-type resonances and both 2 − 2 and 1 − 1 represent Darling-Denninson resonances. The cubic and semidiagonal quartic derivatives of the potential were obtained by numerical differentiation of the analytic second derivatives, with the default 0.01Å step. A3: Discussion of Anomolous Anharmonic Results Figure 9. Ratio between the scaled harmonic (SH) and GVPT2 anharmonic (Anh) frequencies as a function of the scaled harmonic frequencies (left panel) and force constants (right panel) for 250 molecules, highlighting problematic modes due to large amplitude vibrations (LAMs) in our data. Both y axes are given in logarithmic scale. Figure 9 shows the ratio between the scaled harmonic (SH) and anharmonic fundamental frequencies (Anh) as a function of frequency (left panel) and force constant (right panel). Generally, this ratio is very close to 1, i.e. the harmonic and anharmonic calculations have very similar predictions. However, there are two cases where the calculations differ substantially: (1) many low-frequency transitions with small Zapata Trujilo et al.

Infrared Spectroscopy of Phosphorus-bearing Molecules
force constants have very large differences (up to a factor of 50) between the harmonic and anharmonic frequency predictions and (2) the P-H stretch frequencies are in many molecules 200-700 cm −1 or more higher in the anharmonic compared to the harmonic calculations. In summary, the first issue is well-known for large amplitude vibrations (LAMs) and is an inherent limitation of perturbative approaches, while the second issue is far more concerning and can be traced back to deficiencies in the density functional.
The first issue, i.e. large scaled harmonic-anharmonic frequency (SH/Anh) differences for transitions with small force constants, is well known. The small force constants are characteristic of large amplitude vibration (LAMs) which are motions that occur along very flat areas of the potential energy surface. GVPT2 anharmonic frequencies for large amplitude vibrations (LAMs) can greatly differ from the scaled harmonic ones, due to the unsatisfactory performance of perturbation theory in these cases Bloino et al., 2016;Grabska et al., 2017;Puzzarini et al., 2019b,a). Under the assumption that perturbation theory fails when the correction is too large, we have chosen to flag anharmonic calculations as problematic due to LAM all frequencies with SH/Anh ratio < 0.8 and SH/Anh ratio > 1.2 that also have a small force constant < 1 aJ amu −1Å−2 . Figure 10. Comparison of the ExoMol, RASCALL, scaled harmonic and GVPT2 anharmonic frequencies for phosphine. A demonstration of the limitations present in the GVPT2 procedure for highly symmetric systems.
The second issue, large SH/Anh ratios for P-H stretches, is more unusual. The prototypical example is PH 3 , with results shown in Figure 10 for the harmonic, anharmonic, RASCALL and reference ExoMol spectra for PH 3 . For the P-H stretch signal (near 2,360 cm −1 ), the scaled harmonic and anharmonic calculations differ by over 1,000 cm −1 , with the scaled harmonic calculations seen as reliable by comparison to the ExoMol and RASCALL data sources. The highly symmetric nature of PH 3 causes intrinsic degeneracies that results in unreliable vibrational frequencies and intensities. Some improvement can be obtained by formally lowering the symmetry in the calculations (NoSymm option in Gaussian), as seen in Figure 10 comparing the results from the ωB97X-D -GVPT2 (black) and ωB97X-D -GVPT2 -NoSymm (green) calculations. However, unacceptably large errors still occur of 100s of cm −1 , and the harmonic calculations Zapata Trujilo et al.

Infrared Spectroscopy of Phosphorus-bearing Molecules
are far superior (compared to experimental data). Further, similar very large P-H anharmonic stretch frequencies were found in many unsymmetric molecules; for example, the CH 5 OP isomer with SMILES code OCP, exhibited differences of 405 and 467 cm −1 between the scaled harmonic and anharmonic P-H stretch frequencies as calculated by ωB97X-D/def2-SVPD. Therefore, use of NoSymm is helpful but not sufficient for resolving the issue of large SH/Anh ratios for P-H stretches.
We considered neglected resonances as a possible cause for large SH/Anh ratios for P-H stretches, but our tests showed that deuterating one of the hydrogens in the CH 5 OP isomer (which should alter the resonance patterns) did not remove these errors.
Recent results from (Barone et al., 2020) (published subsequent to our calculations) suggested that the density functional might be the issue; this benchmark study covered ten small molecules and found that the ωB97X-D functional can yield unstable anharmonic vibrational frequencies, potentially due to an inappropriate calculation of higher-order derivatives of the potential energy. Our CH 5 OP isomer output files showed warnings for large cubic and quartic force constants. We confirmed the culpability of the functional through tests of HF/def2-SVPD and PW6B95/def2-SVPD calculations for the PH 3 and CH 5 OP isomer that did not show the very large scaled harmonic-anharmonic shifts observed with ωB97X-D. Therefore, we conclude that a different choice of functional is required for reliable anharmonic calculations, despite the very good performance of ωB97X-D for general chemistry (e.g. Goerigk and Mehta (2019); Zapata and McKemmish (2020)). We defer detailed consideration of alternative functionals with anharmonic methods to a future publications in order to enable detailed comparison to experiment for a large number of molecules.
However, new density functionals are unlikely to enable the accurate anharmonic treatment of large amplitude modes, for which the harmonic approximation is not a sufficiently accurate approximation for perturbative treatments to be reliable; more expensive variational approaches like VCI are likely to be required. In automated high-throughput approach, the most practical solution is probably the identification of problematic transitions through flags, as described above. We will undertake future investigations that explicitly compare predicted harmonic and anharmonic computed frequencies against experimental data for a large number of molecules. This will enable us to improve this flagging process as well as develop methods to automatically identify likely problematic molecules.

APPENDIX B: REACTION NETWORK MODELLING
Significant work has gone into detailed kinetic models of the chemical reaction networks in the Earth's atmosphere. GEOS-Chem (The International GEOS-Chem User Community, 2019) and the Master Chemical Mechanism (MCM) (Rickard and Young, 2005) are two large scale chemistry focused models, with the MCM aiming to be a 'near-explicit' mechanism: modelling 6,500+ chemical species and 17,000+ reactions. This reaction network is developed from the explicit representation of the degradation of 143 volatile organic compounds (VOCs), with pre-defined rules of chemical reactivity following an initiation reaction (e.g. oxidation, ozonolysis, photolysis), generating the vast number of reactions that need to be represented (Saunders et al., 2003). Not all species represented in the atmospheric models have well-known behaviour, and these less well understood species are propagated through the reaction network until they form end-products whose chemistry is known (e.g. CO, ROH species).
The aim of making atmospheric models comprehensive is complicated by a combinatorial explosion of possible species and minor reaction channels. To remedy the overabundance of VOCs to be modelled, a process known as 'lumping' is used (Wang et al., 1998;Whitehouse et al., 2004), where only species with branching fractions >5% are explicitly represented (Jenkin et al., 1997), and a generic intermediate is used to represent all molecules that ultimately form similar products. A fundamental challenge of modelling the Earth's atmospheric chemistry is therefore the reduction of model complexity, so the models are computationally tractable and can be used for simulation.
On Earth, field campaigns and lab-based measurements can more easily acquire relevant experimental data, in contrast to the data needed for exoplanet atmospheres. Nevertheless, due to the diversity of VOCs known and predicted, Earth's atmospheric models are increasingly relying on theoretical methods to fill out missing experimental data. A review of how theory, and its interplay with experiment, has advanced our understanding of Earth's atmospheric reactions networks is provided in (Vereecken and Francisco, 2012), with practical examples given in (Vereecken et al., 2015). In Appendix C, various theoretical kinetics methods for predicting reaction rates are outlined briefly.

Infrared Spectroscopy of Phosphorus-bearing Molecules
Exoplanetary atmospheres will be far more diverse, with hot gas giants a particular current focus as these will be the first exoplanet atmospheres to be characterised as they are easiest to observe. The reaction networks are generally more limited, but are appropriate for a wider range of physical conditions and temperatures. Catling and Kasting (2017); Heng (2017) provide good introductions to exoplanetary atmosphere modelling.

APPENDIX C: KINETIC RATES
Prediction of molecular reaction rates is usually more challenging both experimentally and computationally than thermochemistry, with accuracies within an order of magnitude for rates generally considered very good. Molecular reaction rates can be determined using transition state theory (TST) (Petersson, 2000). The foundations of TST was the insight by Eyring (1935) that the transition state represents a dividing surface that minimises the reactive flux between reactants and products. The transition state is a first order saddle point: the point of maximum energy (i.e. bond strain) along the minimum energy reaction pathway. Standard TST rates can be calculated for bimolecular reactions from the activation energy, and partition functions of the TS and the reactants. Unimolecular reaction rates can be calculated by Rice-Ramsperger-Kassel-Marcus (RRKM) theory, where the reaction rate at a given energy is the ratio of the density of (vibrational) states at the minimum energy well and at the TS (Forst, 1973). The vibrational states are usually treated as coupled harmonic oscillators with fast intramolecular vibrational energy redistribution between them.
Where a reaction is barrierless along the minimum energy pathway, i.e. does not have an activation energy, modifications to TST are required. An example of barrierless reactions are radical-radical recombination reactions, which combine with no barrier, or conversely, dissociation of a closed-shell molecule into two radicals. These barrierless reactions are often important to photochemistry and atmospheric chemistry. In barrierless reactions, a variational approach (VTST) is used to determine a dividing surface along the reaction coordinate that minimises the reactive flux by maximising the Gibbs free energy through consideration of the entropy changes of the reactants (Bao and Truhlar, 2017). For reactions with a "loose" transition state, the variable reaction coordinate (VRC-TST) approach can be used to determine the dividing surface more flexibly, through use of optimised pivot points between the reacting fragments (Georgievskii and Klippenstein, 2003). Several codes are available with different methodologies for calculating barrierless reaction rate constants, including: MultiWell (VTST) (Barker, 2001), Polyrate (VRC-TST) (Zheng et al., 2017), and MESMER where the inverse Laplace transform approach is used when high pressure limit data are available (Glowacki et al., 2012).
Theoretical rate constants can often deviate by one or two orders of magnitude from experiment, but provide mechanistic insight and are generally more suitable to model a particular reaction than a "surrogate" experimental reaction rate from an analogous system. Theoretical kinetics therefore provides an important pathway to supplement reaction networks with estimates of reaction rates whenever experimental data are missing or difficult to obtain.