Exploring Carbon Mineral Systems: Recent Advances in C Mineral Evolution, Mineral Ecology, and Network Analysis

Large and growing data resources on the spatial and temporal diversity and distribution of the more than 400 carbon-bearing mineral species reveal patterns of mineral evolution and ecology. Recent advances in analytical and visualization techniques leverage these data and are propelling mineralogy from a largely descriptive field into one of prediction within complex, integrated, multidimensional systems. These discoveries include: (1) systematic changes in the character of carbon minerals and their networks of coexisting species through deep time; (2) improved statistical predictions of the number and types of carbon minerals that occur on Earth but are yet to be discovered and described; and (3) a range of proposed and ongoing studies related to the quantification of network structures and trends, relation of mineral “natural kinds” to their genetic environments, prediction of the location of mineral species across the globe, examination of the tectonic drivers of mineralization through deep time, quantification of preservational and sampling bias in the mineralogical record, and characterization of feedback relationships between minerals and geochemical environments with microbial populations. These aspects of Earth’s carbon mineralogy underscore the complex co-evolution of the geosphere and biosphere and highlight the possibility for scientific discovery in Earth and planetary systems.


INTRODUCTION
Minerals, including carbon-bearing phases, are the oldest available materials from the ancient history of our planet and other bodies in our solar system -they record information about their genetic environments and any subsequent weathering and alteration they underwent, offering a glimpse of ancient environments through deep time. In this work, we describe some of the important carbon mineral data resources, outline a number of new advances in data-driven discovery in carbon and other mineral systems, including new insights from mineral network visualizations and statistical modeling. We also preview upcoming studies and directions of research related to the diversity and distribution of mineral species, statistical modeling of complex, multidimensional data objects and their underlying trends, tectonic drivers of mineralization, characterization of relationships between microbial populations, their expressed protein functions and the geochemical environment, quantification of preservational and sampling bias present in the mineralogical record, and a number of predictive algorithms including those which predict formational environments of minerals as well as the location of previously unknown mineral localities.
Carbon minerals are particularly compelling for multidimensional analysis due to their diverse range of bonding behaviors, paragenetic modes, mineral properties, and ages. Carbon minerals are some of the first condensed phases formed in a solar system and among the hardest materials known, yet carbon minerals are also some of the latest occurring and most ephemeral crystalline phases. Carbon has the ability to behave as a cation, anion, or neutral atom, allowing bonding with itself and 80+ other elements, with a variety of bonding coordination numbers including 2, 3, and 4, and valance states of −4, +2, and +4 (Hazen et al., 2013a). Many of the first crystals formed in our cooling solar system were refractory carbon-bearing phases, including diamond (C) (Lewis et al., 1987), graphite (C) (Amari et al., 1990), and moissanite (SiC) (Zinner et al., 1987). Carbon and its mineral phases are intrinsically linked to organic and biological processesbiomineralization is responsible for a significant portion of rhombohedral carbonates on Earth's surface and organic minerals make up nearly 15% of the 411 known carbon mineral species (as of June 2019; rruff.info/ima). Carbon's widely varying character offers a fascinating opportunity to employ rapidly developing advanced analytics and visualization techniques to characterize its complex, multivariate systems and answer previously inaccessible questions at the interface of Earth, planetary, and life sciences.

MINERAL DATA RESOURCES The International Mineralogical Association List of Mineral Species
The International Mineralogical Association (IMA) list of mineral species (RRUFF.info/IMA) is part of the RRUFF Project (Lafuente et al., 2015) -a mineral library and series of databases with the goal of providing robust, diverse mineralogical data, including high-quality chemical, spectral, and diffraction data, the IMA list of approved mineral species, the American Mineralogist Crystal Structure Database (AMCSD; RRUFF.geo.arizona.edu/AMS/amcsd.php), mineral locality age information (see "Mineral Evolution Database" section below), and other mineral properties (see "Mineral Properties Database" section below). The IMA list allows users to search the over 5400 (as of June 2019) mineral species by name, chemical composition, unit-cell parameters and crystallography, crystal structure group, paragenetic mode, and the availability of ancillary data including crystal structure files in the AMCSD or direct RRUFF Project analyses. This database also provides useful information about each mineral species, including composition, oldest known age, crystal structure group, and unit-cell parameters along with corresponding compositions, all of which can be downloaded in a number of machinereadable file formats. Lastly, this page offers links to a number of related informational pages and websites, including the Handbook of Mineralogy (Anthony et al., 2003), measured data in RRUFF Project databases, crystal structure files in the AMCSD, mineral locality information at Mindat.org (see "Mindat.org" section below), and age and locality data in the Mineral Evolution Database (MED).

The Mineral Evolution Database (MED)
The Mineral Evolution Database (MED; RRUFF.info/Evolution; Golden et al., 2016;Prabhu et al., 2020) was created to support mineral evolution and ecology studies -studies that examine and characterize spatial and temporal mineral diversity and distribution in relation to geologic, biologic, and planetary processes (Hazen et al., 2008(Hazen et al., , 2011(Hazen et al., , 2013c(Hazen et al., ,b, 2014(Hazen et al., ,b, 2016(Hazen et al., , 2017a(Hazen et al., ,b, 2019bHazen and Ferry, 2010;McMillan et al., 2010;Golden et al., 2013;Hazen, 2013Hazen, , 2018Hazen, , 2019Grew and Hazen, 2014;Zalasiewicz et al., 2014;Grew et al., 2015;Hystad et al., 2015bHystad et al., ,a, 2019aLiu et al., 2017aLiu et al., ,b, 2018bMa et al., 2017;Morrison et al., 2017b;Glikson and Pirajno, 2018). The MED contains mineral locality and age information extracted from primary literature and the mineral-locality database, Mindat.org. As of 14 June 2019, 15,906 unique ages for 6,253 directly dated mineral localities, documenting 810,907 mineral-locality pairs and 194,090 mineral-localityage triples are available in the MED. Specific to the known 411 carbon-bearing phases, there are 8,635 dated carbon mineral localities, 94,677 carbon mineral-locality pairs, and 20,773 dated carbon mineral-locality pairs available in the MED, as of June 2019. These data have been assembled and documented to maximize the accuracy and transparency of age associations, which include data on specific mineral formations, mineralization events, element concentrations, and/or deposit formations. The MED interface allows many sorting and displaying options, including sorting by age or locality name, as well as displaying all of the queried minerals at a given locality or displaying a line of data for each mineral-locality pair. These data are available for download directly from the MED (RRUFF.info/Evolution) with various file format options.

The Mineral Properties Database (MPD)
The Mineral Properties Database (MPD; Morrison et al., 2017b;Prabhu et al., 2019a) was created with the goal of better understanding the multidimensional, multivariate trends amongst mineral species and their relationships to geologic materials, preservational and sampling biases, and geologic, biologic, and planetary processes. At present, this database contains dozens of parameters, including age, color, redox state, structural complexity (Krivovichev, 2012(Krivovichev, , 2013(Krivovichev, , 2016(Krivovichev, , 2018Hazen et al., 2013b;Grew et al., 2016;Krivovichev et al., 2017Krivovichev et al., , 2018 and method of discovery associated with copper, uranium, and carbon minerals. Ongoing efforts are in place to expand to minerals of each element of the periodic table. These data, coupled with those of the MED, offer the opportunity to study changes in redox conditions through deep time and are the basis for mineral network analysis studies (Morrison et al., 2017b;Perry et al., 2018;Hazen, 2019;Hazen et al., 2019b). This database will be publicly available through the RRUFF Project on the Open Data Repository platform (ODR; opendatarepository.org). The ODR interface will maximize the flexibility with which users view, explore, subset, and download data of interest.

Mindat.org
Hudson Institute of Mineralogy's Mindat.org is an interactive mineral occurrence database with a wealth of information on mineral localities around the globe, as well as Apollo Lunar samples and meteorites. At present, mindat houses nearly 300,000 mineral localities, with >1.2 million mineral-locality pairs and nearly one million mineral photographs. A large majority of the mineral occurrence information available on Mindat.org is from published literature, but users can also add localities, mineral-locality pairs, photographs, and references. The MED directly interfaces with mindat, harnessing and incorporating the huge amount of mineral locality data held in mindat. It has been an important resource for scientific research and discovery -many studies on the diversity and distribution of minerals on Earth's surface have relied in part on mindat mineral locality information (Hazen et al., 2008(Hazen et al., , 2011(Hazen et al., , 2013c(Hazen et al., ,b, 2014(Hazen et al., ,b, 2016(Hazen et al., , 2017a(Hazen et al., ,b, 2019bHazen and Ferry, 2010;McMillan et al., 2010;Golden et al., 2013;Hazen, 2013Hazen, , 2018Hazen, , 2019Grew and Hazen, 2014;Grew et al., 2015;Hystad et al., 2015aHystad et al., ,b, 2019aLiu et al., 2017aLiu et al., ,b, 2018bMa et al., 2017;Morrison et al., 2017b). These studies include those of Carbon Mineral Ecology detailed below.

Global Earth Mineral Inventory (GEMI)
The Global Earth Mineral Inventory (GEMI 1 ) is a Deep Carbon Observatory (DCO) data legacy project born out of the diverse data types collected in conjunction with the DCO's broad range of scientific driving questions (Prabhu et al., 2019a. Specifically, Prabhu et al. (2019aPrabhu et al. ( , 2020 aimed to support and facilitate scientific discovery by merging and integrating DCO data products, such as the MPD and MED, into a digestible, accessible, and user-friendly format for exploration, statistical analysis, and visualization. Therefore, GEMI is a 1 https://dx.deepcarbon.net/11121/6200-6954-6634-8243-CC faceted, searchable knowledge graph or network in which each node represents a feature of the MED and MPD -allowing users to explore, query, and extract the specific subset of data or combinations of data necessary for their research goal.

CARBON MINERAL ECOLOGY
Statistical approaches are particularly useful in characterizing surface and near-surface environments where biology and reaction kinetics play a major role in mineral formation and stability, as opposed to the dominance of thermodynamics in the subsurface. Mineral ecological studies employ the MED and Mindat.org to examine and characterize the spatial diversity and distribution of mineral species on planetary bodies Hystad et al., 2015a,b;Grew et al., 2017;Liu et al., 2017aLiu et al., , 2018a. "Mineral species" in this case are those recognized by the IMA Commission on New Minerals, Nomenclature and Classification (CNMNC), which often does not account for subtle variations in chemistry or formational processes (see section "Natural Kind Clustering"). Previous studies have found that minerals on Earth's surface follow a distinct trend, a "Large Number of Rare Events" (LNRE) frequency distribution in which most mineral species are rare, occurring at fewer than five geologic localities, and only a few species are very common Hystad et al., 2015a,b).
The discovery of an LNRE frequency distribution across all mineral systems on Earth enabled the modeling of accumulation curves and, thereby, the prediction of the number of missing or previously unknown mineral species that occur on Earth but have yet to be discovered. Carbon minerals are no exception to the LNRE trend and Hazen et al. (2016) explored their ecology -discovering that in addition to the 400 known carbon mineral species, there were likely at least 145 more species awaiting discovery. Hazen et al. (2016) delved into the likely candidates of missing species, generating accumulation curves for carbon minerals with and without oxygen, hydrogen, calcium, and sodium. They predicted that, of the 145 as-yet undiscovered carbon minerals, 129 would contain oxygen, 118 would contain hydrogen, 52 would contain calcium, and 63 would contain sodium. This study led to the Carbon Mineral Challenge (mineralchallenge.net) -a DCO initiative to engage scientists and collectors in finding and identifying the missing carbon phases. As of June 2019, the Carbon Mineral Challenge boasts 30 new mineral species approved by IMA, a number of which were predicted in Hazen et al. (2016).
At the time of the initial mineral ecology studies, it was understood that the models and the predictions based upon them were to be treated as lower limits of the estimate of missing mineral species. This is due, in part, to sampling bias toward wellcrystallized, colorful, or economically valuable specimens. An additional constraint is the advent of new, unforeseen technology that can identify and distinguish minerals at increasingly finer scales. While it is difficult to predict the next technological advance, we can attempt to develop better models to make predictions on our existing data. With this in mind, Hystad et al. (2019a) developed a new Bayesian technique for modeling mineral frequency distribution and predicting the number of undiscovered mineral species on Earth's surface. Hystad et al. (2019a) updated the prediction of the number of missing mineral species on Earth from the previous minimum estimate of 6394 (Hystad et al., 2015a,b) to an increased estimate of 9308 with 95% posterior interval (8650, 10,070). Note that this new, higher value is still a low estimate due to the unknowns of future technology.
Here, we apply the Poisson lognormal model of Hystad et al. (2019a) to the currently known 411 carbon mineral species and their 50,095 localities (with 94,677 minerallocality pairs, 22% of which have associated ages, as of June 2019). Figure 1A illustrates the carbon mineral frequency distribution with a Poisson lognormal LNRE model overlaid in blue. The frequency distribution is used to generate an accumulation curve (Figure 1B), which models the expected number of carbon minerals species as a function of the number of localities characterized, and therefore predict the expected number of carbon mineral species currently present on Earth's surface, many of which remain undiscovered. The new estimate of carbon mineral diversity on Earth is 993 with a 95% posterior interval of (759, 1268), up from the former prediction of 548 carbon mineral species . Note again that, as with the above, this prediction should be considered a lower estimate given the unknowns associated with future technological advances and their impacts on mineral discovery.

Network Analysis
Network analysis, a subfield of graph theory (Otte and Rousseau, 2002;Clauset et al., 2004;Newman, 2006;Kolaczyk, 2009a;Abraham et al., 2010;Newman and Mark, 2010), is particularly useful for visualizing many variables in a multidimensional system in a digestible and meaningful way, particularly when the questions rely on the interrelationships of many entities and their properties, as is the case in mineralogical systems in the context of Earth and planetary processes. Networks are composed of nodes (or vertices) representing entities and edges (or links) between the nodes symbolizing a relationship between two connected nodes. Nodes can be sized, shaped, colored, etc. according to any variables of interest. Likewise, edges can be directed, colored, texturized, or their thickness can be adjusted to represent any parameter of choice and the length of edges can be scaled in proportion to the strength of the connecting variable. With all of these options, it is possible to display upward of eight variables within one network. Network renderings are projections from N-1 dimensional space (where N is the number of different mineral species) into two or three dimensions, although the multidimensionality is preserved in the original data object and therefore in any statistical metrics derived from the network data. Network metrics fall into two categories, the first of which are "local" metrics that describe the role and significance of individual nodes in a network.
Local metrics include degree, which is the number of links connected to a given node, and betweenness, a measure of the number of geodesic (shortest) paths that pass through a given node. The second type of metrics are "global" and are used to evaluate overall trends within a network and allow for comparison of different networks, such as networks of minerals of different elements, from different environments or The frequency distribution of carbon minerals on Earth's surface. The x-axis, "frequency class," is the number of minerals that occur at a locality. The y-axis is the number of mineral species that occur at exactly the corresponding frequency class (i.e., nearly 100 carbon mineral species occur at exactly one locality). The blue line represents the Poisson lognormal LNRE model. (B) Accumulation curve for the mean number of carbon mineral species versus the number of localities sampled, N, based on the Poisson lognormal LNRE model. Today, there are 411 known carbon mineral species based on N = 92,466 sampled localities. As N approaches infinity, the median number of predicted carbon mineral species is 993 with a 95% posterior interval of (759, 1268).
Frontiers in Earth Science | www.frontiersin.org planetary bodies, or a series of networks over a time interval. Global metrics include density, which is the number of links divided by the number of possible links (i.e., a measure of the interconnectedness of a network), and centralization, a measure of how central a network's "most central" node is relative to how central all the other nodes are (i.e., indicating whether or not there are many highly interconnected nodes or if there are a few key "broker" nodes). Additionally, there are a number of network modularity and community detection algorithms, which allows users to determine if there are distinct groups within their network and what nodes belong to those groups. With further exploration, users can determine what characteristics are shared within each group and/or between groups. Furthermore, random forest or decision tree algorithms can offer insight into the relative importance or weight of each characteristic to the network partitioning.

Mineral Network Analysis
Mineral network analysis, which is a powerful approach to exploring complex multidimensional and multivariate systems, facilitates a holistic, integrated, higher-dimensional understanding of Earth and planetary systems (Morrison et al., 2017b;Hazen et al., 2019a,b). The renderings of Fruchterman-Reingold force-directed (Fruchterman and Reingold, 1991;Csardi and Nepusz, 2006) mineral coexistence networks herein are of two types: unipartite and bipartite. Interactive, manipulatable versions of these networks, including node labels, can be found at https://dtdi.carnegiescience.edu/node/4557.

Unipartite Mineral Networks
In the unipartite networks (Figures 2-4), each node represents a mineral species; the nodes are sized according to the frequency of occurrence of each species and colored according to chemistry, paragenetic mode, or structural complexity; the edges represent co-occurrence (which may or may not correspond to an equilibrium assemblage) of two mineral species at a locality on Earth's surface and their length is scaled inversely proportional to their frequency of co-occurrence (i.e., when two species occur together more frequently, they are closer together in the graph). Note that while the nodes of Figures 2-4 are colored according to various parameters (e.g., composition, paragenetic mode), those parameters were not coded into the network layout -meaning that the network topology and any trends are strictly a function of mineral co-occurrence. A number of interesting trends can be observed in the topologies of unipartite mineral co-occurrence networks. Firstly, the copper (Cu) networks show a high density and low centralization; in the Cu network colored by chemistry (Figure 2A), there is strong chemical segregation in which sulfides (red nodes) cluster together, as do sulfates (yellow nodes), and Cu mineral containing oxygen and no sulfur (blue nodes) (Morrison et al., 2017b;Hazen et al., 2019a,b). This chemical segregation results in chemical trend lines throughout the graph, including sulfur fugacity, f S 2 , increasing from bottom (oxides) of the graph to top (through sulfates and into sulfides) and oxygen fugacity, f O 2 , increasing from the top left (sulfides) to the bottom (sulfates and oxides). For any variable that exhibits an embedded trendline, that trend can be used to predict the value of said variable for any node in which the value is unknown. In the case of chemical variables in mineral networks representing equilibrium assemblages, this could allow for the extraction of thermochemical parameters. Secondly, Figure 2B renders the Cu network with nodes colored by crystal structural complexity (Hazen et al., 2013b;Krivovichev, 2013Krivovichev, , 2016Krivovichev, , 2018Krivovichev et al., 2017Krivovichev et al., , 2018. Structural complexity is a mathematical measure for evaluating the symmetry and chemical complexity Frontiers in Earth Science | www.frontiersin.org of a mineral's crystal structure and IMA-approved ideal chemical formula, and converting that complexity into information, measured in bits. Krivovichev et al. (2018) hypothesize crystal structure complexity exhibited by minerals has increased through deep time, with the simplest structures existing at the earliest stages of mineral evolution and becoming increasingly complex moving into modern day. In this network, there is segregation resulting in a trendline from the simplest crystal structures to moderately complex structures. The most complex structures are few and scattered throughout the network, an unexpected trend that begs further investigation alongside whether or not age of first occurrence plays a role in the structural complexity trends observed in network Figure 2B. Thirdly, the chromium (Cr) network (Figure 3) has a very low density and high centralization, with the mineral phase chromite having the highest centrality (Morrison et al., 2017b;Hazen et al., 2019a,b). The most notable feature of the Cr co-occurrence network is its strong clustering by paragenetic mode, indicating that formational environment and mode is the strongest driver for Cr mineral co-occurrence. Lastly, Figure 4 illustrates the changes in carbon mineral co-occurrence through deep time. The earliest known carbon minerals are few and form a dense, highly interconnected network with low centralization. Through time into modern day, the density decreases slightly while the centralization becomes significantly more pronounced, forming two lobes of carbon mineral populations connected by a few key nodes of high centrality beginning as early as 799 Ma and becoming very distinct at 251 Ma, contemporaneous with the end-Permian mass extinction. These two lobes comprise different populations of carbon mineral chemistry, with the left lobe containing a much higher proportion of organic carbon minerals and hydrous phases containing transition elements, lanthanides, and/or actinides, and the right lobe having a higher frequency of anhydrous phases lacking transition elements, lanthanides, or actinides. This unexpected trend and its underlying geologic or biologic implications are the subject of further study.

Bipartite Mineral Network
In the bipartite network rendering (Figure 5), the set of colored nodes represent carbon mineral species, sized by their frequency of occurrence and colored according to the age of the oldest known occurrence (Hazen, 2019;Hazen et al., 2019a). The other set of nodes in black represent the localities at which the carbon minerals occur, sized proportionally to their carbon mineral diversity (i.e., the number of mineral species found at a locality). The edges between nodes signifying that a mineral occurs at a locality. Mineral bipartite diagrams illustrate many relationships between carbon minerals and their locations on Earth's surface. The first surprising feature of the network is the "U-shaped" (or "vase shaped" in 3D, see "Advanced Mineral Network Visualizations" section below) locality node distribution. This topology provides a striking visual representation of mineral ecology, specifically the LNRE frequency distribution in which there are a few very common species (such as calcite and aragonite), but most species are rare. In the network graph, the most common minerals fall FIGURE 3 | Unipartite chromium mineral network. Force directed, unipartite, chromium (Cr) mineral network rendering. Nodes represent Cr mineral species, sized according to their frequency of occurrence and colored according to their paragenetic mode. Edges represent co-occurrence of mineral species at localities on Earth's surface; edge length is scaled inversely proportional to frequency of co-occurrence.
at the bottom of the locality "U, " the frequency of occurrence quickly falls off moving up and out of the locality "U, " ultimately radiating outward and around the locality nodes where the majority of carbon minerals lie, most of which have small radii (i.e., they occur at very few localities). Another related feature clearly visible in the rendering is that rare mineral species tend to occur at localities rich in other rare species, as opposed to localities dominated by the more common species. This is visible at the individual node level, but also in the overall topology of the network: the mineral diversity of the localities, and therefore the size of the locality nodes, decreases from top to the bottom, as the network trends from more rare mineral species into more common mineral species. This trend gives researchers exploration targets to look for new, rare mineral species: at localities already known to host other rare mineral species. This qualitative observation can be parlayed into a quantitative method, specifically affinity analysis (see "Affinity Analysis" in the future directions section below) for predicting new locations of existing mineral species, predicting which minerals are likely to occur but have not been reported at a given locality, and possibly make predictions on the most likely FIGURE 4 | Evolving unipartite carbon mineral networks. Force directed, unipartite, carbon mineral network renderings. Nodes represent carbon mineral species, sized according to their frequency of occurrence and colored according to composition. Edges represent co-occurrence of mineral species at localities; edge length is scaled inversely proportional to frequency of co-occurrence. Each network represents a cumulative time bin in order to illustrate the changes in carbon mineralogy on Earth's surface through deep time.
locations for finding new mineral species. Additionally, an embedded timeline can be observed in the carbon minerallocality network topology. The nodes of Figure 5 are colored according to the age of first occurrence; however, their ages were not coded into the network layout -meaning that the network topology is strictly a function of mineral-locality occurrence.
Despite the lack of age information encoded in the topological layout, the oldest known minerals occur at the bottom of the locality "U" and radiate up and outward as the minerals become younger, with the youngest minerals skirting around the outside of the locality "U". Observing trendlines in any network system can lead to predicting missing values, but age, in particular, offers the opportunity to pin other parameters, such as chemistry, structural complexity, bioavailability, etc. to a timeline and therefore relate these parameters to geologic, biologic, or planetary events throughout deep time.

Advanced Mineral Network Visualizations
Network renderings are projections from multidimensional space into two dimensions and, inherently, some information is lost. Therefore, it is important to explore variations in visualization techniques that will allow the user to maximize the accuracy and amount of information rendered. With this in mind, we are developing 3D networks and also exploring virtual reality (VR) techniques for visualization and network manipulation. VR offers two primary benefits for visual analytics: (1) the ability to employ true 3D layouts that are not projected to 2D displays and offer additional insight especially for very dense networks, and (2) direct natural interaction, and observation of a network's response to such, creates an additional dimension for analysis not captured in static or non-interactive visualizations. At the following link, a video demonstration of an early VR visualization prototype of the bipartite carbon mineral-locality network in Figure 5 can be viewed 2 . The locality "U-shape" observed in the 2D version becomes a "vase" shape in 3D, with the most common, oldest carbon minerals at the base of the locality vase and the youngest, rarest carbon minerals radiate out of the top of the locality vase and down the sides. This and other networks can also be explored in an immersive fashion with VR. 2 https://www.youtube.com/watch?v=5GDnpqpOokU

ORGANIC CARBON MINERALOGY IN EARLY EARTH ENVIRONMENTS AND PLANETARY SYSTEMS
Currently, there are more than 50 organic mineral species approved by the IMA (Skinner, 2005;Perry et al., 2007;Echigo and Kimata, 2010;Hazen et al., 2013a;Piro and Baran, 2018), most of which form through alteration of biological materials (Oftedal, 1922;Rost, 1942;Nasdala and Pekov, 1993;Perry et al., 2007;Chesnokov et al., 2008;Witzke et al., 2015;Pekov et al., 2016;Hummer et al., 2017). Recent discoveries and new studies of organic minerals Bojar et al., 2017;Hummer et al., 2017;Mills et al., 2017) and minerals with metal-organic framework structures that contain metal centers bonded via molecular linkers into porous assemblies of different dimensionalities (Huskić et al., 2016) led to the formulation of novel geomimetic approaches in the design and synthesis of metal-organic frameworks (Huskić and Friščić, 2018;Huskiae and Frišèiae, 2019;Li et al., 2019). Most organic minerals observed on Earth today are oxalates and carboxylates of low nutrient value to microbes (Benner et al., 2010) and are therefore able to persist on a planet teeming with life. The presence of life limits the long-term survival of other organic crystals on modern Earth, but such crystals, including cocrystals, could have existed on early Earth and may currently exist on other planetary bodies (Hazen, 2018;Maynard-Casely et al., 2018;Morrison et al., 2018). Organic molecules can be created by abiotic processes (Glasby, 2006;Fu et al., 2007;Kolesnikov et al., 2009;McCollom, 2013;Sephton and Hazen, 2013;Huang et al., 2017). They have been shown to exist in many planetary settings, including meteorites (Cooper et al., 2001;Sephton, 2002;Pizzarello et al., 2006;Burton et al., 2012;Sephton and Hazen, 2013;Kebukawa and Cody, 2015;Cooper and Rios, 2016;Koga and Naraoka, 2017), comets (Kimura and Kitadai, 2015), and have been detected or are hypothesized to exist on many other planetary bodies in our solar system, including Mars, Titan, Enceladus, Callisto, and Ganymede (McCord et al., 1997;Formisano et al., 2004;Cable et al., 2012Cable et al., , 2018Kimura and Kitadai, 2015;Webster et al., 2015Webster et al., , 2018Zolensky et al., 2015;Hand, 2018;Maynard-Casely et al., 2018). On bodies with low temperatures there is also the possibility of clathrates containing and protecting organic molecules (Kvenvolden, 1995;Buffett, 2000;Shin et al., 2012;Hazen et al., 2013c;Maynard-Casely et al., 2018). Therefore, the FIGURE 5 | Bipartite carbon mineral-locality network. Force directed, bipartite, carbon mineral-locality network rendering. Colored nodes represent carbon mineral species, sized according to their frequency of occurrence and colored according to age of first occurrence. Black nodes represent carbon mineral localities on Earth and are sized according to mineral diversity (i.e., the number of mineral species found at a locality). Edges represent the occurrence of a mineral species at a locality. earliest, prebiotic minerals on Earth's surface, many of which may currently be present on other planetary bodies, were likely organic crystalline compounds, such as amino acids, nucleobases, hydrocarbons, co-crystals, clathrates, and other species that have since been consumed by cellular life here on Earth (Hazen, 2018;Morrison et al., 2018).

Network Structure Quantification
Many trends associated with geologic or planetary processes have been recognized in the topologies of mineral networks and a multitude of unrecognized trends also exist within mineral network topologies and/or data objects. Therefore, it is imperative to develop statistical methods for quantifying mineral network structures and relating these structures to their underlying geologic, biologic, or planetary drivers (Hystad et al., 2019b). Such methods will allow for the systematic study of network features, such as degree distribution, distribution of shared partners, centrality, clustering, connected subgraphs, and cliques, and will employ an exponential random graph model (ERGM) (Frank and Strauss, 1986;Snjiders, 2002;Hunter and Handcock, 2006;Snijders et al., 2006;Hunter, 2007;Pattison et al., 2007;Lusher et al., 2012). The models will determine whether or not the substructures within a network occur more often than would be expected by chance. They will also determine which attributes are most significant to mineral cooccurrence, or any other relationship of interest, including, for example, whether or not minerals of the same paragenetic mode tend to be found at the same location or if there is a more influential parameter. The ERGM model will be expanded to include multilevel networks (Wang et al., 2013), such as one of mineral species, their localities, and their chemical compositions. The multilevel approach will provide a means to model the complex dependence structures and interactions amongst the many network parameters. Additionally, we will employ a latent network model, which models unobserved factors that underlie the expression of network structures by incorporating latent variables (Kolaczyk, 2009b;Kolaczyk and Csárdi, 2014).

Natural Kind Clustering
Physical and chemical attributes of minerals are the direct product of and, as a result, encode their formational conditions and any subsequent weathering and alteration. Therefore, multivariate correlation of these attributes will allow for association of minerals to their paragenetic modes, resulting in a number of distinct "natural kinds" within a mineral species (Hazen, 2019). For example, diamond may have many "natural kinds, " including stellar vapor-deposited diamonds (Hazen et al., 2008;Ott, 2009;Hazen and Morrison, 2019), Type I (Davies, 1984;Shirey et al., 2013;Sverjensky and Huang, 2015), Type II (Smith et al., 2016), and carbonado (Heaney et al., 2005;Garai et al., 2006). Cluster analysis and classification algorithms will allow characterization and designation of various natural kinds of each mineral species and thereby relate the wealth of information contained within mineral samples to their geologic, biologic, and/or planetary origins. Designation of the natural kinds of minerals within the earliest environments of our universe is given in Hazen and Morrison (2019) and Morrison and Hazen (2020), preliminary work is underway to classify the natural kinds of many mineral species, with a particular focus on carbon-bearing phases, including diamond, calcite, and aragonite (Boujibar et al., 2019;Zhang et al., 2019).

Affinity Analysis
The mineral co-occurrence information stored in the MED and Mindat.org provide the means to make predictions on the most likely locations to find certain mineral species, geologic settings, deposits, and/or planetary environments, as well as a probabilistic list of minerals likely to occur at any given locality (Prabhu et al., 2019b;. Affinity analysis is a machine learning method that discovers relationships between various entities in a dataset. This method analyzes co-occurrence data and identifies strong rules based on associations between entities. This method was first introduced by Agrawal and Srikant (1994), and they present two algorithms to create association rules (i.e., the Apriori and AprioriTid algorithms). Apriori uses a bottom-up approach where subsets of entities that frequently co-occur are generated as candidates for testing against the data. The number of occurrences of the candidates are then compiled and patterns observed from the occurrence of these candidates are used to generate rules. For example, consider the following small carbon mineral dataset: Based on the occurrence of candidates, we can create the following rules: • 75% of the sets with malachite also contain calcite.
• 75% of the sets with malachite also contain azurite.
• 50% of the sets with malachite and calcite also contain azurite. • 25% of the sets with malachite and calcite also contain dolomite. • 25% of the sets with malachite and azurite also contain dolomite.
Such association rules can be used to predict the probability of occurrence for certain minerals or mineral assemblages, given the currently known mineralogy of a locality. Therefore, this method allows for prediction of the most probable locations on Earth or other planetary bodies to find mineral species or mineral assemblages of interest, as well as certain geologic settings, deposits, or environments. Likewise, this method can assess the probability of finding any mineral species at a locality in question. In a preliminary case study on Mindat.org mineral occurrence data, pair-wise correlations (i.e., candidates of size 2) were used to predict a likely locality of the mineral species wulfenite. The model predicted the Surprise Mine, Cookes Peak District, Luna County, NM, United States as a very likely new location of wulfenite (locality 3 ). Erin Delventhal, a member of the Mindat.org management team, validated this prediction by going to Cookes Peak and positively identifying an occurrence of wulfenite (image of collected sample 4 ). These preliminary results highlight the promise of discovery with affinity analysis in mineral systems.

GPlates Plate Tectonic Reconstructions
GPlates is an open-source and cross-platform plate reconstruction software that enables users to incorporate any vector or raster data into digital community plate motion models (Merdith et al., 2017;Müller et al., 2018;Young et al., 2019). Incorporation of mineralogical data into plate tectonic reconstructions (Figure 6) will illuminate tectonic drivers and feedbacks of mineralization through deep time, such as identifying tectonic settings that preferentially generate or focus particular mineral species. We will begin to answer questions related to subduction conditions (i.e., depth of mantle wedge interaction and estimation of slab angle, rate of subduction, devolatilization of the subducting slab, etc.) of subduction-related mineralization, characterize mineralization associated with mantle plume and hydrothermal settings, collisional regimes, and identify mineralization clearly not controlled by tectonic influences. A video of a preliminary reconstruction model of carbon mineralization through deep time (from modern day to 1.0 Ga) can be found at https://4d.carnegiescience.edu/explore-our-science.

Quantifying Preservational Bias
Preservational and sampling bias is inherent to geologic materials, the magnitude of which is not uniform through time or across a system and, therefore, can be very difficult to quantify. Recent data-driven studies of mineralization associated with the Rodinian assembly (Liu et al., , 2018b have examined the differences in the mineralogy and geochemistry of igneous rocks associated with the assembly of the Rodinian supercontinent, as igneous rocks of Rodinian age tend to have different geochemical signatures than those from other supercontinents. The question remains: how much of the trend is related to conditions and processes during assembly and how much is related to preservational and sampling bias? This is evident in Figure 6 where major increases in carbon mineralization is associated with the younger mega-continent of Gondwana and supercontinent of Pangea, while the signal related to Nuna and Rodinia assembly is more subdued in the cumulative frequency plot. This question must also be asked of many other formational environments, including those relevant to carbon mineralization (e.g., carbonate platforms, carbonatites). Ongoing and future studies will attempt to quantify preservational bias in the mineralogical record by examining factors that contribute to preservation, such as mineral characteristics (e.g., solubility, hardness), common tectonic settings of mineral formation, etc. It is also important to consider human factors that govern sampling, including economic significance of the material, physical characteristics (e.g., color, crystal habit, size, luster), and scientific importance, and may result in sampling bias within datasets. These data will be used to develop statistical models for prediction of the amount of erosional loss through deep time.

Microbial Populations and Mineral Systems
An underlying driving principle of studying Earth's mineralogy through deep time is to gain insight into the co-evolution of the geosphere and biosphere. Mineral evolution studies characterize Earth's mineralogy during the time of life's emergence and throughout its evolution, but how do we garner an understanding of direct influence and feedback systems between Earth materials (e.g., the "geochemical environment") and microbial populations? Given the dearth of ancient microbial samples, we can examine modern day equivalents, particularly in geochemical environments most likely to be analogous to ancient environments (e.g., hydrothermal vents, hot springs). Therefore, a study is underway to employ advanced analytics and visualization, including network analysis, to characterize the complex, multidimensional, multivariate relationships between the metagenomes of extant microbial populations and their geochemical environments Buongiorno et al., 2019aBuongiorno et al., ,b, 2020Giovannelli et al., 2019). Figures 7A,B illustrate a preliminary look at bipartite networks of sampling site locations and their metagenomes (A) and mineralogy (B). A multilevel network approach (see "Network Structure Quantification" section above) and transfer learning techniques will be used to relate location, metagenomic data, and mineralogy ( Figures 7A,B), as well as aqueous geochemistry, temperature, pressure, pH, salinity, and more, and to generate models quantifying the complex relationships therein. These studies are examining trends in metagenomic and geochemical parameters across a single arc system (Barry et al., 2019a,b), across multiple systems such as volcanic arcs, mid-oceanic ridges, and hot spots, and across disparate systems around our planet, as depicted in Figures 7A,B (e.g., including settings like acid-mine drainage, permafrost, and hot springs). Targeting closely related systems, such as a single volcanic arc or all hotspot related hot spring systems, allows tight correlation of changes in geochemical conditions with changes in microbial communities due to the fact that there is less variance in the environmental parameters. Whereas a more global comparison allows for examination of all possible environmental and microbial variables. Preliminary results show distinct, complex trends in geochemical parameters related to changes in protein functions of microbial populations.

DISCUSSION
Motivated by understanding Earth's mineral diversity and distribution through deep time, bioavailability of redox sensitive elements during the emergence and evolution of life, biosignatures at mineralogical and planetary scales, and underlying geologic and biologic drivers of mineralization, we have made many discoveries in carbon science, including: (1) Earth's mineralogy is a function of the physical, chemical, and biological processes that are different at each stage of planetary evolution.
(2) Earth's mineral diversity and spatial distribution FIGURE 6 | GPlates plate tectonic reconstruction snapshots with carbon mineral occurrences, and a cumulative frequency plot highlighting that some increases in carbon mineral occurrences are contemporaneous with changes in the supercontinent cycle. Full video (modern day to 1.0 Ga) available at https://4d.carnegiescience.edu/explore-our-science.
FIGURE 7 | (A,B) Bipartite microbial population-locality and mineral-locality networks. Force-directed, bipartite networks of a preliminary analysis of interaction between metagenomes and mineralogy of the same sampling sites. (A) Metalloprotein oxidoreductases (colored nodes; enzyme commission EC1 class) and the sites where they were found (black nodes). Enzyme nodes sized according to their counts and colored by their subclass. (B) Bipartite network of the mineral diversity at the same sites. Mineral nodes in gray, sized according to their mineral diversity; site nodes in black.
follows an LNRE trend, a trend that is visually represented in the topology of mineral-locality bipartite network renderings and is likely a planetary-scale biosignature.
(3) Predication of as-yet undiscovered mineral species, which spurred the Carbon Mineral Challenge -an initiative that has reported 30 new carbon minerals species in less than 3 years. (4) Recognition of embedded trend lines in network topologies, such as those of chemical composition, crystal structural complexity, time, and paragenetic mode. In addition, this work has developed new tools for visualization of mineral systems, including mineral networks, as well as 3D and VR platforms thereof. Furthermore, we are exploring and are on the brink of discoveries related to (1) quantifying mineral network structures and their underlying geologic, biologic, and planetary drivers, (2) predicting mineral paragenetic mode on Earth and other planetary bodies through natural kind clustering, (3) predicting the new locations of mineral occurrence and missing minerals at specified locations on Earth's surface via affinity analysis, (4) investigating the tectonic drivers of mineralization through deep time through integration with paleotectonic reconstructions, (5) understanding the complex feedback systems controlling the relationships between mineralogy and the geochemical environment with microbial populations and their enzymatic functions, and (6) quantifying preservational and sampling bias in the mineralogical record. These recent discoveries and new research directions show great promise for further unraveling the complexities surrounding carbon mineral formation, the deep carbon cycle, and life's coevolution with Earth materials and processes.

AUTHOR CONTRIBUTIONS
The individuals listed with the following projects or databases provided discussion, performed analyses, and/or collected/curated data. JG, RH, RD, and SM: the International Mineralogical Association list of mineral species (RRUFF.info/IMA). CL, DH, JG, JR, RH, RD, SR, and SM: the Mineral Evolution Database (MED). AE, AP, JR, RH, RD, SM, and SK: the Mineral Properties Database (MPD). JR: Mindat.org. AE, AP, JG, JR, PF, RH, RD, SM, and SK: Global