The Gulf of Mexico in trouble: Big data solutions to climate change science

The latest technological advancements in the development and production of sensors have led to their increased usage in marine science, thus expanding data volume and rates within the field. The extensive data collection efforts to monitor and maintain the health of marine environments supports the efforts in data driven learning, which can help policy makers in making effective decisions. Machine learning techniques show a lot of promise for improving the quality and scope of marine research by detecting implicit patterns and hidden trends, especially in big datasets that are difficult to analyze with traditional methods. Machine learning is extensively used on marine science data collected in various regions, but it has not been applied in a significant way to data generated in the Gulf of Mexico (GOM). Machine learning methods using ocean science data are showing encouraging results and thus are drawing interest from data science researchers and marine scientists to further the research. The purpose of this paper is to review the existing approaches in studying GOM data, the state of the art in machine learning techniques as applied to the GOM, and propose solutions to GOM data problems. We review several issues faced by marine environments in GOM in addition to climate change and its effects. We also present machine learning techniques and methods used elsewhere to address similar problems and propose applications to problems in the GOM. We find that Harmful Algal Blooms (HABs), hypoxia, and sea-level rises have not received as much attention as other climate change problems and within the machine learning literature, the impacts on estuaries and coastal systems, as well as oyster mortality (also major problems for the GOM) have been understudied – we identify those as important areas for improvement. We anticipate this manuscript will act as a baseline for data science researchers and marine scientists to solve problems in the GOM collaboratively and/or independently.


Introduction
The Gulf of Mexico was formed by plate tectonics around 300 million years ago. Shallow continental shelf waters cover nearly half of the basin. The Gulf of Mexico's most significant biological and geological provinces are the coastal zone, continental shelf, continental slope, and abyssal plain. Tidal marshes, sandy beaches, mangrove-covered areas, and several bays, estuaries, and lagoons make up the coastal zone. Buried salt domes occur at varying depths on the shelf and on the slope that dips downward to the abyssal plain that are connected with economically valuable oil and natural gas resources. The Gulf of Mexico receives water from more than 150 rivers and runoff from 31 of the 50 states, making it a focus point for water quality research and improvement. The GOM has many natural resources and is home to many species of fish. Fishing is an important commercial activity, with red snapper, amberjack, tilefish, swordfish, and various grouper, as well as shrimp and crabs, being the most common catches. Many of the bays and sounds are also used to harvest oysters on a massive scale. Shipping, petrochemical processing and storage, military use, paper manufacturing, and tourism are all other key sectors along the Gulf coast (Worrall and Snelson, 1989).
The Gulf of Mexico has mixed semi-diurnal and diurnal tidal cycles due to the unusual form of its basin. The warm water temperature in the Gulf of Mexico can support major Atlantic hurricanes that bring widespread human death and destruction, as Hurricane Katrina did in 2005. The GOM approximately comprises of 30% continental shelf which is shallower than 200 m and the shallow waters in the northern GOM are warm temperate. Hurricanes do intensify over shallow waters despite the low tropical heat potential Potter et al. (2019). Although the water temperature drops when a hurricane passes through, it quickly recovers and becomes capable of sustaining another tropical cyclone (NASAEarthObservatory, 2005). Due to massive fresh water discharge from the Mississippi river and other rivers into the Gulf, the content of nitrogen and phosphates inflow into the Gulf has increased (Tian et al., 2020), and as a consequence led to hypoxic zones in the Gulf (Diaz and Rosenberg, 2008). In addition, the GOM is one of the World's major offshore petroleumproducing zones, accounting for one-sixth of all US output. The GOM has suffered from catastrophic oil spills like Deepwater Horizon and Ixotic I oil spills, and accounts for thousands of minor oil spills every year. The oil spills, agricultural run-offs, red-tide algae blooms, and hypoxic zones are some environmental threats in addition to the climate change impacting the GOM.
Oceans being the largest ecosystem on Earth are heavily impacted by climate change. On a local to global scale, oceans significantly affect the weather, and changes in climate can adversely impact many of the ocean's qualities. The primary indicators of climate change on oceans are the sea surface temperatures, sea-level rise, coastal flooding, and ocean acidity. The GOM has shown these primary indicators and hence is vulnerable to climate change (Epstein, 2005). Rising surface ocean temperatures, freshwater and nutrient imports, and CO 2 levels in the atmosphere will further aggravate these problems. Increase in ocean acidity and freshwater run-off also exacerbate the hypoxic zones and algae blooms. The Deepwater Horizon oil spill was a catastrophicevent which led to expanding our knowledge of the GOM and restoration efforts. However, there is still much to be learned and we should not wait for any further disasters to happen.
Today, we are flooded with data originating from several sources in a wide range of application areas. It is now relevant to say that we are in an era of "big data (Data, 2008)". With the abundance of data, the decisions that were previously relying on mere speculation or simulated models can now be based on facts. Big data is revolutionizing scientific research and several sectors are benefiting by using historic and real-time data for better analysis. However, oceanography has yet to properly embrace the era of big data. Ocean data comes from models, satellites, buoys, research vessels, sensor networks, unmanned maritime vehicles, and others, but the majority of it comprises of remote sensing and model data. Most of the ocean data is hosted by regional and global ocean observation platforms like OceanCube (Stanic et al., 2020), the U.S. Integrated Ocean Observing System Program (IOOS), The Gulf of Mexico Coastal Ocean Observing System (GCOOS), etc. Several researchers, modelers, and industries use ocean observation platforms to get the data and make better use of it. However, our capacity for data analysis has not kept pace, and the rising gap is becoming a major bottleneck for making efficient use of the data available, as well as a roadblock to expanding data collecting even further.
Machine learning which itself is a subfield of artificial intelligence, is broadly used in various fields, such as epidemiology (Appice et al., 2020), physics (Carleo et al., 2019), ecology (Huang et al., 2018), data science (Huang et al., 2016), agriculture (Liakos et al., 2018), and language processing (Iliev et al., 2022), but is newly adapted by marine science. With the amplitude of ocean science data available, machine learning has great potential to answer several problems and expand marine science research. Machine learning techniques are used in addressing climate change, climate analysis, ecological environments, natural disasters, and global patterns Rasouli et al. (2012); Kim et al. (2014); Deo and Sahin (2015); Mosavi et al. (2018); Xiao et al. (2019); Mansfield et al. (2020); Rosso et al. (2020). However, climate change being a global problem and oceans covering more than 70% of land on Earth, machine learning is yet to be fully adapted by marine science to address any local and global problems using available ocean data. While machine learning is a hot topic engaged by numerous researchers from different fields it is not being utilized to its fullest with data available for the GOM region to address any climate change and regional problems. This paper aims to provide an overview of critical problems in the GOM that need attention of machine learning experts to utilize the available ocean science data originating in the GOM. We discuss innovative engineering efforts and ocean observing platforms in the GOM that provide historical and real time data from several sources to marine researchers and data scientists to use the data to better understand the GOM. We highlight topics including but not limited to climate change and its impacts, CO 2 emissions, ocean acidification, algal blooms, oyster reefs, severe weather events, ports/coastal resilience, etc.

Climate change and other concerns for the Gulf of Mexico
The Gulf of Mexico is a vast ecological system with environmental and economic salience. It supports diverse ecosystems that provide significant benefits like wildlife habitat, erosion prevention, stable shorelines, hurricane buffer, fish nurseries, improved water quality, and tourism. The major sectors of the GOM's economy directly and indirectly rely on the coastal ecosystems. However, global warming and climate change are having a drastic impact on these ecosystems posing a serious threat and hence needs more attention from diverse researchers to understand, analyze, and mitigate any future impacts.
Many of the fundamental elements that regulate algal development, such as water temperature, nutrients, light, and grazers, are controlled by climate, and hence can be predicted to affect changes in the species composition, trophic structure, and function of aquatic ecosystems. Global temperatures are rising, a fact now widely acknowledged to be linked to human activity (Lee, 2007). Over the next century, average sea surface temperatures are anticipated to rise by up to 5°C, resulting in ice melt and altered precipitation in many maritime locations (Scavia et al., 2002;Levina et al., 2007;Doney, 2010;Hoegh-Guldberg and Bruno, 2010;Doney et al., 2012). The regional minimal and maximum temperatures have already increased by 0.8°C and 0.4°C since 1950 (Levina et al., 2007), with estimated warming rate of the GOM sea surface temperature of 0.19°C per decade (Wang et al., 2023). In the future climate, coupled model simulations show weakening of the Loop Current in the GOM during the 21st century (Liu et al., 2012). The study by 0 Misra et al. (2019) suggests that in the projected climate, the warming of the GOM will be enhanced as a result of the increase in heat flux from the atmosphere to the ocean and the increase in anomalous advective heat flux convergence due to the slowing of the heat transport across the Loop Current. Seasonal warming of ocean surfaces intensified by global warming and rising carbon dioxide levels in the atmosphere likely increase the water stratification and worldwide deoxygenation, aggravating red tides, hypoxia, aquatic species migration, and illnesses (Keeling et al., 2010;Cheng et al., 2017). At the same time, anthropogenic CO 2 intake is predicted to lower ocean pH and carbonate saturation levels (Sarmiento et al., 1998;Feely et al., 2009). This confluence of stressors could have serious consequences for ocean ecosystems and fisheries, particularly in coastal areas where eutrophication has already resulted in hypoxic and acidic conditions (Boyd and Doney, 2003;Bopp et al., 2013;Melzner et al., 2013;Altieri and Gedan, 2015;Breitburg et al., 2015;Flynn et al., 2015;Levin et al., 2015;Laurent et al., 2018).
Climate change is poised to exacerbate impacts of coastal eutrophication in the northern Gulf of Mexico (Laurent et al., 2018). Coastal eutrophication is often accompanied by hypoxia (Howarth, 1995;Nixon et al., 1995;Vitousek et al., 1997;Caraco and Cole, 1999;Bennett et al., 2001). The increase in frequency of hypoxia and anoxia formations in shallow, coastal and estuarine areas is highly likely a result of human activities (Diaz et al., 1995). One of the key stresses affecting estuarine and coastal ecosystems is nutrient over-enrichment from anthropogenic sources (Bricker, 1999;Howarth et al., 2000;Cloern, 2001). In many parts of the world, there is growing worry that an abundance of nutrients from various sources is having widespread biological repercussions on shallow coastal waters. Reduced light penetration, increased abundance of nuisance macroalgae, loss of aquatic habitat like seagrass or macroalgal beds, noxious and toxic algal blooms, hypoxia and anoxia, disruptions in trophic interactions and food webs, and impacts on living resources are just a few of the consequences (Vitousek et al., 1997;Schramm, 1999;Anderson et al., 2002;Rabalais, 2002;Rabalais et al., 2002). Excessive nitrogen inputs from rivers drive algae development and subsequent decomposition in bottom waters in many coastal locations. In combination with stratified waters, the ensuing oxygen consumption and dissolved inorganic carbon generation frequently result in hypoxic ([O 2 ] < 62.5 mmol O 2 m 3 ) and acidified situations (Rabouille et al., 2008;Cai et al., 2011). Hypoxic conditions caused by eutrophication are a common and growing problem that will most likely be aggravated by global warming (Diaz and Rosenberg, 2008;Rabalais et al., 2010).
Anthropogenic nutrient overabundance, combined with rising temperatures and a growing frequency of extreme hydrologic events (storms and droughts), is hastening eutrophication and encouraging the spread of harmful algal blooms (HABs) across the freshwater-to-marine continuum (Paerl et al., 2018). Water resources, fisheries, recreational usage, tourism, and property values are all threatened by cyanobacterial blooms. Climate change makes predicting changes in HABs frequency, intensity, and proliferation much more difficult (Paerl and Huisman, 2009;Havens and Paerl, 2015;Wells et al., 2015;Paerl et al., 2018). Changes in ocean and lake circulation, stratification, and upwelling, wind speed, and cyclone frequency and intensity, as well as global warming, altered precipitation patterns, and sea level rise, all play prominent roles in modulating HABs dynamics (Paerl and Paul, 2012). When combined with excessive fertilizer loading, hydrologic changes and climate change allow HABs to grow in size and remain longer (Paerl et al., 2016;Paerl, 2017). (Hu et al., 2011) suspects the rise of chl-a content after oil spills and provided results stating that the northeastern Gulf of Mexico became greener after the Deepwater Horizon oil spill. The GOM experiencing 1000s of oil spills every year may also be contributing to growth of chl-a. Within days after Hurricane Isabel in 2003, a large phytoplankton bloom developed in Chesapeake Bay, linked to increased nutrient loads (Miller et al., 2005). Algal blooms and extensive hypoxia/anoxia occurred shortly after several hurricanes affected the lagoon of the Neuse River Estuary in the 1990s (Burkholder and Glibert, 2006;. A bloom of the picocyanobacterium Synechococcus in eastern Florida Bay, lasting more than 18 months, followed an input of nutrients from the highfreshwater discharge caused by Hurricanes Katrina, Rita, and Wilma in 2005 (Glibert et al., 2009).
Human actions have altered the atmosphere and ocean environment in ways that affect storms and extreme climate occurrences. Changes in extremes, which are outside the bounds of prior weather, are the most common way that climate change is perceived. Trenberth (2012) and Solomon et al. (2007) state severe weather events are also a result of climate change. Intensity of hurricanes and storm occurrences increased in the past decade. Increasing hurricanes and storm frequency can increase the loss of wetlands and coastal erosion. According to published assessments of tide gauge data, the average worldwide sea-level rise rate over the 20th century was 1.7 mm/yr (Bindoff et al., 2007). Climate scientists project a 0.25 to 0.5 m global sea level rise by 2100, which is more than double the pace of rise during the 20th century under some carbon emission scenarios (Meehl et al., 2007). Sea-level rise projections and impacts in coming years, global and regional sealevel rise scenarios, and the climate change impacts are very well documented in the reports (Reidmiller et al., 2017;Fox-Kemper et al., 2021;Sweet et al., 2022) generated by the National Oceanic and Atmospheric Administration (NOAA) and the Intergovernmental Panel on Climate Change. Assessments of coastal vulnerability indexes can reveal the relative risk of coastal change as a result of future sea-level rise. Barrier islands are dynamic habitats that experience both gradual and quick change as a result of waves, tides, and currents, as well as rapid change as a result of catastrophic storms. Due to increased sea-level rise and changes in the frequency and intensity of storm occurrences, these islands are likely to alter dramatically during the next century. Because of the dynamic nature of barrier islands and the importance of these areas, natural resource managers must understand how habitats on barrier islands are changing or may change over time in order to identify when and where management activities are necessary (Enwright et al., 2021). The findings of nearly three decades of marine geological research in the northwest Gulf of Mexico have been compiled in an effort to better understand the factors (e.g., sea-level rise, sediment supply, subsidence, and antecedent topography) that influenced coastal evolution during the last eustatic cycle (120 ka to present). The geological record shows that the northwestern Gulf of Mexico's low-gradient coastal ecosystems will continue to be adversely impacted by continuous sea-level rise, decreasing sediment supply, and human interference (Anderson et al., 2014).
The aquatic and wild life are also suffering the impacts of climate change. The GOM has around 22 percent of the world's nonarctic tidal marsh, accounting for about 62 percent of North America's tidal marsh habitat. These tidal marshes, which cover around 9,880 square kilometers of Gulf coast (Greenberg and Maldonado, 2006), feature both riverine and marine-dominated habitats with varied tidal effect, freshwater and sediment input, and plant species diversity. The biology and ecology of tidal marsh birds along the GOM are little understood, especially in comparison to Atlantic coast populations (Rush et al., 2009(Rush et al., , 2010a2010b), and we have limited information on current population status and trends for most marsh birds (Conway, 2011). Despite the potential fragility of tidal marshes along the northern Gulf Coast, little is known about the life histories, behavior, distribution, and ecological affinities of the many marsh bird species. Predicting the future of tidal habitats and indigenous marsh bird species in the context of global climate change is difficult given current information gaps. Regional environmental changes that affect marine ecosystems will be brought on by a warming climate. These changes will include variations in ocean circulations, rising ocean acidity, changes in riverine discharge, variations in precipitation and evaporation rates, and loss of coastal habitat due to flooding (Scavia et al., 2002;Roessig et al., 2004). Because of physiological and behavioral reactions to environmental gradients, fish and fisheries are likely to be impacted by changes to species ranges, loss or degradation of nearshore fish habitat, modification of larval dispersal pathways, and loss of nearshore fish habitat (Scavia et al., 2002;Perry et al., 2005;Hare et al., 2010). The Atlantic bluefin tuna (BFT) is a highly migratory species that feeds in frigid North Atlantic waters before migrating to tropical areas to reproduce. Projected future warmer climate will cause the upper ocean temperatures in the western Atlantic spawning habitat in GOM to rise dramatically, which may have an impact on the temporal and spatial range of BFT spawning activities. An ensemble of 20 climate models simulations were studied in (Muhling et al., 2011) to predict the mean temperature variations within GOM, in combination with larval BFT data to understand the effects of warming on the suitability of GOM as a spawning ground. The effects of climate, fishing, and other anthropogenic disturbances on three species (Atlantic Goliath grouper, red grouper, and tilefish) that are economically important to GOM and called habitat engineers are presented in (Coleman and Koenig, 2010). The intricate vertical (benthic-pelagic coupling) and horizontal (inshore to offshore) links between habitats, species, and marine strata are shown by the habitat engineers at various geographical and temporal scales. Oceanic currents connect them physically, and ontogenetic migrations and trophic interactions connect them biologically (Coleman and Koenig, 2010). Rising temperatures and decreasing freeze frequency/intensity may characterize climate change at temperate latitudes, potentially shifting the latitudinal limits of vegetation and animals (Bakkenes et al., 2002;Walther et al., 2002;Loarie et al., 2008). (Comeaux et al., 2012) examined the implications of mangrove expansion into coastal wetlands. Heavy rain and snowmelt in the Midwest prompted catastrophic flooding of the Mississippi River in the spring and summer of 2019, necessitating two openings of the Bonnet Carre Spillway (BCS) to relieve pressure on New Orleans' levees. That allowed a massive amount´of freshwater to flow into Lake Pontchartrain and then into the Mississippi Sound. If the frequency or duration of BCS openings increases as a result of increased precipitation, oyster populations in Mississippi may become unsustainable for harvesting unless future freshwater intrusions are factored into management plans (Gledhill et al., 2020). Although oyster reefs around the world are essential for maintaining healthy ecosystems, most of them have disappeared over the past 200 years (Beck et al., 2011;Grabowski et al., 2012). About 69 percent of the US commercial wild eastern oyster harvest is produced by the GOM, which is home to the greatest remaining wild oyster fisheries (Beck et al., 2011;NOAA, 2022).
Changing coastlines and rising sea levels may necessitate the relocation of highways, rail lines, or airport runways in the long run, with significant implications for port facilities and coastal infrastructure. Increased hurricanes and other extreme weather events could have ramifications for emergency evacuation planning, facility maintenance, and safety management for surface transportation, maritime vessels, and aviation. Rain and snowfall patterns, as well as periodic flooding patterns, may have an impact on safety and maintenance. Storms, sea level rise, sedimentation and erosion rates and mechanisms, and changes in critical variables such as prevailing winds, waves, currents, and precipitation rates must all be addressed in order to gain a better understanding of the possible effects on ports and maritime shipping. Enhancing climate models to replicate storm events, determining the impact of severe events on ports and shipping, and simulating local sea level rise near ports and shipping channels are all research priorities. A summary of the literature cited in this section is presented in Figure 1 showing the proportion of the literature under specific categories.

Ocean of data
Data about America's seas and coasts comes from a variety of sources across several sectors and is used for a wide range of purposes. They are utilized to manage ocean ecosystems and fisheries, promote the blue economy's sustainable and equitable growth, conserve endangered marine species, and assist the global community in mitigating and preparing for climate change. Ocean science data, climate, and hydrodynamic models aid in the protection of endangered species by identifying acceptable areas for offshore wind energy production facilities, improving storm prediction models, understanding marine species migration/habitat change, warming waters, and planning for sea level rise and other climate change impacts on coastal populations to name a few. With newtechnology, new uses, and renewed national pledges to understanding and managing our oceans, ocean science data collection and access is undergoing a revolution. We have an unrivaled ability to gather and evaluate data on our environment and human usage of marine natural resources, as well as to provide enormous potential for scientific and decision-making improvement. The ability of government organizations to interpret and incorporate ocean data from new sources, including new technology, into decision-making is currently limited. Specifically, the data management infrastructure has not kept up with the virtually exponential increase in data being collected by the public and private sectors. In this section, we discuss the ocean science data originating from the Gulf of Mexico, where this data is available, and present some characteristics of ocean science data.

Ocean data categories
Ocean science data is collected by several entities for various purposes. Academia, government agencies both state and federal, and private industries collect most ocean science data to fulfill their project requirements. Ocean data can be broadly categorized into five types: physical data, biological data, chemical data, geological data, and socioeconomic data (Trice et al., 2021).

Physical ocean data
Physical ocean data represents the physical attributes and dynamic processes of the oceans including ocean currents, coastal dynamics, internal waves and tides, temperature and salinity structure, and more. This data is captured by variables such as sea-surface temperature, sea surface salinity, ocean surface heat flux, wave conditions, surface and subsurface currents, bathymetry, and seabed forms. Physical ocean data is important in helping many industries and government agencies relying on oceans. Sea surface temperature and wave conditions are extensively used in weather prediction models. Physical data are also used to study climate change and also include atmospheric data such as heat and hydrological fluxes. Physical ocean data are also used by ports, research and commercial vessels in navigation as they face challenges due to shifting currents and tides and variable water levels. Physical ocean data collection is made by in-situ ocean instruments such as sensors and unmanned marine vehicles, and some datasets can also be collected remotely by satellites.

Biological ocean data
Biological data is more related to marine organisms, their ecology, and how marine organisms interact with the ocean environment. Biological ocean data is used to track and protect endangered species, maintain and boost ecosystem health, minimize and mitigate the impacts of human activities on marine wildlife, and study climate change effects. Fisheries data collected by commercial and recreational fishing industries, and fisheryindependent data collected by researchers and government-led organizations are also good sources of biological data. Ocean sounds and acoustics data, for example, larval recruitment, biodiversity, marine mammals, and biotic sounds, and underwater videos and photos for example reef mosaicking, fish communities, species identification, etc. also contribute to biological ocean data. Biological data has been collected traditionally by taking the samples in situ, requiring human input. At sea, sample collection, data post-processing, and physical sample archiving are expensive and tedious. New emerging methods like image recognition and environmental DNA (eDNA) can help in reducing post-processing times, maintaining data storage, and reuse of data.

Chemical ocean data
Chemical ocean data represents the properties of ocean water, processes and cycles of ocean waters, and how ocean water interacts with the atmosphere and seafloor. These data help in understanding the ocean's role in climate change as the oceans are a major carbon sink. Ocean acidification is also a central topic investigated using chemical ocean data. Some of the chemical data variables are inorganic carbon, oxygen, pH, nutrients, ocean color, and dissolved organic carbon. Chemical ocean data collection is made using in-situ ocean sensors like pH sensors, pCO 2 sensors, etc., and also through remote sensing satellites.

Geological ocean data
Geological ocean data relates to the seafloor and sub-seafloor features. These data give insights into plate tectonics, volcanic processes, and other phenomena. Geological data may overlap with biological, chemical, and physical data to study ocean circulation, sedimentation patterns, and biological productivity, sediment coring, paleoceanography, stratigraphy, estuarine system, beach erosion, and sea level rise.

Socioeconomic ocean data
Socioeconomic ocean data relates to ocean-based industries like shipping, fishing, tourism, and offshore renewable energy. Community impacts on coastal areas, oceans, and industries related to the oceans can also be considered as socioeconomic data. Analyzing and understanding the socioeconomic data can help find the climate change indicators as they affect employment, exports, demographics, and unemployment rates.

Ocean data platforms
The aforementioned ocean data types can be collected using different ocean technologies and methods in-situ and/or by remote sensing. Insitu data collection in the water uses portable ocean sensors, sensors on ships, submersibles, and unmanned marine vehicles. Sensing ocean data remotely without direct contact with the medium is made possible using satellites. Satellites are equipped with sensors that make measurements of the ocean like temperature, wave height, and watercolor. Sampling is also another data collection method where samples of specimens like water, sediments, etc are collected in situ. A mathematical model can also be developed to generate ocean data in a simulated environment. Below we discuss several data platforms that collect different sets of ocean science data. Ocean data platforms themselves are broad topics and discussing them in-depth is out of the scope of this paper.

Unmanned marine vehicles
UMVs are essentially vehicles that can be controlled remotely, either wired or wireless. Operators can control the vehicles either on a ship or onshore. UMVs can also be preprogrammed to go to a specific location and time for data collection. UMVs can be equipped with a variety of sensors for data collection for example temperature, pressure, dissolved oxygen, but also active and passive acoustic sensors. UMVs can store data on board which can be retrieved upon recovering the UMV while some UMVs can also transmit the data wirelessly via satellites, wifi, and/or cellular networks. UMVs are powered by batteries and can be recharged via solar energy or wind energy, while others can be recharged aboard the ship. Depending on the size, range, and endurance of UMVs, they can be deployed for longer time periods and collect data at desired sampling rates.
Gliders are also a type of preprogrammed unmanned marine vehicle that can essentially go underwater to follow a particular path. Gliders have sensors installed on them and record the data throughout their track until they reach the surface. This data can be used to generate a 3D visualization of the ocean data for example temperature and salinity, but also many more like acoustic measurements. Gliders transmit the recorded data to a satellite, and users can download it. Gliders can take new instructions from operators via the satellite link and can also change the data sampling rates.
Remotely operated vehicles (ROVs) are surface and underwater vehicles that are controlled by operators on the surface. ROVs are tethered to a main control board that can stay on the ship or can be connected to any other human-operated vehicle (HOV). ROVs and HOVs have an onboard cameras and manipulator arms that can be controlled by the operator. ROVs and HOVs send the real-time data feed to the mother ship via the tether cable. Thus they provide telepresence to the scientific community and public from Exploration Vessel (E/V) Nautilus, NOAA Ship Okeanos Explorer, and Schmidt Ocean Institute's (SOI) Research Vessel (R/V) Falkor during underwater exploration Raineault (2019). ROVs and HOVs are also used to collect specimen samples, and sediments, and capture marine organisms.

Buoys and floats
Buoys are platforms that comprise several instruments and sensors. While static buoys are anchored to the seafloor, some buoys like Argo buoys are free-floating and provide vertical profile of the water column by going up and down and drifting with the current at a prescribed depth/pressure until they die. Buoys equipped with mission-specific sensors and instruments can be deployed at a desired location. Buoys have central control board that gathers the data from all the sensors on the buoy and transmits either averaged data or the whole collected data to a satellite. Nearshore buoys may use a cellular network for data transmission instead of expensive satellite data transmission rates. Some buoys can also be equipped with profilers that move through the water column vertically carrying instruments/sensors which sample the water column.

Research vessels
Research vessels which are floating labs also act as a data platforms. Vessels can carry instruments and sensors and collect the data or take specimen samples along their route. This data can be processed and analyzed onboard the vessel for any quick analysis.

Satellites and aircrafts
Satellites with the right instruments and sensors can be used to sense ocean data remotely and conduct non-invasive observations of the oceans. Satellites can be used to measure the ocean data variables like sea surface height, watercolor, sea surface temperature, etc. Satellites are also used to collect the data from buoys, and unmanned marine vehicles and transmit that data to land stations. Aircrafts are also used to collect remote observations of the ocean to obtain images or other data related to the ocean landscape or its properties. Earth observing system data and information system (EOSDIS) and National Environmental Satellite Data and Information Service (NESDIDS) maintain environmental data for land, ocean, and atmosphere applications collected by several satellites, aircraft, and field measurements. Remote sensing data is also used to study the spatial and temporal characteristics of earth's landscape which can be used to understand the effects of human activities, population growth, and natural and biological factors (Chen et al 2020a;2021). Studies like Chen et al. (2022) have developed methods for extracting coastline information using remote sensing images to study the evolution of the coastlines.

Models
A mathematical model can simulate the ocean state and describe the ocean conditions and trends in a parameterized and quantitative way generating ocean data like temperature, salinity, waves, etc. This model data is also extensively used in understanding the oceans. With the rapid improvements in computing power, numerical simulations of the ocean have developed greatly and have become a big source of ocean science data. Some of the models that generate simulated ocean science data are Hybrid Coordinate Ocean Model (HYCOM), Amseas (Former Navy Coastal Ocean Model), Regional Ocean Modeling System for ocean circulation simulations, Simulating WAves Nearshore (SWAN), and WAVEWATCH for wave simulations, ecosystem models (ECOSIM-ECOPATH) to understand dynamics of natural ecosystems, biophysical models such as Daigle et al. (2016) to study larval dispersal, and biogeochemical models such as Fennel ROMS to examine physical and biogeochemical mechanisms for the formation and destruction of seasonal hypoxia on the TX-LA shelf.

Data sources
Like in any other areas, Gulf of Mexico's ocean science data is collected by several ocean instrumentsmoored surface and underwater ocean buoys, floating ocean buoys, surface and subsurface sensor networks, coastal ocean dynamics applications radar (CODAR) stations, remote sensing satellites, unmanned maritime vehicles, research vessels, and numerical ocean models. Regional, state, and national ocean observing platforms host most of the ocean science data. The data and the ocean observing platforms are managed by government, educational, and private organizations. The ocean observing platforms common goal is to maximize the full potential of ocean data. NOAA is responsible for one of the government's largest data inventories, collecting, organizing, and releasing data on everything from the deep ocean to the atmosphere and space. The agency founded in 1807 as the U.S. Coast and Geodetic Survey, now oversees a complex system of data on America's seas, Great Lakes, and coastlines in collaboration with other federal agencies, regional organizations, business, academic partners, and data producers and users. In addition, commercial firms, citizen science, non-governmental organizations, and others are using new, low-cost technology like drones and cellphones to contribute to the ever-growing pool of ocean data. For a number of legal, regulatory, private, or technological reasons, certain data may not be shared with government agencies or the general public, restricting its applicability. However, unrestricted ocean science data is publicly available for independent research purposes. The unrestricted ocean science data can be accessed via ERDDAP (https:// coastwatch.pfeg.noaa.gov/erddap/index.html) servers and Opensource Project for a Network Data Access Protocol (OPeNDAP) depending on what protocols the data sources support. Accessing and downloading the ocean science data of interest is a tedious task for data scientists and marine researchers as the data of interest for a specific region may be collected by several ocean instruments and organizations but for different purposes. Thus, the similar data from a specific region but recorded by different or same instruments by different organizations may be available in the data platforms of specific regional, state, and federal organizations. Below we present some of the data sources commonly used to download and/or analyze GOM-related ocean science data that data scientists and marine researchers can resort to.

United States geological survey
USGS is one of the significant data contributors providing especially the water, energy, minerals, and other natural resources people rely on, natural hazards that can threaten lives and property, the health of our ecosystems and environment, and the impacts of climate and land-use change. USGS is nation's largest water, earth, and biological science and civilian mapping agency. In addition to providing quality scientific data, USGS also provides web tools like interactive maps, alerts and notifications, data analysis and visualizations, and data repositories. USGS produces accurate geologic maps, topographic maps, and 3-D geologic frameworks that provide critical data for sustaining and improving the quality of life and economic vitality of the Nation. They also organize, maintain, and publish the geospatial baseline of the Nation's topography, natural landscape, built environment and more. USGS also has multimedia data like videos, images, audio, beforeafter imagery, webcams, and more. The USGS tracks and researches a wide range of water conditions and resources, including streamflow, groundwater, river discharge, water quality, and water usage and availability. USCG operates and maintains 500 laboratories nationwide, 60 science centers, and 5 volcano observatories. USGC supports research and also supply data to external users. USGS serve multiple types of users, including scientists (USGS, other federal agencies, and academic), regulators and resource managers (federal, state, and local), and private companies.

National centers for environmental information
One of the world's largest archives for atmospheric, coastal, geophysical, and oceanic research is managed by NCEI. NCEI houses a large portion of the data that NOAA scientists, observational systems, and research projects collect. It is in charge of a huge repository of environmental data that spans many different time periods, monitoring systems, scientific fields, and geographical locations. NCEI is holding 44 petabytes of data as of May 2022 and is expecting the data will grow to 250 petabytes by 2030 (NOAA, 2012). Figure 2 shows the current and forecasted NCEI data archival volume. NCEI's data access functionality provides free access to NCEI's archive of global coastal, oceanographic, geophysical, climate, and historical weather data. These data include quality controlled hourly, sub-hourly, daily, monthly, seasonal, and yearly measurements of NOAA's archived environmental data. Data are available through direct download or sub-setting services. Customers can also order most of these data as certified hard copies for legal use. NCEI provides several data services to monitor, access, archive, and download multi domain ocean science data. NCEI generates a comprehensive annual summary report of the global climate system. This report summarizes the global and regional climate of the preceding calendar year with the input from hundreds of authors giving several directions for further research topics.

National data buoy center
National Data Buoy Center maintains a network of data collecting ocean buoys and coastal stations. NDBC provides meteorologic and oceanographic data collected by buoys and coastal stations. NDBC transmits hourly observations from its network of buoys and coastal stations to a ground facility operated by NOAA. This transmitted data is recorded and processed by the national oceanographic data center. The NDBC website displays all buoy details it is managing on a map along with some useful resources to download the observations data. The NDBC Distributed Oceanographic Data System (DODS) makes netCDF files available to the science community and general public via their website. It uses the Open Source Project for a Network Data Access Protocol software for data providers to share data with each other and the end users.

Integrated ocean observing systems
IOOS maintains near-real time observations data collected from 11 regional associations spanning the national coasts. It is a multidisciplinary system that provides data in the forms and at the speeds that decision makers require in order to accomplish NCEI Archival Volume History and Forecast NCEI (2022). Sunkara et al. 10.3389/fmars.2023.1075822 Frontiers in Marine Science frontiersin.org several societal goals. IOOS is a cooperative effort of federal and non-federal entities to provide new data, tools, and forecasts to improve marine safety, enhance the economy, and protect the U.S coastal and ocean environment. Several tools and resources to access and visualize multi-domain ocean science data are developed and readily available on IOOS web interface. IOOS also contributes to and shares the resources to global ocean observing system (GOOS). IOOS manages data assembly centers for gliders, animal telemetry network, and High Frequency (HF) Radar network. IOOS created the Coastal and Ocean Modeling Testbed that allows sharing of numerical ocean models data along with observations data, and utilizes the software tools for integration, scientific analysis, comparison, and data archival.

Gulf of Mexico coastal ocean observing system
GCOOS is at the center of data gathering for the Gulf of Mexico's ocean and coastal waters, collecting thousands of data points from sensors and ensuring that the data is trustworthy, timely, and correct before being disseminated to the ocean sectors that rely on it. GCOOS represents the academic, industrial, government, and non-government sectors, with organizations streaming data, information, and products on marine and estuarine systems to the GCOOS online platform, where thousands of users from ocean modelers to ship captains have instant access. GCOOS integrates physical, biological, meteorological, biogeochemical, bathymetric, and other ocean science data from diverse providers. GCOOS delivers timely and reliable data, products, and services to IOOS, decision-makers, and the general public to benefit human communities, the economy, and natural ecosystems. GCOOS is one of the 11 regional associations of the U.S. IOOS.

Coastal CUBEnet
The Coastal CUBEnet developed by the University of Southern Mississippi is a high-resolution, coastal ocean sensors, modelling, and data sharing network that provides the integrated, multidimensional, open infrastructure needed for collaborative ocean research products. The CUBEnet's centralized environmental intelligence resources are critical for sharing high-resolution data, model forecasts, and other related research products with end users and the community. The CUBEnet and its environmental intelligence tools serve as a platform that offers the opportunity to bring in expertise, insights, methods, and tools from multiple disciplines including oceanography, climate science, biology, natural resource management, computer/data science, public policy, and economics. The CUBEnet provides data collected by several ocean instruments like moored buoys, underwater sensor networks, HF Radar stations, unmanned maritime systems, satellites along with high resolution hydrodynamic models integrated for several GOM coastal areas. The web interface of CUBEnet provides tools and resources for visualization, analysis, and download the data. The CUBEnet is currently hosting data for Louisiana, Mississippi, Alabama, and West Florida coastal regions.
The CUBEnet's data is passed on to GCOOS for QA and QC, and is then archived at the Global Telecommunication System.

Earth observing system data and information system
The EOSDIS is the primary component of NASA's earth science data collected from satellites, aircrafts, field measurements, and other programs. EOSDIS provides an open source web application, "worldview," allowing users to browse satellite imagery interactively. Solar irradiance, oceanic, atmospheric, land surface, and subsurface data are only a few examples of the Earth observation data that EOSDIS supports. Worldview uses the Global Imagery Browse Services (GIBS) to rapidly retrieve its imagery for an interactive browsing experience. The GIBS system is a core EOSDIS component which provides a scalable, responsive, highly available, and community standards based set of imagery services. The GIBS imagery archive includes over 100 imagery products representing visualized science parameters.

National environmental satellite, data, and information service
The NOAA's mission is to understand and predict changes in climate, weather, oceans and coasts, to share that knowledge and information with others, and to conserve and manage coastal and marine ecosystems and resources as the Nation's authoritative environmental intelligence agency. NESDIS supports NOAA's mission of Science, Service and Stewardship through our satellite missions, data centers, data and information products and services as well as use-inspired science. The United States depends on NOAA to provide satellite data and imagery for meteorological forecasts and emergency services to support continuity of government. NESDIS' responsibility is to collect and provide the critical satellite Earth observations and other essential environmental information needed for disaster preparedness, all hazards response and recovery and the protection of the Nation's critical infrastructure and natural resources. The 24/7 global coverage provided by NESDIS generates an uninterrupted stream of information and products. These products and information enable services used across the country in preparation for events that impact our climate, weather, oceans, daily lives and national safety and provide essential information for national, regional and local planners and officials. Massive amounts of satellite data is processed to create products, tools, and services that help decisionmakers better inform the public and safeguard our environment. This data is archived in NCEI and can be accessed through NOAA's data search platform, "One Stop."

Data issues
The ocean science data is as complex as the oceans are. As data density and resolutions improve with technological advancements, the volume of ocean data is expected to expand dramatically, surpassing 250 petabytes by 2030 (NOAA, 2012). Huge volumes of data from multiple disciples of ocean science create significant challenges in handling and managing the data. The data output from any sensor instrument collecting ocean science data should describe the data as well as the file structure. Since different datasets may use different terms to describe ocean variables, descriptions of metadata in the output files guarantee correct interpretation of data. Network Common Data Form (netCDF) is one such data format that is widely used by several data platforms. Processing raw data and making ocean data useful for analysis that can guide management choices, clarify long-term patterns, and solve key scientific issues takes a significant amount of time and skill. The requirements vary depending on the kind of data and are frequently a result of how the data was obtained. The processes required to handle the data must also be accurately recorded in order for scientific reproducibility across research utilizing the same data to be possible. Data processing and analysis will be a severe bottleneck given the volume of data that is currently available and expected in the future with the traditional manual methods. Thus, data science and data driven skills and expertise is needed in the field of ocean related sciences and collaborations among data scientists and marine scientists will solve many data related problems.
Despite the significant amounts of ocean data currently available and expected in the future, because of the vastness of oceans and dependence of several economic sectors on the ocean, there are data needs and gaps identified for different ocean data categories. In addition to acquiring acceptable ocean data, data management and access systems must be capable of distributing data to decision makers and other users in ways that are helpful to them. Planning for data collection should be done in tandem with planning for data usage, taking into account a variety of purposes and users, including individuals whose desire to provide data may be influenced in large part by whether their requirements are satisfied and their interests are respected. Moreover, the data collected by some academic and private sector communities, the data collected by private industries that is withheld due to privacy and confidentiality purposes, and by scientific groups working on specific projects with short-term funding is not openly available to the public. As a result, large amounts of ocean science data is inaccessible, and thus shows the need for data sharing.

Machine learning and its application in Gulf of Mexico
Machine learning (ML) uses dynamic models to make data-driven choices, and ML approaches may be used to high-dimensional, complicated, non-linear, and big-data problems. ML can tackle issues that are unfeasible or too difficult for traditional methodologies, which need a large number of people, resources, time, and effort to achieve the requisite precision. ML not only delivers effective solutions, resilience, and accuracy, but also efficiency due to its ability to swiftly handle massive volumes of data. Furthermore, the ML technique works well even when the data is noisy. ML is adapted by several scientific fields and has proven to be effective with the fields that generate large volumes of data. As the ocean science data is real-time, near real-time, and historical, ML techniques and approaches have been effective in addressing the ocean related problems. In this section, we discuss how ML is being used to solve some of the problems faced by GOM and other ocean waters.
Wind and wave conditions are critical for many marine industries and hence the accurate forecasting and prediction of wind and wave parameters is a valuable resource. With the knowledge of wind and wave characteristics, shipping routes can be optimized by avoiding rough seas, aquaculture harvesting can be improved, testing and performance evaluation of unmanned marine vehicles can be conducted for navy and military operations. While traditional physics based models are used in wave prediction (Hsu and Holland, 2007;Kalourazi et al., 2021), physics based wind and wave modelling is computationally expensive and thus machine learning approaches with better accuracy are preferred. The outputs of thousands of wave model runs is used to create the training dataset to develop a surrogate model as a data-driven technique to empirically approximate the response of a physics based model in (James et al., 2018). Artificial neural network is fed with reanalysis wind data to extend the observed time series data of significant wave heights in (Peres et al., 2015). Data recorded by the buoys carrying inertial sensors is used by a convolutional neural network to predict wave heights and period in (Liu et al., 2019). Support vector machines, random forests, and artificial neural networks are used to forecast hurricane wave height over the GOM utilizing the data collected from six different buoys at different locations (Mafi and Amirinia, 2017). While some data driven machine learning approaches use the real-time and historical observations data from buoys in forecasting and predictions of wind and wave characteristics, (Ellenson et al., 2020) presents a hybrid approach to improve predictions by correcting a physics based model using a machine learning method.
While satellite remote sensing is an important tool for spatial and temporal estimation of some of the ocean parameters, in many models satellite mapping is done for a specific oceanic region often dominated by a single major oceanic process. Thus, any numerical or data driven models developed to estimate ocean parameters for a specific region may have poor applicability in other regions. Sea surface temperature, pCO 2 and pH are critical parameters which play an important role in understanding climate change and global warming. Yet, the estimation of these variables mostly based on satellite remote sensing is difficult due to the complex relationship between these variables and other environmental variables. Statistical techniques such as multiple linear regression (MLR) (Lefevre et al., 2002;Olsen et al., 2004;Jamet et al., 2007), multiple polynomial regression (MPR) (Stephens et al., 1995;Ono et al., 2004;Gloege et al., 2022), and principal component regression (PCR) (Lohrenz and Cai, 2006;Lohrenz et al., 2010) are used in most research that estimates pCO 2 , whether based on remote sensing pictures or observation data sets. Self-organizing maps (SOMs) and feedforward neural networks (Moussa et al., 2016) are examples of machine learning algorithms that have done well in estimating pCO 2 . Statistical research, on the other hand, still lacks knowledge on the mechanical mechanisms that explain why coastal regions act as carbon sinks or sources (Dai et al., 2013). At the same time, neither physical, chemical, nor biological processes can explain the projected findings . Fu et al. (2020) combines cubist and semi-mechanistic methods to predict ocean surface pCO 2 using sea surface temperature, sea surface salinity, chlorophyll, and diffuse attenuation coefficient. The GOM is chosen as the study area and has obtained a satisfied performance providing a solid foundation for extending its application to other areas with similar environmental and geographic conditions. Most common ML approaches used to study water quality and key parameters such as pH, temperature, chlorophyll-a (Chl-a), salinity, dissolved oxygen, are Artificial Neural Networks (ANN), Support Vector Machines (SVM), Radio Frequency (RF), Decision Trees (DT), Multilayer Perceptron (MLP), cubist, and Gaussian processes (GP) (Hassan and Woo, 2021). These ML approaches are also used to monitor the water quality at regional and global scales. However, a variety of indicators like turbidity, Chromophoric Dissolved Organic Matter (CDOM), Chl-a, Dissolved Oxygen (DO), suspended solids concentrations, harmful algae, e.t.c are used to assess the water quality of a water body while water temperature, pH, salinity, and DO are common water quality indicators. For most of the water quality prediction, forecasting, and monitoring ML models satellite remote sensing data is a common source for training datasets (Kim et al., 2014a;Chang et al., 2017;Shehhi and Kaya, 2020;Goḿez et al., 2021;Zhu et al., 2022). However, real-time, near real-time, and historical sensor measurements are also used to predict water quality and the contributing parameters (Lu and Huang, 2009;Xiang and Jiang, 2009;Solanki et al., 2015;Khan and See, 2016;Lorenzo et al., 2019;Manimegalai et al., 2020).
Data-driven models (Minsker et al., 2006;Coopersmith et al., 2011;Li et al., 2020;Yu et al., 2020) are also developed to understand, predict and forecast hypoxic and anoxic waters along the coastal regions using the dissolved oxygen content data along with other environmental variables. The GOM being prone to hypoxic conditions along the coastlines affects the marine industries, aquacultures, and tourism. The freshwater discharges are related to hypoxic conditions and thus affect water quality along the coastline which is a major concern (Alizadeh et al., 2018;Dzwonkowski et al., 2018). In addition, the GOM is also prone to harmful algal blooms along the coastal waters resulting in fish kills and marine wildlife kills (Blondeau-Patissier et al., 2014;Le et al., 2019;Yñiguez and Ottong, 2020). Predicting the chlorophyll andd issolved oxygen contents is handled by several machine learning approaches, developed to investigate algal blooms (Kumar and Bhandarkar, 2017;Hill et al., 2020;Chen et al., 2020b;Yerrapothu, 2021;Yu et al., 2021). Computer vision techniques are also studied in detection and classification of algal blooms (Samantaray et al., 2018;Pant et al., 2020).
With the extent of observational data available along the GOM, machine learning is used in predicting the severity of weather events (Ramachandra, 2019). Data-driving machine learning approach is used to predict the Loop Current evolution and the Loop Current ring formation in the Gulf of Mexico , including a forecast of the sea surface height of the Loop Current System (Zeng et al., 2015;Wang et al., 2021), and a forecast of velocity structures of the Loop Current and its eddies (Huang et al., 2021;Muhamed Ali et al., 2021;Huang et al., 2022b). All ocean basins are experiencing sea level rise and warming due to climate change and global warming, so predicting and understanding the sea-level rise is also done using the machine learning approaches (Roshni et al., 2019;Morovati et al., 2021;Nieves et al., 2021;Tur et al., 2021). Remote sensing data is quite helpful in detecting the oil spills by analyzing satellite images manually, however machine learning models can also help automate the detection and tracking of oil spills (Estes and Senger, 1971;Kubat et al., 1998;Shamsudeen, 2020). While remote sensing data is used to detect oil spills for a larger area, oil spill detection in confined areas like ports is also carried out with the help of aerial vehicles, thermal infrared images and a trained convolutional neural networks (De Kerf et al., 2020). To assist faster emergency oil spill responses a novel faster regionbased convolutional neural network model is developed which uses satellite synthetic aperture radar imagery (Huang et al., 2022a). Flood warning systems and forecasting techniques play an important role in mitigating the hazards in flood prone areas and have a severe economic impact. While the physics based models have been long used to predict storms, rainfalls and other hydrologic events (Costabile et al., 2013;Fernańdez-Pato et al., 2016), data driven machine learning models show promising results faster than physics based models in forecasting floods (Xu and Li, 2002;Mekanik et al., 2013;Kim et al., 2016;Mosavi et al., 2017).
Within machine learning, Deep Learning (DL) models with a focus on ocean data analysis have also been developed in the recent years (Lou et al., 2021). A number of such DL models focus on object detection, sound, image or video processing of marine and ocean objects. Deep Learning models have been used to identify fish species in underwater drone footage (Meng et al., 2018) and coral reefs have been classified using such models with image processing applications (Mary and Dharma, 2017). With the advancement in underwater drones, ocean data can also take the form of videos and DL models have been applied for object detection in such videos when the quality is too low for manual detection (Sun et al., 2018). Computer vision and other DL models can successfully be used to detect numerous types of ocean objects, such as seagrass meadows in a plethora of ocean datasets (Moniruzzaman et al., 2017), some of the specific application include deep water sound processing to identify sources of sound pollution or detect specific marine species (Mishachandar and Vairamuthu, 2021) and to detect and classify fish call types in the northern Gulf of Mexico (Waddell et al., 2021). Satellite ocean data, such as what we discussed above, is also amenable to Deep Learning analysis (Ducournau and Fablet, 2016) including for ocean data forecasting (Choi et al., 2022). Deep Learning approaches also show significant promise for climate change modeling of ocean data such as wave energy forecasting (Bento et al., 2021), as well as sea surface temperature patterns to identify ocean extremes (Prochaska et al., 2021). They are also effective tools in ocean data quality control, such as when used for ocean temperature data with potential gross errors (Mieruch et al., 2021). To overcome sampling rates and lowresolution ocean data (Bolton and Zanna, 2019) use Deep Learning approaches to predict unresolved turbulent processes and subsurface flow fields. Figure 3 shows the summary of this section by illustrating the major areas of ML applications in the GoM with the proportion of the literature divided into specific categories of interest.

Discussion and conclusions
The reviewed literature identifies several approaches that can be applied to GOM problems, and Figures 1, 3 depict the breadth of studies on a wide variety of applicable topics. The figures also illustrate areas that need further research both within the ML-based literature and the broader climate change literature. Algal blooms, HABs/fish kills, and sea-level rises (major problems with the GOM) have not received as much attention as other climate change problems. Within the ML-based literature, the effects on estuaries and coastal systems, and oyster mortality (both being major problem for the GOM) have been understudied and we identify those as important areas for improvement. Our overview, combined with the abundance of data and constantly developing methodologies, should serve as a template to identify areas of research most in need of further work. We have attempted to provide that template, suggestions for possible data sources, as well as methods that can be used to analyze the available data. As we pointed out, the GOM is of major interest economically, ecologically, and within public policy so such future research holds tremendous promise for a broad impact. We anticipate this paper will help data scientists looking to delve into ocean science to find a research problem and ocean science researchers trying to find ways how ML skills can be applied to their specific research problem.

Author contributions
VS wrote the manuscript with the support of JM, SK, II and DNB. All authors contributed to the article and approved the submitted version.

Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher's note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.